Jan 21, 2019 - Relevance Feedback on Your Modern Web Stack

Relevance Feedback on Your Modern Web Stack

Relevance feedback is a solid chapter of information retrieval research with many open-sourced implementation such as Indri, Anserini and Terrier. Together with my PhD supervisor, Claudia Hauff, we wrote a paper about bringing Indri to the modern Web stack and it was recently published as a demo at ECIR 2019.

  title={node-indri: moving the Indri toolkit to the modern Web stack},
  author={Moraes, Felipe and Hauff, Claudia},

node-indri is a Node.js module that acts as a wrapper around the Indri toolkit, and thus makes an established IR toolkit accessible to the modern web stack.

node-indri was implemented with the idea to expose many of Indri’s functionalities and provides direct access to document content and retrieval scores for web development (in contrast to, for instance, the Pyndri wrapper).

This setup reduces the amount of glue code that has to be developed and maintained when researching search interfaces, which today tend to be developed with specific JavaScript libraries such as React.js, Angular.js or Vue.js.

After developing node-indri, we immediately incorporated it to SearchX, the open-source collaborative search that I have developed in the last almost two years of PhD. You can find blog posts about it here on Claudia’s webpage.

SearchX’s backend supports the inclusion of many IR backends such as Elasticsearch and Bing API calls. In order to include node-indri as one of the supported backends, we implemented a Searcher class to provide search results (with or without snippets) in a pagination manner leveraging feedback documents.

This class exposes the functionalities of Indri’s QueryEnvironment and RMExpander classes through the method search which returns a list of search results in a paginated manner. When a Searcher object is instantiated, it takes a configuration object as an argument. When a call to search() is made and no feedback documents are provided as an argument, the standard query likelihood model is employed, otherwise, the relevance feedback expander RM3 is. Depending on the configuration settings, the returned result list may contain document snippets (as provided by Indri’s SnippetBuilder), document scores, document text and other metadata. An example of this can be found in the image below (notice the bold terms on the snippets showing expanded terms for the X query).


Another class we implemented was Reader to enable the rendering of a document’s content when a user has clicked on it, and Scorer to enable our backend to have direct access to documents’ scores for reranking purposes.

All of these aforementioned classes were used in our user study (more than 300 crowd workers recruited) published as an Information Retrieval journal article.

In our paper we also presented an efficiency analysis of node-indri, comparing it to Indri and Pyndri. We indexed two standard test corpora—Aquaint and ClueWeb12B—with Indri and measured the execution time for 10k queries of the TREC 2007 Million Query track across the three toolkits. The table below presents the overall query execution time of the three toolkits.

Aquaint ClueWeb12B
Indri 29s (0.30s) 1645s ( 20s)
Pyndri 25s (1.22s) 2262s (340s)
node-indri 25s (0.58s) 2058s (338s)

As you can see from the table, node-indri can be efficiently used in modern web backend development with comparable efficiency to Indri and Pyndri.

The node-indri repository is open-sourced at https://github.com/felipemoraes/node-indri.

Nov 13, 2018 - Search and Recommender Systems Papers with Code

Hey there! Inspired by the https://paperswithcode.com/Papers with Code effort, I created a small instance of this for search and recommender systems research papers that I found the code available on Github: https://felipemoraes.github.io/se-recsys-paperswithcode/.

Another interesting effort I follow for years now is the GitXiv.

In my humble version, I collected a few papers that I encountered during my research that are available on Github. These papers are mainly from SIGIR, CIKM, WSDM, EMNLP, ECIR, ICTIR conferences.

A plus is that I finally released the code I programmed during my master here.

I encourage pull requests or issues (with yours or from others papers with code) to make this small effort huge!

Jul 5, 2017 - Information Flow in Dynamic Information Retrieval

Back in 2016, I decided to take an Information Theory course at UFMG lectured by Mário S. Alvim as part of my Master’s coursework. In this course, Mário asks his students to write a report about a topic in information theory (not covered by him during the semester) or to model something in their research field using concepts of information theory. I decided to go with the latter, however, I chose a topic of Mário expertise, information flow, and then I wrote a short report on how we could model information flow in dynamic information retrieval.

Months later, Mário and I reviewed the report and together with my Master’s supervisor, Rodrygo Santos, we decided to make this modeling available to the IR community. Thus, we got this study accepted as a short paper at ICTIR 2017:

  author = {Felipe Moraes and Mário S. Alvim and Rodrygo L. T. Santos},
  title = {Modeling Information Flow in Dynamic Information Retrieval},
  booktitle = {Proceedings of the 3rd ACM International Conference on the 
    Theory of Information Retrieval},
  year = {2017},
  address = {Amsterdam, The Netherlands}

Here’s the abstract of our paper:

User interaction with a dynamic information retrieval (DIR) system can be seen as a cooperative effort towards finding relevant information. In this cooperation, the user provides the system with evidence of his or her information need (e.g., in the form of queries, query reformulations, relevance judgments, or clicks). In turn, the system provides the user with evidence of the available information (e.g., in the form of a set of candidate results). Throughout this conversational process, both user and system may reduce their uncertainty with respect to each other, which may ultimately help in finding the desired information. In this paper, we present an information-theoretic model to quantify the flow of information from users to DIR systems and vice versa. By employing channels with memory and feedback, we decouple the mutual information among the behavior of the user and that of the system into directed components. As a result, we are able to measure: (i) by how much the DIR system is capable of adapting to the user; and (ii) by how much the user is influenced by the results returned by the DIR system. We discuss implications of the proposed framework for the evaluation and optimization of DIR systems.

I hope that this paper inspires some researchers in bringing information theory foundations to IR.