Regularizing ad hoc retrieval scores

Fernando Diaz

doi:http://doi.acm.org/10.1145/1099554.1099722

The cluster hypothesis states: closely related documents tend to be relevant to the same request. We exploit this hypothesis directly by adjusting ad hoc retrieval scores from an initial retrieval so that topically related documents receive similar scores. We refer to this process as score regularization. Score regularization can be presented as an optimization problem, allowing the use of results from semi-supervised learning. We demonstrate that regularized scores consistently and significantly rank documents better than unregularized scores, given a variety of initial retrieval algorithms. We evaluate our method on two large corpora across a substantial number of topics.

bibtex

Copied!

@inproceedings{diaz:regularization, year = {2005}, title = {Regularizing ad hoc retrieval scores}, publisher = {ACM Press}, pages = {672--679}, location = {Bremen, Germany}, isbn = {1-59593-140-6}, doi = {http://doi.acm.org/10.1145/1099554.1099722}, booktitle = {CIKM '05: Proceedings of the 14th ACM international conference on Information and knowledge management}, author = {Fernando Diaz}, address = {New York, NY, USA} }