Vertical Selection in the Presence of Unlabeled Verticals

Jaime Arguello; Fernando Diaz; Jean-François Paiement

doi:10.1145/1835449.1835564

Vertical aggregation is the task of incorporating results from specialized search engines or verticals (e.g., images, video, news) into Web search results. Vertical selection is the subtask of deciding, given a query, which verticals, if any, are relevant. State of the art approaches use machine learned models to predict which verticals are relevant to a query. When trained using a large set of labeled data, a machine learned vertical selection model outperforms baselines which require no training data. Unfortunately, whenever a new vertical is introduced, a costly new set of editorial data must be gathered. In this paper, we propose methods for reusing training data from a set of existing (source) verticals to learn a predictive model for a new (target) vertical. We study methods for learning robust, portable, and adaptive cross-vertical models. Experiments show the need to focus on different types of features when maximizing portability (the ability for a single model to make accurate predictions across multiple verticals) than when maximizing adaptability (the ability for a single model to make accurate predictions for a specific vertical). We demonstrate the efficacy of our methods through extensive experimentation for 11 verticals.

bibtex

Copied!

@inproceedings{arguello:vertical-xfer, year = {2010}, url = {http://doi.acm.org/10.1145/1835449.1835564}, title = {Vertical selection in the presence of unlabeled verticals}, series = {SIGIR '10}, publisher = {ACM}, pages = {691--698}, numpages = {8}, location = {Geneva, Switzerland}, isbn = {978-1-4503-0153-4}, doi = {10.1145/1835449.1835564}, booktitle = {Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval}, author = {Arguello, Jaime and Diaz, Fernando and Paiement, Jean-Fran\c{c}ois}, address = {New York, NY, USA}, acmid = {1835564} }