Vertical aggregation is the task of incorporating results from specialized search engines or verticals (e.g., images, video, news) into Web search results. Vertical selection is the subtask of deciding, given a query, which verticals, if any, are relevant. State of the art approaches use machine learned models to predict which verticals are relevant to a query. When trained using a large set of labeled data, a machine learned vertical selection model outperforms baselines which require no training data. Unfortunately, whenever a new vertical is introduced, a costly new set of editorial data must be gathered. In this paper, we propose methods for reusing training data from a set of existing (source) verticals to learn a predictive model for a new (target) vertical. We study methods for learning robust, portable, and adaptive cross-vertical models. Experiments show the need to focus on different types of features when maximizing portability (the ability for a single model to make accurate predictions across multiple verticals) than when maximizing adaptability (the ability for a single model to make accurate predictions for a specific vertical). We demonstrate the efficacy of our methods through extensive experimentation for 11 verticals.
bibtex
Copied!
@inproceedings{arguello:vertical-xfer,
year = {2010},
url = {http://doi.acm.org/10.1145/1835449.1835564},
title = {Vertical selection in the presence of unlabeled verticals},
series = {SIGIR '10},
publisher = {ACM},
pages = {691--698},
numpages = {8},
location = {Geneva, Switzerland},
isbn = {978-1-4503-0153-4},
doi = {10.1145/1835449.1835564},
booktitle = {Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval},
author = {Arguello, Jaime and Diaz, Fernando and Paiement, Jean-Fran\c{c}ois},
address = {New York, NY, USA},
acmid = {1835564}
}