<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Carsten Eickhoff</title>
	<atom:link href="http://www.carsten-eickhoff.com/wp/?feed=rss2" rel="self" type="application/rss+xml" />
	<link>http://www.carsten-eickhoff.com/wp</link>
	<description></description>
	<lastBuildDate>Tue, 21 May 2013 11:47:27 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1.3</generator>
		<item>
		<title>Copulas for Information Retrieval</title>
		<link>http://www.carsten-eickhoff.com/wp/?p=430&#038;utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=copulas-for-information-retrieval</link>
		<comments>http://www.carsten-eickhoff.com/wp/?p=430#comments</comments>
		<pubDate>Thu, 04 Apr 2013 14:56:44 +0000</pubDate>
		<dc:creator>Carsten</dc:creator>
				<category><![CDATA[Conferences]]></category>
		<category><![CDATA[Paper Abstract]]></category>

		<guid isPermaLink="false">http://www.carsten-eickhoff.com/wp/?p=430</guid>
		<description><![CDATA[In many domains of information retrieval, system estimates of document relevance are based on multidimensional quality criteria that have to be accommodated in a unidimensional result ranking. Current solutions to this challenge are often inconsistent with the formal probabilistic framework in which constituent scores were estimated, or use sophisticated learning methods that make it difficult [...]]]></description>
			<content:encoded><![CDATA[<p>In many domains of information retrieval, system estimates of document relevance are based on multidimensional quality criteria that have to be accommodated in a unidimensional result ranking.  Current solutions to this challenge are often inconsistent with the formal probabilistic framework in which constituent scores were estimated, or use sophisticated learning methods that make it difficult for humans to understand the origin of the final ranking. To address these issues, we introduce the use of <em>copulas</em>, a powerful statistical framework for modeling complex multi-dimensional dependencies, to information retrieval tasks.  We provide a formal background to copulas and demonstrate their effectiveness on standard IR tasks such as combining multidimensional relevance estimates and fusion of results from multiple search engines.  We introduce copula-based versions of standard relevance estimators and fusion methods and show that these lead to significant performance improvements on several tasks, as evaluated on large-scale standard corpora, compared to their non-copula counterparts.  We also investigate criteria for understanding the likely effect of using copula models in a given retrieval scenario.</p>
<p>This work together with Arjen P. de Vries and Kevyn Collins-Thompson has been accepted for full oral presentation at the <a href="http://sigir2013.ie/">36th Annual International ACM Conference on Research and Development in Information Retrieval (SIGIR)</a> in Dublin, Ireland.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.carsten-eickhoff.com/wp/?feed=rss2&#038;p=430</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>ECIR 2013, Moscow, Russia</title>
		<link>http://www.carsten-eickhoff.com/wp/?p=405&#038;utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=ecir-2013-moscow-russia</link>
		<comments>http://www.carsten-eickhoff.com/wp/?p=405#comments</comments>
		<pubDate>Thu, 28 Mar 2013 15:53:56 +0000</pubDate>
		<dc:creator>Carsten</dc:creator>
				<category><![CDATA[Conferences]]></category>

		<guid isPermaLink="false">http://www.carsten-eickhoff.com/wp/?p=405</guid>
		<description><![CDATA[My personal highlights from the accepted paper presentations include: Claudio Carpineto et al. &#8211; Semantic Search Log k-Anonymization with Generalized k-Cores of Query Concept Graph. (shared best paper) For k-anonymization of search engine log files, unique or infrequent queries can be removed in order to prevent individual users from being identifiable. Massive pruning can, however, [...]]]></description>
			<content:encoded><![CDATA[<p>My personal highlights from the accepted paper presentations include:</p>
<ul>
<li><strong>Claudio Carpineto et al.</strong> &#8211; <a href="http://link.springer.com/chapter/10.1007/978-3-642-36973-5_10">Semantic Search Log k-Anonymization with Generalized k-Cores of Query Concept Graph</a>. <strong>(shared best paper)</strong> For k-anonymization of search engine log files, unique or infrequent queries can be removed in order to prevent individual users from being identifiable. Massive pruning can, however, significantly reduce the coverage of log files. The authors propose a clustering method to anonymize log files based on query similarity rather than identity.</li>
<li><strong>Yashar Moshfeghi et al.</strong> &#8211; <a href="http://link.springer.com/chapter/10.1007/978-3-642-36973-5_2">Understanding Relevance: An fMRI Study</a>. <strong>(shared best paper)</strong> Are there neural differences in the processing of relevant and non-relevant documents? How does the human brain react to relevance? The authors show a first investigation into this domain.</li>
<li><strong>Aleksandr Chuklin et al.</strong> &#8211; <a href="http://link.springer.com/chapter/10.1007/978-3-642-36973-5_1">Using Intent Information to Model User Behavior in Diversified Search</a>. <strong>(best student paper)</strong> The authors propose an intent-aware click model to estimate relevance based on the user&#8217;s underlying search intent.</li>
<li><strong>Marc Bron et al.</strong> &#8211; <a href="http://link.springer.com/chapter/10.1007/978-3-642-36973-5_33">Example Based Entity Search in the Web of Data</a>. Using positive examples for entity search, the authors show performance gains when enriching entity queries with knowledge gained from the context of provided examples.</li>
<li><strong>Van Dang et al.</strong> &#8211; <a href="http://link.springer.com/chapter/10.1007/978-3-642-36973-5_36">Two-Stage Learning<br />
to Rank for Information Retrieval</a>. The authors introduce a multi-stage bootstrapped learning to rank process.</li>
<li><strong>Dongyi Guan et al.</strong> &#8211; <a href="http://link.springer.com/chapter/10.1007/978-3-642-36973-5_40">Increasing Stability of Result Organization for Session Search</a>. For faceted search, the authors employ external resources such as Wikipedia to improve the performance of the underlying result clustering.</li>
<li><strong>Xiaofei Zhu et al.</strong> &#8211; <a href="http://link.springer.com/chapter/10.1007/978-3-642-36973-5_54">Recommending High Utility Query via Session-Flow Graph</a>. Based on random walks in the clickthrough graph, the authors motivate a query recommendations scheme that focuses on high-utility queries. Such queries have been estimated to return more useful results for the user.</li>
<li><strong>Maksim Zhukovskii et al.</strong> &#8211; <a href="http://link.springer.com/chapter/10.1007/978-3-642-36973-5_55">URL Redirection Accounting for Improving Link-Based Ranking Methods</a>. The authors show how redirections can significantly obscure web graphs used to compute PageRank and other structural quality indicators.</li>
<li><strong>Nima Asadi et al.</strong> &#8211; <a href="http://link.springer.com/chapter/10.1007/978-3-642-36973-5_13">Training Efficient Tree-Based Models for Document Ranking</a>. The authors investigate the balanced creation of CART trees for LambdaMART learning to rank schemes. By biasing algorithms towards balanced, shallow tree ensembles, they show significant efficiency gains at only minuscule losses in ranking performance.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.carsten-eickhoff.com/wp/?feed=rss2&#038;p=405</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>DIR 2013 in Delft, The Netherlands</title>
		<link>http://www.carsten-eickhoff.com/wp/?p=398&#038;utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=dir-2013-in-delft-the-netherlands</link>
		<comments>http://www.carsten-eickhoff.com/wp/?p=398#comments</comments>
		<pubDate>Fri, 11 Jan 2013 16:22:11 +0000</pubDate>
		<dc:creator>Carsten</dc:creator>
				<category><![CDATA[Conferences]]></category>

		<guid isPermaLink="false">http://www.carsten-eickhoff.com/wp/?p=398</guid>
		<description><![CDATA[On 26 April, the 13th edition of the Dutch-Belgian Information Retrieval Workshop series, DIR 2013, will be hosted at Delft University of Technology in the Netherlands. The workshop serves as a forum for exchange and discussion on relevant challenges in the fields of information retrieval, data mining and natural language processing. DIR invites novel previously [...]]]></description>
			<content:encoded><![CDATA[<p>On 26 April, the 13th edition of the Dutch-Belgian Information Retrieval Workshop series, DIR 2013, will be hosted at Delft University of Technology in the Netherlands. The workshop serves as a forum for exchange and discussion on relevant challenges in the fields of information retrieval, data mining and natural language processing. DIR invites novel previously unpublished work, compressed presentations of previous major international contributions, as well as demonstrations of applied research and industry applications.<br />
More information  can be found at: <a href="http://www.dir2013.org">http://www.dir2013.org</a>.</p>
<p><a href="http://www.dir2013.org"><img alt="" src="http://www.carsten-eickhoff.com/files/images/dir.jpg" title="DIR 2013 Website" class="aligncenter" width="876" height="520" /></a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.carsten-eickhoff.com/wp/?feed=rss2&#038;p=398</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Exploiting User Comments for Audio-visual Content Indexing and Retrieval</title>
		<link>http://www.carsten-eickhoff.com/wp/?p=392&#038;utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=exploiting-user-comments-for-audio-visual-content-indexing-and-retrieval</link>
		<comments>http://www.carsten-eickhoff.com/wp/?p=392#comments</comments>
		<pubDate>Tue, 04 Dec 2012 08:52:40 +0000</pubDate>
		<dc:creator>Carsten</dc:creator>
				<category><![CDATA[Conferences]]></category>
		<category><![CDATA[Paper Abstract]]></category>

		<guid isPermaLink="false">http://www.carsten-eickhoff.com/wp/?p=392</guid>
		<description><![CDATA[State-of-the-art content sharing platforms often require users to assign tags to pieces of media in order to make them easily retrievable. Since this task is sometimes perceived as tedious or boring, annotations can be sparse. Commenting on the other hand is a frequently used means of expressing user opinion towards shared media items. We propose [...]]]></description>
			<content:encoded><![CDATA[<p>State-of-the-art content sharing platforms often require users to assign tags to pieces of media in order to make them easily retrievable. Since this task is sometimes perceived as tedious or boring, annotations can be sparse. Commenting on the other hand is a frequently used means of expressing user opinion towards shared media items. We propose the use of time series analyses in order to infer potential tags and indexing terms for audio-visual content from user comments. In this way, we mitigate the vocabulary gap between queries and document descriptors. Additionally, we show how large-scale encyclopedias such as Wikipedia can aid the task of tag prediction by serving as surrogates for high-coverage natural language vocabulary lists. Our evaluation is conducted on a corpus of several million real-world user comments from the popular video sharing platform YouTube, and demonstrates significant improvements in retrieval performance.</p>
<p>This work together with Wen Li and Arjen P. de Vries has been accepted for full oral presentation at the <a href="http://ecir2013.org/">35th European Conference on Information Retrieval (ECIR)</a> in Moscow, Russia.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.carsten-eickhoff.com/wp/?feed=rss2&#038;p=392</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Designing Human-Readable User Profiles for Search Evaluation</title>
		<link>http://www.carsten-eickhoff.com/wp/?p=390&#038;utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=designing-human-readable-user-profiles-for-search-evaluation</link>
		<comments>http://www.carsten-eickhoff.com/wp/?p=390#comments</comments>
		<pubDate>Tue, 04 Dec 2012 08:49:33 +0000</pubDate>
		<dc:creator>Carsten</dc:creator>
				<category><![CDATA[Conferences]]></category>
		<category><![CDATA[Paper Abstract]]></category>

		<guid isPermaLink="false">http://www.carsten-eickhoff.com/wp/?p=390</guid>
		<description><![CDATA[Forming an accurate mental model of a user is crucial for the qualitative design and evaluation steps of many information-centric applications such as web search, content recommendation, or advertising. This process can often be time-consuming as search and interaction histories become verbose. We present and analyze the usefulness of concise human-readable user profiles in order [...]]]></description>
			<content:encoded><![CDATA[<p>Forming an accurate mental model of a user is crucial for the qualitative design and evaluation steps of many information-centric applications such as web search, content recommendation, or advertising. This process can often be time-consuming as search and interaction histories become verbose. We present and analyze the usefulness of concise human-readable user profiles in order to enhance system tuning and evaluation by means of user studies.</p>
<p>This work together with Kevyn Collins-Thompson, Paul Bennett and Susan Dumais has been accepted for poster presentation at the <a href="http://ecir2013.org/">35th European Conference on Information Retrieval (ECIR)</a> in Moscow, Russia.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.carsten-eickhoff.com/wp/?feed=rss2&#038;p=390</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Personalizing Atypical Web Search Sessions</title>
		<link>http://www.carsten-eickhoff.com/wp/?p=385&#038;utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=personalizing-atypical-web-search-sessions</link>
		<comments>http://www.carsten-eickhoff.com/wp/?p=385#comments</comments>
		<pubDate>Mon, 12 Nov 2012 12:41:47 +0000</pubDate>
		<dc:creator>Carsten</dc:creator>
				<category><![CDATA[Conferences]]></category>
		<category><![CDATA[Paper Abstract]]></category>

		<guid isPermaLink="false">http://www.carsten-eickhoff.com/wp/?p=385</guid>
		<description><![CDATA[State-of-the-art web search personalization treats users as static or slowly evolving entities with a given set of preferences defined by their past behavior. However, recent publications as well as empirical evidence suggest that there is a significant number of search sessions in which users diverge from their regular search profiles in order to satisfy atypical, [...]]]></description>
			<content:encoded><![CDATA[<p>State-of-the-art web search personalization treats users as static or slowly evolving entities with a given set of preferences defined by their past behavior. However, recent publications as well as empirical evidence suggest that there is a significant number of search sessions in which users diverge from their regular search profiles in order to satisfy atypical, non-recurring information needs. In this work, we conduct a large-scale inspection of real life search sessions to further the understanding of this problem. Subsequently, we design an automatic means of detecting and supporting such atypical sessions. We demonstrate significant improvements over state-of-the-art web search personalization techniques by accounting for the typicality of search sessions. The merit of the proposed method is evaluated based on web-scale search session data spanning several months of user activity.</p>
<p>This work together with Kevyn Collins-Thompson, Paul Bennett and Susan Dumais has been accepted for full oral presentation at the <a href="http://wsdm2013.org/">ACM International Conference on Web Search and Data Mining (WSDM)</a> in Rome, Italy.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.carsten-eickhoff.com/wp/?feed=rss2&#038;p=385</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>CIKM 2012, Maui, Hawaii, USA</title>
		<link>http://www.carsten-eickhoff.com/wp/?p=377&#038;utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=cikm-2012-maui-hawaii-usa</link>
		<comments>http://www.carsten-eickhoff.com/wp/?p=377#comments</comments>
		<pubDate>Mon, 12 Nov 2012 12:31:12 +0000</pubDate>
		<dc:creator>Carsten</dc:creator>
				<category><![CDATA[Conferences]]></category>

		<guid isPermaLink="false">http://www.carsten-eickhoff.com/wp/?p=377</guid>
		<description><![CDATA[My personal highlights from the accepted paper presentations include: Ben Carterette et al. &#8211; Incorporating in User Behavior into Systems Based Evaluation. The authors propose a log-file-based user modeling scheme for system evaluation, representing variability in user behavior in the form of distributions rather than point estimates. Shahzad Rajput et al. &#8211; Constructing Test Collections [...]]]></description>
			<content:encoded><![CDATA[<p>My personal highlights from the accepted paper presentations include:</p>
<ul>
<li><strong>Ben Carterette et al.</strong> &#8211; <a href="http://www.cikm2012.org/">Incorporating in User Behavior into Systems Based Evaluation</a>. The authors propose a log-file-based user modeling scheme for system evaluation, representing variability in user behavior in the form of distributions rather than point estimates.</li>
<li><strong>Shahzad Rajput et al.</strong> &#8211; <a href="http://www.cikm2012.org/">Constructing Test Collections by Inferring Document Relevance via Extracted Relevant Information</a>. The authors demonstrate how passage-level relevance judgements can be propagated across documents in order to make the judging process more efficient.</li>
<li><strong>Jin Young Kim et al.</strong> &#8211; <a href="http://people.cs.umass.edu/~jykim/papers/cikm_2012_book_search_jykim.pdf">Understanding Book Search Behavior on the Web</a>. Based on 6 months of log file data of the open library, the authors show a detailed overview how and with which tools users search for books online. The resulting dataset is openly available to the research community.</li>
<li><strong>Mark Sanderson et al.</strong> &#8211; <a href="http://www.seg.rmit.edu.au/mark/publications/my_papers/cikm12_short_1.1.pdf">Differences in Effectiveness across Sub-Collections</a>. Different test collections have previously been reported to affect retrieval system performance scores. In this work, the authors quantify this effect by creating subsets of well-known collections.</li>
<li><strong>Mark Smucker et al.</strong> &#8211; <a href="http://www.mansci.uwaterloo.ca/~msmucker/publications/smucker-clarke-cikm2012.pdf">Stochastic Simulation of Time-Biased Gain</a>. The authors expand on the previously (SIGIR&#8217;12 best paper) introduced performance metric time-biased gain and propose a stochastic simulation for the score&#8217;s numerical approximation.</li>
<li><strong>Theodoros Lappas et al.</strong> &#8211; <a href="http://www.cikm2012.org/">Customizing Search Results for Non-Native Speakers</a>. The authors employ complexity of foreign language text as well as its similarity to the reader&#8217;s native tongue as a quality criterion in document ranking.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.carsten-eickhoff.com/wp/?feed=rss2&#038;p=377</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>SIGIR 2012, Portland, Oregon, USA</title>
		<link>http://www.carsten-eickhoff.com/wp/?p=355&#038;utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=sigir-2012-portland-oregon-usa</link>
		<comments>http://www.carsten-eickhoff.com/wp/?p=355#comments</comments>
		<pubDate>Mon, 20 Aug 2012 05:20:46 +0000</pubDate>
		<dc:creator>Carsten</dc:creator>
				<category><![CDATA[Conferences]]></category>

		<guid isPermaLink="false">http://www.carsten-eickhoff.com/wp/?p=355</guid>
		<description><![CDATA[The 35th ACM SIGIR Conference was held in Portland, Oregon, USA. Every three years the Gerard Salton Award is handed out for long lasting achievements in the field of information retrieval. This year, Prof. Dr. Norbert Fuhr was awarded with the prize. My personal highlights from the accepted full paper presentations include: Van Dang and [...]]]></description>
			<content:encoded><![CDATA[<p>The <a href="http://www.sigir.org/sigir2012/">35th ACM SIGIR Conference </a> was held in Portland, Oregon, USA. Every three years the Gerard Salton Award is handed out for long lasting achievements in the field of information retrieval. This year, Prof. Dr. Norbert Fuhr was awarded with the prize.<BR><br />
My personal highlights from the accepted full paper presentations include:</p>
<ul>
<li><strong>Van Dang and Bruce Croft</strong> &#8211; <a href="">Diversity by Proportionality: An Election-based Approach to Search Result Diversification</a> The authors propose a voting algorithm that takes into account the underlying aspect proportions of diverse queries.</li>
<li><strong>Ryen White and Eric Horvitz</strong> &#8211; <a href="http://research.microsoft.com/en-us/um/people/horvitz/health_search_persistence_SIGIR_2012.pdf">Studies of the Onset and Persistence of Medical Concerns in Search Logs</a>. The authors investigate medical web search sessions as expressed in search engine log files. According to their analysis, 80% of their users showcase medical search sessions over a period of several months. Most notably, sympton-driven searches were found to precede concrete conditions by significant time.</li>
<li><strong>Patrick Pantel et al.</strong> &#8211; <a href="http://www.patrickpantel.com/download/papers/2012/sigir12.pdf">Social Annotations on the Search Results Page: Utility and Prediction Modeling</a>. This work uses social network annotations such as likes, dislikes and expressions of expertise to augment search engine result pages. The evaluation is based on a large-scale simulated social network.</li>
<li><strong>Eugene Agichtein et al.</strong> &#8211; <a href="http://research.microsoft.com/en-us/um/people/ryenw/papers/AgichteinSIGIR2012.pdf">Search, Interrupted: Understanding and Predicting Search Task Continuation </a>. Some search tasks continue across session boundaries and resurface across extended stretches of time. The authors identify properties of continuing search task in order to predict whether a given task will end with the current session or resurface at a later point in time.</li>
<li><strong>Brent Hecht et al.</strong> &#8211; <a href="http://raubal.cartography.ch/Publications/RefConferences/fp490-bhecht.pdf">Explanatory Semantic Relatedness and Explicit Spatialization for Exploratory Search</a>. The authors presented Atlasify, a system for exploring the relatedness of topical and spatial domain of search queries.</li>
<li><strong>Yu-Heng Lei et al.</strong> &#8211; <a href="">Where Is Who: Large-Scale Photo Retrieval by Facial Attributes and Canvas Layout </a>. A sketch-based image retrieval system powered by number and position of people in the images. Face recognition further helps to refine queries by specifying who to search for.</li>
<li><strong>Mehdi Hosseini et al.</strong> &#8211; <a href="http://www0.cs.ucl.ac.uk/staff/ingemar/Content/papers/2012/SIGIR2012.pdf">An Uncertainty-aware Query Selection Model for Evaluation of IR Systems</a>. The authors propose a query selection framework to identify the most effective subset of queries for the formation of evaluation corpora.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.carsten-eickhoff.com/wp/?feed=rss2&#038;p=355</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Downside of Markup: Examining the Harmful Effects of CSS and Javascript on Indexing Today&#8217;s Web</title>
		<link>http://www.carsten-eickhoff.com/wp/?p=345&#038;utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=the-downside-of-markup-examining-the-harmful-effects-of-css-and-javascript-on-indexing-todays-web</link>
		<comments>http://www.carsten-eickhoff.com/wp/?p=345#comments</comments>
		<pubDate>Mon, 16 Jul 2012 20:59:14 +0000</pubDate>
		<dc:creator>Carsten</dc:creator>
				<category><![CDATA[Conferences]]></category>
		<category><![CDATA[Paper Abstract]]></category>

		<guid isPermaLink="false">http://www.carsten-eickhoff.com/wp/?p=345</guid>
		<description><![CDATA[The continued development and maturation of advanced HTML features such as Cascading style sheets (css), js, and AJAX, as well as their widespread adoption by browsers, has enabled web pages to flourish with sophistication and interactivity. Unfortunately, this presents challenges to the web search community, as a web page&#8217;s representation in the browser (i.e., what [...]]]></description>
			<content:encoded><![CDATA[<p>The continued development and maturation of advanced HTML features such as <em>Cascading style sheets</em> (css), js, and AJAX, as well as their widespread adoption by browsers, has enabled web pages to flourish with sophistication and interactivity. Unfortunately, this presents challenges to the web search community, as a web page&#8217;s representation in the browser (i.e., what users see) can diverge dramatically from its raw HTML content (i.e., what search engines index and retrieve). For example, interactive pages may contain content in regions that are not visible before a user action, such as focusing a tab, but which are nonetheless still contained within the raw HTML. We study this divergence by comparing raw HTML to its fully rendered form across a number of metrics spanning presentation, geometry, and content, using a large, representative sample of popular web pages. We find that a large divergence currently exists, and we show via a historical analysis that this divergence has grown more pronounced over the last decade. Finally, we conduct a retrieval experiment which shows that this divergence is already influencing web retrieval in a negative manner, and that we can improve performance by making use of properties that are only available via pages&#8217; rendered forms. The general finding of our study is that continuing to index the web via simple HTML parsing will diminish the effectiveness of retrieval on the modern web.</p>
<p>This paper has been accepted for publication at <a href="http://www.cikm2012.org/">CIKM&#8217;12</a>, Maui, USA.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.carsten-eickhoff.com/wp/?feed=rss2&#038;p=345</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>BooksOnline Workshop at CIKM 2012, Maui, Hawaii</title>
		<link>http://www.carsten-eickhoff.com/wp/?p=342&#038;utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=booksonline-workshop-at-cikm-2012-maui-hawaii</link>
		<comments>http://www.carsten-eickhoff.com/wp/?p=342#comments</comments>
		<pubDate>Fri, 22 Jun 2012 23:56:49 +0000</pubDate>
		<dc:creator>Carsten</dc:creator>
				<category><![CDATA[Conferences]]></category>

		<guid isPermaLink="false">http://www.carsten-eickhoff.com/wp/?p=342</guid>
		<description><![CDATA[During the past years, the Web culture has grown more and more enticing, centring many services around social media and collaboratively shared content. The vast range of possible exploitations of such community platforms includes viral marketing, collaborative tagging, recommendation and content creation. BooksOnline&#8217;12 aims to offer a forum for bringing together expertise from academia, industry, [...]]]></description>
			<content:encoded><![CDATA[<p>During the past years, the Web culture has grown more and more enticing, centring many services around social media and collaboratively shared content. The vast range of possible exploitations of such community platforms includes viral marketing, collaborative tagging, recommendation and content creation. BooksOnline&#8217;12 aims to offer a forum for bringing together expertise from academia, industry, libraries and archives to facilitate the exchange of research and application of social media and collaboratively shared content in the field of digital libraries with specific focus on online books. In particular, the impact and social use of this technology on younger users, so called Native Digital, is of great interest for a number of stakeholders from DL researchers to educators and publishers. The focus of this year’s workshop will thus be on how to make engaging reading experiences that readers would want to share.</p>
<p>BooksOnline&#8217;12 will encourage strong exploitation of the incentives and benefits of these major forms of massive on-line collaborations for digital libraries.</p>
<p><strong>Workshop Format</strong></p>
<p>The one day workshop will include selected oral and poster sessions to present and discuss ongoing research efforts, and a break-out session to brainstorm around new ideas, research directions, proposals and implementation strategies, finishing with presentations to summarize the results of the break-out sessions<br />
Similarly to previous years, we plan to host keynote speakers, who are prominent in the area. Previous keynote speakers included Adam Farquhar (The British Library), Ville Miettinnen (Microtask), James Crawford (Google Books), John Ockerbloom (University of Pennsylvania), and Brewster Kahle (Internet Archive).</p>
<p>Find out more at: <a href="http://www.hci.usi.ch/BooksOnline12/">http://www.hci.usi.ch/BooksOnline12/</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.carsten-eickhoff.com/wp/?feed=rss2&#038;p=342</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
