Prior work established the benefits of server-recorded user engagement measures (e.g. clickthrough rates) for improving the results of search engines and recommendation systems. Client-side measures of post-click behavior received relatively little attention despite the fact that publishers have now the ability to measure how millions of people interact with their content at a fine resolution using client-side logging. In this study, we examine patterns of user engagement in a large, client-side log dataset of over 7.7 million page views (including both mobile and non-mobile devices) of 66,821 news articles from seven popular news publishers. For each page view we use three summary statistics: dwell time, the furthest position the user reached on the page, and the amount of interaction with the page through any form of input (touch, mouse move, etc.). We show that simple transformations on these summary statistics reveal six prototypical modes of reading that range from scanning to extensive reading and persist across sites. Furthermore, we develop a novel measure of information gain in text to capture the development of ideas within the body of articles and investigate how information gain relates to the engagement with articles. Finally, we show that our new measure of information gain is particularly useful for predicting reading of news articles before publication, and that the measure captures unique information not available otherwise.
[.pdf] copy available on author’s site.