Michael J. Price Lab for Digital Humanities

Mining Goodreads

Mining Goodreads

Literary Reception Studies at Scale

Jim English

John Welsh Centennial Professor of English; Director, Wolf Humanities Center; Faculty Director, Price Lab for Digital Humanities

Lyle Ungar

Professor of Computer and Information Science

Rahul H. Dhakecha

MSc Student, Computer Science

Scott Enderle

Digital Humanities Specialist, Penn Libraries

Funding Period: 
June, 2016September, 2018

Project Manager: James Pawelski
Director of Education and Senior Scholar, Master of Applied Positive Psychology Program

Previous Research Assistants: Tianli Han (MSc SEAS), Sharvin Shah (MSc SEAS), Daniel Sample, Alex Anderson, Savannah Lambert, Amy Stidham (CAS)

Supported by the Price Lab for Digital Humanities in partnership with the Positive Psychology Center and the World Well-Being Project.

This project studies readers’ habits and experiences of literary consumption via computational analysis of online reader reviews in the massively popular Goodreads social reading site. We are currently working with some three million reviews from the site and the corresponding reader data.  We began with all the reviews of the two main sets of novels in the Contemporary Fiction Database Project, a parallel stream of research supported by Price Lab.  These novels were either top-ten bestsellers for some year since 1960, or they were shortlisted for major fiction prizes in the US, UK, or other English-speaking countries in some year since 1960.  Contrary to the widespread view that cultural consumers are becoming less polarized between high and low culture (the “omnivore thesis”), we found very little overlap between readers of these two sets of books.  And we observed that when readers do read both commercial blockbusters and critically esteemed works, they not only use different vocabularies to describe them (as one would expect), but they shift sharply between contrasting linguistic registers, using a younger and more typically “female” register to discuss their reading of bestsellers and an older, more characteristically “male” register to discuss critically prestigious novels. The sense of a great divide between the popular and the prestigious, a divide that has always been strongly gendered, remains so firmly embedded that it induces a kind of linguistic code-switching as an unconscious effect.

 

The guiding principle of univorousness appears to extend as well to readers of popular genre fiction.  Comparing reviews of three additional sets of contemporary novels, works of Science Fiction, Detective Fiction, and Chick-Lit, we again find scant overlap among readers.  Ongoing analysis of a random set of 1750 high-volume readers (users who have reviewed at least 150 books on Goodreads) finds that even these most avid of readers look to fiction to provide relatively predictable and self-similar forms of reading pleasure rather than new, unexpected experiences.

 

The next phase of our work is to analyze large sets of 5-star reviews corresponding to different groups of readers, so as to compare the different scales of value and the different forms or modes of “positive” reading experience corresponding to different genre preferences.  This analysis will be presented at a multi-day workshop on “Literature and Human Flourishing” in September 2018, and published in an Oxfored UP volume co-edited by Jim English and Heather Love.

Sample Figure I:  Words that correlate positively with 4 or 5 star reviews of prize-nominated novels and negatively with 4 or 5 star reviews of bestsellers. Larger font = stronger correlation. Only reviewers who have written 4 or 5 star reviews of both kinds of novel are included.  For each included reviewer an equal number of bestseller reviews and prize-novel reviews is used. (Blue/red signals word frequency across all the reviews, with more red = more frequent.)