Mining Goodreads

Mining Goodreads

Literary Reception Studies at Scale

Jim English

John Welsh Centennial Professor of English; Director, Wolf Humanities Center; Faculty Director, Price Lab for Digital Humanities

Lyle Ungar

Professor of Computer and Information Science

Rahul H. Dhakecha

MSc Student, Computer Science

Scott Enderle

Digital Humanities Specialist, Penn Libraries

Project Start Date: 
June, 2016September, 2018

Project Manager: James Pawelski
Director of Education and Senior Scholar, Master of Applied Positive Psychology Program

Previous Research Assistants: Tianli Han (MSc SEAS), Sharvin Shah (MSc SEAS), Daniel Sample, Alex Anderson, Savannah Lambert, Amy Stidham (CAS)

Supported by the Price Lab for Digital Humanities in partnership with the Positive Psychology Center and the Humanities and Human Flourishing Project.

This project studies readers’ habits and experiences of literary consumption via computational analysis of online reader reviews in the massively popular Goodreads social reading site. We are currently working with some three million reviews from the site and the corresponding reader data.  We began with all the reviews of the two main sets of novels in the Contemporary Fiction Database Project, a parallel stream of research supported by Price Lab.  These novels were either top-ten bestsellers for some year since 1960, or they were shortlisted for major fiction prizes in the US, UK, or other English-speaking countries in some year since 1960.  Contrary to the widespread view that cultural consumers are becoming less polarized between high and low culture (the “omnivore thesis”), we found very little overlap between readers of these two sets of books.  And we observed that when readers do read both commercial blockbusters and critically esteemed works, they not only use different vocabularies to describe them (as one would expect), but they shift sharply between contrasting linguistic registers, using a younger and more typically “female” register to discuss their reading of bestsellers and an older, more characteristically “male” register to discuss critically prestigious novels. The sense of a great divide between the popular and the prestigious, a divide that has always been strongly gendered, remains so firmly embedded that it induces a kind of linguistic code-switching as an unconscious effect.

The guiding principle of univorousness appears to extend as well to readers of popular genre fiction.  Comparing reviews of three additional sets of contemporary novels, works of Science Fiction, Detective Fiction, and Chick-Lit, we again find scant overlap among readers.  Ongoing analysis of a random set of 1670 high-volume readers (users who have reviewed at least 150 books on Goodreads) finds that even these most avid of readers look to fiction to provide relatively predictable and self-similar forms of reading pleasure rather than new, unexpected experiences.  Here are some of the interactive visualizations we created to help us explore the data on those 1670 readers.

Portions of this work were presented at a multi-day workshop on “Literature and Human Flourishing” in September 2018, and will be published in an Oxfored UP volume co-edited by Jim English and Heather Love.

Sample Figure I:   Taste profiles of a random 1670 highly active Goodreads users.  Each point corresponds to a reader, with colors indicating the genre of novel they primarily favor.  A book's genre is determined by the "shelf" to which it is most often assigned by Goodreads users.  Mousing over a point will disclose the number of novels of each genre the reader has reviewed.