PRESENTATION: Reconsidering Project Gutenberg’s Significance as an Early Digitization Project

Michael Hart’s Project Gutenberg is often regarded as the first ebook publisher with Hart typing up the Declaration of Independence on a Xerox Sigma V at the University of Illinois at Urbana-Champaign (UIUC) in 1971. This mythology has perpetuated despite contradictory evidence: Hart did not coin the name ‘Project Gutenberg’ until the late 1980s, an early version of Milton’s Paradise Lost was an updated copy of a 1965 digitization by Joseph Raben at CUNY; and the Project’s first full book, the King James Bible, was not released until 1989.

In this paper, I challenge hagiographic accounts of Michael Hart’s early work within the broader context of early collaborative digitization work and innovations with the computer facilities at UIUC in the early 1970s. Computers and the Humanities, a prominent early digital humanities journal notes a range of digitization projects during the 1960s, and the Oxford Text Archive, a digital publication interchange network, formed in 1976. These early projects were more active than Hart, who only began work in earnest in the 1990s with the benefit of Usenet, FTP, and Gopher. Furthermore, Hart acknowledged but never used UIUC’s PLATO (Programmed Logic for Automatic Teaching Operations), an early computer network with a larger audience than ARPANET in the early 1970s, to disseminate texts. Through re-appraising Hart’s work within its historical and geographical context, the paper challenges the concept of a lone genius inventor of ebooks to propose a more inclusive history of digital publishing.

PUBLICATION: The limits of Big Data for analyzing reading

Rowberry, Simon (2019), ‘The limits of big data for analyzing reading. Participations. 16.1: 237-257. This was part of a special issue on Readers, Reading and Digital Media edited by DeNel Rehberg Sedo and Danielle Fuller.

Abstract: Companies including Jellybooks and Amazon have introduced analytics to collect, analyze and monetize the user’s reading experience. Ebook apps and hardware collect implicit data about reading including progress and speed as well as encouraging readers to share more data through social networks. These practices generate large data sets with millions, if not billions of data points. For example, a copy of the King James Bible on the Kindle features over two million shared highlights. The allure of big data suggests that these metrics can be used at scale to gain a better understanding of how readers interact with books. While data collection practices continue to evolve, it is unclear how the metrics relate to the act of
reading. For example, Kindle software tracks which words a reader looks up, but cannot distinguish between accidental look-ups, or otherwise link the act to the reader’s comprehension. In this article, I analyze patent filings and ebook software source code to assess the disconnect between data collection practices and the act of reading. The metrics capture data associated with software use rather than reading and therefore offer a poor
approximation of the reading experience and must be corroborated by further data.

