PRESENTATION: Fragmentation and Discontinuity in Access to the USPTO Full-text and Image Database

October 27th, 2019 § 0 comments § permalink

Abstract: Historians of technology use patent databases including the United States Patent and Trademark Office (USPTO) Patent Full-Text and Image Database (PatFT) as evidence for technological development. While academics acknowledge the limitations of the content such as citations as a proxy for influence, less attention has been paid to how digital methods for accessing patent data – search engines, PDFs and webpages – alter our perception of PatFT as a primary source.

The USPTO provides access to all granted patents in the United States from Samuel Hopkin’s filing for the manufacture of potash in 1790 onwards via PatFT. There is a clear separation between patents filed before and after 1976, with the latter available as full searchable text, and the former available only as photograph facsimile copies of the original document. Plans to automate patent processing and discovery date back to the 1950s with the “Bush Report”, but it was not until the 1990s that the US government began to invest in the digitisation of patent submission and publishing workflows.

The forty-year lag between exploring and adopting electronic workflows has led to a fragmented discovery system. PDF documents differ from the searchable HTML full text in structure and metadata. Classification numbers are updated for historical filings, but there is no record of this update in either the HTML or PDF version. Some files were scanned from earlier efforts at microfilm preservation. Original copies of up to 5 million patent documents stored in Franconia, Virginia were destroyed in 2018, while the National Archives primarily holds material filed prior to 1978. As a result, many patents granted between 1978 and 1990, if not later, are only available as digital surrogates without a print copy of the original.Even born-digital records filed as late as 2000 started as semi-structured text records, enduring a brief transition to SGML (Standardised General Mark-up Language) in 2001, before settling on XML (Extensible Mark-up Language) standard in early 2002.

This paper examines how digital technologies and processes including data storage, digitisation, and Optical Character Recognition (OCR) shape our knowledge of the history of innovation recorded in patent databases. Through questioning the relationship between patents as data and as digital objects, I demonstrate how what Ian Milligan has termed the “illusionary order” of digitized archives restricts the USPTO PatFT’s use as a data source.

PRESENTATION: The Legacy of E-Readers Beyond Reading

October 25th, 2019 § 0 comments § permalink

While the reports of the ebook’s demise have been greatly exaggerated, dedicated e-readers have diminished in popularity with readers electing to use smartphones and tablets in their place. In this paper, I argue that beyond the e-reader’s continued role as a niche consumer electronic, it also fulfilled an important part of broad technological research and development that will continue to influence future trends in mobile computing and beyond.

Through case studies of e-reader’s early adoption of lithium-ion batteries and continual investment in bendable, low-power screens, I argue that e-reader’s long-term legacy will be solidified by its proof-of-concepts, rather than reading on-screen. For example, Nuvomedia, developers of the Rocket Ebook, designed the product to test the use of lithium-ion batteries, with the inventors later going on found Tesla. Likewise, Amazon subsidiary LiquaVista’s research into next generation electronic paper was a precursor for foldable phones. Through analysing the cutting-edge elements of e-reader design historically, I demonstrate the importance of the device within wider digital culture.

PRESENTATION: An archaeology of patent databases as material objects

September 9th, 2019 § 0 comments § permalink

Patent databases including Espacenet and the USPTO Patent Full-Text and Image Database (PatFT) offer rich sources for big data analysis to document the evolution of a technology or to demonstrate the value of a filing through citation networks. While scholars such as Manuel Trajtenberg, Douglas O’Reagan and Lee Fleming have complicated the relationship between citations and chains of influence between inventors, less attention has been paid to the affordances and limitations of the databases storing the underlying data. Researchers can access patent databases as searchable, online databases or via third-party sources such as the National Bureau of Economic Research (NBER) US Patent citation data files, but how does the structure, type, and availability of data shape our understanding of patents as historical evidence?

In this paper, I offer a case study of the layers of digitisation embedded within USPTO PatFT and Patent Application Full-Text and Image Database (AppFT) to analyse what Ian Milligan terms the “illusionary order” of patent databases. PatFT and AppFT combined provide access to all granted patents in the United States from Samuel Hopkin’s filing for the manufacture of potash in 1790 to weekly updates of new patents in early 2019. There is a clear separation between patents filed before and after 1976, with the latter available as full searchable text, and the latter available only as photograph facsimile copies of the original document. While the USPTO began transitioning to digital workflows in the 1970s, the full text from this time can rely on automatic Optical Character Recognition (OCR) processes, leading to a difference between the facsimile PDF copy of the patent and the semantically-rich full text.

Discrepancies proliferate beyond simple historical distortions. PDF documents can differ from the HTML full text, and updates to patent classifications can create different versions of the same patent without acknowledgement of an update. Some files were scanned from earlier efforts at microfilm preservation. Original copies of up to 5 million patent documents stored in Franconia, Virginia were destroyed in 2018, while the National Archives primarily holds material filed prior to 1978. As a result, many patents filed prior to the complete adoption of digital workflows are only available as digital surrogates without a print copy of the original.

The provenance of these patents as digital objects is therefore uneven and can create an incomplete image of patent filings, which is only exacerbated in other national databases where even photo-facsimiles of all patents remain unavailable. This paper offers a bibliographic and media archaeological excavation of the database documenting the digitalisation of the USPTO’s workflows to contextualise the types of error that may affect our understanding of patents as documents.

PRESENTATION: Reconsidering Project Gutenberg’s Significance as an Early Digitization Project

July 9th, 2019 § 0 comments § permalink

Michael Hart’s Project Gutenberg is often regarded as the first ebook publisher with Hart typing up the Declaration of Independence on a Xerox Sigma V at the University of Illinois at Urbana-Champaign (UIUC) in 1971. This mythology has perpetuated despite contradictory evidence: Hart did not coin the name ‘Project Gutenberg’ until the late 1980s, an early version of Milton’s Paradise Lost was an updated copy of a 1965 digitization by Joseph Raben at CUNY; and the Project’s first full book, the King James Bible, was not released until 1989.

In this paper, I challenge hagiographic accounts of Michael Hart’s early work within the broader context of early collaborative digitization work and innovations with the computer facilities at UIUC in the early 1970s. Computers and the Humanities, a prominent early digital humanities journal notes a range of digitization projects during the 1960s, and the Oxford Text Archive, a digital publication interchange network, formed in 1976. These early projects were more active than Hart, who only began work in earnest in the 1990s with the benefit of Usenet, FTP, and Gopher. Furthermore, Hart acknowledged but never used UIUC’s PLATO (Programmed Logic for Automatic Teaching Operations), an early computer network with a larger audience than ARPANET in the early 1970s, to disseminate texts. Through re-appraising Hart’s work within its historical and geographical context, the paper challenges the concept of a lone genius inventor of ebooks to propose a more inclusive history of digital publishing.

PUBLICATION: The limits of Big Data for analyzing reading

July 9th, 2019 § 0 comments § permalink

Rowberry, Simon (2019), ‘The limits of big data for analyzing reading. Participations. 16.1: 237-257. This was part of a special issue on Readers, Reading and Digital Media edited by DeNel Rehberg Sedo and Danielle Fuller.

Abstract: Companies including Jellybooks and Amazon have introduced analytics to collect, analyze and monetize the user’s reading experience. Ebook apps and hardware collect implicit data about reading including progress and speed as well as encouraging readers to share more data through social networks. These practices generate large data sets with millions, if not billions of data points. For example, a copy of the King James Bible on the Kindle features over two million shared highlights. The allure of big data suggests that these metrics can be used at scale to gain a better understanding of how readers interact with books. While data collection practices continue to evolve, it is unclear how the metrics relate to the act of
reading. For example, Kindle software tracks which words a reader looks up, but cannot distinguish between accidental look-ups, or otherwise link the act to the reader’s comprehension. In this article, I analyze patent filings and ebook software source code to assess the disconnect between data collection practices and the act of reading. The metrics capture data associated with software use rather than reading and therefore offer a poor
approximation of the reading experience and must be corroborated by further data.

URL: [Open Access]

PUBLICATION: DIY Peer Review and Monograph Publishing in the Arts and Humanities

August 3rd, 2018 § 0 comments § permalink

Butchard, Dorothy, Simon Peter Rowberry & Claire Squires (2018) “DIY Peer Review and Monograph Publishing in the Arts and Humanities”. Convergence. Online First.

Abstract: In order to explore monograph peer review in the arts and humanities, this article introduces and discusses an applied example, examining the route to publication of Danielle Fuller and DeNel Rehberg Sedo’s Reading Beyond the Book: The Social Practices of Contemporary Literary Culture (2013). The book’s co-authors supplemented the traditional ‘blind’ peer-review system with a range of practices including the informal, DIY review of colleagues and ‘clever friends’, as well as using the feedback derived from grant applications, journal articles and book chapters. The article ‘explodes’ the book into a series of documents and non-linear processes to demonstrate the significance of the various forms of feedback to the development of Fuller and Rehberg Sedo’s monograph. The analysis reveals substantial differences between book and article peer-review processes, including an emphasis on marketing in review forms and the pressures to publish, which the co-authors navigated through the introduction of ‘clever friends’ to the review processes. These findings, drawing on science and technology studies, demonstrate how such a research methodology can identify how knowledge is constructed in the arts and humanities and potential implications for the valuation of research processes and collaborations.

Open Access Version (Stirling Repository)

PRESENTATION: The End of Ebooks

July 15th, 2018 § 0 comments § permalink

Abstract: Amazon have dominated the ebook market since the launch of the Kindle in 2007 but the next decade may be defined by the merger of the Independent Digital Publishing Forum (IDPF) with the World Wide Web Consortium (W3C) in January 2017. The merger resulted in the formation of the W3C Publishing Working Group with the remit to maintain the EPUB standard while working to future-proof digital publications as “first-class entities on the Web” in the form of Packaged Web Publications (PWP). The proposed PWP specification would mark a paradigm shift for the book trade with ebooks gaining all the features of the modern Web rather than the more conservative EPUB specification.

The PWP specification is yet to be finalized but during its development, Working Group participants have extensively debated the limits of the book and its digital representation. The new standard must satisfy a broad range of use cases including trade publishing, scholarly communication, journalism, and grey literature. In this presentation, I conduct an analysis of the consensuses and fractures that will shape the presentation of books in browsers from a Science and Technology Studies perspective. The W3C offer an unprecedented level of transparency in decision making compared to prior ebook standards such as EPUB revealing the human decisions behind algorithmic interventions by mark-up validators, InDesign export wizards, and web browsers. These on-going discussions will not only shape the future of digital publishing, but return to the question of “what is a book?” in the context of the early twenty-first century.

PUBLICATION: Continuous, not discrete

February 7th, 2018 § 0 comments § permalink

Rowberry, Simon (2018), “Continuous, not discrete: The mutual influence of digital and physical literature.” Convergence. Online First.

Abstract:  The use of computational methods to develop innovative forms of storytelling and poetry has gained traction since the late 1980s. At the same time, legacy publishing has largely migrated to using digital workflows. Despite this possible convergence, the electronic literature community has generally defined their practice in opposition to print and traditional publishing practices more generally. Not only does this ignore a range of hybrid forms, but it also limits non-digital literature to print, rather than considering a range of physical literatures. In this article, I argue that it is more productive to consider physical and digital literature as convergent forms as both a historicizing process, and a way of identifying innovations. Case studies of William Gibson et al.’s Agrippa (A Book of the Dead) and Christian Bök’s The Xenotext Project’s playful use of innovations in genetics demonstrate the productive tensions in the convergence between digital and physical literature.

DOI: 10.1177/1354856518755049

Open Access Version (Stirling Repository)

PRESENTATION: Strategies for reconstructing the pre-history of the ebook through catalogue archives

September 15th, 2017 § 0 comments § permalink

Abstract: Amazon’s dominance of the ebook trade since 2007 can be credited to their erasure of evidence about the historical development of ebooks prior to the launch of the Kindle. This activity included removing catalogue records for their ‘Ebook and E-Doc’ store, a strategy Amazon repeated with the removal of old public domain Kindle titles in 2014. Early ebook experiments prior to the Kindle were not financially lucrative but provided the foundation for the platform’s future success. In this presentation, I will explore the challenges of analysing contemporary digital publishing due to the shifting landscape prior to the Kindle’s entry to the market. I will use a case study of Microsoft LIT format (discontinued in 2012) and, Microsoft’s dedicated catalogue of ebook titles to demonstrate the importance of the catalogue website for contemporary book historical research.
The preservation of the original ebooks is an optimistic ideal for platforms that have shut down and are therefore only available for consumers who have kept backups of files from at least half a decade ago. As a consequence, catalogues are vital evidence of what titles were available for sale. The reconstruction and preservation of these corporate catalogue records, only partially available through the Internet Archive’s Wayback Machine. The preservation of these metadata sources allows for a more comprehensive understanding of the history of the ebook and the flow of content from platforms as they fall in and out of fashion. In this paper, I present some initial findings from reconstructing this catalogue and highlight the importance of archiving contemporary ebook catalogues to preserve important evidence of early twenty-first century publishing practices.

PUBLICATION: Peer Review in Practice

September 11th, 2017 § 0 comments § permalink

Butchard, Dorothy, Simon Rowberry, Claire Squires, & Gill Tasker (2017), “Peer Review in Practice”. BOOC.

Preface: The report Peer Review in Practice was originally published in beta version during Peer Review Week 2016. It was the first stage in a mini-project focusing on peer review as part of the broader Academic Book of the Future project, and reviews the existing literature of peer review, and builds models for understanding traditional and emerging peer review practices.

The report underwent its own peer review. The beta version allowed readers to make comments upon the report, and a peer review was also commissioned by UCL Press. The former are still available on the beta version, while the latter is available here. The author of the latter (Professor Jane Winters of the School of Advanced Studies, University of London) made her peer review anonymously, but agreed on request that her comments be made public and her identity revealed.

The comments we received on the beta version were from a small number of individuals, and provided some useful additional resources and suggestions. As discussed in much of the literature of peer review, however, it was difficult to encourage substantial numbers of scholars to participate in the open, post-publication peer review. We also noted that the comment function led to responses being made about individual sentences or paragraphs, rather than providing overall analysis of the report. Overall, as an experiment in open post-publication peer review, we had hoped to receive more responses that would enable the report to develop further an ongoing core of knowledge and analysis of peer review. This current version of the report also has a commenting function, and we encourage the scholarly and publishing community to engage further with our report, in order to make it a useful ongoing resource.

One of the points made in the traditional peer review was about the lack of information about monograph publishing, something which we flag up in the introduction to our report. There is little research currently written on this subject, although as part of our mini-project, we are working on a forthcoming journal article focusing on peer review and monographs publishing in the Arts and Humanities. There are also further research projects focusing on peer review, including that encapsulated in a report by Fyfe et al., Untangling Academic Publishing: A History of the Relationship Between Commercial Interests, Academic Prestige and the Circulation of Research (May 2017), and the forthcoming project on ‘Reading Peer Review’, headed by Professor Martin Eve. The next stage in our own research into peer review is examining the language of peer review in Arts and Humanities journals.

DOI: 10.14324/111.9781911307679.15

Repository Version