Social Reading of Harry Potter on the Kindle (from a distance)

October 20th, 2015 § 0 comments § permalink

I’ve been seriously working on research for my history of the Kindle for a couple of years now and I’m still figuring out how to capture the impact of the Kindle on the scale of both the publishing/technology industry and the individual reader.

This tension is clearest when looking at the available data on reading and the shared highlights. There are a large number of individuals making personal choices behind the 500,000 shared highlights of a single edition of Wuthering Heights. If we scale this to over 4 million ebooks and 40 million Kindle users, it becomes extremely difficult to focus on both the local and global trends (and doubly so when access to the data is obsfucated and entirely unavailable): What counts as an appropriate sample? To what degree can individual highlights link to the mass of activity? How much data can I even get hold of?

While I ponder these questions, there’s still the problem of method. In order to figure this out, here’s a pilot study of the Harry Potter series as a complete unit that is manageable yet has received a fair amount of attention.

On the global level, shared highlights might not be able to tell us much about readership because an unknown number of readers choose not to highlight or share their efforts. The benefit of using Harry Potter, however, comes from the fact it is possible to gauge popularity across the series.

In recent versions of the Kindle software, a helpful pop-up box appears “About This Book” when opening a title for the first time. Luckily, this pop-up contains the total number of shared highlights and how many unique sections of the title have been highlighted. (These may not necessarily be up-to-date, but all the data here comes from 20 October 2015)

The data from the Harry Potter series reveals some interesting patterns. Figure 1 shows the total volume of shared highlights for each title, while figure 2 looks at the number of unique highlights per title. The most striking part of figure 1 is that the visible highlights (the top 10 most shared highlights) barely represent 10% of all shared highlights for any individual title.

Total number of highlights per Harry Potter book

Figure 1.  Total highlights for each Harry Potter title and the visible top 10 highlights (click for full size)


Figure 2. Unique highlights for each Harry Potter title (click for full size)

While the two graphs appear to show that the popularity of the series drops at the end and plummets after the first novel only to be pick up towards the middle, there is a far simpler explanation: the longer books receive more highlights as there is more text to highlight.

The only notable exception is Harry Potter and the Philosopher’s Stone, where more readers are focusing on particular passages. The large increase in total highlights without a similar increase in unique highlights likely indicates that more people are reading the first book than the rest of the series, or at the very least, they lose enthusiasm after the first book.

The second macroscopic view we can get from the Popular Highlights is the location of the shared highlights. Jordan Ellenberg has coined the Piketty Index as a way of using popular highlight locations to see how far through a book a reader got before quitting. From the evidence I’m gathering, it looks like the top 10 shared highlights are more likely to appear at the beginning of a book than the end, but what about the Harry Potter series?


Figure 3. Top 10 Shared Highlights for each Harry Potter title (click image for full size)

As a series, readers are more likely to highlight passages at the end of the book than the beginning. Not only does this suggest that readers are likely to finish the books, but through looking at the content of the highlights from the end of the book, it is clear that some of the most popular parts of the titles are Dumbledore’s speeches to Harry and the denouement of the narrative. Given the make-up of Rowling’s series and the slow start of most of the books, this inversion makes sense.

And that’s about as much as you can deduce from looking at the global level as far as I can tell. Once I’ve dug into the more traditional annotations and highlights of individual readers, I’ll compare the results with the broad patterns identified here.

Amazon is 20 years old – and far from bad news for publishers

July 16th, 2015 § 0 comments § permalink

I was asked to write about the importance of Amazon for publishers for the company’s 20th anniversary in The Conversation. It was originally published here:

It has now been 20 years since Amazon sold its first book: the titillating-sounding Fluid Concepts and Creative Analogies, by Douglas Hofstadter. Since then publishers have often expressed concern over Amazon. Recent public spates with Hachette and Penguin Random House have heightened the public’s awareness of this fraught relationship.

It has been presented as a David and Goliath battle. This is despite the underdogs’ status as the largest publishing houses in the world. As Amazon has become the primary destination for books online, it has been able to lower book prices through their influence over the book trade. Many have argued that this has reduced the book to “a thing of minimal value”.

Despite this pervasive narrative of the evil overlord milking its underlings for all their worth, Amazon has actually offered some positive changes in the publishing industry over the last 20 years. Most notably, the website has increased the visibility of books as a form of entertainment in a competitive media environment. This is an achievement that should not be diminished in our increasingly digital world.

Democratising data

In Amazon’s early years, Jeff Bezos, the company’s CEO, was keen to avoid stocking books. Instead, he wanted to work as a go-between for customers and wholesalers. Instead of building costly warehouses, Amazon would instead buy books as customers ordered them. This would pass the savings on to the customers. (It wasn’t long, however, until Amazon started building large warehouses to ensure faster delivery times.)

This promise of a large selection of books required a large database of available books for customers to search. Prior to Amazon’s launch, this data was available to those who needed it from Bowker’s Books in Print, an expensive data source run by the people who controlled the International Standardised Book Number (ISBN) standard in the USA.

ISBN was the principle way in which people discovered books, and Bowker controlled this by documenting the availability of published and forthcoming titles. This made them one of the most powerful companies in the publishing industry and also created a division between traditional and self-published books.

Bowker allowed third parties to re-use their information, so Amazon linked this data to their website. Users could now see any book Bowker reported as available. This led to Amazon’s boasts that they had the largest bookstore in the world, despite their lack of inventory in their early years. But many other book retailers had exactly the same potential inventory through access to the same suppliers and Bowker’s Books in Print.

Amazon’s decision to open up the data in Bowker’s Books in Print to customers democratised the ability to discover of books that had previously been locked in to the sales system of physical book stores. And as Amazon’s reputation improved, they soon collected more data than Bowker.

For the first time, users could access data about what publishers had recently released and basic information about forthcoming titles. Even if customers did not buy books from Amazon, they could still access the information. This change benefited publishers as readers who can quickly find information about new books are more likely to buy new books.

World domination?

As Amazon expanded beyond books, ISBN was no longer the most useful form for recalling information about items they sold. So the company came up with a new version: Amazon Standardized Identifier Numbers (ASINs), Amazon’s equivalent of ISBNs. This allowed customers to shop for books, toys and electronics in one place.

The ASIN is central to any Amazon catalogue record and with Amazon’s expansion into selling eBooks and second hand books, it connects various editions of books. ASINs are the glue that connect eBooks on the Kindle to shared highlights, associated reviews, and second hand print copies on sale. Publishers, and their supporters, can use ASINs as a way of directing customers to relevant titles in new ways.

Will Cookson’s Bookindy is an example of this. The mobile app allows readers to find out if a particular book is available for sale cheaper than Amazon in an independent bookstore nearby. So Amazon’s advantage of being the largest source of book-related information is transformed into a way to build the local economy.

ASINs are primarily useful for finding and purchasing books from within the Amazon bookstore, but this is changing. For example, many self-published eBooks don’t have ISBNs, so Amazon’s data structure can be used to discover current trends in the publishing industry. Amazon’s data allows publishers to track the popularity of books in all forms and shape their future catalogues based on their findings.

While ISBNs will remain the standard for print books, ASIN and Amazon’s large amount of data clearly benefits publishers through increasing their visibility. Amazon have forever altered bookselling and the publishing industry, but this does not mean that its large database cannot be an invaluable resource for publishers who wish to direct customers to new books outside of Amazon.

The strange orthography of ebooks

February 9th, 2015 § Comments Off on The strange orthography of ebooks § permalink

While the ebook has become a familiar concept since 2007 and the launch of the Kindle, there appears to be little consensus over how exactly to spell it. There appear to be three main contenders: e-book, ebook, and eBook.

Unfortunately, it is difficult to trace usage of small orthographic differences to see the popularity of each over time, but there are clear comparisons with the term ’email,’ which started off with the hyphen (e-mail) but is now normally simply spelled email as it has become the standard form of communication over the postal system. In early discussions around ebooks, a hyphen similarly marked the emergent form as alien and distinct from its printed counterpart. Perhaps over time we will drop the hyphen and this ellipsis will demonstrate how the ebook has become embedded within contemporary culture, as it is possible to trace with email.

But this leaves the question of the third orthographic variation, ‘eBook.’ While this may look like a riff on Apple’s branding for the iPod and associated devices, it’s history goes back much further to the first generation of commercial ebook device, and the Rocket eBook in particular. A couple of other devices borrowed the orthography, and it appears to have caught on beyond the brand. Interesting, since the ebook revival in 2006, this orthographic convention has not been widely copied, perhaps due to the dominance of Apple with that kind of orthography. Given its awkwardness, particularly when using the word at the beginning of a sentence, perhaps it should be used only with reference to these historic devices.

Twitterbots: Reading Automata

June 9th, 2014 § Comments Off on Twitterbots: Reading Automata § permalink

One of Twitter’s unique selling points as a social network is its unerring focus on text. Even posting a picture or video independently generates a textual anchor for the media in the form of an URL. As a textual media, Twitter’s primary currency is reading. The politics of reading, and more typically, not reading, characterizes a user’s relationship with their audience of followers and beyond.

It turns out that some of Twitter’s most voracious readers are not human at all, but rather the range of artisan bots that have emerged in the last couple of years. While they vary in type dramatically (Tully Hansen’s taxonomy covers this territory well), at the most fundamental level these bots are reading machines. On a basic level, the tweets emerge from the bots reading, and enacting, their source algorithms, but their literacy extends beyond that. These bots do not generate material out of nothing but rather than read a variety of sources—Twitter searches, the dictionary, novels, ROM texts, headlines, and other assorted materials—and present their readings as new writings.

This sleight of hand, based upon the process of reading to write, is reminiscent of automata that have intrigued countless historical audiences. Through use of clockwork and other mechanisms, automata maintain an illusion of autonomy. Once the underlying mechanics have been figured out, the automata become either trivial or joyful for appreciating the underlying mechanics. Twitterbots garner similar reactions once the processes have been understood, although many make use of dynamic and timely reading sources to ensure proceedings do not become stale. Nonetheless, Adam Parrish’s @everyword, a project that undertook its name in alphabetic order, is probably the most popular bot, despite its stable reading material and its relative predictability ending (that was eventually subverted due to collating words starting with é after z).

@horse_ebooks, the most contentious, and previously beloved, of all Twitterbots has a special place in this analogy: the machine that was disappointingly all too human in the end. The grand reveal that the account which generated bizarrely poetic and uncommercial spam was in fact a human performance mirrors the trajectory of a human-run automata affectionately known as the mechanical turk (picked as a name for Amazon’s crowdsourcing service due to its namesake’s “artificial artificial intelligence”). The Turk appeared to be a brilliantly gifted automatic chess-player that was actually entirely controlled by a human hidden in a secret compartment behind the false clockwork. The performance that fueled @horse_ebooks’s final years equally represented a kind of reverse Turing Test, whereby a human attempted to appropriate the linguistic tics of a bot.

These reading automata become much more interesting when considering the ways in which they challenge our notions of reading. Take, for example, Mark Sample’s Station 51000 (@_LostBuoy_), which plays with the tensions of reading on various levels currently being teased out in humanities research.  The bot mixes a reading of sections of Moby Dick with live data from the unmoored buoy classified as station 51000. Despite the specificity of location, the buoy still transmits a range of maritime data. The mash-up of a single, fixed, canonical work of literature with an erratic stream of nautical data goes beyond a comical clash of high culture, low culture and data—it reflects upon digital methodologies of reading that have emerged in recent decades including the use of “big data.” Of course, I’m not the first one to notice this, and the trend in Twitterbots more generally:

As automata, it is not up to these bots to make aesthetic decisions, as evidenced by Station 51000‘s mixture of the literary and real-time feed. Instead they can be used to push the limits of what reading means, and occasionally make us smile or laugh.

Call for Papers: New Sites of Worship [SHARP 2014]

October 22nd, 2013 § Comments Off on Call for Papers: New Sites of Worship [SHARP 2014] § permalink

Next year’s SHARP conference in Antwerp (17-21 September 2014) has the central theme of ‘Religions of the Book’. I would be interested in submitting a proposal for a session on ‘New Sites of Worship’ and invite anyone interested in this theme to join this session.

The rise of new social networks and websites both general (e.g. Facebook, Twitter, Tumblr) and those geared towards reading (e.g. Goodreads, Shelfari, LibraryThing) have led to ‘new sites of worship’ for fandom of literary authors. Users have populated these sites to discuss their favourite authors and books. Occasionally this discourse has become out-of-control and fandom has become fanatical and discussion of the literary turns into worship.

The proposed session will explore the traces of rabid fandom online including but limited to role-play, interaction with authors, obsession and misuses of social media. Please send a short abstract (400 words) to Simon Rowberry ( by Thursday 28 November if you want to participate in this session.

The Public and Private Nabokov

October 18th, 2013 § Comments Off on The Public and Private Nabokov § permalink

It is well known that Nabokov projected a persona in his rare public statements and interviews. He used such occasions to stamp his authority on his texts and to preserve the myth of a solitary genius who was not fond of many other authors. It is unsurprising that his correspondence reveals a different, more personable character. Make no mistake, Nabokov-as-public-figure appears in some letters to publishers and authors as he denounces second-rate authors and those who introduce errors into his works!

One of the many examples of this discrepancy can be seen in Nabokov’s mentions of the typewriter in his correspondence and published interviews. Nabokov’s composition method after 1941 relied on index cards and pencils, a method he had transferred from his research into butterflies. When the time came to type up these index cards, he left the typing to his wife, Véra. Although this offers no direct evidence that Vladimir himself could not type, in personal correspondence to James Laughlin, an early American publisher of Nabokov’s novels, in November 1942, Nabokov admitted parenthetically that “I cannot type.” (Selected Letters, 43)

Three years later, however, he included a holograph to a typed letter to Katharine White: “This is the first letter I have typed out myself in my life. Took me 28 minutes but came out beautifully.” (SL, 54) Here we can see an apprentice’s pride and an appreciation of the aesthetic value of typing, especially through the struggle. This moment of glory is in stark contrast to Nabokov’s public pronouncement years later (1963) in a Playboy  interview to Alvin Toffler: “Yes I never learned to type.” (Strong Opinions, 29). Undoubtedly after his initial struggle, he did not take over all his typing duties, but under such a public statement lies a much more complex private engagement with the technology he dismisses.

Do we need an E-lit Short Title Catalogue?

September 3rd, 2013 § Comments Off on Do we need an E-lit Short Title Catalogue? § permalink

I’ve spent the day successfully viewing the copy of William Gibson, Dennis Ashbaugh and Kevin Begos Jr.’s Agrippa: book of the dead at the National Art Library (once I’ve gathered my thoughts and reorganized my notes – I’ll write up my findings regarding the differences between this and other editions of the text) and being unsuccessful in an attempt to see Mark Hansen and Ben Rubin’s Listening Post currently installed next door at the Science Museum. These are two works of hybrid physical-digital literature that can’t easily be replicated and distributed widely among the community, but equally are incredible works that require physical interaction in order to fully appreciate them. Certainly, Agrippa was a completely different physical experience than I had imagined from any description I had read.

Along with other important works of digital literature, there are very few, if more than one, functional copies of these two artefacts. Preservation has been a pressing need within the community as the wealth of recent literature would suggest. In recent years there have been a promising number of acquisitions of important digital literature author’s papers in libraries and a number of laboratories doing important work such as the Media Archaeology Lab, UCSC’s large collection of Japanese videogames, MIT’s Trope Tank, etc (As an aside, I don’t know if there is any such lab in the UK yet?). The currently on-going NEH Office of Digital Humanities project, “Pathfinders” has been instrumental in the promoting preservation through documentary and “Let’s Play” practice. This is useful work for the institutionalization of digital literature but access remains an issue.

If some of these works are still executable in their original form, or preserved in some other form if they take are very physical (e.g. Where can I access works built to run with CAVE?), how do we find out where these places are? Catalogues of digital works are being built: ELMCIP, Electronic Literature Directory, I ♥ E-Poetry, and so forth, but a common finding aid for where to travel to when you actually want to interact with these works is still missing.

A useful analogy for what might solve this problem can be found in one of the great undertakings of bibliographers in the last 150 years: the Short Title Catalogue (STC). The STC was a monumental undertaking to document the existing copies of books printed between 1475 and 1640 in the British Isles and notes the libraries that held the titles. Rather than noting all books that could have existed, it focused instead on those that survived. This allowed researchers to find copies of these rare materials. Surely many of digital incunabula deserve similar treatment?

Now, the short title aspect of the work is no longer important as digital bibliographies allow for longer records, and the STC itself is now online with much more metadata than the crammed references of the printed original. Equally, the records of locations has changed since the revised edition, and with issues such as the Senate House Library’s proposed sales of their Shakespeare Folios, the catalogue remains in flux and libraries may no longer have the same holdings. The STC instead offers a starting point.

The finding aid aspect of the STC would be of great use for scholars interested in the material aspects of digital literature. If we had a centralized database offering the locations and any system requirements based upon the limitations, this could aid access to the original artefacts and enrich our understanding of early digital literature. Setting up such a database would be another step towards legitimizing the form as it would once more demonstrate the importance of the material form as something worth traveling for, rather than relying on the description of those earlier scholars lucky enough to interact with some of the more elusive and ephemeral works.

Error, Failure and Nabotov

August 26th, 2013 § Comments Off on Error, Failure and Nabotov § permalink

I’ve recently created my first Twitterbot, Vladimir Nabotov, by appropriating the code from Zach Whalen’s brilliant Pelafina Lièvre. The bot itself is fairly derivative and works on the principle of Markov chains, but the source material is esoteric enough to deserve further comment.

The bot draws its source material from bootleg versions of Nabokov’s works available online with all the errors, typographical quirks and other peculiarities. Using Nabokov as source material is also problematized by Nabokov’s frequent code-switching between English, Russian and French.

Automatically generating tweets can often lead to failure, as the source might not be an interesting section of the text or the bot might post something offensive. Nabotov embraces this failure with the “dirty” source material. This can lead to some familiar motifs being recast in new forms:

Equally, there can be some interesting results with non-English languages, such as this snippet of Zembla:

These results would be desirable in a usual twitterbot, but the missteps reveal problems with the texts readers are most likely to encounter if they are unwilling to purchase a copy of a carefully produced book. This can take the form of weird glyphs, errant punctuation (Nabokov loves his parentheses!) and conjoined words:

Failure is often banded around as an important aspect of the digital humanities, and twitterbots certainly allow us to understand the potential failure (and harm) of generative writing, but equally, this failure can be channeled to examine other types of failure too.

More on “Lolita is Famous, Not I”

August 18th, 2013 § Comments Off on More on “Lolita is Famous, Not I” § permalink

Inspired by Juan Martinez’s excellent visualization comparing mentions of Lolita to Nabokov in the Google Books corpus (and the 55th anniversairy of Lolita‘s publication in America today!), I thought I would delve a little deeper into the Google Books n-gram data available and test the claims using the raw data (which admittedly is very rough and contains a lot of duplicates, but others a rough estimate of volumes current digitized by Google).

Looking at raw 2-grams (that is, all instances of “Lolita” and “Nabokov” with one word after it referenced in the most up to date data sets available from Google), there are two figures available for analysis: the number of times a word is mentioned in the complete corpus, and the volume of texts that reference the word at least once.

The online viewer does not distinguish between the two categories and is case-sensitive, so the raw data gives us more data to play with.
There is a clear difference between the total references and the numbers of books using the words, as one book may repeatedly mention Nabokov but never Lolita, popularity should be mapped by the number of texts using the word, rather than the total references. Such a graph still shows that Nabokov is more popular for most years other than the publication of Lolita, as Juan noted, and the period from 1966-1969 for some reason.
Click for large
The average number of references per book further asserts Nabokov’s enduring popularity. Since Nabokov is mentioned on average more times than Lolita, not only is he discussed by a broader range of texts, but they are engaging with him as a subject in a deeper manner than Lolita.
Click for large

There were also some interesting phrases that came out of the data, with their first data of use next to them:

  • Lolita sunglasses (1976)
  • Lolita complex (1959)
  • Lolita Delores (1962)
  • Nabokov Festival (1985)
  • Nabokov studies (1967)
  • Nabokov archive (1989)

A Guerrilla Digital Humanities

August 14th, 2013 § Comments Off on A Guerrilla Digital Humanities § permalink

N.B. This is a revised version of my presentation at Digital humanities 2013.

Mark Sample’s “Unseen and Unremarked Upon” laments the lost years of digital humanities research from texts that exist on the wrong side of the copyright divide. Sample constructs a satirical counterfactual history of digital humanities studies of Don DeLellio, an author who he posits is central to the digital humanities canon. (Sample) This concern is serious, since there is much to be gained from critical projects examining the mirroring of computer culture’s rise in twentieth century fiction.

The B© distinction (Kirschenbaum) has recently been supplanted by the dawn of ADMCA (Anno Digital Millennium Copyright Act). Conceptualized in 1995, around the time of mainstream adoption of the Web, and passed as legislation in 1998, the peak of the Dot.Com boom (Hilderbrand 103), ADMCA awkwardly heralded the age of digital copyright in an antiquated and hybrid manner. Although the DMCA has proved controversial, it has also allowed for guerrilla activity online. Since the owners of servers are not directly responsible for the material uploaded to them under the safe-harbour provisions, risks can be taken with the distribution of content before take-down, exemplified by user-generated content on YouTube. (Hilderbrand 242) This has led to a detection arms race where users flip remix videos and pitch-shift vocals in order to avoid algorithmic detection.

The countercultural space the DMCA safe-harbour provisions afford YouTube reflects the older phenomenon of Napster, another platform that permanently altered modes of transmission. Napster demonstrates the tensions between two histories of the success of a format (in this case, the MP3): the corporate history that celebrates the coming of the Device, often coupled with a “Killer App,” and the guerrilla history that celebrates the counterculture that developed around a more democratic form of sharing media. Lucas Hilderbrand suggests often the countercultural history predates the commercial one and is a vital part of a medium’s success (Hilderbrand 78). Jonathan Sterne has framed this as the Napster versus iPod moments (Sterne).

Within the same debate, Kenneth Goldsmith has argued that we have not yet had a Napster moment for literature, as texts are not bootlegged to the same degree as music and video (Goldsmith 82). Goldsmith, such a pivotal player in pushing the boundaries of copyright through his conceptual poetry – particularly in its latest manifestation in the Printing the Internet project – highlights a useful distinction in understanding how online media sharing differs from the dreaded term, piracy. If Adrian Johns introduction to Piracy reaches its apex with the revelation of a pirate corporation mirroring the entire structure of NEC, producing counterfeit goods purely for profit (Johns 1), bootlegging is the best way of describing online sharing activity other than a few outliers. This may appear to be an ideological choice, akin to the terrorist-freedom fighter binary, but as Hilderbrand suggests, “bootlegging functions to fill in the gaps of market failure… archival omissions… and personal collections.” (Hilderbrand 22–23)

The bootlegging of literature is starting to take place with literature in recent years. William Gibson’s aphorism “the streets finds its own uses for things” rings true. Both amateurs and digital humanists have in recent years been undertaking guerrilla digital humanities, a phenomenon made possible in ADMCA. Guerrilla digital humanities can be broadly defined as the application of methods associated with digital humanities to texts still protected by copyright laws. As these projects must be covert and rarely are affiliated with institutions or grant funding, this underground activity can be equated to the Napster moment of contemporary literature.

There is an important caveat to this activity. Aaron Swartz’s mass download of public domain journal articles from JSTOR through MIT’s network and subsequent arrest ended tragically. Swartz’s operation fit well within the bounds of bootlegging given the articles were in the public domain and the method was an exemplar of guerrilla activity. The case demonstrates a divide that still exists and must be carefully toed by guerrilla digital humanists. Although it is unclear what Swartz intended to do with the articles, it is clear that a facsimile or representation of the original work is out-of-bounds in most cases.

Deformance (Samuels & McGann) such as visualization, mapping and other creative endeavours moves away from this. We are unlikely to be able to explore free scholarly critical editions of Salman Rushdie’s oeuvre in the near future, but deformative interpretations built on digital tools as guerrilla movements are likely to flourish. This is because “copyright actually insists on aesthetics and recognizes the importance of tangibility or interfaces: the law does not protect ideas, only expressions in fixed forms.” (Hilderbrand 82) New expressions in new forms at least one deformance removed from the original have the best chance of surviving.

Swartz’s case serves as a warning though, that guerrilla activity may come at the risk of the openness of the community. Digital humanities thrives on sharing, resharing and forking, but to undertake such work, guerrilla digital humanists must conduct their methods under a sleight of hand to avoid detection. The final result must be published in a samizdat culture, as official channels will be too public.

The blueprint for this form of activity is in uncreative or conceptual writing, an activity most frequently undertaken in print. In this framework, two of the most likely candidates for the Napster moment of text are creative adaptations of literature online. Fan fiction demonstrates the manipulability of thematic devices, while literary Twitterbots explore literature as potential language in oulipian ecstacy. Horse_ebooks provides the template for potential literature on Twitter and many anonymous literary adaptations have appeared of varying quality. The pattern, however, is obvious: Twitter is to text as YouTube is to video. Twitter provides a space for relatively low-risk experimentation in guerrilla activity.

In more recent years, Twitterbot auteurs have emerged, with Darius Kazemi and Mark Sample probably being the two most prolific. Generative versions of pithy William Carlos Williams’s poems, Bruno Latour meets swag culture, and My Favourite Things now explore the potential of Twitter literature. The constraint of 140 characters is oulipian on its own, but when connected with selective use of remixed source material, these bots produce mixed results but with some glorious moments of serendipity closing in on what appears to be self-consciousness. Equally, as Ian Bogost states in Alien Phenomenology, carpentry of this unfiltered sort can often lead to offensive moments unless strict filters regulate the underlying code. (Bogost 97–100) Post-publication filtering through retweets enables the cream of the crop to be shared with a wider audience.

Sample has also engaged in more extensive guerrilla activity online. House of Leaves Of Grass (HoLoG) works through several layers of ADMCA deformations. HoLoG builds on Nick Montfort and Stephanie Strickland’s Sea and Spar Between, a textual mash-up of Emily Dickinson’s poetry with Moby Dick, instantly recognisable due to the clash in typographic and lexical content, further remediated through a schizophrenic interface that disrupts the reading experience. Sample’s guerrilla operation is to transpose the subject of remix to Walt Whitman’s Leaves of Grass – yes, freely available in the public domain – with Mark Danielewski’s House of Leaves, a text published in 2000 and still subject to copyright law. This remix survives ADMCA as long as it remains a guerrilla publication.

Guerrilla digital humanities is not limited to remediation and editions, Joyce scholarship is fraught with examples of those who have attempted unsuccessfully to explore digital applications of Joyce. FWEET approaches the problem differently by re-appropriating Finnegans Wake as a database, turning every line into a series of possibilities and connections other parts of the text in various different forms.

Playing on Lev Manovich’s Narrative-Database dichotomy (Manovich), FWEET disrupts the text on a narrative level and requires the reader to engage with the database connections that emerge within the text. Although this makes it difficult to read the text in its original form, it is possible to extract the text. FWEET works as a guerrilla publication on two levels. Firstly, it remediates a text whose copyright situation is still providing to be problematic, but more interestingly, most of the annotations are sourced from scholarly texts still protected by copyright.

A further interesting example can be found in the work of Vladimir Nabokov. The digital humanities link to Nabokov runs back to Ted Nelson’s apocryphal demonstration of Pale Fire for the Hypertext Editing System back in the late 1960s, for which Nelson received permission from G. P. Putnan & Sons, Nabokov’s US publisher at the time. In more recent years, Brian Boyd’s Ada Online has offered an annotated version of Nabokov’s later work, Ada, or Ardor, for free online with permission of Dmitri Nabokov. Dmitri appeared to be open to digital editions, suggesting that his father’s last unfinished novel would work well as a digital edition.

 It is hard to tell how far this could go, but if we don’t ask, how do we know what the answer is? If we don’t work within the permitted boundaries of the DMCA, how can we know what is possible? As Bielstein states, “permissions are hard to avoid, but in principle you don’t want to ask permission” – asking permissions sets a precedent (Bielstein 10). Working towards integrating contemporary texts into the digital humanities can continue in the covert way as guerrilla digital humanities or it could equally push for permissions and set a precedent for what creative interpretations of literary texts are possible under fair-use. The only way forward is to test the waters.

Works Cited

Bielstein, Susan M. Permissions: A Survival Guide, Blunt Talk About Art as Intellectual Property. Chicago: University of Chicago Press, 2006. Print.

Bogost, Ian. Alien Phenomenology, or What It’s Like to Be a Thing. Minneapolis: University of Minnesota Press, 2012. Print.

Goldsmith, Kenneth. Uncreative Writing. New York: Columbia University Press, 2011. Print.

Hilderbrand, Lucas. Inherent Vice: Bootleg Histories of Videotape and Copyright. Durham and London: Duke University Press, 2009. Print.

Johns, Adrian. Piracy: The Intellectual Property Wars from Gutenberg to Gates. Chicago: The University of Chicago Press, 2009. Print.

Kirschenbaum, Matthew G. “The .txtual Condition: Digital Humanities, Born-Digital Archives, and the Future Literary.” Digital Humanities Quarterly 7.1 (2013) : n. pag. . <>.

Manovich, Lev. The Language of New Media. Cambridge and London: The MIT Press, 2001. Print.

Sample, Mark. “Unseen and Unremarked On: Don DeLillo and the Failure of the Digital Humanities.” Debates in the Digital Humanities. Ed. Matthew K. Gold. Minneapolis and London: University of Minnesota Press, 2012. 187–201. Print.

Samuels, Lisa, and Jerome J. McGann. “Deformance and Interpretation.” New Literary History 30.1 (1999) : 25–56. Print.

Sterne, Jonathan. MP3: The Meaning of a Format. Durham: Duke University Press, 2012. Print.