I’ve recently created my first Twitterbot, Vladimir Nabotov, by appropriating the code from Zach Whalen’s brilliant Pelafina Lièvre. The bot itself is fairly derivative and works on the principle of Markov chains, but the source material is esoteric enough to deserve further comment.

The bot draws its source material from bootleg versions of Nabokov’s works available online with all the errors, typographical quirks and other peculiarities. Using Nabokov as source material is also problematized by Nabokov’s frequent code-switching between English, Russian and French.

Automatically generating tweets can often lead to failure, as the source might not be an interesting section of the text or the bot might post something offensive. Nabotov embraces this failure with the “dirty” source material. This can lead to some familiar motifs being recast in new forms:

Equally, there can be some interesting results with non-English languages, such as this snippet of Zembla:

These results would be desirable in a usual twitterbot, but the missteps reveal problems with the texts readers are most likely to encounter if they are unwilling to purchase a copy of a carefully produced book. This can take the form of weird glyphs, errant punctuation (Nabokov loves his parentheses!) and conjoined words:

Failure is often banded around as an important aspect of the digital humanities, and twitterbots certainly allow us to understand the potential failure (and harm) of generative writing, but equally, this failure can be channeled to examine other types of failure too.

Inspired by Juan Martinez’s excellent visualization comparing mentions of Lolita to Nabokov in the Google Books corpus (and the 55th anniversairy of Lolita‘s publication in America today!), I thought I would delve a little deeper into the Google Books n-gram data available and test the claims using the raw data (which admittedly is very rough and contains a lot of duplicates, but others a rough estimate of volumes current digitized by Google).

Looking at raw 2-grams (that is, all instances of “Lolita” and “Nabokov” with one word after it referenced in the most up to date data sets available from Google), there are two figures available for analysis: the number of times a word is mentioned in the complete corpus, and the volume of texts that reference the word at least once.

The online viewer does not distinguish between the two categories and is case-sensitive, so the raw data gives us more data to play with.
There is a clear difference between the total references and the numbers of books using the words, as one book may repeatedly mention Nabokov but never Lolita, popularity should be mapped by the number of texts using the word, rather than the total references. Such a graph still shows that Nabokov is more popular for most years other than the publication of Lolita, as Juan noted, and the period from 1966-1969 for some reason.
Click for large
The average number of references per book further asserts Nabokov’s enduring popularity. Since Nabokov is mentioned on average more times than Lolita, not only is he discussed by a broader range of texts, but they are engaging with him as a subject in a deeper manner than Lolita.
Click for large

There were also some interesting phrases that came out of the data, with their first data of use next to them:

  • Lolita sunglasses (1976)
  • Lolita complex (1959)
  • Lolita Delores (1962)
  • Nabokov Festival (1985)
  • Nabokov studies (1967)
  • Nabokov archive (1989)

N.B. This is a revised version of my presentation at Digital humanities 2013.

Mark Sample’s “Unseen and Unremarked Upon” laments the lost years of digital humanities research from texts that exist on the wrong side of the copyright divide. Sample constructs a satirical counterfactual history of digital humanities studies of Don DeLellio, an author who he posits is central to the digital humanities canon. (Sample) This concern is serious, since there is much to be gained from critical projects examining the mirroring of computer culture’s rise in twentieth century fiction.

The B© distinction (Kirschenbaum) has recently been supplanted by the dawn of ADMCA (Anno Digital Millennium Copyright Act). Conceptualized in 1995, around the time of mainstream adoption of the Web, and passed as legislation in 1998, the peak of the Dot.Com boom (Hilderbrand 103), ADMCA awkwardly heralded the age of digital copyright in an antiquated and hybrid manner. Although the DMCA has proved controversial, it has also allowed for guerrilla activity online. Since the owners of servers are not directly responsible for the material uploaded to them under the safe-harbour provisions, risks can be taken with the distribution of content before take-down, exemplified by user-generated content on YouTube. (Hilderbrand 242) This has led to a detection arms race where users flip remix videos and pitch-shift vocals in order to avoid algorithmic detection.

The countercultural space the DMCA safe-harbour provisions afford YouTube reflects the older phenomenon of Napster, another platform that permanently altered modes of transmission. Napster demonstrates the tensions between two histories of the success of a format (in this case, the MP3): the corporate history that celebrates the coming of the Device, often coupled with a “Killer App,” and the guerrilla history that celebrates the counterculture that developed around a more democratic form of sharing media. Lucas Hilderbrand suggests often the countercultural history predates the commercial one and is a vital part of a medium’s success (Hilderbrand 78). Jonathan Sterne has framed this as the Napster versus iPod moments (Sterne).

Within the same debate, Kenneth Goldsmith has argued that we have not yet had a Napster moment for literature, as texts are not bootlegged to the same degree as music and video (Goldsmith 82). Goldsmith, such a pivotal player in pushing the boundaries of copyright through his conceptual poetry – particularly in its latest manifestation in the Printing the Internet project – highlights a useful distinction in understanding how online media sharing differs from the dreaded term, piracy. If Adrian Johns introduction to Piracy reaches its apex with the revelation of a pirate corporation mirroring the entire structure of NEC, producing counterfeit goods purely for profit (Johns 1), bootlegging is the best way of describing online sharing activity other than a few outliers. This may appear to be an ideological choice, akin to the terrorist-freedom fighter binary, but as Hilderbrand suggests, “bootlegging functions to fill in the gaps of market failure… archival omissions… and personal collections.” (Hilderbrand 22–23)

The bootlegging of literature is starting to take place with literature in recent years. William Gibson’s aphorism “the streets finds its own uses for things” rings true. Both amateurs and digital humanists have in recent years been undertaking guerrilla digital humanities, a phenomenon made possible in ADMCA. Guerrilla digital humanities can be broadly defined as the application of methods associated with digital humanities to texts still protected by copyright laws. As these projects must be covert and rarely are affiliated with institutions or grant funding, this underground activity can be equated to the Napster moment of contemporary literature.

There is an important caveat to this activity. Aaron Swartz’s mass download of public domain journal articles from JSTOR through MIT’s network and subsequent arrest ended tragically. Swartz’s operation fit well within the bounds of bootlegging given the articles were in the public domain and the method was an exemplar of guerrilla activity. The case demonstrates a divide that still exists and must be carefully toed by guerrilla digital humanists. Although it is unclear what Swartz intended to do with the articles, it is clear that a facsimile or representation of the original work is out-of-bounds in most cases.

Deformance (Samuels & McGann) such as visualization, mapping and other creative endeavours moves away from this. We are unlikely to be able to explore free scholarly critical editions of Salman Rushdie’s oeuvre in the near future, but deformative interpretations built on digital tools as guerrilla movements are likely to flourish. This is because “copyright actually insists on aesthetics and recognizes the importance of tangibility or interfaces: the law does not protect ideas, only expressions in fixed forms.” (Hilderbrand 82) New expressions in new forms at least one deformance removed from the original have the best chance of surviving.

Swartz’s case serves as a warning though, that guerrilla activity may come at the risk of the openness of the community. Digital humanities thrives on sharing, resharing and forking, but to undertake such work, guerrilla digital humanists must conduct their methods under a sleight of hand to avoid detection. The final result must be published in a samizdat culture, as official channels will be too public.

The blueprint for this form of activity is in uncreative or conceptual writing, an activity most frequently undertaken in print. In this framework, two of the most likely candidates for the Napster moment of text are creative adaptations of literature online. Fan fiction demonstrates the manipulability of thematic devices, while literary Twitterbots explore literature as potential language in oulipian ecstacy. Horse_ebooks provides the template for potential literature on Twitter and many anonymous literary adaptations have appeared of varying quality. The pattern, however, is obvious: Twitter is to text as YouTube is to video. Twitter provides a space for relatively low-risk experimentation in guerrilla activity.

In more recent years, Twitterbot auteurs have emerged, with Darius Kazemi and Mark Sample probably being the two most prolific. Generative versions of pithy William Carlos Williams’s poems, Bruno Latour meets swag culture, and My Favourite Things now explore the potential of Twitter literature. The constraint of 140 characters is oulipian on its own, but when connected with selective use of remixed source material, these bots produce mixed results but with some glorious moments of serendipity closing in on what appears to be self-consciousness. Equally, as Ian Bogost states in Alien Phenomenology, carpentry of this unfiltered sort can often lead to offensive moments unless strict filters regulate the underlying code. (Bogost 97–100) Post-publication filtering through retweets enables the cream of the crop to be shared with a wider audience.

Sample has also engaged in more extensive guerrilla activity online. House of Leaves Of Grass (HoLoG) works through several layers of ADMCA deformations. HoLoG builds on Nick Montfort and Stephanie Strickland’s Sea and Spar Between, a textual mash-up of Emily Dickinson’s poetry with Moby Dick, instantly recognisable due to the clash in typographic and lexical content, further remediated through a schizophrenic interface that disrupts the reading experience. Sample’s guerrilla operation is to transpose the subject of remix to Walt Whitman’s Leaves of Grass – yes, freely available in the public domain – with Mark Danielewski’s House of Leaves, a text published in 2000 and still subject to copyright law. This remix survives ADMCA as long as it remains a guerrilla publication.

Guerrilla digital humanities is not limited to remediation and editions, Joyce scholarship is fraught with examples of those who have attempted unsuccessfully to explore digital applications of Joyce. FWEET approaches the problem differently by re-appropriating Finnegans Wake as a database, turning every line into a series of possibilities and connections other parts of the text in various different forms.

Playing on Lev Manovich’s Narrative-Database dichotomy (Manovich), FWEET disrupts the text on a narrative level and requires the reader to engage with the database connections that emerge within the text. Although this makes it difficult to read the text in its original form, it is possible to extract the text. FWEET works as a guerrilla publication on two levels. Firstly, it remediates a text whose copyright situation is still providing to be problematic, but more interestingly, most of the annotations are sourced from scholarly texts still protected by copyright.

A further interesting example can be found in the work of Vladimir Nabokov. The digital humanities link to Nabokov runs back to Ted Nelson’s apocryphal demonstration of Pale Fire for the Hypertext Editing System back in the late 1960s, for which Nelson received permission from G. P. Putnan & Sons, Nabokov’s US publisher at the time. In more recent years, Brian Boyd’s Ada Online has offered an annotated version of Nabokov’s later work, Ada, or Ardor, for free online with permission of Dmitri Nabokov. Dmitri appeared to be open to digital editions, suggesting that his father’s last unfinished novel would work well as a digital edition.

 It is hard to tell how far this could go, but if we don’t ask, how do we know what the answer is? If we don’t work within the permitted boundaries of the DMCA, how can we know what is possible? As Bielstein states, “permissions are hard to avoid, but in principle you don’t want to ask permission” – asking permissions sets a precedent (Bielstein 10). Working towards integrating contemporary texts into the digital humanities can continue in the covert way as guerrilla digital humanities or it could equally push for permissions and set a precedent for what creative interpretations of literary texts are possible under fair-use. The only way forward is to test the waters.

In a conversation with René Alladaye about his brilliant new book (The Darker Side of Pale Fire – the best introduction to Pale Fire currently available, although it’s only available through currently) at the recent Nabokov & France conference, the question of translating the Index came up. In most languages, this is not a problem, because the final entry, “Zembla, a distant northern land.” (PF, 315), which works as a fitting conclusion to the narrative, will naturally come last as “Z” is the last letter of the alphabet.

In non-Latin scripts, this is more problematic, most prominently in Nabokov’s native tongue, Russian, where “З” or “Z” is ninth of 33 characters. Véra Nabokov’s translation of Pale Fire for Ardis Press works round this by rephrasing the entry:
ЯЧЕЙКА яшмы, Зембля, далекая северная страна.
[Orbicle of jasp, Zembla, a far northern country]
Я [ya] is the last letter of the Russian alphabet. This raises a further question of what “orbicle of jasp” – a quotation from line 558, “Terra the Fair, an orbicle of jasp” (PF, 54) – is doing in front of Zembla to retain its position? I don’t have any immediate answers, but a deeper analysis of the differences between the English and Russian index will surely help. Ultimately, the flow of the narrative is more important than the index’s order, indicating the importance in the Nabokovs’s collective mind of having the Zembla entry of the index close the text.

Thanks to Marina Savina for helping translating the Russian and finding the reference to “orbicle of jasp” in the poem.

V. Nabokov. 1962. Pale Fire. New York: GP Putnam’s & Son.
V. Nabokov. 1983. Бледныĭ огонь [Pale Fire]. trans. by Véra Nabokov. Ann Arbor: Ardis Press.