How Complex the Mathematic!

"Jan Hambourg and Willa Cather swimming in the south of France," 1920. Duquesne University mathematician and computer scientist Patrick Juola, who devised the JGAAP authorship attribution tool, is dedicated to making complex text analysis techniques accessible to scholars outside of statistics departments.  In a 2008 Digital Humanities conference abstract entitled “Authorship Attribution for the Rest of Us,” he conceded that “the statistics necessary for performing [attribution] can be onerous and mathematically formidable.  For example, a commonly used analysis method, Principle Component Analysis (PCA), requires the calculation of ‘the eigenvectors of the covariance matrix with the largest eigenvalues,’ a phrase not easily distinguishable from Star Trek technobabble” (250).  JGAAP seeks to “hide this complexity from the user” with a friendly framework and interface.

In an abstract submitted to the 2009 Digital Humanities conference, Juola reiterated that the main goal of JGAAP is to make it accessible for “general-purpose use,” which does not require “a high degree of expertise from the user” (357).   But how is an inexpert user expected to interpret and contextualize the results of JGAAP analyses, which arrive as neat assignments presented in the graphical interface, backed by multiplying, dizzying lines of texts and numbers in the Terminal window?

Over the past several weeks, I have used JGAAP to predict the authorship of several unsigned journalism pieces that appeared in Home Monthly circa 1896-1897, in an effort to ascertain whether Willa Cather might have written them.  I have learned, however, that the tool’s elusive promise of proffering statistics to the masses remains to be perfected.  In his writings on authorship attribution, Juola points out that even those comfortable with the mathematics of computational authorship attribution may not be able to explain their effectiveness (or lack thereof).  In a hundred-page discourse that appeared in 2006, he writes: “Understandability remains a major problem.  One reason that PCA [Principle Component Analysis] and similar algorithms remain popular, despite their modest effectiveness, is that the reasons for their decisions can easily be articulated.  …For researchers more interested in the ‘why’ than the ‘what,’ this ease of explanation is a key feature…” (321).

JGAAP’s user-friendly interface, then, belies the range and complexity of the predictions it serves up to the attributionist.  This past week, I have been talking with the enduringly helpful Brian Pytlik Zillig, Kay Walter, and Andy Jewell about useful ways to retire my foray into authorship attribution.  I’ve concluded that, rather than emerging with a definitive response to the question of authorship surrounding the Home Monthly articles, I will examine more traditional methods of attribution and review predominant computational methods.  Additionally, I will offer my thoughts on the viability of current computational methods and their potential for use in the Center.

A final note for the week: as I was drafting this post, and thinking about Juola’s characterization of computational attribution as a “forensic discipline,” a sort of literary lie detector that might be used to fish out plagiarists or the authors of ransom notes, I noticed a June press release from CLIR, reporting that they had received a Mellon grant “to investigate the possible relevance of declassified tools developed by the intelligence community to humanistic scholarship.”  I’m looking forward to reading the CLIR report when it is released, and learning more about this forensic interplay.

(Image: “Jan Hambourg and Willa Cather swimming in the south of France,” 1920, Philip L. and Helen Cather Southwick Collection, Archives and Special Collections, University of Nebraska-Lincoln.)

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s