Do digital humanists need data visualization?
The answer is yes — but. To elaborate, let me tell you a story about my experience with Mae as we worked on this project last summer. As we were preparing to present the results of our preliminary topic model of British books (1757-1795), we ran into a roadblock. We were collaborating with a local ed-tech startup, Windrush, who were putting together a dynamic, year-by-year topic network graph based on our data. They would set it up on their server; then Mae would show it off when she presented our work for Skidmore’s Faculty-Student Summer Research program. That was the plan. But the date for the presentation was coming up, and Windrush was swamped with other projects; they hadn’t been able to update their servers. I was beginning to worry that we might not have any compelling visuals to project behind Mae as she spoke! For a brief moment, we were stuck with the data itself.
This was initially discomforting. A lot of people working on DH projects have emphasized visualization — understandably so. Without speaking for others in the field, I will admit to being very excited that literary scholars might now have recourse to colorful pie charts and scatterplots to point at while talking. I had already used Gephi to put together a static graph of the model, and we had some ho-hum line charts, but I didn’t yet know enough about d3 to do anything really visually interesting on my own. We were also in that awkward forty-eight-hour period before the presentation — too little time to do anything ambitious — too much time to sit and do nothing.
My grouchiness manifested itself in a sour-grapes rant. “Who cares about visualization anyway? What a bunch of superficial window-dressing it all is. We don’t need to visualize our data.” I didn’t actually believe this. But I decided to behave as if I did. So I wrote an anti-visualization script. It didn’t make any pretty pictures; instead, it took the same measurements we used to create the topic graphs, and used them to print out lists of books.
This turned out to be more exciting than it might sound. Look at this graph:
The edges are weighted; topics that appear together more frequently in particular books are connected by thicker edges.1 So for example, that link between Character and International Affairs? That’s dark because lots of individual books seem to talk about both of those topics together.
Now why is this interesting? As I look at this graph and ask myself that question, I immediately think that it’s interesting because it says something about how people in the eighteenth century were thinking about Character and International Affairs — it gives us a picture of this little corner of eighteenth-century readers’ and writers’ discursive universe. But that’s a very abstract idea. What’s a discursive universe? As soon as I started printing out lists of books, it seemed interesting for a completely different reason. The script I wrote printed out the top fifty books that contributed to that edge between Character and International Affairs — the books in which they appeared most often together. Suddenly I wasn’t looking at a discursive universe anymore; I was looking at books — representations of books, yes, but representations of physical books, not abstract concepts floating in cultural space. That thick edge between the two topics had become a list of things.
As soon as I started thinking about those edges as being made of books, I realized that this graph has a physical interpretation. Each edge on this graph roughly represents the probability that if you, an eighteenth-century reader, had picked up — leafed through — read — reviewed — critiqued — disfigured — destroyed books about Character, you would also have been doing those things to books about International Affairs. And vice-versa.
When literary scholars talk about visualizations and statistical modeling, we seem to be talking about immaterial abstractions. We seem to be turning away from the material. But that’s not necessarily so. In fact, there are ways to conceptualize digital practices as things that allow us to engage, in even more thoughtful and theoretically informed ways, with things.
I sometimes worry that we obscure that fact when we emphasize visualization too heavily. Visualization allows us to take any kind of data and make it look like a thing. But it’s easy to forget whether the thing we then see is the thing we really ought to be looking at.
- For the maths-oriented—we used cosine similarity between topics represented as proportion-of-book vectors. ↩