Have you ever wondered about the structure of our discipline? How does it hang together? Most importantly, what would if feel like to destroy it? To drag it around like a rag doll? To tear it apart piece by piece? Click through and wonder no more…
Further down the page are some brightly colored balloons. But what are they and what does it all mean? Bear with me and I’ll explain. This visualization was born from a frustrated attempt to split the discipline of Classics into clean categories. My goal was to continue refining my meta-analysis (the ‘scoring engine’) for Bryn Mawr reviews, and see if there were any noticeable trends or differences when I grouped books by sub-discipline, like archaeology or philosophy. The trouble was that reliably separating books into those sub-disciplines proved rather difficult. WorldCat’s subject tags are incomplete, and almost never give a single answer. Almost every Classics book that carries the tag ‘literature’ also carries at least one for ‘history’ or ‘philosophy’, or both. This is not meant as criticism of WorldCat—in most cases, that is a wise choice. And that same trend toward interdisciplinarity made it difficult, bordering on impossible, for me to assign a single label to books so that I could train my computer to classify them.
Below is the result of an attempt to crack that problem from a different angle. I decided to look at the individual words in the titles of all 11897 books that were submitted to the BMCR for review over the last three decades (until September 2015). I broke those down into individual words in the title, and then recast them as a network, where the nodes (the circles) are the words themselves, and edges (the lines) connect any words that have appeared together in a title. The hope was that by visualizing the discipline in this way, I might be able to detect some underlying structure that would make books easier to categorize. This approach shows some promise for classification, but I wondered if there might not be some value in simply showing people the overall structure of our discipline.
At first glance, it’s a bit of a hairball. Hover over the circles to see what words they represent. The circles are sized according the total number of occurrences of a given word, and colored… well, I’ll leave that as a test. There are four colors, and they are assigned to words according to whether WorldCat would tend to label the word as an Archaeology, Literature, History, or Philosophy term. See if you can figure out which color corresponds to which label (it isn’t always obvious). The visualization is also interactive. Drag nodes around to get a sense of what they are tied to: double-click to destroy a node, and see how the network reshapes itself (yes, I know it’s 2016, but if you also want to be able to click and drag, you have to accept double-clicks. Mobile users, it’s ‘double-tap’ to you).
As an aside, this week I learned a hard lesson about making assumptions. I assumed that WordPress would allow me to embed interactive data visualizations, in addition to the static ones I usually show you. They do not. They seek and destroy “<script>” tags with extreme prejudice. To enjoy the interactivity, you’ll have to go here (it will open in a new window). Please do. Not being able to show you cool new things makes me sad. Also, seriously: this is your chance to pop Cicero like a balloon. Don’t miss it!
Network of Common Words in Classics
So, as viscerally satisfying as popping Cicero like a balloon might be, that isn’t the only reason to play with this. There can be practical benefits to understanding the structure of the discipline. Considering the field as a network of ideas can help you find and model an audience for your own work. You probably remember Marshall McLuhan’s famous dictum “the medium is the message”: it actually has a less-famous second half, “the audience is the content.” In scholarly publishing, the full quote is especially apt. People who write books about, for instance, Plato, also perforce read books about Plato. So if you want to find a bigger audience for your own book, or to see how big the audience might be, you can count up the values for all nodes that represent words in your own title. But no one writes a book or even an article about just a single idea. Any
always a collection, so people writing a book that touches on any part of that collection can also be expected to read your work.
Below is an area chart that I hope can help you understand some of the possibilities. It shows the 75 most common ideas (i.e., words) in Classics books over the last 25 years. In the default version, the circles’ radii are scaled according to simple frequency; if you click on the chart, it will switch to scaling them according to their ‘centrality’ within the network, specifically their eigenvector or Bonacich centrality.
Find Your Audience:
Frequency and Network Centrality of Common Words
All right: what on earth is that, and why should you care about it? In a network, ‘centrality’ is essentially a measure of how well-connected something is. People who have more friends can be said to be more central to a social network. Bonacich centrality is an attempt to measure not only how connected someone/something is, but how powerful it is within the network, or how much influence it has on the people/things connected to it. In our context, power means “For this idea, how likely is it that someone who is writing about an adjacent topic will also read a book about this?”
For example, let’s say that you want to write a book about democratic Athens (and that you followed the link above so that you can see the bubbles re-size themselves according to network centrality — these are only sized by raw frequency). This is already a strong choice of subject, since there are many books about Athens and hence a large audience. Now, what angle should you tackle this from, and what should you highlight? By measures of centrality, ‘Plato‘ punches far above his (reputedly quite substantial) weight, while ‘Euripides‘ punches below his and ‘women‘ completely disappear — much to the delight of Thucydides, who is probably just jealous because he doesn’t make the chart at all. So a book about Athens that features enough of Plato to put him in the title is more likely to be widely read than many other alternatives.
It is important to note again that the chart only shows the 75 most common words by raw frequency. It does not necessarily show the most powerful ones. For that, you’ll have to wait for another installment.
This post was written by the Library of Antiquity’s data analytics specialist, who still prefers to remain anonymous. The analysis and visualizations were made using the Natural Language ToolKit and d3.