Help with Research: Using Tesserae for intertextuality, part 4

In our previous posts, we’ve talked about what intertextuality means, how computers can help you locate it, the differences between intertextuality and discourse analysis, how Tesserae can help you with the latter in particular, and how to limit the number of results you get in a Tesserae search. In this post, we finish going through the advanced features and talk about Tesserae’s most innovative search type: sound analysis.

In the past post, we looked at about half of the advanced features.

Screen Shot 2018-03-08 at 7.22.25 PM

We’ll go through these in order.

The frequency basis is a companion and refinement to the stop words and scores menus. Unlike the stop words, which are so common that you tell the computer to ignore them, frequent words are common enough that they are less likely to be significant. But the computer doesn’t ignore them; instead, it gives matches with a high number of frequent words a lower score. But sometimes the computer is wrong! (Remember our example with his labor, hoc opus est?) Just because words are common doesn’t mean that they aren’t intertextual; you need to review them and make a judgment call.

The commonness of words is a spectrum: there aren’t a few words that are extremely common and then a bunch of words that are rare, but rather all words can be slotted in on a line of “common” to “uncommon”. Words that are common are more likely to appear together (in the same group of texts) than words that are not common, and because they’re more likely to appear, it’s less likely to be significant when they do. Because the texts we read don’t operate in a vacuum, the commonness or rareness of words is defined by text. For example, if both your source and target texts are about dogs, it won’t be unexpected or interesting that both contain the phrase Good boy! That’s pretty frequent when talking about dogs, and that is why it appears in both texts. The same phrase appearing in two texts about fish probably is interesting, and may well be an intentional echo/intertext, since it is much less common in this context.

Tesserae gives you the option of deciding what words are common based on the texts (target/source) and on the corpus (all texts on Tesserae). Your results will be more finely targeted if you choose texts, because then you are ranking the commonness of words against the same text — this is the equivalent of saying that the word terrier is common in our dogs book, but not common in the fish book. But if you were looking globally at all texts, terrier isn’t that common, and might fly under the frequency radar. So by choosing texts, you are more likely to receive results that are relevant.

The Maximum distance and Distance metric menus are related. Distance is how you define what constitutes a phrase. If we think of a simple sentence, like Marcus in villam Quinti ab horto ambulavit, you as a person can group the phrases fairly easily: Marcus… ambulant, ab horto, in villam Quinti. But how do you know that Quinti depends on villam, rather than horto? Because you’ve read a lot of Latin. The computer is not as sophisticated as an advanced Latinist. The distance metrics are there to help you tell the computer how to chunk its Latin.

Let’s say we wanted the computer to recognize villam Quinti as a phrase (in, like most prepositions, is a stop word) even if there were further qualifiers: villam pulchram Quinti, for example. To suggest an intertext between the two-word phrase and the three-word phrase, we would need to include a maximum distance of three. This part is a little counter-intuitive: the maximum distance includes all of the words that you think are part of the phrase. In other words, Tesserae, like the Greeks and Romans, counts inclusively.

The Basic Search has no maximum distance, and you may decide that this is the best way to go for your search (anything involving Horatian odes, for example, probably will not want to use this feature). But if you’re working with texts that are commonly end-stopped, like pentameters, or extremely short texts, like epigrams, you might want to experiment with the maximum distance. You can choose no max, 5 words, 10 words, 20 words, 30 words, 40 words, and 50 words; for the cases that I mentioned, I would choose 10 or 20 words for most pentameter lines, but maybe 40 or 50 for epigrams (remembering that Tesserae groups texts by the book, rather than by the individual poem, whereas you are probably interested in results within a single poem).

The distance metric adjusts how the computer measures distance in relation to score (or relevance). The default is frequency, and it works by having the computer find the two least common words in the space delineated by the maximum distance. So if you’ve chosen 10 words, it will find the least common two of the ten, on the assumption that they are the most significant. You can also choose span, which refers to how large the distance between words is. Words that are closer are ranked more highly.

Tesserae lets you choose whether you want to use the frequency or span from both sources added together as the distance metric (using our example above, that would give us a distance of 5: villam Quinti is 2, and villam pulchram Quinti is 3), or whether you want to use either based solely on the source text or solely on the target text. Because lower distance scores are deemed more relevant, choosing only one text may give you higher scores, but won’t necessarily change the relevance of your parallels.

Lastly, there’s the Feature menu. This is most exciting and innovative feature of Tesserae in my opinion. This menu allows you to choose whether you are matching the same word(s) or semantically related words — for example, bellus with pulcher, or proelia, bellum, and pugna. Although this feature is still being refined (and we’re excited to have a member of the Tesserae team talk about these advanced features in an upcoming post!), it is much closer to a human reader than other word-searching tools, like those found in Perseus.

Your options for the Feature menu are a little complex. The default is lemma, which is the dictionary entry of any given word. You can choose to search these terms more narrowly, by exact word (if, for example, you were interested in how many times Vergil uses the same set phrase, this would be a good way to find it). You can also choose to broaden your search by choosing semantic match, which would get you words that have broadly similar meanings, as seen above. Tesserae will also let you combine lemma + semantic match. Lastly, you can search by sound, which tries to capture words that have similar auditory effects. This is a feature that I can see being more useful in studies of poetry, where the sound effects of individual lines help enhance the meaning: for example, the r sound indicating anger.

As an example of how the sound effects work, here are the top five results of running our 200-stop word, target+source search with sound rather than lemma selected as the feature:

tesserae search results vergil Aeneid sound effects

As you can see, our top hit from before has now moved down to #5, while all of the other results are new. And while some of the words are similar (such as umbram and inumbrant) or the same (frondis/frondes), other words have the same sound effect but come from markedly different stems (mediae, odia). This is an innovative feature that brings a lot of new results to the fore.

Since we know that many, if not all, ancient texts were meant to be read aloud, searching by sound effect has the potential to open up new areas of inquiry. Does Vergil use -c- sounds at the beginning of lines more often? Do sad lines often end with an -eu- sound, like eheu? These are questions that I could now try to address using this feature.

So far, we’ve introduced all of the Latin-language tools on Tesserae. Tesserae is also working on Greek-Latin bilingual searches. Our next Tesserae post will be a guest post by a Tesserae team member explaining how to make the best use of those bilingual features!


~j.

Leave a comment