Blog Listing

Topic Modeling

When I chose to use the State of the Union speeches for this week’s assignment, I certainly did not expect to find what I found in terms of figuring out topics for possible research projects using techniques from digital humanities. Out of two hundred plus years of the many different historical subjects that the United States’ past presidents deemed important to address before Congress, what would I end up with in terms of topics using Miriam Posner’s techniques for developing and interpreting one’s own topics in the field of digital humanities? The answer, as I have already said, was unexpected and served to illustrate both the limitations and surprising benefits of taking this digital macroscopic view of history.
Once I had downloaded and unzipped the file containing the 223 presidential speeches to Congress ranging from the year 1790 to 2013, I ran them through the Topic Modeling Tool to see what kind of topics it would come up with. Relying upon the experience of Professor David Thomas, I set the program to generate twenty-five topic results showing the first twenty-five words divided into each topic. As expected, I received twenty-five different lists of words that the Topic Modeling Tool suggested were connected in some way. It was difficult trying to find something that caught my eye in terms of a topic because, while some topic word clusters contained certain words that stood out, they did not seem curious to find no connection with the other words in their respective clusters, concerning the nature of the historical documents being analyzed. For instance, Topic 15 looked curious as it contained the words “america,” “americans,” and “American,” and I was intrigued by the fact that so many words referring back to this country were found in one topic. When I clicked on this topic and browsed through the first eight or so documents pertaining to this topic, I found no distinct pattern in the reason why presidents chose to use these words. If I had taken the time to review all of the cited documents in connection with this topic cluster, then perhaps a pattern would have arisen or at least a few overall objectives. Still when I went back to the list of random topic clusters and read them over a few times, something did jump out to me as curious. Seeing as how the history of the United States is tightly interwoven with the histories of other countries, it was odd that very few countries were brought up in the topic clusters as subjects of discussion in the presidential speeches. The only countries that did appear within the listed topics in reference to the speeches were Britain, China, Cuba, the Soviet Union, Mexico, and Spain. Out of these listed countries only Britain and Spain were listed in more than one topic cluster. Why were these two countries more prominent than any of the others past presidents have mentioned in their speeches?
The first possibility to suggest is that if I were to expand the number of words presented in each topic cluster after running the Topic Modeling Tool, then I may very well get more countries referenced in more than one topic cluster. However, if one considers that the parameters I had set were good enough to develop topic ideas for digital projects, then it would be interesting to investigate and argue this phenomenon. It is not so surprising to find Great Britain appearing as one of the two most common connecting threads between topics. This is due to the fact that, more than any other country, Great Britain has been strongly connected to American history (whether as an enemy or ally). It would not be surprising to find Great Britain connected to many different subjects of discussion in these presidential addresses. On the other hand, Spain is a rather curious country to appear so prominently. Upon looking at the other words associated with it in the two topics citing it as a word, the first topic contained many words referring to war. My immediate assumption was that this word cluster could be labeled as documents in reference to the Spanish American War. The first few documents in reference to this word cluster would have supported this based purely on the fact that they were all speeches delivered during the late 19th century including this century’s last two years: 1898 and 1899. However, when reading these first few speech citations, there was no mention of Spain or war, but instead the prosperity of the country. Therefore, I decided to take a look at the second word cluster that referenced Spain only to find words and documents referring to the United States’ supposedly flourishing economy. Had I looked through all the cited documents, I may have found more references to war, specifically the Spanish American War. However, based on what I did find, I think the reason Spain was referenced more than once was because the Spanish American War played a major role in what I now think is the bigger topic connected these two word clusters: Pre-World Wars American Imperialism. It was because of this war that the United States was able to gain control of the Philippines and start its own empire (granted a small one considering the territory that the European countries were controlling). Control of the Philippines did lead to new found economic prosperity in the United States.

This connecting thread, I argue, is the main reason Spain appears more prominently than most of the other countries. I could very well be wrong considering I have not done as much investigation as is normally required in these situations, but for the purpose of this assignment, it seems like a logical theory to consider.



Leave a Reply