The most famous rejected Masters thesis
Kurt Vonnegut (1922-2007) was a brilliant American author best known for his novel Slaughterhouse Five. Before his literary fame, he entered in to the University of Chicago’s anthropology Master’s program but never finished.
Vonnegut submitted two Master’s theses that were both, unfortunately, rejected. In one thesis, he presented the idea that stories have a shape, based on the happiness of characters over the span of the work. For instance, a character can start the story contented, then hit a low point, only to rally at the end. Think of Cinderella or Star Wars. Or a character can start at a low place, then end up happy or sad at the end, and so on.
Let’s call this Vonnegut’s thesis. Vonnegut lectured and wrote about his thesis often during his career.
While Vonnegut’s thesis is intuitively appealing and jives with our experiences as consumers of fiction, it wasn’t proven rigorously.
While there is no precise definition for big data, by consensus it is the investigation of very large or complex data sets that are otherwise too large for analysis by hand. For instance, we could consider the entire Facebook social network, and the friendship interactions that occur between users. This dynamic social network has over a billion nodes and many more edges. The analysis of the vast Facebook network is in the realm of big data.
Big data lies at the intersection of computer science, mathematics, and engineering. It has applications in every discipline, and it is revolutionizing the way we view the world.
Every time you visit Amazon or Netflix and you see recommendations based on your buying or browsing habits, there are hidden big data analytics at work. The absolute monarch of big data is Google, who regularly scans through the trillions of websites for its search engine queries. Google uses sophisticated mathematical algorithms such as PageRank to sort the importance of webpages.
Thanks to big data analytics, the personalization of advertising, healthcare, finance, and most other aspects of your daily lives are the clear trend. We may not yet be at level of the scene below from Minority Report, but this vision will be a reality soon.
Six Degrees of Emotional Arcs
A recent article presents convincing evidence for Vonnegut’s thesis. Big data researchers from the U.S. and Australia plotted the relative happiness or sadness of 1,737 stories. They used their algorithm called a Hedonometer, which measures happiness or sadness based on the frequency of certain words. Happy/positive words are things like “love” or “win” while examples of sad/negative words are “war” or “sick”.
Here is what they did. They broke up the story into small of 10,000 word windows (about thirty pages). They used the Hedonometer to generate sentiment scores for each window, and then slid the windows across the book to plot the emotional arc of the story. The featured image of this blog is the result of this experiment for J.K. Rowling’s Harry Potter and the Deathly Hallows.
Data sets came from Project Gutenberg, which is an open-source project that digitizes fictional and cultural works. Using three distinct approaches: singular value decomposition, Ward’s method, and machine learning, the research team clustered these emotional arcs into clearly defined curves. Every story conformed to one of six curves:
- Rags to riches (rise).
- Tragedy, or Riches to rags (fall).
- Man in a hole (fall–rise).
- Icarus (rise–fall).
- Cinderella (rise–fall–rise).
- Oedipus (fall–rise–fall).
Here is a reference to the paper.
Andrew J. Reagan, Lewis Mitchell, Dilan Kiley, Christopher M. Danforth, and Peter Sheridan Dodds, The emotional arcs of stories are dominated by six basic shapes, arXiv:1606.07772
The automatization of literature
While it took decades to verify, Vonnegut’s thesis appear to be correct. There are six shapes that govern every story’s emotional arc. The thesis was tested using big data analytics using thousands of sources. I emphasize that the study was completely automated: no literary analysis was done directly of the stories, at least by humans.
I blogged recently about an analogous automated literary analysis of the Game of Thrones. Using tools from network theory, the authors there uncovered the main characters as: Jon Snow, Tyrion Lannister, and Sansa Stark. If you are watching the series on HBO (the latest season predates the next novel), then you can appreciate the accuracy of their analysis.
In other news, Google announced that it is using artificial intelligence to write a romance novel, basing its work on analyzing a large sample of such novels. We don’t yet have the output of Google’s AI endeavor, but it will be interesting to see how that turns out.
Are we in the era where we can dispense with literary analysis done by humans? Going even further, will novels and screenplays be authored by machines? The likely answer is no to both questions, or at least not for the near future. It remains likely, however, that a day will come that literature will be created, or at least co-created, by machines.
George R.R. Martin recently asked Stephen King: “How the f**k do you write so fast?” King claims he writes for three to four hours per day, and aims at producing six polished pages. That effort would result in a three hundred page novel in under two months.
Has anyone checked King for a pulse? Maybe he is a fiction-writing robot!