The shapes of stories: from Vonnegut to big data

The most famous rejected Masters thesis

Kurt Vonnegut (1922-2007) was a brilliant American author best known for his novel Slaughterhouse Five. Before his literary fame, he entered in to the University of Chicago’s anthropology Master’s program but never finished.

Vonnegut submitted two Master’s theses that were both, unfortunately, rejected. In one thesis, he presented the idea that stories have a shape, based on the happiness of characters over the span of the work. For instance, a character can start the story contented, then hit a low point, only to rally at the end. Think of Cinderella or Star Wars. Or a character can start at a low place, then end up happy or sad at the end, and so on.

Let’s call this Vonnegut’s thesis. Vonnegut lectured and wrote about his thesis often during his career.

While Vonnegut’s thesis is intuitively appealing and jives with our experiences as consumers of fiction, it wasn’t proven rigorously.

Until now.

Big data

While there is no precise definition for big data, by consensus it is the investigation of very large or complex data sets that are otherwise too large for analysis by hand. For instance, we could consider the entire Facebook social network, and the friendship interactions that occur between users. This dynamic social network has over a billion nodes and many more edges. The analysis of the vast Facebook network is in the realm of big data.

Big data lies at the intersection of computer science, mathematics, and engineering. It has applications in every discipline, and it is revolutionizing the way we view the world.

Every time you visit Amazon or Netflix and you see recommendations based on your buying or browsing habits, there are hidden big data analytics at work. The absolute monarch of big data is Google, who regularly scans through the trillions of websites for its search engine queries. Google uses sophisticated mathematical algorithms such as PageRank to sort the importance of webpages.

Thanks to big data analytics, the personalization of advertising, healthcare, finance, and most other aspects of your daily lives are the clear trend. We may not yet be at level of the scene below from Minority Report, but this vision will be a reality soon.

Six Degrees of Emotional Arcs

A recent article presents convincing evidence for Vonnegut’s thesis. Big data researchers from the U.S. and Australia plotted the relative happiness or sadness of 1,737 stories. They used their algorithm called a Hedonometer, which measures happiness or sadness based on the frequency of certain words. Happy/positive words are things like “love” or “win” while examples of sad/negative words are “war” or “sick”.

Here is what they did. They broke up the story into small of 10,000 word windows (about thirty pages). They used the Hedonometer to generate sentiment scores for each window, and then slid the windows across the book to plot the emotional arc of the story. The featured image of this blog is the result of this experiment for J.K. Rowling’s Harry Potter and the Deathly Hallows.

Data sets came from Project Gutenberg, which is an open-source project that digitizes fictional and cultural works. Using three distinct approaches: singular value decomposition, Ward’s method, and machine learning, the research team clustered these emotional arcs into clearly defined curves. Every story conformed to one of six curves:

  1. Rags to riches (rise).
  2. Tragedy, or Riches to rags (fall).
  3. Man in a hole (fall–rise).
  4. Icarus (rise–fall).
  5. Cinderella (rise–fall–rise).
  6. Oedipus (fall–rise–fall).
The six emotional arcs uncovered in the paper.

Here is a reference to the paper.

Andrew J. Reagan, Lewis Mitchell,  Dilan Kiley, Christopher M. Danforth, and Peter Sheridan Dodds, The emotional arcs of stories are dominated by six basic shapes, arXiv:1606.07772

The automatization of literature

While it took decades to verify, Vonnegut’s thesis appear to be correct. There are six shapes that govern every story’s emotional arc. The thesis was tested using big data analytics using thousands of sources. I emphasize that the study was completely automated: no literary analysis was done directly of the stories, at least by humans.

I blogged recently about an analogous automated literary analysis of the Game of Thrones. Using tools from network theory, the authors there uncovered the main characters as: Jon Snow, Tyrion Lannister, and Sansa Stark. If you are watching the series on HBO (the latest season predates the next novel), then you can appreciate the accuracy of their analysis.

In other news, Google announced that it is using artificial intelligence to write a romance novel, basing its work on analyzing a large sample of such novels. We don’t yet have the output of Google’s AI endeavor, but it will be interesting to see how that turns out.

Are we in the era where we can dispense with literary analysis done by humans? Going even further, will novels and screenplays be authored by machines? The likely answer is no to both questions, or at least not for the near future. It remains likely, however, that a day will come that literature will be created, or at least co-created, by machines.

George R.R. Martin recently asked Stephen King: “How the f**k do you write so fast?” King claims he writes for three to four hours per day, and aims at producing six polished pages. That effort would result in a three hundred page novel in under two months.

Has anyone checked King for a pulse? Maybe he is a fiction-writing robot!

Anthony Bonato

4 thoughts on “The shapes of stories: from Vonnegut to big data

  1. First time reading your blog! Before I launch into nitpicking, let me say that it looks like a fun blog and I will definitely come back again.

    Here’s my nitpick: “Every story conformed to one of six curves” just seems way too optimistic. What they mean is that THEY were able to cram every store into one of six boxes — whether or not it actually fit! Looking at the graphs in the “six emotional arcs” figure, it looks as if many have different numbers of maxima and minima from the “average” graph in the category they have been shoved into. Harry Potter is a case in point: its graph has nine maxima! You could only call it “rise-fall-rise” if you omitted almost all of the detail of the book. I would absolutely not consider this book to be “rise-fall-rise.”

    I haven’t looked at their paper yet; presumably in some sense these six arcs account for some large percentage of the variation in stories… but that is quite different from saying that “every story conforms” to one of the six arcs.


  2. […] I think the ending of the play felt wrong because we are used to certain narrative shapes in stories. Kurt Vonnegut championed this idea of narrative shapes in his master’s thesis in Anthropology. The thesis was rejected for being too simplistic but it turns out that Vonnegut was just way ahead of his time. Researchers looked at the fictions in Project Gutenberg and mapped their emotional arc based on thei…: […]


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s