In the chart,
- The blue line shows the count of words in each post ordered in sequence. For example, post 21 has 1370 words.
- The grey line shows a linear trend — the word count per post is increasing as the series progresses.
- The red is a constant, the average words per post. The word count for post 21 is larger than both the trend and the average.
The code is just a beginning. Many more metrics will be added to analyze the text of a corpus. I want to be analyze the style of the posts, and several word measures can be calculated: frequency, feeling, concreteness, complexity, etc. Together they profile the style of posts and can be used to compare to the corpus. Even more interesting, it builds a platform for computational understanding of a text. More to come.
Lila is a “cognitive” technology, i.e., natural language processing software to aid with reading and writing. It is initially intended to analyze and improve essays in a corpus. Below is a wireframe for a user interface, comparable to to Voyant Tools by Stéfan Sinclair & Geoffrey Rockwell.
Lila has unique functions:
- On a Home screen a user gets to enter an essay. Lila is intended to accept the text of individual essays created by a writer. An Analyze button begins the natural language processing that results in the screen above. The text is displayed, highlighting one paragraph at a time as the user scrolls down.
- The button set provides four functions. The Home button is for navigation back to the Home screen. The Save button allows the user to save an essay with analytics to a database to build an essay set or corpus. The Documents button navigates to a screen for managing the database. The Settings button navigates to a screen that can adjust configurations for the analytics.
- The graph shows the output of natural language processing and analytics for a “Feeling” metric, an aggregate measure based on sentiment, emotion and perhaps other measures. The light blue shows the variance in Feeling across paragraphs. The dark blue straight line shows the aggregate value for the document. The user can see how Feeling varies across paragraphs and in comparison to the whole essay. Another view will allow for comparison of single essays to the corpus.
- The user can choose one of several available metrics to be displayed on the graph. See list of metrics below.
- All metrics are associated with individuals words. Numeric values will be listed for a subset of the words.
- Topic Cloud. A representation of topics in an essay will be shown.
- Count. The straight count of words.
- Frequency. The frequency of words.
- Concreteness. The imagery and memorability of words. A personal favourite.
- Complexity. Ambiguity or polysemy, i.e., words with multiple meanings. Synonymy or antonmy. A measure of the readability of the text. Complexity can also be measured for sentences, e.g., number of conjunctions, and for paragraphs, e.g, number of sentences.
- Hyponymy. A measure of the abstraction of words.
- Metaphor. I am evaluating algorithms that identify metaphors.
- Form. Various measures are available to measure text quality, e.g., repetition.
- Readability by grade level.
- Thematic presence can be measured by dictionary tagging of selected words related to the work’s theme.
The intention is to help a writer evaluate the literary quality of an essay and compare it to the corpus. A little bit like spell-check and grammar-check, but packed with literary smarts. Where it is helpful to be conscious of conformity and variance, e.g., author voice, Lila can help. It is a modest step in the direction of an artificial intelligence project that will emerge in time. Perhaps one day Lila will live.