In the chart,
- The blue line shows the count of words in each post ordered in sequence. For example, post 21 has 1370 words.
- The grey line shows a linear trend — the word count per post is increasing as the series progresses.
- The red is a constant, the average words per post. The word count for post 21 is larger than both the trend and the average.
The code is just a beginning. Many more metrics will be added to analyze the text of a corpus. I want to be analyze the style of the posts, and several word measures can be calculated: frequency, feeling, concreteness, complexity, etc. Together they profile the style of posts and can be used to compare to the corpus. Even more interesting, it builds a platform for computational understanding of a text. More to come.