Testing Flesch-Kincaid Readability in Jekyll

Apr 10, 2015 Ruby

Introduction

One of the things I loved most about the Yoast SEO plugin for WordPress was the score it would give me on the readability of my content. This easily allowed me to adapt to the target audiences reading abilities. When I migrated to Jekyll, I lost a lot of the SEO benefits, so I took a stab at implementing it back in.

I did some quick Googling and stumbled upon the Lingua gem. This little treasure has a lot of features, one of which allows you to calculate the Flesch-Kincaid Readability of text. To add this functionality to Jekyll, I created a simple Rake task to let me know if a post needed to be reworded.

Keeping It Flesch With Lingua

There is a gem that exists that handles a lot of this functionality called lingua. So let’s get that added to your list of dependencies.

In your Gemfile, add:

gem 'lingua'

Once it’s added, run bundle install.

If you are not using bundler, you can simple install with gem:

gem install 'lingua'

One you’ve grabbed your dependencies, it’s time to update your projects Rakefile.

Creating Your Rakefile Task

Below is a quick proof of concept I threw together, which will grab all your posts in _posts/ and calculate the Flesch reading score. I took the liberty of translating the scores from their numerical ranking to whom has that ability to read. This ranges between everyone and a PhD student.

In your Rakefile, add the following:

require 'lingua'

task :readability do
  
  # Create a simple function to do this
  def calculate_score(text)
    # Remove the frontmatter, and codeblocks, because this will impact the score
    frontmatter = /(---)((?:(?:\r?\n)+(?:\w|\s).*)+\r?\n)(?=---\r?\n)(.*?)/x
    fenced_code = /`{3}(?:(.*$)\n)?([\s\S]*)`{3}/mx
    nested_code = /((?:^(?:[ ]{4}|\t).*$(?:\r?\n|\z))+)/x

    parsed = text.gsub(frontmatter, "#{$3}").gsub(fenced_code, "").gsub(nested_code, "")
    score = Lingua::EN::Readability.new( parsed ).flesch
    if score > 100
      return score, "an Elementary Schooler"
    elsif score.between?(80,100)
      return score, "a Middle Schooler"
    elsif score.between?(50,80)
      return score, "a High Schooler"
    elsif score.between?(30,50)
      return score, "an Average Adult"
    elsif score.between?(0,30)
      return score, "a College Level Student"
    else
      return score, "a PhD Academic"
    end
  end

  # Get all posts in ./_posts
  Find.find("./_posts/") do |post|
    if File.file?(post)
      score, level = calculate_score(File.read(post))
      puts "#{post} has a score of #{score}, which is suitable for #{level}"
    end
  end
end

To run this new task, you can use the following command:

rake readability

Here is a small snippet after I ran it on my _posts/ directory

FAILED: ./_posts/2012-02-14-dark-dreamweaver-theme--aurza.md has a score of -14.631031353135285, which is suitable for a PhD Academic
FAILED: ./_posts/2012-02-14-dreamweaver-theme-generator.md has a score of 4.533732394366211, which is suitable for a College Level Student
FAILED: ./_posts/2012-09-05-null-pointer-exception-when-calling-getwritabledatabase.md has a score of 10.74755555555555, which is suitable for a College Level Student
FAILED: ./_posts/2012-11-09-regular-expression-pattern-parsing-ifconfig.md has a score of 7.8300000000000125, which is suitable for a College Level Student
FAILED: ./_posts/2013-12-29-creating-bootable-installer-for-osx-mavericks.md has a score of -11.032681159420264, which is suitable for a PhD Academic
FAILED: ./_posts/2014-05-19-barnyard2-mysql-alert-view-for-snort.md has a score of 91.8459941520468, which is suitable for a Middle Schooler

Awesome, but that’s a little difficult to read.

Taking It A Step Further

This output is cool, we just calculated our Flesch score. In addition to it not being visually pleasing, it does not fit into a simple work flow, such as rake test or jekyll build. Since I use continuous integration to build and deploy my blog, I wanted to have it fail if the reading level was too low, or too high. My old build script was simply jekyll build.

The following rake task can be used to cause the jekyll build to fail using the shell && operator:

task :readability do
  
  # Create a simple function to do this
  def calculate_score(text)
    # Remove the frontmatter, and codeblocks, because this will impact the score
    frontmatter = /(---)((?:(?:\r?\n)+(?:\w|\s).*)+\r?\n)(?=---\r?\n)(.*?)/x
    fenced_code = /`{3}(?:(.*$)\n)?([\s\S]*)`{3}/mx
    nested_code = /((?:^(?:[ ]{4}|\t).*$(?:\r?\n|\z))+)/x

    parsed = text.gsub(frontmatter, "#{$3}").gsub(fenced_code, "").gsub(nested_code, "")
    score = Lingua::EN::Readability.new( parsed ).flesch
    if score > 100
      return score, "an Elementary Schooler"
    elsif score.between?(80,100)
      return score, "a Middle Schooler"
    elsif score.between?(50,80)
      return score, "a High Schooler"
    elsif score.between?(30,50)
      return score, "an Average Adult"
    elsif score.between?(0,30)
      return score, "a College Level Student"
    else
      return score, "a PhD Academic"
    end
  end

  # Get all posts in ./_posts
  Find.find("./_posts/") do |post|
    if File.file?(post)
      score, level = calculate_score(File.read(post))
      if score > 80 || score < 15
        puts "FAILED: #{post} has a score of #{score}, which is suitable for #{level}"
        
        # Fail the build
        exit 1
      end
    end
  end
end

And then run it:

rake readability && jekyll build

My updated build script now returns this:

FAILED: ./_posts/2012-01-10-php-539-released.md has a score of 81.727311827957, which is suitable for a Middle Schooler

Once the command fails, you’ll know which post fails, and which direction you need to reword the document. If rake readability succeeds, it will build my Jekyll site.

Conclusion

By calculating the Flesch score of your content, you can better the experience for your audience by helping them understand what you’re trying to say. Another benefit of using this score is to identify run-on and incomplete sentences since they greatly impact the score in either direction.

This post has a Flesch score of ``.


comments powered by Disqus