A simple Nanoc filter to compress HTML using Nokogiri

When using Nanoc, there are standard filter options for compressing CSS and minifying javascript, but there’s no equivalent for HTML. However, we can use Nokogiri to trim our HTML a little by removing some whitespace and getting rid of comments.

To do this, we’re going to build a custom Nanoc filter. Create a new file in lib/filters, let’s call it html_compress.rb.

# encoding: utf-8
require 'nokogiri'

class HTMLCompressFilter < Nanoc::Filter
  identifier :html_compress
  type :text

  def run(content, params={})
    doc = Nokogiri::HTML(content)
    
    # Find comments.
    doc.xpath("//comment()").each do |comment|
        # Check it's not a conditional comment.
        if (comment.content !~ /\A(\[if|\<\!\[endif)/)
            comment.remove()
        end
    end

    doc.to_html
  end
end

Simply parsing the HTML through Nokogiri removes excess whitespace in the <head> section with no extra steps necessary. This filter could be extended to remove whitespace from the <body> section too, but be careful with the content of any <pre> tags for example.

The other reduction uses an xpath selector to find any comments and removes most of them. The only ones that are left are conditional comments which are often used as a way of including or excluding elements based on Internet Explorer version. The following example of this is generated by Compass to include the ie.css stylesheet for Internet Explorer only:

<!--[if IE]>
  <link href="/stylesheets/ie.css" media="screen, projection" rel="stylesheet" type="text/css" />
<![endif]-->

Apply the filter as normal by placing filter :html_compress in the compile sections where you want it to apply in your Rules file. As this filter works on the HTML of your pages, it’s best to apply this filter after any that generate the HTML for example :erb or :kramdown.

Further reading:

Tags: