Outputting JSON From Ruby

Last night I got the scraper to write an output.txt file which listed all the contents of words_by_selector. Today I want to make it write to JSON instead of plain text and I want to back it with some tests.

Updating our tests

Our current test for words_by_selector looks like this:

spec/crawler_spec.rb
it "should produce the correct Array of keywords" do
  expected_array = ['hello: 3', 'world: 2', 'foobar: 1']
  actual_array = @crawler.words_by_selector(@options[:post_selector], @options[:ignored_post_selector])
  actual_array.should eq(expected_array)
end

We’re going to need to break that sucker so it’ll produce something more like this:

spec/crawler_spec.rb
it "should produce the correct Hash of keywords" do
  expected_hash = {
      word_count: [
        {
          word: 'hello',
          count: 3
        },
        {
          word: 'world',
          count: 2
        },
        {
          word: 'foobar',
          count: 1
        },
      ]
    }

  actual_hash = @crawler.words_by_selector(@options[:post_selector], @options[:ignored_post_selector])
  actual_hash.should eq(expected_hash)
end

And we update words_by_selector to look like this:

tentacles/lib/tentacles/crawler.rb
def words_by_selector(selector, ignored_selector = nil)
  node = nodes_by_selector(selector).first
  if ignored_selector
    ignored = node.css(ignored_selector)
    ignored.remove()
  end
  words = words_from_string(node.content)
  count_frequency(words)

  sorted = @counts.sort_by { |word, count| count }
  sorted.reverse!
  sorted.map! do |word, count|
    { word: word, count: count }
  end
  { word_count: sorted }
end

Our new test should pass. Feel free to flip one of the numbers in the expected_hash to 99 or something to see it fail.

Now let’s make sure the runner takes the content out of the crawler and writes it to a JSON file.

spec/runner_spec.rb
it "should create a directory for our output" do
  @runner.run
  Dir.exists?('../../output').should be_true
end

it "should output the correct JSON" do
  @runner.run
  File.open("../../output/word_count.json") do |file|
    file.each_line do |line|
      puts line
    end
  end
end

And in runner.rb…

tentacles/lib/tentacles/runner.rb
def run
  @crawler = Tentacles::Crawler.from_uri(@options.uri)
  output = @crawler.words_by_selector(@options.post_selector, 'ul:last-child')

  Dir.mkdir('../../output') unless Dir.exists?('../../output')

  File.open("../../output/word_count.json", "w") do |file|
    file.puts JSON.pretty_generate(output)
  end
end

And there we go. Our first decent output from the crawler :D -Rob

You should follow me on Twitter here.

  • Mood: Awake
  • Sleep: 6
  • Hunger: 4
  • Coffee: 0