Article

Using Cucumber for Acceptance Testing

Adam Carnagey

Published on: March 20th, 2009

This post was previously on the Pathfinder Software site. Pathfinder Software changed its name to Orthogonal in 2016. Read more.

A consistent nuisance problem when testing Rails applications is the “unit test gap”. This happens when the model test passes and the controller test passes, but the application as a whole fails because there’s a mismatch between the output produced by the model and the input given to the controller test. In theory, Rails integration tests can solve this problem, but they aren’t really designed for it, and nobody uses them much anyway.

An easy-to-use tool that solves that end-to-end testing would be great. Somebody should really write a blog post about that.

So, as I was saying. We continually run into an issue with our clients over defining requirements at the level that developers need to keep going, without getting bogged down in long stretches of design. There’s a big gap between “users should be able to upload photos to their pages” and all the different details of permissions, validation and the like than need to be answered at some point. The Agile process suggests that those requirements be created as closely as possible to the code, which leads to the question of how best to keep the customer in the loop while the developers need these decisions to be made.

An easy-to-use tool that lets users read or create requirements that developers can build from and run as acceptance tests would be great. Somebody should definitely write a blog post about that.

Which brings us — finally — to Cucumber, a tool for creating automated acceptance tests. It’s flexible enough to solve both problems. It can be used as a developer tool to drive regular TDD testing, and as a client tool for managing requirements.

Self-promotion: If you want to read more about Cucumber and all kinds of Ruby testing, buy the beta version of Rails Test Prescriptions. Cucumber isn’t in the book yet, but will be in the next release, scheduled for the end of March. Buy now, and get all future updates free of charge.

It’s got loads of potential. And some pitfalls and tricks. The Cucumber web site has great documentation on installing and using. I want to focus on how Cucumber can work within a development process.

Here’s a Cucumber test:

  Background:
    Given I am logged in
    Given I am in a project

  Scenario: Make High Priority
    Given I have a task with medium priority
    And I am on a page that displays tasks
    When I click on "priority_up" for that task
    Then the task should display high priority

The structure of a Cucumber test is pretty simple. You’ve got your “Given” lines, which specify prerequisites, then you’ve got your “When” lines which are user actions, finally the “Then” lines list results. The Background items are loaded before each scenario in the feature, and therefore consist primarily of “Given” lines.

This test is actually runnable as is, although Cucumber will just tell you that it doesn’t know what any of the lines actually mean yet. The Cucumber test runner output, by the way, is outstanding. Color coded results of each step, tied to the line where each step is defined, a nice summary, and snippets of code to insert for undefined steps. The output goes a long way toward making Cucumber easy to use.

In order to make this test actually work, you have to define it. Cucumber gives you a handy file to place the definitions in. Cucumber task definitions are Ruby code. Here is the definition for the first step in the scenario above:

Given /^I have a task with (.*) priority$/ do |priority|
  @task = Task.make(:priority => priority, :project => @project,
      :feature => @feature)
end

Let’s break this down a little bit. The starting point of a Cucumber step definition is a definition line containing a regular expression and taking a block. Any group specified in the regular expression is passed as an argument to the block (and matching text is very helpfully presented in bold from the command line output).

In this particular case, the text “I have a task with high priority” matches this step definition, and passes “high” to the block, where it’s used to create the new task (using the Machinist fixture replacement plugin — Cucumber plays very nicely with all the fixture replacements).

Here are the definitions for the other three steps.

Given /^I am on a page that displays tasks$/ do
  visit("/tasks")
end

This one is pretty straightforward and uses Webrat to simulate a browser hit to that specific URL.

When /^I click on "(.*)" for that task$/ do |button|
  #click_link(dom_id(@task, button))
  if button == "priority_up"
    visit("/tasks/#{@task.id}/upgrade")
  elsif button == "priority_down"
    visit("/tasks/#{@task.id}/downgrade")
  end
end

The commented line here is what I would do if it was a normal link — again using Webrat to simulate the link. However, Webrat doesn’t follow rails link_to_remote links (at least not yet), which is what I actually have in this project. So I’m faking it with a truly ugly hack. I’m assuming there’s a better way to do this.

Finally, we get to the actual assertion.

Then /^the task should display (.*) priority$/ do |priority|
  response.body.should =~ /priority_task_#{@task.id}/
  response.body.should =~ /#{priority}/
end

Since the response here is an Ajax response, all I can do is a match on the output. If it was ordinary HTML, I could use the full weight of assert_selectand the like.

With these definitions in place, the tests will run and fail. You can then spin off into your regular TDD process to make everything work. In practice, I actually do this one step at a time. Write the step definition, then make it work, then on to the next step. Over time, this takes the place of some controller and (especially) view testing. Also, you should find some ability to reuse step definitions.

I have several points to make about this.

It seems like the Cucumber steps should be written from the user perspective in plain language as much as possible. So “I should see a high priority icon” rather than, “the page should contain high_priority.jpg”. Along with that line, thought the Given statements probably do need to touch the database, the When and Then statements should work entirely at the user action and HTML response level. Complete end-to-end testing is, to me, the place where Cucumber has the most to offer.
It’s also clear that the amount of efficiency Cucumber adds or subtracts from your development process is largely dependent on how well you write the step definitions. The Given and When lines need to be as simple as possible, and everything relies on the Then lines giving an accurate representation of what the application is doing.
Cucumber tests have the same problems as view tests in general — if they are detailed enough to be fully exact, then they are probably too brittle, but a more robust test (based on DOM ID, for example), could easily be fooled by a sloppily written view layer.
The exercise of creating the Cucumber tests is, all by itself, quite useful in specifying exactly what the application is going to do. Unlike a lot of other requirements definition languages, Cucumber is really low-ceremony and easy to generate. So far, I’ve mostly used Cucumber in projects where the developer, not a client is generating the tests. Even then, the process of thinking these scenarios through was very helpful in driving development. Having the tests be runnable was a bonus.
That said, it’s not hard for the tests to basically lie to you. The batch of tests referenced here is a feature I actually built using Cucumber. First time I tried it in a browser, with all the Cucumber tests passing, the browser failed. For an interesting reason. The priority column was a new addition, and the code assumed that the tasks would have one of three values. Existing tasks, however, had a nil value, which broke the code. A rake task to preset the existing tasks fixed the problem, though. So — this is not a substitute for actual human testing.
Which leads to another question about Cucumber — where to draw the line between Cucumber and existing tests. It seems to be that Cucumber is not the place to test all your error conditions (although if an error condition has a user-visible component than a generic error test is probably appropriate). I’m not sure I’ve completely found the groove as to when a test goes in Cucumber and when it goes into a controller or view test — model tests still do seem to be completely separate, I had no trouble writing model tests off of my Cucumber scenarios. (I know that some people are using Cucumber to replace controller tests — I’m not convinced yet that’s the way to go.
I’d be very interested to see how this played out on a larger project. Would the number of feature files become unwieldy? Would multiple developers be continually reinventing the step definition wheel? Would that matter?

One thing that kept me from trying out Cucumber in the past was a variant on the same thinking that keeps people from testing in general – the fear that you’re just going to be writing more code, and taking more time, to little or no benefit.

My experience so far, as I explore what Cucumber can do, has been largely positive. Where I was starting with Cucumber and only a vague idea of how the user interaction would play out, writing the scenarios at the Cucumber level felt very valuable and gave the development a clear path that I wouldn’t have otherwise had. That said, there is extra code being written, and it’s clearly possible to get really tangled in getting the step definitions right.

As developers, we’re kind of conditioned to believe that if we’re solving a complicated problem, we’re doing something valuable. At its worst, writing the step definitions felt like a complicated problem that wasn’t adding much. Luckily, a lot of that was due to my own unfamiliarity with Cucumber. As I got better at it, most of the step definition part came quickly. Ultimately, I think I wrote better code as a result of starting with the Cucumber definitions — it definitely kept me from writing unneeded code, since most everything flowed from a Cucumber step.

I definitely recommend giving Cucumber a try — the potential for really cleaning up what is often the messy process of determining what a feature should do is very high. I’m really looking forward to reading (and writing) more details of what successful strategies with Cucumber look like.

Related Services: Custom Software Development