Posted almost 2 years ago. Visible to the public. Repeats.

Fixing flaky integration tests

This card shows basic techniques for fixing a flaky integration test suite that sometimes passes and sometimes fails. "Integration test" is a test script that remote-controls a web browser with tools like Selenium WebDriver.

Although the examples in this card use Cucumber and Selenium, the techniques are applicable to all languages and testing tools.

Why tests are flaky

Your tests probably look like this:

Copy
When I click on A When I click on B When I click on C Then I should see effects of C

A test like this works fine most of the time and does not need to be fixed. However, as your frontend adds more JavaScript, AJAX and animations, this test might become "flaky". Flaky tests destroy the trust in your test suite ("I thought this is always red") and should be stabilized ASAP.

The root cause for flaky tests is that your test script, your application code and the controlled browser are three different programs. These three programs run in parallel with little to no synchronization to prevent race conditions.

A race condition occurs whenever the test script makes unchecked assumptions about the browser's application state. The test above has a lot of race conditions like that: It might click on B before A receives the click event, finishes animating, or renders the B button. This might cause an error (B isn't rendered yet) or silently skip the click on B step (B is rendered but not yet interactive). It will only pass with lucky timing.

Your testing tools have some default safeguards that prevent this from blowing up in your face most of the time. E.g. when finding an element before it is in the DOM, Selenium WebDriver and Capybara will wait for a short time before raising NoSuchElementError. This is sufficient for basic, server-rendered application. However, as frontends become more complex (more JavaScript, AJAX requests, animations), race conditions will become too severe to be caught by these default safeguards.

How to remove race conditions

Stabilizing flakey tests always means to remove race conditions. In other words: Your tests should not depend on lucky timing.

Below you will find some tools that should fix 99% of your flakey tests. Use as few or as many as you need.

Tool 1: Disable animations in tests

Animations are a common cause for race conditions: An element exists twice in the DOM during a page transition, or a button isn't yet interactive while it is fading in. I recommend to simply disable animations during tests.

While you can make animations work in tests, it's rarely worth your time. Remember that tests both speed you up (when catching regressions) and slow you down (when they break for the wrong reasons), and it's your job to find a sweet spot between testing everything and moving ahead.

In Unpoly you can globally disable animations like this:

Copy
up.motion.config.enabled = false

In AngularJS 1.x you can globally disable animations like this:

Copy
angular.module('myApp', []).run(['$animate', function($animate) { $animate.enabled(false) }])

Tool 2: Interleave actions and expectations

After every action, observe its effect before moving on the next action. This way we know one action has completed before moving on to the next action.

Applied to the test above, it would look like this:

Copy
When I click on A Then I should see effects of A When I click on B Then I should see effects of B When I click on C Then I should see effects of C

Note how instead of clicking on B right after clicking on A, we first observe the effects of clicking on A. This could be done by checking for the presence of a new element that appears after clicking on A.

By interleaving actions and expectations you can make sure your test script periodically synchronizes with the browser's application state.

Tool 3: Ignore errors and retry

After every action or observation, ignore errors and exception failures until either succeeding or until a timeout is reached.

Applied to the test above, it would look like this:

Copy
When I click on A, retrying on errors until success or timeout Then I should see effects of A, retrying on failure until success or timeout When I click on B, retrying on errors until success or timeout Then I should see effects of B, retrying on failure until success or timeout When I click on C, retrying on errors until success or timeout Then should see effects of C, retrying on failure until success or timeout

When clicking on A causes a JavaScript error, or if A is not yet in the DOM, we wait for 50ms and click again. Or when we expect an element to have a given text, and that expectation fails, we wait for 50ms and check the text again. Only when an action or observation repeatedly fails for 5 seconds, we let the test fail with the most recent error or expectation failure.

The catch-and-retry logic should be built into the functions you call when interacting with the browser. At makandra we use Spreewald's patiently helper for this. Also all Spreewald steps are already using the patiently helper. If you're using other testing tools, you can easily build a patiently-like mechanism yourself by porting its source code. Your testing tool chain might even contain similar mechanisms already, like FluentWait in the Java WebDriver implementation.

Tool 4: Disable concurrent AJAX requests in tests

In applications that do a lot of AJAX, I've found it helpful to limit the number of concurrent requests to 1. This will implicit serialize UI interactions that make an AJAX request.

In Unpoly you can configure the proxy to only allow a single pending request at a time:

Copy
up.proxy.config.maxRequests = 1

Additional requests will be queued and sent once the pending request has completed.

Tool 5: Ensure no other scenarios are bleeding in

Sometimes a test is flaky if it is run in a specific order. There are many global states which all need to be reset after each scenario. Here are some examples:

To ensure your flaky test does not fail because of a global state you need to prove that it both fail and succeed for a same order. This order is called seed in term of tests.

Copy
bundle exec cucumber --order random features # The test outout Randomized with seed 19497

Each time you call bundle exec cucumber --order random:19497 features now, the scenarios are run in the same order.

In case you use ParallelTests it might be useful to get a final summary of the run scenarios and seed per process:

Copy
DISPLAY=:17 bundle exec parallel_cucumber --serialize-stdout --verbose-process-command --test-options '--order random' features/ bin/cucumber --order random --profile parallel features/location/import.feature features/location/export.feature Using the parallel profile... ..................................................................................................................................................................................................................................................................................................................... 22 scenarios (22 passed) 309 steps (309 passed) 1m22.599s Randomized with seed 54619 bin/cucumber --order random --profile parallel features/bill/import.feature features/bill/export.feature Using the parallel profile... ............................................ 4 scenarios (4 passed) 44 steps (44 passed) 1m44.329s Randomized with seed 54797 bin/cucumber --order random --profile parallel features/user/import.feature features/user/export.feature Using the parallel profile... ................................................................................................................................................................................................................................................................................................................................................................. 17 scenarios (17 passed) 353 steps (353 passed) 1m45.641s Randomized with seed 25304

Then you can rerun the input of one process in the same order e.g. DISPLAY=:17 bundle exec cucumber --order random:25304 features/user/import.feature features/user/export.feature. Note: DISPLAY=:17 is not required, it will just spawn the browser in a VNC window.

Note: There are cases where parallel tests will use a global state, too. In case you use the same uploads directory for all processes, one process will override the file of the other. Seeds will not help you in this case, you need to figure out manually why a test runs work in a sequential order but not in parallel.

Growing Rails Applications in Practice
Check out our new e-book:
Learn to structure large Ruby on Rails codebases with the tools you already know and love.

Owner of this card:

Avatar
Henning Koch
Last edit:
28 days ago
by Henning Koch
Keywords:
brittle
About this deck:
We are makandra and do test-driven, agile Ruby on Rails software development.
License for source code
Posted by Henning Koch to makandra dev
This website uses cookies to improve usability and analyze traffic.
Accept or learn more