Fixing flaky E2E tests

Updated . Posted . Visible to the public. Repeats.

An end-to-end test (E2E test) is a script that remote-controls a web browser with tools like Selenium WebDriver. This card shows basic techniques for fixing a flaky E2E test suite that sometimes passes and sometimes fails.

Although many examples in this card use Ruby, Cucumber and Selenium, the techniques are applicable to all languages and testing tools.

Why tests are flaky

Your tests probably look like this:

When I click on A
And I click on B
And I click on C
Then I should see effects of C

A test like this works fine most of the time and does not need to be fixed. However, as your frontend adds more JavaScript, AJAX and animations, this test might become "flaky". Flaky tests destroy the trust in your test suite ("I thought this is always red") and should be stabilized ASAP.

The root cause for flaky tests is that your test script, your application code and the controlled browser are three different programs. These three programs run in parallel with little to no synchronization to prevent race conditions.

A race condition occurs whenever the test script makes unchecked assumptions about the browser's application state. The test above has a lot of race conditions like that: It might click on B before A receives the click event, finishes animating, or renders the B button. This might cause an error (B isn't rendered yet) or silently skip the click on B step (B is rendered but not yet interactive). It will only pass with lucky timing.

Your testing tools have some default safeguards that prevent this from blowing up in your face most of the time. E.g. when finding an element before it is in the DOM, Selenium WebDriver and Capybara will wait for a short time before raising NoSuchElementError. This is sufficient for basic, server-rendered application. However, as frontends become more complex (more JavaScript, AJAX requests, animations), race conditions will become too severe to be caught by these default safeguards.

How to remove race conditions

Stabilizing flaky tests always means to remove race conditions. In other words: Your tests should not depend on lucky timing.

Below you will find some tools that should fix 99% of your flaky tests. Use as few or as many as you need.

Interleave actions and expectations

After every action, observe its effect before moving on the next action. This way we know one action has completed before moving on to the next action.

Applied to the test above, it would look like this:

When I click on A
Then I should see effects of A
When I click on B
Then I should see effects of B
When I click on C
Then I should see effects of C

Note how instead of clicking on B right after clicking on A, we first observe the effects of clicking on A. This could be done by checking for the presence of a new element that appears after clicking on A.

By interleaving actions and expectations you can make sure your test script periodically synchronizes with the browser's application state.

Ignore errors and retry

After every action or observation, ignore errors and exception failures until either succeeding or until a timeout is reached.

Applied to the test above, it would look like this:

When I click on A, retrying on errors until success or timeout
Then I should see effects of A, retrying on failure until success or timeout
When I click on B, retrying on errors until success or timeout
Then I should see effects of B, retrying on failure until success or timeout
When I click on C, retrying on errors until success or timeout
Then should see effects of C, retrying on failure until success or timeout

When clicking on A causes a JavaScript error, or if A is not yet in the DOM, we wait for 50ms and click again. Or when we expect an element to have a given text, and that expectation fails, we wait for 50ms and check the text again. Only when an action or observation repeatedly fails for 5 seconds, we let the test fail with the most recent error or expectation failure.

The catch-and-retry logic should be built into the functions you call when interacting with the browser:

Disable animations in tests

Animations are a common cause for race conditions: An element exists twice in the DOM during a page transition, or a button isn't yet interactive while it is fading in. I recommend to simply disable animations during tests.

While you can make animations work in tests, it's rarely worth your time. Remember that tests both speed you up (when catching regressions) and slow you down (when they break for the wrong reasons), and it's your job to find a sweet spot between testing everything and moving ahead.

When you use CSS transitions or animations, you can globally disable them like this:

*, :before, :after {
  transition-property: none !important;
  animation: none !important;
}

To only include these styles for tests, see Detect the current Rails environment from JavaScript or CSS.

Disabling animations in Unpoly

In Unpoly Show archive.org snapshot you can globally disable animations like this:

up.motion.config.enabled = false

Disabling animations in AngularJS

In AngularJS 1.x Show archive.org snapshot you can globally disable animations like this:

angular.module('myApp', []).run(['$animate', function($animate) {
  $animate.enabled(false)
}])

Disable smooth scrolling

When clicking an element in an E2E test, the element is first scrolled into view.

When you have enabled smooth scrolling Show archive.org snapshot the browser may still be scrolling the viewport when the element is clicked, causing the test to sometimes miss the element.

Note

The popular Bootstrap framework enables smooth scrolling by default.

You can address this by disabling smooth scrolling in tests:

body, html {
  scroll-behavior: auto !important;
}

If you have other scrolling elements with overflow-y: scroll or overflow-y: auto, add them to the CSS selector above.

You can also disable smooth-scrolling at the driver level and skip the custom css with:

options = Selenium::WebDriver::Chrome::Options.new
options.add_argument('--disable-smooth-scrolling')

Disable concurrent AJAX requests in tests

In applications that do a lot of AJAX, I've found it helpful to limit the number of concurrent requests to 1. This will implicit serialize UI interactions that make an AJAX request.

You can configure Unpoly Show archive.org snapshot to only allow a single pending request at a time:

// Unpoly 2
up.network.config.concurrency = 1

// Unpoly 1
up.proxy.config.maxRequests = 1

Additional requests will be queued and sent once the pending request has completed.

Use the capybara lockstep gem (Ruby only)

This capybara-lockstep Show archive.org snapshot gem synchronizes Capybara commands with client-side JavaScript and AJAX requests. This greatly improves the stability of an end-to-end ("E2E") test suite, even if that suite has timing issues.

Use feature-flags to disable features with heavy side effects

See Using feature flags to stabilize flaky E2E tests.

Adjust your Capybara configuration (Ruby only)

Capybara offers some global variables Show archive.org snapshot that can be fine-tuned. If your flickering tests are failing when the fill_in method is used Show archive.org snapshot , change the way Capybara clears a field before filling in a new value (Selenium only):

default_clear_strategy = :backspace
Capybara.default_set_options = {
  # Capybara defaults to clear input fields with native methods
  # As we experienced flickering Selenium features with this setting,
  # we switched to "send N backspaces".
  # Use @native-field-clearing to change the behavior of single features.
  clear: default_clear_strategy,
}

Before '@native-field-clearing or not @javascript' do
  Capybara.default_set_options.delete(:clear)
end

After '@native-field-clearing or not @javascript' do
  Capybara.default_set_options[:clear] = default_clear_strategy
end

Ensure no other tests are bleeding in

Sometimes a test is flaky if it is run in a specific order. There are many global states which all need to be reset after each scenario. Here are some examples:

To ensure your flaky test does not fail because of a global state you need to prove that it both fails and succeeds for a same order. This order is called seed in term of tests.

With Cucumber you can run tests like this:

bundle exec cucumber --order random features

# The test outout

Randomized with seed 19497

Each time you call bundle exec cucumber --order random:19497 features now, the scenarios are run in the same order.

In case you use ParallelTests Show archive.org snapshot it might be useful to get a final summary of the run scenarios and seed per process:

DISPLAY=:17 bundle exec parallel_cucumber --serialize-stdout --verbose-process-command --test-options '--order random' features/

bin/cucumber --order random --profile parallel features/location/import.feature features/location/export.feature
Using the parallel profile...
.....................................................................................................................................................................................................................................................................................................................

22 scenarios (22 passed)
309 steps (309 passed)
1m22.599s

Randomized with seed 54619

bin/cucumber --order random --profile parallel features/bill/import.feature features/bill/export.feature
Using the parallel profile...
............................................

4 scenarios (4 passed)
44 steps (44 passed)
1m44.329s

Randomized with seed 54797

bin/cucumber --order random --profile parallel features/user/import.feature features/user/export.feature
Using the parallel profile...
.................................................................................................................................................................................................................................................................................................................................................................

17 scenarios (17 passed)
353 steps (353 passed)
1m45.641s

Randomized with seed 25304

Then you can rerun the input of one process in the same order e.g. bundle exec cucumber --order random:25304 features/user/import.feature features/user/export.feature.

Note: There are cases where parallel tests will use a global state, too. In case you use the same uploads directory for all processes, one process will override the file of the other. Seeds will not help you in this case, you need to figure out manually why a test runs work in a sequential order but not in parallel.

Reduce the number of parallel test processes

This a last resort if you cannot fix your test suite.

See Too many parallel test processes may cause flaky tests.

Debugging flaky tests

For debugging flaky issues, it might be helpful to halt after the failed step. This gives you the chance to interact with the browser and better understand the root cause.

# Run an example with `DEBUG_CUCUMBER=true bundle exec bin/cucumber some.feature:10 --format=pretty` to stop at the
# end of a feature on a failure
After do |test_case|
  binding.pry if ENV['DEBUG_CUCUMBER'] && test_case.failed?
end

As flaky tests are especially hard to observe consistently, you could run them in a loop:

for i in {1..30}; do bundle exec cucumber features/my_flaky_test.feature:123; done

If the test does not flicker locally but only in a CI environment, change your .gitlab-ci.yml to run only the specific test in your branch. Then, create a MR and start multiple pipelines via the Gitlab UI.

Further readings

Henning Koch
Last edit
Michael Leimstädtner
Keywords
brittle, flakey, integration, test, full-stack, integration, test
License
Source code in this card is licensed under the MIT License.
Posted by Henning Koch to makandra dev (2017-09-18 09:15)