Posted about 4 years ago. Visible to the public. Repeats.

Fixing flaky E2E tests

An end-to-end test (E2E test) is a script that remote-controls a web browser with tools like Selenium WebDriver. This card shows basic techniques for fixing a flaky E2E test suite that sometimes passes and sometimes fails.

Although the examples in this card use Cucumber and Selenium, the techniques are applicable to all languages and testing tools.

Why tests are flaky

Your tests probably look like this:

Copy
When I click on A When I click on B When I click on C Then I should see effects of C

A test like this works fine most of the time and does not need to be fixed. However, as your frontend adds more JavaScript, AJAX and animations, this test might become "flaky". Flaky tests destroy the trust in your test suite ("I thought this is always red") and should be stabilized ASAP.

The root cause for flaky tests is that your test script, your application code and the controlled browser are three different programs. These three programs run in parallel with little to no synchronization to prevent race conditions.

A race condition occurs whenever the test script makes unchecked assumptions about the browser's application state. The test above has a lot of race conditions like that: It might click on B before A receives the click event, finishes animating, or renders the B button. This might cause an error (B isn't rendered yet) or silently skip the click on B step (B is rendered but not yet interactive). It will only pass with lucky timing.

Your testing tools have some default safeguards that prevent this from blowing up in your face most of the time. E.g. when finding an element before it is in the DOM, Selenium WebDriver and Capybara will wait for a short time before raising NoSuchElementError. This is sufficient for basic, server-rendered application. However, as frontends become more complex (more JavaScript, AJAX requests, animations), race conditions will become too severe to be caught by these default safeguards.

How to remove race conditions

Stabilizing flaky tests always means to remove race conditions. In other words: Your tests should not depend on lucky timing.

Below you will find some tools that should fix 99% of your flaky tests. Use as few or as many as you need.

Interleave actions and expectations

After every action, observe its effect before moving on the next action. This way we know one action has completed before moving on to the next action.

Applied to the test above, it would look like this:

Copy
When I click on A Then I should see effects of A When I click on B Then I should see effects of B When I click on C Then I should see effects of C

Note how instead of clicking on B right after clicking on A, we first observe the effects of clicking on A. This could be done by checking for the presence of a new element that appears after clicking on A.

By interleaving actions and expectations you can make sure your test script periodically synchronizes with the browser's application state.

Ignore errors and retry

After every action or observation, ignore errors and exception failures until either succeeding or until a timeout is reached.

Applied to the test above, it would look like this:

Copy
When I click on A, retrying on errors until success or timeout Then I should see effects of A, retrying on failure until success or timeout When I click on B, retrying on errors until success or timeout Then I should see effects of B, retrying on failure until success or timeout When I click on C, retrying on errors until success or timeout Then should see effects of C, retrying on failure until success or timeout

When clicking on A causes a JavaScript error, or if A is not yet in the DOM, we wait for 50ms and click again. Or when we expect an element to have a given text, and that expectation fails, we wait for 50ms and check the text again. Only when an action or observation repeatedly fails for 5 seconds, we let the test fail with the most recent error or expectation failure.

The catch-and-retry logic should be built into the functions you call when interacting with the browser:

  • If you're using Capybara Archive , functions like click_link or expectations like expect(page).to have_css('.foo') already retry internally.
  • At makandra we use Spreewald's patiently helper to retry arbitrary blocks of code. Also all Spreewald steps are already using the patiently helper.
  • If you're using other testing tools, you can easily build a patiently-like mechanism yourself by porting its source code Archive .
  • Your testing tool chain might even contain similar mechanisms already, like FluentWait Archive in the Java WebDriver implementation.

Disable animations in tests

Animations are a common cause for race conditions: An element exists twice in the DOM during a page transition, or a button isn't yet interactive while it is fading in. I recommend to simply disable animations during tests.

While you can make animations work in tests, it's rarely worth your time. Remember that tests both speed you up (when catching regressions) and slow you down (when they break for the wrong reasons), and it's your job to find a sweet spot between testing everything and moving ahead.

In Unpoly Archive you can globally disable animations like this:

Copy
up.motion.config.enabled = false

When you use CSS transitions or animations, you can globally disable them like this:

Copy
*, :before, :after { transition-property: none !important; animation: none !important; }

To only include these styles for tests, see Detect the current Rails environment from JavaScript or CSS.

In AngularJS 1.x Archive you can globally disable animations like this:

Copy
angular.module('myApp', []).run(['$animate', function($animate) { $animate.enabled(false) }])

Disable concurrent AJAX requests in tests

In applications that do a lot of AJAX, I've found it helpful to limit the number of concurrent requests to 1. This will implicit serialize UI interactions that make an AJAX request.

You can configure Unpoly Archive to only allow a single pending request at a time:

Copy
// Unpoly 2 up.network.config.concurrency = 1 // Unpoly 1 up.proxy.config.maxRequests = 1

Additional requests will be queued and sent once the pending request has completed.

Use the capybara lockstep gem (Ruby only)

This capybara-lockstep Archive gem synchronizes Capybara commands with client-side JavaScript and AJAX requests. This greatly improves the stability of an end-to-end ("E2E") test suite, even if that suite has timing issues.

Use feature-flags to disable features with heavy side effects

See Using feature flags to stabilize flaky E2E tests.

Adjust your Capybara configuration (Ruby only)

Capybara offers some global variables Archive that can be fine-tuned. If your flickering tests are failing when the fill_in method is used Archive , change the way Capybara clears a field before filling in a new value (Selenium only):

Copy
default_clear_strategy = :backspace Capybara.default_set_options = { # Capybara defaults to clear input fields with native methods # As we experienced flickering Selenium features with this setting, # we switched to "send N backspaces". # Use @native-field-clearing to change the behavior of single features. clear: default_clear_strategy, } Before '@native-field-clearing or not @javascript' do Capybara.default_set_options.delete(:clear) end After '@native-field-clearing or not @javascript' do Capybara.default_set_options[:clear] = default_clear_strategy end

Ensure no other tests are bleeding in

Sometimes a test is flaky if it is run in a specific order. There are many global states which all need to be reset after each scenario. Here are some examples:

To ensure your flaky test does not fail because of a global state you need to prove that it both fails and succeeds for a same order. This order is called seed in term of tests.

With Cucumber you can run tests like this:

Copy
bundle exec cucumber --order random features # The test outout Randomized with seed 19497

Each time you call bundle exec cucumber --order random:19497 features now, the scenarios are run in the same order.

In case you use ParallelTests Archive it might be useful to get a final summary of the run scenarios and seed per process:

Copy
DISPLAY=:17 bundle exec parallel_cucumber --serialize-stdout --verbose-process-command --test-options '--order random' features/ bin/cucumber --order random --profile parallel features/location/import.feature features/location/export.feature Using the parallel profile... ..................................................................................................................................................................................................................................................................................................................... 22 scenarios (22 passed) 309 steps (309 passed) 1m22.599s Randomized with seed 54619 bin/cucumber --order random --profile parallel features/bill/import.feature features/bill/export.feature Using the parallel profile... ............................................ 4 scenarios (4 passed) 44 steps (44 passed) 1m44.329s Randomized with seed 54797 bin/cucumber --order random --profile parallel features/user/import.feature features/user/export.feature Using the parallel profile... ................................................................................................................................................................................................................................................................................................................................................................. 17 scenarios (17 passed) 353 steps (353 passed) 1m45.641s Randomized with seed 25304

Then you can rerun the input of one process in the same order e.g. bundle exec cucumber --order random:25304 features/user/import.feature features/user/export.feature.

Note: There are cases where parallel tests will use a global state, too. In case you use the same uploads directory for all processes, one process will override the file of the other. Seeds will not help you in this case, you need to figure out manually why a test runs work in a sequential order but not in parallel.

Reduce the number of parallel test processes

This a last resort if you cannot fix your test suite.

See Too many parallel test processes may cause flaky tests.

Debugging flaky tests

For debugging flaky issues, it might be helpful to halt after the failed step. This gives you the chance to interact with the browser and better understand the root cause.

Copy
# Run an example with `DEBUG_CUCUMBER=true bundle exec bin/cucumber some.feature:10 --format=pretty` to stop at the # end of a feature on a failure After do |test_case| binding.pry if ENV['DEBUG_CUCUMBER'] && test_case.failed? end

Once an application no longer requires constant development, it needs periodic maintenance for stable and secure operation. makandra offers monthly maintenance contracts that let you focus on your business while we make sure the lights stay on.

Owner of this card:

Avatar
Henning Koch
Last edit:
about 1 month ago
by Henning Koch
Keywords:
brittle, flakey, integration, test, full-stack, integration, test
About this deck:
We are makandra and do test-driven, agile Ruby on Rails software development.
License for source code
Posted by Henning Koch to makandra dev
This website uses short-lived cookies to improve usability.
Accept or learn more