An end-to-end test (E2E test) is a script that remote-controls a web browser with tools like Selenium WebDriver. This card shows basic techniques for fixing a flaky E2E test suite that sometimes passes and sometimes fails.
Although many examples in this card use Ruby, Cucumber and Selenium, the techniques are applicable to all languages and testing tools.
Why tests are flaky
Your tests probably look like this:
When I click on A
And I click on B
And I click on C
Then I should see effects of C
A test like this works fine most of the time and does not need to be fixed. However, as your frontend adds more JavaScript, AJAX and animations, this test might become "flaky". Flaky tests destroy the trust in your test suite ("I thought this is always red") and should be stabilized ASAP.
The root cause for flaky tests is that your test script, your application code and the controlled browser are three different programs. These three programs run in parallel with little to no synchronization to prevent race conditions.
A race condition occurs whenever the test script makes unchecked assumptions about the browser's application state. The test above has a lot of race conditions like that: It might click on B
before A
receives the click
event, finishes animating, or renders the B
button. This might cause an error (B
isn't rendered yet) or silently skip the click on B
step (B
is rendered but not yet interactive). It will only pass with lucky timing.
Your testing tools have some default safeguards that prevent this from blowing up in your face most of the time. E.g. when finding an element before it is in the DOM, Selenium WebDriver and Capybara will wait for a short time before raising NoSuchElementError
. This is sufficient for basic, server-rendered application. However, as frontends become more complex (more JavaScript, AJAX requests, animations), race conditions will become too severe to be caught by these default safeguards.
How to remove race conditions
Stabilizing flaky tests always means to remove race conditions. In other words: Your tests should not depend on lucky timing.
Below you will find some tools that should fix 99% of your flaky tests. Use as few or as many as you need.
Interleave actions and expectations
After every action, observe its effect before moving on the next action. This way we know one action has completed before moving on to the next action.
Applied to the test above, it would look like this:
When I click on A
Then I should see effects of A
When I click on B
Then I should see effects of B
When I click on C
Then I should see effects of C
Note how instead of clicking on B
right after clicking on A
, we first observe the effects of clicking on A
. This could be done by checking for the presence of a new element that appears after clicking on A
.
By interleaving actions and expectations you can make sure your test script periodically synchronizes with the browser's application state.
Ignore errors and retry
After every action or observation, ignore errors and exception failures until either succeeding or until a timeout is reached.
Applied to the test above, it would look like this:
When I click on A, retrying on errors until success or timeout
Then I should see effects of A, retrying on failure until success or timeout
When I click on B, retrying on errors until success or timeout
Then I should see effects of B, retrying on failure until success or timeout
When I click on C, retrying on errors until success or timeout
Then should see effects of C, retrying on failure until success or timeout
When clicking on A
causes a JavaScript error, or if A
is not yet in the DOM, we wait for 50ms and click again. Or when we expect an element to have a given text, and that expectation fails, we wait for 50ms and check the text again. Only when an action or observation repeatedly fails for 5 seconds, we let the test fail with the most recent error or expectation failure.
The catch-and-retry logic should be built into the functions you call when interacting with the browser:
- If you're using
Capybara
Show archive.org snapshot
, functions like
click_link
or expectations likeexpect(page).to have_css('.foo')
already retry internally. - At makandra we use Spreewald's
patiently
helper to retry arbitrary blocks of code. Also all Spreewald steps are already using thepatiently
helper. - If you're using other testing tools, you can easily build a
patiently
-like mechanism yourself by porting its source code Show archive.org snapshot . - Your testing tool chain might even contain similar mechanisms already, like
FluentWait
Show archive.org snapshot in the Java WebDriver implementation.
Disable animations in tests
Animations are a common cause for race conditions: An element exists twice in the DOM during a page transition, or a button isn't yet interactive while it is fading in. I recommend to simply disable animations during tests.
While you can make animations work in tests, it's rarely worth your time. Remember that tests both speed you up (when catching regressions) and slow you down (when they break for the wrong reasons), and it's your job to find a sweet spot between testing everything and moving ahead.
When you use CSS transitions or animations, you can globally disable them like this:
*, :before, :after {
transition-property: none !important;
animation: none !important;
}
To only include these styles for tests, see Detect the current Rails environment from JavaScript or CSS.
Disabling animations in Unpoly
In Unpoly Show archive.org snapshot you can globally disable animations like this:
up.motion.config.enabled = false
Disabling animations in AngularJS
In AngularJS 1.x Show archive.org snapshot you can globally disable animations like this:
angular.module('myApp', []).run(['$animate', function($animate) {
$animate.enabled(false)
}])
Disable smooth scrolling
When clicking an element in an E2E test, the element is first scrolled into view.
When you have enabled smooth scrolling Show archive.org snapshot the browser may still be scrolling the viewport when the element is clicked, causing the test to sometimes miss the element.
Note
The popular Bootstrap framework enables smooth scrolling by default.
You can address this by disabling smooth scrolling in tests:
body, html {
scroll-behavior: auto !important;
}
If you have other scrolling elements with overflow-y: scroll
or overflow-y: auto
, add them to the CSS selector above.
You can also disable smooth-scrolling at the driver level and skip the custom css with:
options = Selenium::WebDriver::Chrome::Options.new
options.add_argument('--disable-smooth-scrolling')
Related cards
Disable concurrent AJAX requests in tests
In applications that do a lot of AJAX, I've found it helpful to limit the number of concurrent requests to 1
. This will implicit serialize UI interactions that make an AJAX request.
You can configure Unpoly Show archive.org snapshot to only allow a single pending request at a time:
// Unpoly 2
up.network.config.concurrency = 1
// Unpoly 1
up.proxy.config.maxRequests = 1
Additional requests will be queued and sent once the pending request has completed.
Use the capybara lockstep gem (Ruby only)
This capybara-lockstep Show archive.org snapshot gem synchronizes Capybara commands with client-side JavaScript and AJAX requests. This greatly improves the stability of an end-to-end ("E2E") test suite, even if that suite has timing issues.
Use feature-flags to disable features with heavy side effects
See Using feature flags to stabilize flaky E2E tests.
Adjust your Capybara configuration (Ruby only)
Capybara offers some global variables Show archive.org snapshot that can be fine-tuned. If your flickering tests are failing when the fill_in method is used Show archive.org snapshot , change the way Capybara clears a field before filling in a new value (Selenium only):
default_clear_strategy = :backspace
Capybara.default_set_options = {
# Capybara defaults to clear input fields with native methods
# As we experienced flickering Selenium features with this setting,
# we switched to "send N backspaces".
# Use @native-field-clearing to change the behavior of single features.
clear: default_clear_strategy,
}
Before '@native-field-clearing or not @javascript' do
Capybara.default_set_options.delete(:clear)
end
After '@native-field-clearing or not @javascript' do
Capybara.default_set_options[:clear] = default_clear_strategy
end
Ensure no other tests are bleeding in
Sometimes a test is flaky if it is run in a specific order. There are many global states which all need to be reset after each scenario. Here are some examples:
- Timecop
- Browser caches (Session Storage)
- Databases (Postgres, Redis, Elasticsearch)
- Folders e.g. for file uploads
- Singleton Classes that have state
To ensure your flaky test does not fail because of a global state you need to prove that it both fails and succeeds for a same order. This order is called seed
in term of tests.
With Cucumber you can run tests like this:
bundle exec cucumber --order random features
# The test outout
Randomized with seed 19497
Each time you call bundle exec cucumber --order random:19497 features
now, the scenarios are run in the same order.
In case you use ParallelTests Show archive.org snapshot it might be useful to get a final summary of the run scenarios and seed per process:
DISPLAY=:17 bundle exec parallel_cucumber --serialize-stdout --verbose-process-command --test-options '--order random' features/
bin/cucumber --order random --profile parallel features/location/import.feature features/location/export.feature
Using the parallel profile...
.....................................................................................................................................................................................................................................................................................................................
22 scenarios (22 passed)
309 steps (309 passed)
1m22.599s
Randomized with seed 54619
bin/cucumber --order random --profile parallel features/bill/import.feature features/bill/export.feature
Using the parallel profile...
............................................
4 scenarios (4 passed)
44 steps (44 passed)
1m44.329s
Randomized with seed 54797
bin/cucumber --order random --profile parallel features/user/import.feature features/user/export.feature
Using the parallel profile...
.................................................................................................................................................................................................................................................................................................................................................................
17 scenarios (17 passed)
353 steps (353 passed)
1m45.641s
Randomized with seed 25304
Then you can rerun the input of one process in the same order e.g. bundle exec cucumber --order random:25304 features/user/import.feature features/user/export.feature
.
Note: There are cases where parallel tests will use a global state, too. In case you use the same uploads directory for all processes, one process will override the file of the other. Seeds will not help you in this case, you need to figure out manually why a test runs work in a sequential order but not in parallel.
Reduce the number of parallel test processes
This a last resort if you cannot fix your test suite.
See Too many parallel test processes may cause flaky tests.
Debugging flaky tests
For debugging flaky issues, it might be helpful to halt after the failed step. This gives you the chance to interact with the browser and better understand the root cause.
# Run an example with `DEBUG_CUCUMBER=true bundle exec bin/cucumber some.feature:10 --format=pretty` to stop at the
# end of a feature on a failure
After do |test_case|
binding.pry if ENV['DEBUG_CUCUMBER'] && test_case.failed?
end
As flaky tests are especially hard to observe consistently, you could run them in a loop:
for i in {1..30}; do bundle exec cucumber features/my_flaky_test.feature:123; done
If the test does not flicker locally but only in a CI environment, change your .gitlab-ci.yml
to run only the specific test in your branch. Then, create a MR and start multiple pipelines via the Gitlab UI.