Too many parallel test processes may amplify flaky tests

Updated . Posted . Visible to the public.

By default parallel_tests Show archive.org snapshot will spawn as many test processes as you have CPUs. If you have issues with flaky tests, reducing the number of parallel processes may help.

Important

Flaky test suites can and should be fixed. This card is only relevant if you need to run a flaky test suite that you cannot fix for some reason. If you have no issues with flaky tests you should run as many parallel test processes as possible.

Test case

In my case halfing the number of processes from 8 to 4 reduced test failures by 80% while only increasing test runtime by 10%:

CPUs Test runtime Test runtime (%) Failures Failures (%)
8 308 100% 14 100%
8 304 99% 10 71%
6 315 102% 6 43%
4 343 111% 1 7%
4 346 112% 6 43%
4 333 108% 2 14%
4 340 110% 3 21%
3 378 123% 2 14%
3 370 120% 2 14%

How to start fewer processes

When you're using the parallel_tests Show archive.org snapshot gem you can use the PARALLEL_TEST_PROCESSORS environment variable:

PARALLEL_TEST_PROCESSORS=4 geordi cucumber features
PARALLEL_TEST_PROCESSORS=4 parallel_cucumber features

To set a default process count you can add this to your ~/.bashrc or ~/.zshrc:

export PARALLEL_TEST_PROCESSORS=4

What's a good number of processes?

I'm going to experiment with 4, since that's the number of physical CPUs on my PC. If your CPU has hyperthreading Show archive.org snapshot Linux may report a higher number of CPUs. In my case Linux reports 8 and parallel_tests defaults to that:

$ lscpu

CPU(s):                          8
On-line CPU(s) list:             0-7
Thread(s) per core:              2
Core(s) per socket:              4
Socket(s):                       1

Note that your system needs to handle many more processes than just your tests while your test suite is running:

  • The Rails server booted by each test process (in a separate process)
  • The Chrome browser started by each test process
  • Your IDE
  • Your window environment
  • Background services

Fixing flaky tests

Running fewer test processes is only a bandaid. Your test suite has issues with uncontrolled timing issues. Reducing the number of test processes just makes any race conditions occur less frequently.

We have a separate card for fixing flaky integration tests.

Informing other developers

If you cannot fix your test suite, you may suggest to your colleagues that they run fewer processes.

The following script will print a yellow message to the console if the user is running more test processes than physical CPUs:

You are running more test processes than your PC has physical CPUs (8). If you encounter flaky tests, consider running tests with PARALLEL_TEST_PROCESSORS=8.

Run the script while starting your test suite (e.g. in Cucumber copy it to features/support/suggest_fewer_test_processes.rb):

class SuggestFewerTestProcesses
  class CannotReadCPUCount < StandardError; end

  def check_process_count
    if process_count > physical_cpu_count
      warn("You are running more test processes than your PC has physical CPUs (#{physical_cpu_count}). If you encounter flaky tests, consider running tests with PARALLEL_TEST_PROCESSORS=#{physical_cpu_count}.")
    end
  rescue CannotReadCPUCount => e
    warn('Cannot read CPU count: ' + e.message)
  end

  private

  def process_count
    if (env_value = ENV['TEST_ENV_NUMBER'])
      env_value.to_i
    else
      1
    end
  end

  def physical_cpu_count
    stdout_str, error_str, status = Open3.capture3('lscpu')
    if status.success?
      # lscpu output looks like this:
      #
      #     ...
      #     Core(s) per socket:              4
      #     Socket(s):                       1
      #     ...
      if stdout_str =~ /Socket\(s\)\:\s*(\d+)/
        sockets = Regexp.last_match(1).to_i
      else
        raise CannotReadCPUCount, 'Cannot parse socket count from lscpu output'
      end
      if stdout_str =~ /Core\(s\) per socket\:\s*(\d*)/
        cores_per_socket = Regexp.last_match(1).to_i
      else
        raise CannotReadCPUCount, 'Cannot parse socket count from lscpu output'
      end

      sockets * cores_per_socket
    else
      raise CannotReadCPUCount, error_str
    end
  end

  def warn(message)
    puts yellow(message)
  end

  def yellow(string)
    "\e[33m#{string}\e[0m"
  end

end

SuggestFewerTestProcesses.new.check_process_count
Henning Koch
Last edit
Henning Koch
License
Source code in this card is licensed under the MIT License.
Posted by Henning Koch to makandra dev (2021-02-24 21:26)