Read more

Ruby: The YAML safe_load method hides some pitfalls

November 08, 2019Software engineer at makandra GmbH

The Ruby standard lib ships with a YAML Parser called Psych. But serializing and deserializing data seems not as obvious as if you are using JSON.

Illustration online protection

Rails Long Term Support

Rails LTS provides security patches for old versions of Ruby on Rails (2.3, 3.2, 4.2 and 5.2)

  • Prevents you from data breaches and liability risks
  • Upgrade at your own pace
  • Works with modern Rubies
Read more Show snapshot

To safely write and read YAML files you should use Psych#dump (String#to_yaml) and Psych.safe_load (YAML.safe_load):

data = {'key' => 'value'}.to_yaml
=> "---\nkey: value\n"
=> {"key"=>"value"}

Unfortunately you might encounter a few pitfalls which are not obvious in the first place. All of them are a side effect that you can not configure Psych#dump to only write safe data.

Pitfall 1: Psych::DisallowedClass

Psych#safe_load only whitelists the following classes: TrueClass, FalseClass, NilClass, Numeric, String, Array and Hash. All other classes will raise an exception unless you whitelist them. Maybe it is a good idea to add Symbol, Date and Time to that list, but other classes could also make sense.

data = {foo: 'bar'}.to_yaml

Psych::DisallowedClass: Tried to load unspecified class: Symbol

::YAML.safe_load(data, [Symbol])
=> {:foo=>"bar"}

Pitfall 2: Psych::BadAlias

Psych#dump will create aliases if you reference the same object more than one time. By default this is disabled by Psych#safe_load. If you use the default whitelist you will not encounter the issue, but for "more complex" classes (e.g. Time) Psych#dump will optimize the result.

time =
data = {foo: time, bar: time}.to_yaml
=> => "---\n:foo: &1 2019-11-08 11:28:34.834180510 +01:00\n:bar: *1\n"

::YAML.safe_load(data, [Symbol, Time])
Psych::BadAlias: Unknown alias: 1

::YAML.safe_load(data, [Symbol, Time], [], true) # This sym
=> {:foo=>2019-11-08 11:28:34 +0100, :bar=>2019-11-08 11:28:34 +0100}

A note

Note that both these options are there for a reason:

Allowing to deserialize symbols can expose an application to a DOS attack (since symbols are not garbage-collectable).

Parsing aliases allows "YAML bombs" that also constitute a DOS attack.

You have to choose if this is acceptable risk for your use case.

new safe_load API

Starting with psych 3.1.0 the safe_load api got more userfriendly by replacing positional arguments with keyword arguments:

if >='3.1.0.pre1')
  ::YAML.safe_load(input, permitted_classes: PERMITTED_CLASSES, permitted_symbols: PERMITTED_SYMBOLS, aliases: true)

Posted by Emanuel to makandra dev (2019-11-08 11:03)