Read more

Ruby: The YAML safe_load method hides some pitfalls

Emanuel
November 08, 2019Software engineer at makandra GmbH

The Ruby standard lib ships with a YAML Parser called Psych. But serializing and deserializing data seems not as obvious as if you are using JSON.

Illustration book lover

Growing Rails Applications in Practice

Check out our e-book. Learn to structure large Ruby on Rails codebases with the tools you already know and love.

  • Introduce design conventions for controllers and user-facing models
  • Create a system for growth
  • Build applications to last
Read more Show archive.org snapshot

To safely write and read YAML files you should use Psych#dump (String#to_yaml) and Psych.safe_load (YAML.safe_load):

data = {'key' => 'value'}.to_yaml
=> "---\nkey: value\n"
YAML.safe_load(data)
=> {"key"=>"value"}

Unfortunately you might encounter a few pitfalls which are not obvious in the first place. All of them are a side effect that you can not configure Psych#dump to only write safe data.

Pitfall 1: Psych::DisallowedClass

Psych#safe_load only whitelists the following classes: TrueClass, FalseClass, NilClass, Numeric, String, Array and Hash. All other classes will raise an exception unless you whitelist them. Maybe it is a good idea to add Symbol, Date and Time to that list, but other classes could also make sense.

data = {foo: 'bar'}.to_yaml

::YAML.safe_load(data)
Psych::DisallowedClass: Tried to load unspecified class: Symbol

::YAML.safe_load(data, [Symbol])
=> {:foo=>"bar"}

Pitfall 2: Psych::BadAlias

Psych#dump will create aliases if you reference the same object more than one time. By default this is disabled by Psych#safe_load. If you use the default whitelist you will not encounter the issue, but for "more complex" classes (e.g. Time) Psych#dump will optimize the result.

time = Time.now
data = {foo: time, bar: time}.to_yaml
=> => "---\n:foo: &1 2019-11-08 11:28:34.834180510 +01:00\n:bar: *1\n"


::YAML.safe_load(data, [Symbol, Time])
Psych::BadAlias: Unknown alias: 1

::YAML.safe_load(data, [Symbol, Time], [], true) # This sym
=> {:foo=>2019-11-08 11:28:34 +0100, :bar=>2019-11-08 11:28:34 +0100}

A note

Note that both these options are there for a reason:

Allowing to deserialize symbols can expose an application to a DOS attack (since symbols are not garbage-collectable).

Parsing aliases allows "YAML bombs" that also constitute a DOS attack.

You have to choose if this is acceptable risk for your use case.

new safe_load API

Starting with psych 3.1.0 the safe_load api got more userfriendly by replacing positional arguments with keyword arguments:

if Gem::Version.new(Psych::VERSION) >= Gem::Version.new('3.1.0.pre1')
  ::YAML.safe_load(input, permitted_classes: PERMITTED_CLASSES, permitted_symbols: PERMITTED_SYMBOLS, aliases: true)
else
  ::YAML.safe_load(input, PERMITTED_CLASSES, PERMITTED_SYMBOLS, true)
end

Posted by Emanuel to makandra dev (2019-11-08 11:03)