Why string sorting sucks in vanilla Ruby
Ruby's sort
method doesn't work as expected with special characters (like German umlauts):
["Schwertner", "Schöler"].sort
# => ["Schwertner", "Schöler"] # you probably expected ["Schöler", "Schwertner"]
Also numbers in strings will be sorted character by character which you probably don't want:
["1", "2", "11"].sort
# => ["1", "11", "2"] # you probably expected ["1", "2", "11"]
Also the sorting is case sensitive:
["a", "B"].sort
# => ["B", "a"] # you probably expected ["a", "B"]
How to fix it
To fix all of this copy the attached files to config/initializers
. It gives your strings a method #to_sort_atoms
that returns an object that compares as expected.
You can now say:
["Schwertner", "Schöler"].sort_by(&:to_sort_atoms) #=> ["Schöler", "Schwertner"]
["1", "2", "11"].sort_by(&:to_sort_atoms) # => ["1", "2", "11"]
["a", "B"].sort_by(&:to_sort_atoms) # => ["a", "B"]
There is also a shortcut #natural_sort
that does roughly the same as sort_by(&:to_sort_atoms)
:
["Schwertner", "Schöler"].natural_sort #=> ["Schöler", "Schwertner"]
["1", "2", "11"].natural_sort # => ["1", "2", "11"]
["a", "B"].natural_sort # => ["a", "B"]
In additional natural_sort
will look for a method #to_sort_atoms
on non-strings so you can define your own natural sort order.
There is also natural_sort_by
which works like Ruby's sort_by(&block)
.
Tweaking for weird requirements
You can configure the string normalization as described in "Normalize characters in Ruby".
Specs (for nerds)
Here are some specs that describe the behavior of #to_sort_atoms
:
describe String do
describe '#to_sort_atoms' do
it 'should return an object that correctly compares German umlauts' do
expect('Äa'.to_sort_atoms <=> 'Az'.to_sort_atoms).to eq -1
expect('Äa'.to_sort_atoms <=> 'Ää'.to_sort_atoms).to eq 0
expect('Az'.to_sort_atoms <=> 'Ää'.to_sort_atoms).to eq 1
end
it 'should return an object that compares case insensitively' do
expect('A'.to_sort_atoms <=> 'b'.to_sort_atoms).to eq -1
expect('A'.to_sort_atoms <=> 'a'.to_sort_atoms).to eq 0
expect('B'.to_sort_atoms <=> 'a'.to_sort_atoms).to eq 1
end
it 'should return an object that compares naturally' do
expect('2'.to_sort_atoms <=> '11'.to_sort_atoms).to eq -1
expect('2'.to_sort_atoms <=> '2'.to_sort_atoms).to eq 0
expect('11'.to_sort_atoms <=> '2'.to_sort_atoms).to eq 1
end
it 'should compare correctly when the left and right side have different number of atoms' do
expect('a1b1c1'.to_sort_atoms <=> 'b1'.to_sort_atoms).to eq -1
expect('a1b1c1'.to_sort_atoms <=> 'a1b1c1'.to_sort_atoms).to eq 0
expect('b1'.to_sort_atoms <=> 'a1b1c1'.to_sort_atoms).to eq 1
end
it 'should compare correctly when the left side starts with a digit' do
expect('1'.to_sort_atoms <=> 'a'.to_sort_atoms).to eq -1
expect('1'.to_sort_atoms <=> '1'.to_sort_atoms).to eq 0
expect('a'.to_sort_atoms <=> '1'.to_sort_atoms).to eq 1
end
end
end