Posted about 7 years ago. Visible to the public. Repeats.

Ruby: Natural sort strings with Umlauts and other funny characters

Why string sorting sucks in vanilla Ruby

Ruby's sort method doesn't work as expected with German umlauts:

["Schwertner", "Schöler"].sort => ["Schwertner", "Schöler"] # you probably expected ["Schöler", "Schwertner"]

Also numbers in strings will be sorted character by character which you probably don't want:

["1", "2", "11"].sort # => ["1", "11", "2"] # you probably expected ["1", "2", "11"]

Also the sorting is case sensitive:

["a", "B"].sort # => ["B", "a"] # you probably expected ["a", "B"]

How to fix it

To fix all of this copy the attached files to config/initializers. It gives your strings a method #to_sort_atoms that returns an object that compares as expected.

You can now say:

["Schwertner", "Schöler"].sort_by(&:to_sort_atoms) #=> ["Schöler", "Schwertner"] ["1", "2", "11"].sort_by(&:to_sort_atoms) # => ["1", "2", "11"] ["a", "B"].sort_by(&:to_sort_atoms) # => ["a", "B"]

There is also a shortcut #natural_sort that does roughly the same as sort_by(&:to_sort_atoms):

["Schwertner", "Schöler"].natural_sort #=> ["Schöler", "Schwertner"] ["1", "2", "11"].natural_sort # => ["1", "2", "11"] ["a", "B"].natural_sort # => ["a", "B"]

In additional natural_sort will look for a method #to_sort_atoms on non-strings so you can define your own natural sort order.

There is also natural_sort_by which works like Ruby's sort_by(&block).

Tweaking for weird requirements

You can configure the string normalization as described in "Normalize characters in Ruby".

Specs (for nerds)

Here are some specs that describe the behavior of #to_sort_atoms:

require 'spec_helper' describe String do describe '#to_sort_atoms' do it 'should return an object that correctly compares German umlauts' do ('Äa'.to_sort_atoms <=> 'Az'.to_sort_atoms).should == -1 ('Äa'.to_sort_atoms <=> 'Ää'.to_sort_atoms).should == 0 ('Az'.to_sort_atoms <=> 'Ää'.to_sort_atoms).should == 1 end it 'should return an object that compares case insensitively' do ('A'.to_sort_atoms <=> 'b'.to_sort_atoms).should == -1 ('A'.to_sort_atoms <=> 'a'.to_sort_atoms).should == 0 ('B'.to_sort_atoms <=> 'a'.to_sort_atoms).should == 1 end it 'should return an object that compares naturally' do ('2'.to_sort_atoms <=> '11'.to_sort_atoms).should == -1 ('2'.to_sort_atoms <=> '2'.to_sort_atoms).should == 0 ('11'.to_sort_atoms <=> '2'.to_sort_atoms).should == 1 end it 'should compare correctly when the left and right side have different number of atoms' do ('a1b1c1'.to_sort_atoms <=> 'b1'.to_sort_atoms).should == -1 ('a1b1c1'.to_sort_atoms <=> 'a1b1c1'.to_sort_atoms).should == 0 ('b1'.to_sort_atoms <=> 'a1b1c1'.to_sort_atoms).should == 1 end it 'should compare correctly when the left side starts with a digit' do ('1'.to_sort_atoms <=> 'a'.to_sort_atoms).should == -1 ('1'.to_sort_atoms <=> '1'.to_sort_atoms).should == 0 ('a'.to_sort_atoms <=> '1'.to_sort_atoms).should == 1 end end end
Growing Rails Applications in Practice
Check out our new e-book:
Learn to structure large Ruby on Rails codebases with the tools you already know and love.

Owner of this card:

Henning Koch
Last edit:
over 2 years ago
enumerable_natural_sort.rb, string_to_sort_atoms.rb
About this deck:
We are makandra and do test-driven, agile Ruby on Rails software development.
License for source code
Posted by Henning Koch to makandra dev
This website uses cookies to improve usability and analyze traffic.
Accept or learn more