Posted over 6 years ago. Visible to the public. Repeats.

Ruby: Natural sort strings with Umlauts and other funny characters

Why string sorting sucks in vanilla Ruby

Ruby's sort method doesn't work as expected with German umlauts:

Copy
["Schwertner", "Schöler"].sort => ["Schwertner", "Schöler"] # you probably expected ["Schöler", "Schwertner"]

Also numbers in strings will be sorted character by character which you probably don't want:

Copy
["1", "2", "11"].sort # => ["1", "11", "2"] # you probably expected ["1", "2", "11"]

Also the sorting is case sensitive:

Copy
["a", "B"].sort # => ["B", "a"] # you probably expected ["a", "B"]

How to fix it

To fix all of this copy the attached files to config/initializers. It gives your strings a method #to_sort_atoms that returns an object that compares as expected.

You can now say:

Copy
["Schwertner", "Schöler"].sort_by(&:to_sort_atoms) #=> ["Schöler", "Schwertner"] ["1", "2", "11"].sort_by(&:to_sort_atoms) # => ["1", "2", "11"] ["a", "B"].sort_by(&:to_sort_atoms) # => ["a", "B"]

There is also a shortcut #natural_sort that does roughly the same as sort_by(&:to_sort_atoms):

Copy
["Schwertner", "Schöler"].natural_sort #=> ["Schöler", "Schwertner"] ["1", "2", "11"].natural_sort # => ["1", "2", "11"] ["a", "B"].natural_sort # => ["a", "B"]

In additional natural_sort will look for a method #to_sort_atoms on non-strings so you can define your own natural sort order.

There is also natural_sort_by which works like Ruby's sort_by(&block).

Tweaking for weird requirements

You can configure the string normalization as described in "Normalize characters in Ruby".

Specs (for nerds)

Here are some specs that describe the behavior of #to_sort_atoms:

Copy
require 'spec_helper' describe String do describe '#to_sort_atoms' do it 'should return an object that correctly compares German umlauts' do ('Äa'.to_sort_atoms <=> 'Az'.to_sort_atoms).should == -1 ('Äa'.to_sort_atoms <=> 'Ää'.to_sort_atoms).should == 0 ('Az'.to_sort_atoms <=> 'Ää'.to_sort_atoms).should == 1 end it 'should return an object that compares case insensitively' do ('A'.to_sort_atoms <=> 'b'.to_sort_atoms).should == -1 ('A'.to_sort_atoms <=> 'a'.to_sort_atoms).should == 0 ('B'.to_sort_atoms <=> 'a'.to_sort_atoms).should == 1 end it 'should return an object that compares naturally' do ('2'.to_sort_atoms <=> '11'.to_sort_atoms).should == -1 ('2'.to_sort_atoms <=> '2'.to_sort_atoms).should == 0 ('11'.to_sort_atoms <=> '2'.to_sort_atoms).should == 1 end it 'should compare correctly when the left and right side have different number of atoms' do ('a1b1c1'.to_sort_atoms <=> 'b1'.to_sort_atoms).should == -1 ('a1b1c1'.to_sort_atoms <=> 'a1b1c1'.to_sort_atoms).should == 0 ('b1'.to_sort_atoms <=> 'a1b1c1'.to_sort_atoms).should == 1 end it 'should compare correctly when the left side starts with a digit' do ('1'.to_sort_atoms <=> 'a'.to_sort_atoms).should == -1 ('1'.to_sort_atoms <=> '1'.to_sort_atoms).should == 0 ('a'.to_sort_atoms <=> '1'.to_sort_atoms).should == 1 end end end

Does your version of Ruby on Rails still receive security updates?
Rails LTS provides security patches for old versions of Ruby on Rails (3.2 and 2.3).

Owner of this card:

Avatar
Henning Koch
Last edit:
about 2 years ago
Attachments:
enumerable_natural_sort.rb, string_to_sort_atoms.rb
Keywords:
Umlaute
About this deck:
We are makandra and do test-driven, agile Ruby on Rails software development.
License for source code
Posted by Henning Koch to makandra dev
This website uses cookies to improve usability and analyze traffic.
Accept or learn more