The Ruby Object Model

Updated . Posted . Visible to the public. Repeats.

In Ruby (almost) everything is an Object. While this enables a lot of powerful features, this concept might be confusing for developers who have been programming in more static languages, such as Java or C#. This card should help understanding the basic concepts of Ruby's object model and how things behave.


Usage of objects in Ruby

When working with objects in Ruby, you might think of a "container" that holds metadata, variables and methods. Metadata describes stuff like the object's class or its object_id which uniquely identifies it. From an external point of view, this might seem adequate. Let's say we have a user class and instantiate it:

class User
  def initialize(name)
    @name = name
  end

  def introduce
    puts "Hi, I am #{@name}!"
  end
end

dominic = User.new('Dominic')
dominic.object_id # => 180
dominic.introduce # => "Hi, I am Dominic!"
dominic.methods # => [:introduce, :dup, :itself, :yield_self, :then, ...]
dominic.instance_variables # => [:@name]

As you can see, we can directly access metadata, methods and instance variables of our concrete object. While this may seem simple, Ruby does a lot of work under the hood to enable this.

Basic mapping of classes and objects

Ruby's objects internally map to a C structure that roughly looks like the following:

struct RObject {
 struct RBasic basic;
 struct st_table *iv_tbl;
};

As you can see, it holds two members. One is iv_table which holds a pointer to a table of instance variables defined on that object. Think of st_table like a hash/map that holds key-value-pairs, representing the name of the variable and its value, respectively.
The other variable holds a RBasic structure. More on this one later.

Moving further on, there is a concept for classes that kind of looks like this:

struct RClass {
 struct RBasic basic;
 struct st_table *iv_tbl;
 struct st_table *m_tbl;
 VALUE super;
};

As we can see, we again have a variable of the type struct RBasic. We can also spot another pointer to a table holding instance variables. The super member refers to the superclass of this class. We will handle this one later.
Additionally, we do now see a table named m_tbl. What does that mean? It stands for a collection of methods defined on a specific class. When talking about collection, we again actually mean a structure like a hash/map with key-value-pairs. But why did we not have this declaration in the RObject structure? Here comes a first really important point:

Objects do not have methods, only classes do.

Remember this because it will be important to understand the upcoming concepts.
Is that all? Only two low-level concepts for the whole Ruby language? No, there are of course more structures involved. These are necessary as some types in Ruby also need a predefined representation in memory. How would you otherwise display a string or a float using a regular RObject? Unfortunately, instance variables won't help here. For this reason, there is e.g. also a RString structure which holds a char-pointer and a corresponding length integer. The same principle applies to arrays with RArray or floats with RFloat. Most of these structures have a member of type RBasic.

What does the RBasic struct do?

struct RBasic {
 unsigned long flags;
 VALUE klass;
};

It holds two members:

  • Some flags used to describe states (e.g. if the object is frozen, garbage collection-markings, ...) and an object type description (e.g. T_ARRAY).
  • A pointer klass which points to the corresponding class of this object (which the object is an instance of).

Keep in mind that the type VALUE is basically a typedef for unsigned long. Although it is also used for other purposes (and thus is not directly of type void*), you may just consider it as a simple pointer for understanding this card.

Now that we know the internal representations, let's have another look at our previously defined object dominic. How would that be represented?
We would probably have a RObject with iv_tbl pointing to something like | name | <struct RString>* | where the string structure would contain a pointer to the first element of this char array: ['D', 'o', 'm', 'i', 'n', 'i', 'c']
Also, it would have a RBasic object with klass pointing to a RClass that represents our User class and whose m_table includes, inter alia, our declared introduce method.

This means that declaring a class Foo internally creates a Foo (RClass) which contains the methods defined using def <method>; end. Instances of Foo then are RObjects that hold a pointer to Foo (RClass) as class reference (klass) and one to the instance variables with the values for the concrete object (iv_table*). Again, the object itself has no methods defined on it.

To make this a bit more clear, have a look at the following diagram:

Image

Connections between classes happen over the inheritance chain. The super method of classes accesses the corresponding superclass. Objects do not have a super reference as they have no superclass, just a class they are an instance of (klass).
However, as classes are also objects they do also have a klass reference to Class which they are an instance of.

That is what you usually know from Ruby:

class X
#...
end

is equivalent to

X = Class.new do
#...
end

which means X is an instance of Class. When you call X.class you will get Class.
With an instance of X, however, you will get the following:

x = X.new
x.class # => "X"

x is then an RObject which holds a reference klass to X.

Mapping of singleton methods

When declaring singleton methods, you extend a specific object with some methods that only this object knows. Let us make an example with our User class:

peter = User.new('Peter')
alice = User.new('Alice')

Let's say we want Peter to respond to a method age that returns his age. On the other hand, Alice should not respond to this method as you don't ask a woman for her age. We could do that by adding a method only to Peter:

def peter.age
  42
end

This will result in the following:

peter.age # => 42
alice.age # => NoMethodError (undefined method `age' for #<User:0x0000559ee6c4a1f0 @name="Alice">)

While Alice's instance does not know about age, only Peter's instance does and can respond with 42. This is why a such method is called a singleton method. Only one specific instance is allowed to access it, but not all instances of User.

How does Ruby internally represent singleton methods? Where do they go? Obviously they cannot be part of the underlying User <RClass> itself or they would be available to any instance (RObject). But, on the other side, the objects themselves do not hold any method information.

The answer is: There are separate classes created to hold this information. They are called singleton classes. Sometimes they may also be referred to as eigenclasses or metaclasses. Every object in Ruby has a singleton class which is hidden from you in the background. Nevertheless, you are able to access it using the object.singleton_class-method. When not empty (i.e. singleton methods are defined on an object), singleton classes find their place between the object itself and its corresponding class. The class reference (klass) of the concrete RObject then points to this singleton class instead of the class directly. In order to not lose the chain, the superclass (super) of the singleton class points to the original RClass. Let's see this for the example above:

Image

This makes sense. peter is an instance of his own singleton class that inherits the basic definitions from the User class. This means User is a superclass of peter's eigenclass. This way introduce is also available to peter. On the other hand alice is a direct instance of the User class. This does not mean that she has no singleton class. But as it is empty, it's just discarded.

Careful: In peter's case peter.class does not return his eigenclass now. It still returns User. This means the klass reference in the C-struct is not the same as Ruby's class method. It will discard singleton classes and move on to the actual class. To access the singleton class directly, you will have to call peter.singleton_class in Ruby.

As eigenclasses are also objects, they may again have an eigenclass (over the klass reference). This means you may create an endless chain of singleton classes. Practically, this is actually never a thing you might want to do.

Class methods

Singleton methods can also be used to create class methods which, the way they work, could depict in some way what "static methods" do in other programming languages. Classes themselves are also objects of the class Class. This means we can also define methods on them directly (like we did on the object peter):

class User
  def User.foo
    puts "foo"
  end
end

As we are inside the scope of the class definition, we can avoid the repetition of User by writing self:

class User
  def self.foo
    puts "foo"
  end
end

This works because self is User.
It is important to note that in Ruby, there is no such concept as class or static methods in any way. Every method you encounter is an instance method. It just depends on where it is defined.

Overriding method definitions in instances

By the way, this does also mean that you can overwrite methods for specific instances:

class User
  def greet
    puts "Hi"
  end
end

peter = User.new
alice = User.new

def peter.greet
  puts "Salut"
end

alice.greet # => "Hi"
peter.greet # => "Salut"

The reason why that works is that singleton classes are the first place for Ruby to check for a method definition when you invoke a specific method on an object. Only in case that no definition can be found there or in the prepended modules, Ruby will fall back on the method definition in the class itself. You can read more about that in Henning's card How Ruby method lookup works or in the last section of this card.

The default definee

You should question yourself what object a usual method definition refers to (i.e. on which object a method is defined when omitting an explicit identifier). The answer is that this depends on the current scope.
When using def <method> inside of a class definition, the definition target is the class itself (not its singleton class!).

instance_eval and class_eval

instance_eval sets the definition target to its caller object. This means that using def <method> inside the block parameter of peter.instance_eval will refer to peter and define the method on the singleton class of peter:

peter = User.new
peter.instance_eval do
  def age
    42
  end
end
peter.age # => 42
User.age # => NoMethodError

As you can see, instance_eval is used for declaring singleton methods and consequently, class methods:

User.instance_eval do
  def shout!
    "Hey!"
  end
end

User.shout! # => "Hey!"
peter = User.new
peter.shout! # => NoMethodError

class_eval on the other hand can be used to declare instance methods. This may seem completely counterintuitive.
The reason is that class_eval behaves as if you would open the class manually and then write code into it. The following two pieces of code do exactly the same thing:

class User
  def shout!
    "Hey!"
  end
end
class User
end

User.class_eval do
  def shout!
    "Hey!"
  end
end
User.shout! # => NoMethodError
peter = User.new
peter.shout! # => "Hey!"

As you can see, User has an instance method shout! which we can call on our object peter (instance of User), but not User itself.

It is also important to mention that class_eval may also be used equivalently to instance_eval in order to define singleton/class methods:

class User
end

User.class_eval do
  def self.shout!
    "Hey!"
  end
end

is equivalent to

User.instance_eval do
  def shout!
    "Hey!"
  end
end

Inside class_eval the self object refers to User which means you may also define methods on its singleton class as how you would do it in a regular class definition.

Modules

Modules are useful for adding namespaces to your code.
Another common usage is multiple inheritance as you can include multiple modules with predefined functionality into your class. The same thing is not possible using classes only. The Child < Parent pattern only allows one base class. But how are these modules added to the class that includes them?

One possibility might be to copy the content of a module to the class, similar to how includes work in C. However, this would mean the we cannot define any methods on the module object itself as all singleton methods would jump over to the including class. To illustrate that, let's have a look at the instance method included of the Module class. This method can be overwritten for module instances and is called whenever the module is included somewhere:

module Greet
  def self.included(base)
    puts "Greet is included into #{base}."
  end

  def greet
    puts "Hi!"
  end
end

class User
  include Greet
end

If Ruby would only copy the module code into the including class, we could as well define this:

class User
  def self.included(base)
    puts "Greet is included into #{base}."
  end

  def greet
    puts "Hi!"
  end
end

On the one hand, this would define our method greet inside User which is our goal. But, unfortunately, this would also result in a class method included on User which is not what we want at all.

Instead, we would like to know when Greet is included and then print the name of the including class, completely independent from the including class. This should thus not bleed into User in any way. For that to work, include has the possibility to call included(self) on Greet which is an instance of the Module class.
If this is not logical to you, make yourself clear that these snippets are equal:

module Greet
end
Greet = Module.new do
end

included is now a regular instance method of Module which means it may be called on Greet. The default implementation does nothing and needs to be overwritten for a concrete module instance. In the section 'Overriding method definitions in instances' we've talked about this. The same applies here. We need a singleton method on Greet to overwrite included. If you look at the code example above, this is exactly what we've done by defining def self.included(base) which is the same as def Greet.included(base).

If you try this out, it will work as expected: User gets a method greet and included is not bleeding into User, but is only called upon inclusion.
That means methods are not C&Ped into the class/module that calls include (User in this case). There is some kind of different magic going on.
The truth is that modules are added to the inheritance chain through so called IClasses which act as a docking point between classes and their superclasses. For the example above, this would look like this:

Image

If you included more modules after Greet, they would get prepended to Greet's IClass using another IClass prepended by another IClass prepended by another IClass (and so on), until every module is included. The order in the inheritance chain will thus be the reverse of the inclusion order (the last included module comes first). This allows you to overwrite a method of a previously included module inside another module. Still, you may not overwrite methods from the class itself with modules (unless you prepend them, see last section).

Consider this:

module A
  def foo
    puts "A"
   end
end

module B
  def foo
    puts "B"
   end
end

class C
  include A
  include B
end

If you run C.new.foo you will see "B" because module B is included at last and will thus be the first module to encounter in the inheritance chain. The chain roughly looks like this:

Image

If you swapped the inclusion of A and B, this would print "A".

extend

What do we do if we wish to append a class method by including a module? Say we have our User class that should have a class method foo from a module. As we've learnt, we cannot write def self.foo inside of the module. Instead, there is a useful trick that is often used in Vanilla Ruby: You may use extend. extend adds the contents of a module to the caller's singleton class by including it as a module in the inheritance chain of the singleton class:

module A
  def self.included(base)
    base.extend(ClassMethods)
  end

  module ClassMethods
    def foo
      puts "Foo!"
    end
  end
end

class User
  include A
end

User.foo # => "Foo!"
User.singleton_class.ancestors # => [#<Class:User>, A::ClassMethods, #<Class:Object>, #<Class:BasicObject>, Class, Module, Object, Kernel, BasicObject]

As you gain control over the including class/module inside of the included method, you may add your class methods at this point over a separate ClassMethods module inside of your module. If you are using Rails (or only ActiveSupport), you may also use ActiveSupport::Concern which facilitates this for you.

extend vs. include

The both may also be used equivalently. As extend defines the module content on the singleton class's inheritances chain of the caller, we may also do this ourselves using include:

module A
  def foo
    puts "Foo!"
  end
end

class User
  singleton_class.include(A)
end

User.foo # => "Foo!"
User.singleton_class.ancestors # => [#<Class:User>, A, #<Class:Object>, #<Class:BasicObject>, Class, Module, Object, Kernel, BasicObject]

The reason you might not want to do this in a practical API is the fact that the user doesn't care about distinguishing between classes and singleton classes. They might just want to include the module and it should run as expected.

Ruby's method lookup

After you've seen all components of the inheritance chain and how they work together, you might now be able to better understand Ruby's way of searching for methods. What Ruby actually does is just following the inheritance chain all the way up until the end is reached.
We know that only classes have methods which means on our way we must only look at classes, not objects.

Image

When we call peter.greet, the first point in the chain is the singleton class of peter (of which peter is an instance). The method isn't there, so we move up to the base class/superclass of peter's singleton class: User. The method isn't there neither, so by following super again, we have a look at the included module(s). The method is not defined inside of any module, so last but not least we arrive at User's superclass which is Person. The method can be found there, so we are done. If it wouldn't be available there, we would further follow up the chain, visiting Object and BasicObject.
But, wait... Did we just say that User's superclass is Person although super of User <RClass> refers to A <IClass>? Yes. In Ruby you would have that structure:

class User < Person
  include A
end

Remember that the diagram above only shows the internal representation of the inheritance chain in which the module gets placed between the both. In your Ruby code this still appears as a direct connection between User and Person while Ruby does a lot of other things under the hood.

If you now put the steps we've taken in chronological order, you get this:

  1. Singleton class
  2. Class
  3. Included modules
  4. Superclass(es)

Rails 5 introduced prepended modules which allow you to patch methods in a class in a much more convenient way than using alias_method_chain. You can read more about this here Show archive.org snapshot .

As prepended modules should thus overwrite the existing method definition in a class, we need to check them before the second step, finally resulting in:

  1. Singleton class
  2. Prepended modules
  3. Class
  4. Included modules
  5. Superclass(es)

There you go. This is the explanation behind Henning's card.

Dominic Beger
Last edit
Felix Eschey
License
Source code in this card is licensed under the MIT License.
Posted by Dominic Beger to makandra dev (2021-03-09 14:20)