In Ruby (almost) everything is an Object
. While this enables a lot of powerful features, this concept might be confusing for developers who have been programming in more static languages, such as Java or C#. This card should help understanding the basic concepts of Ruby's object model and how things behave.
Usage of objects in Ruby
When working with objects in Ruby, you might think of a "container" that holds metadata, variables and methods. Metadata describes stuff like the object's class or its object_id
which uniquely identifies it. From an external point of view, this might seem adequate. Let's say we have a user class and instantiate it:
class User
def initialize(name)
@name = name
end
def introduce
puts "Hi, I am #{@name}!"
end
end
dominic = User.new('Dominic')
dominic.object_id # => 180
dominic.introduce # => "Hi, I am Dominic!"
dominic.methods # => [:introduce, :dup, :itself, :yield_self, :then, ...]
dominic.instance_variables # => [:@name]
As you can see, we can directly access metadata, methods and instance variables of our concrete object. While this may seem simple, Ruby does a lot of work under the hood to enable this.
Basic mapping of classes and objects
Ruby's objects internally map to a C structure that roughly looks like the following:
struct RObject {
struct RBasic basic;
struct st_table *iv_tbl;
};
As you can see, it holds two members. One is iv_table
which holds a pointer to a table of instance variables defined on that object. Think of st_table
like a hash/map that holds key-value-pairs, representing the name of the variable and its value, respectively.
The other variable holds a RBasic
structure. More on this one later.
Moving further on, there is a concept for classes that kind of looks like this:
struct RClass {
struct RBasic basic;
struct st_table *iv_tbl;
struct st_table *m_tbl;
VALUE super;
};
As we can see, we again have a variable of the type struct RBasic
. We can also spot another pointer to a table holding instance variables. The super
member refers to the superclass of this class. We will handle this one later.
Additionally, we do now see a table named m_tbl
. What does that mean? It stands for a collection of methods defined on a specific class. When talking about collection, we again actually mean a structure like a hash/map with key-value-pairs. But why did we not have this declaration in the RObject
structure? Here comes a first really important point:
Objects do not have methods, only classes do.
Remember this because it will be important to understand the upcoming concepts.
Is that all? Only two low-level concepts for the whole Ruby language? No, there are of course more structures involved. These are necessary as some types in Ruby also need a predefined representation in memory. How would you otherwise display a string
or a float
using a regular RObject
? Unfortunately, instance variables won't help here. For this reason, there is e.g. also a RString
structure which holds a char
-pointer and a corresponding length integer. The same principle applies to arrays with RArray
or floats with RFloat
. Most of these structures have a member of type RBasic
.
What does the RBasic
struct do?
struct RBasic {
unsigned long flags;
VALUE klass;
};
It holds two members:
- Some flags used to describe states (e.g. if the object is frozen, garbage collection-markings, ...) and an object type description (e.g.
T_ARRAY
). - A pointer
klass
which points to the corresponding class of this object (which the object is an instance of).
Keep in mind that the type VALUE
is basically a typedef for unsigned long
. Although it is also used for other purposes (and thus is not directly of type void*
), you may just consider it as a simple pointer for understanding this card.
Now that we know the internal representations, let's have another look at our previously defined object dominic
. How would that be represented?
We would probably have a RObject
with iv_tbl
pointing to something like | name | <struct RString>* |
where the string structure would contain a pointer to the first element of this char array: ['D', 'o', 'm', 'i', 'n', 'i', 'c']
Also, it would have a RBasic
object with klass
pointing to a RClass
that represents our User
class and whose m_table
includes, inter alia, our declared introduce
method.
This means that declaring a class Foo
internally creates a Foo (RClass)
which contains the methods defined using def <method>; end
. Instances of Foo
then are RObject
s that hold a pointer to Foo (RClass)
as class reference (klass
) and one to the instance variables with the values for the concrete object (iv_table*
). Again, the object itself has no methods defined on it.
To make this a bit more clear, have a look at the following diagram:
Connections between classes happen over the inheritance chain. The super
method of classes accesses the corresponding superclass. Objects do not have a super
reference as they have no superclass, just a class they are an instance of (klass
).
However, as classes are also objects they do also have a klass
reference to Class
which they are an instance of.
That is what you usually know from Ruby:
class X
#...
end
is equivalent to
X = Class.new do
#...
end
which means X
is an instance of Class
. When you call X.class
you will get Class
.
With an instance of X, however, you will get the following:
x = X.new
x.class # => "X"
x
is then an RObject
which holds a reference klass
to X
.
Mapping of singleton methods
When declaring singleton methods, you extend a specific object with some methods that only this object knows. Let us make an example with our User
class:
peter = User.new('Peter')
alice = User.new('Alice')
Let's say we want Peter to respond to a method age
that returns his age. On the other hand, Alice should not respond to this method as you don't ask a woman for her age. We could do that by adding a method only to Peter:
def peter.age
42
end
This will result in the following:
peter.age # => 42
alice.age # => NoMethodError (undefined method `age' for #<User:0x0000559ee6c4a1f0 @name="Alice">)
While Alice's instance does not know about age
, only Peter's instance does and can respond with 42
. This is why a such method is called a singleton method. Only one specific instance is allowed to access it, but not all instances of User
.
How does Ruby internally represent singleton methods? Where do they go? Obviously they cannot be part of the underlying User <RClass>
itself or they would be available to any instance (RObject
). But, on the other side, the objects themselves do not hold any method information.
The answer is: There are separate classes created to hold this information. They are called singleton classes. Sometimes they may also be referred to as eigenclasses or metaclasses. Every object in Ruby has a singleton class which is hidden from you in the background. Nevertheless, you are able to access it using the object.singleton_class
-method. When not empty (i.e. singleton methods are defined on an object), singleton classes find their place between the object itself and its corresponding class. The class reference (klass
) of the concrete RObject
then points to this singleton class instead of the class directly. In order to not lose the chain, the superclass (super
) of the singleton class points to the original RClass
. Let's see this for the example above:
This makes sense. peter
is an instance of his own singleton class that inherits the basic definitions from the User
class. This means User
is a superclass of peter
's eigenclass. This way introduce
is also available to peter
. On the other hand alice
is a direct instance of the User
class. This does not mean that she has no singleton class. But as it is empty, it's just discarded.
Careful: In peter
's case peter.class
does not return his eigenclass now. It still returns User
. This means the klass
reference in the C-struct is not the same as Ruby's class
method. It will discard singleton classes and move on to the actual class. To access the singleton class directly, you will have to call peter.singleton_class
in Ruby.
As eigenclasses are also objects, they may again have an eigenclass (over the klass
reference). This means you may create an endless chain of singleton classes. Practically, this is actually never a thing you might want to do.
Class methods
Singleton methods can also be used to create class methods which, the way they work, could depict in some way what "static methods" do in other programming languages. Classes themselves are also objects of the class Class
. This means we can also define methods on them directly (like we did on the object peter
):
class User
def User.foo
puts "foo"
end
end
As we are inside the scope of the class definition, we can avoid the repetition of User
by writing self
:
class User
def self.foo
puts "foo"
end
end
This works because self
is User
.
It is important to note that in Ruby, there is no such concept as class or static methods in any way. Every method you encounter is an instance method. It just depends on where it is defined.
Overriding method definitions in instances
By the way, this does also mean that you can overwrite methods for specific instances:
class User
def greet
puts "Hi"
end
end
peter = User.new
alice = User.new
def peter.greet
puts "Salut"
end
alice.greet # => "Hi"
peter.greet # => "Salut"
The reason why that works is that singleton classes are the first place for Ruby to check for a method definition when you invoke a specific method on an object. Only in case that no definition can be found there or in the prepended modules, Ruby will fall back on the method definition in the class itself. You can read more about that in Henning's card How Ruby method lookup works or in the last section of this card.
The default definee
You should question yourself what object a usual method definition refers to (i.e. on which object a method is defined when omitting an explicit identifier). The answer is that this depends on the current scope.
When using def <method>
inside of a class definition, the definition target is the class itself (not its singleton class!).
instance_eval
and class_eval
instance_eval
sets the definition target to its caller object. This means that using def <method>
inside the block parameter of peter.instance_eval
will refer to peter
and define the method on the singleton class of peter
:
peter = User.new
peter.instance_eval do
def age
42
end
end
peter.age # => 42
User.age # => NoMethodError
As you can see, instance_eval
is used for declaring singleton methods and consequently, class methods:
User.instance_eval do
def shout!
"Hey!"
end
end
User.shout! # => "Hey!"
peter = User.new
peter.shout! # => NoMethodError
class_eval
on the other hand can be used to declare instance methods. This may seem completely counterintuitive.
The reason is that class_eval
behaves as if you would open the class manually and then write code into it. The following two pieces of code do exactly the same thing:
class User
def shout!
"Hey!"
end
end
class User
end
User.class_eval do
def shout!
"Hey!"
end
end
User.shout! # => NoMethodError
peter = User.new
peter.shout! # => "Hey!"
As you can see, User
has an instance method shout!
which we can call on our object peter
(instance of User
), but not User
itself.
It is also important to mention that class_eval
may also be used equivalently to instance_eval
in order to define singleton/class methods:
class User
end
User.class_eval do
def self.shout!
"Hey!"
end
end
is equivalent to
User.instance_eval do
def shout!
"Hey!"
end
end
Inside class_eval
the self
object refers to User
which means you may also define methods on its singleton class as how you would do it in a regular class definition.
Modules
Modules are useful for adding namespaces to your code.
Another common usage is multiple inheritance as you can include multiple modules with predefined functionality into your class. The same thing is not possible using classes only. The Child < Parent
pattern only allows one base class. But how are these modules added to the class that includes them?
One possibility might be to copy the content of a module to the class, similar to how includes work in C. However, this would mean the we cannot define any methods on the module object itself as all singleton methods would jump over to the including class. To illustrate that, let's have a look at the instance method included
of the Module
class. This method can be overwritten for module instances and is called whenever the module is included somewhere:
module Greet
def self.included(base)
puts "Greet is included into #{base}."
end
def greet
puts "Hi!"
end
end
class User
include Greet
end
If Ruby would only copy the module code into the including class, we could as well define this:
class User
def self.included(base)
puts "Greet is included into #{base}."
end
def greet
puts "Hi!"
end
end
On the one hand, this would define our method greet
inside User
which is our goal. But, unfortunately, this would also result in a class method included
on User
which is not what we want at all.
Instead, we would like to know when Greet
is included and then print the name of the including class, completely independent from the including class. This should thus not bleed into User
in any way. For that to work, include
has the possibility to call included(self)
on Greet
which is an instance of the Module
class.
If this is not logical to you, make yourself clear that these snippets are equal:
module Greet
end
Greet = Module.new do
end
included
is now a regular instance method of Module
which means it may be called on Greet
. The default implementation does nothing and needs to be overwritten for a concrete module instance. In the section 'Overriding method definitions in instances' we've talked about this. The same applies here. We need a singleton method on Greet
to overwrite included
. If you look at the code example above, this is exactly what we've done by defining def self.included(base)
which is the same as def Greet.included(base)
.
If you try this out, it will work as expected: User
gets a method greet
and included
is not bleeding into User
, but is only called upon inclusion.
That means methods are not C&Ped into the class/module that calls include
(User
in this case). There is some kind of different magic going on.
The truth is that modules are added to the inheritance chain through so called IClasses which act as a docking point between classes and their superclasses. For the example above, this would look like this:
If you included more modules after Greet
, they would get prepended to Greet
's IClass
using another IClass
prepended by another IClass
prepended by another IClass
(and so on), until every module is included. The order in the inheritance chain will thus be the reverse of the inclusion order (the last included module comes first). This allows you to overwrite a method of a previously included module inside another module. Still, you may not overwrite methods from the class itself with modules (unless you prepend them, see last section).
Consider this:
module A
def foo
puts "A"
end
end
module B
def foo
puts "B"
end
end
class C
include A
include B
end
If you run C.new.foo
you will see "B" because module B
is included at last and will thus be the first module to encounter in the inheritance chain. The chain roughly looks like this:
If you swapped the inclusion of A
and B
, this would print "A".
extend
What do we do if we wish to append a class method by including a module? Say we have our User
class that should have a class method foo
from a module. As we've learnt, we cannot write def self.foo
inside of the module. Instead, there is a useful trick that is often used in Vanilla Ruby: You may use extend
. extend
adds the contents of a module to the caller's singleton class by including it as a module in the inheritance chain of the singleton class:
module A
def self.included(base)
base.extend(ClassMethods)
end
module ClassMethods
def foo
puts "Foo!"
end
end
end
class User
include A
end
User.foo # => "Foo!"
User.singleton_class.ancestors # => [#<Class:User>, A::ClassMethods, #<Class:Object>, #<Class:BasicObject>, Class, Module, Object, Kernel, BasicObject]
As you gain control over the including class/module inside of the included
method, you may add your class methods at this point over a separate ClassMethods
module inside of your module. If you are using Rails (or only ActiveSupport), you may also use ActiveSupport::Concern
which facilitates this for you.
extend vs. include
The both may also be used equivalently. As extend
defines the module content on the singleton class's inheritances chain of the caller, we may also do this ourselves using include
:
module A
def foo
puts "Foo!"
end
end
class User
singleton_class.include(A)
end
User.foo # => "Foo!"
User.singleton_class.ancestors # => [#<Class:User>, A, #<Class:Object>, #<Class:BasicObject>, Class, Module, Object, Kernel, BasicObject]
The reason you might not want to do this in a practical API is the fact that the user doesn't care about distinguishing between classes and singleton classes. They might just want to include the module and it should run as expected.
Ruby's method lookup
After you've seen all components of the inheritance chain and how they work together, you might now be able to better understand Ruby's way of searching for methods. What Ruby actually does is just following the inheritance chain all the way up until the end is reached.
We know that only classes have methods which means on our way we must only look at classes, not objects.
When we call peter.greet
, the first point in the chain is the singleton class of peter
(of which peter
is an instance). The method isn't there, so we move up to the base class/superclass of peter's singleton class: User
. The method isn't there neither, so by following super
again, we have a look at the included module(s). The method is not defined inside of any module, so last but not least we arrive at User's superclass which is Person
. The method can be found there, so we are done. If it wouldn't be available there, we would further follow up the chain, visiting Object
and BasicObject
.
But, wait... Did we just say that User
's superclass is Person
although super
of User <RClass>
refers to A <IClass>
? Yes. In Ruby you would have that structure:
class User < Person
include A
end
Remember that the diagram above only shows the internal representation of the inheritance chain in which the module gets placed between the both. In your Ruby code this still appears as a direct connection between User
and Person
while Ruby does a lot of other things under the hood.
If you now put the steps we've taken in chronological order, you get this:
- Singleton class
- Class
- Included modules
- Superclass(es)
Rails 5 introduced prepended modules which allow you to patch methods in a class in a much more convenient way than using alias_method_chain
. You can read more about this
here
Show archive.org snapshot
.
As prepended modules should thus overwrite the existing method definition in a class, we need to check them before the second step, finally resulting in:
- Singleton class
- Prepended modules
- Class
- Included modules
- Superclass(es)
There you go. This is the explanation behind Henning's card.