Advanced Python: Dot operator
Once again, I will write about something seemingly trivial. It is the “dot operator”. Most of you have already used this operator many times, without knowing or questioning what happens behind the scenes. And in comparison to the concept of metaclasses that I talked about last time, this one is a bit more usable for daily tasks. Just kidding, you are practically using it each time you use Python for something more than a “hello world”.
So what is a “dot operator”? Here is an example:
hello = 'Hello world!'
print(hello.upper())
# HELLO WORLD!
Well, this is surely a “hello world” example, but I can hardly imagine someone starting to teach you Python exactly like this. Anyway, the “dot operator” is the part hello.upper()
. Let’s try giving a more verbose example:
class Person:
num_of_persons = 0
def __init__(self, name):
self.name = name
def shout(self):
print(f"Hey! I'm {self.name}")
p = Person('John')
p.shout()
# Hey I'm John.
p.num_of_persons
# 0
p.name
# 'John'
There are a few places where you use the “dot operator”. To make it easier to see the bigger picture, let’s summarize the way you use it in two cases:
- use it to access attributes of an object or class,
- use it to access functions defined in the class definition.
Obviously, we have all of this in our example, and this seems intuitive and as expected. But there is more to this than meets the eye! Take a closer look at this example:
p.shout
# <bound method Person.shout of <__main__.Person object at 0x1037d3a60>>
id(p.shout)
# 4363645248
Person.shout
# <function __main__.Person.shout(self)>
id(Person.shout)
# 4364388816
Somehow, p.shout
is not referencing the same function as Person.shout
although it should. At least you would expect it, right? And p.shout
is not even a function! Let’s go over the next example before we start discussing what is happening.
class Person:
num_of_persons = 0
def __init__(self, name):
self.name = name
def shout(self):
print(f"Hey! I'm {self.name}.")
p = Person('John')
vars(p)
# {'name': 'John'}
def shout_v2(self):
print("Hey, what's up?")
p.shout_v2 = shout_v2
vars(p)
# {'name': 'John', 'shout_v2': <function __main__.shout_v2(self)>}
p.shout()
# Hey, I'm John.
p.shout_v2()
# TypeError: shout_v2() missing 1 required positional argument: 'self'
For those unaware of the vars
function, it returns the dictionary that holds attributes of an instance. If you run vars(Person)
you will get a bit different response, but you will get the picture. There will be both attributes with their values and variables that hold class function definitions. There is obviously a difference between an object that is an instance of a class and the class object itself, and so there will be a difference in vars
function response for these two.
Now, it is perfectly valid to additionally define a function after an object is created. This is the line p.shout_v2 = shout_v2
. This does introduce another key-value pair in the instance dictionary. Seemingly everything is good, and we will be able to run smoothly, as if shout_v2
were specified in the class definition. But alas! Something is truly wrong. We are not able to call it the same way as we did the shout
method.
Astute readers should have noticed by now how carefully I use the terms function and method. After all, there is a difference in how Python prints these as well. Take a look at the previous examples. shout
is a method, shout_v2
is a function. At least if we look at these from the perspective of the object p
. If we look at these from the perspective of the Person
class, shout
is a function, and shout_v2
doesn’t exist. It is defined only in the object’s dictionary (namespace). So if you are really going to rely on object-oriented paradigms and mechanisms like encapsulation, inheritance, abstraction, and polymorphism, you will not define functions on objects, like p
is in our example. You will make sure you are defining functions in a class definition (body).
So why are these two different, and why do we get the error? Well, the fastest answer is because of how the “dot operator” works. The longer answer is that there is a mechanism behind the scenes that does the (attribute) name resolution for you. This mechanism consists of __getattribute__
and __getattr__
dunder methods.
Getting the attributes
At first, this will probably sound unintuitive and rather unnecessarily complicated, but bear with me. Essentially, there are two scenarios that can happen when you try to access an attribute of an object in Python: either there is an attribute or there is not. Simply. In both cases, __getattribute__
is called, or to make it easier for you, it is being called always. This method:
- returns computed attribute value,
- explicitly calls
__getattr__
, or - raises
AttributeError
in which case__getattr__
is called by default.
If you want to intercept the mechanism that resolves attribute names, this is the place to hijack. You just have to be careful, because it is really easy to end up in an infinite loop or to mess up the whole mechanism of name resolution, especially in the scenario of object-oriented inheritance. It is not as simple as it may appear.
If you want to handle cases where there is no attribute in the object’s dictionary, you can straight away implement the __getattr__
method. This one gets called when __getattribute__
fails to access the attribute name. If this method can’t find an attribute or deal with a missing one after all, it raises an AttributeError
exception as well. Here is how you can play around with these:
class Person:
num_of_persons = 0
def __init__(self, name):
self.name = name
def shout(self):
print(f"Hey! I'm {self.name}.")
def __getattribute__(self, name):
print(f'getting the attribute name: {name}')
return super().__getattribute__(name)
def __getattr__(self, name):
print(f'this attribute doesn\'t exist: {name}')
raise AttributeError()
p = Person('John')
p.name
# getting the attribute name: name
# 'John'
p.name1
# getting the attribute name: name1
# this attribute doesn't exist: name1
#
# ... exception stack trace
# AttributeError:
It is very important to call super().__getattribute__(...)
in your implementation of __getattribute__
, and the reason, like I wrote earlier, is that there is a lot going on in Python’s default implementation. And this is exactly the place where “dot operator” gets its magic from. Well, at least half of the magic is there. The other part is in how a class object is created after interpreting the class definition.
Class functions
The term I use here is purposeful. Class does contain only functions, and we saw this in one of the first examples:
p.shout
# <bound method Person.shout of <__main__.Person object at 0x1037d3a60>>
Person.shout
# <function __main__.Person.shout(self)>
When looking from the object’s perspective, these are called methods. The process of transforming the function of a class into a method of an object is called bounding, and the result is what you see in the previous example, a bound method. What makes it bound, and to what? Well, once you have an instance of a class and start calling its methods, you are, in essence, passing the object reference to each of its methods. Remember the self
argument? So, how does this happen, and who does it?
Well, the first part happens when the class body is being interpreted. There are quite a few things that happen in this process, like defining a class namespace, adding attribute values to it, defining (class) functions, and binding them to their names. Now, as these functions are being defined, they are being wrapped in a way. Wrapped in an object conceptually called descriptor. This descriptor is enabling this change in the identification and behavior of class functions that we saw previously. I’ll make sure to write a separate blog post about descriptors, but for now, know that this object is an instance of a class that implements a predefined set of dunder methods. This is also called a Protocol. Once these are implemented, it is said that objects of this class follow the specific protocol and therefore behave in the expected way. There is a difference between the data and non-data descriptors. Former implements __get__
, __set__
, and/or __delete__
dunder methods. Later, implement only the __get__
method. Anyway, each function in a class ends up being wrapped in a so-called non-data descriptor.
Once you initiate attribute lookup by using the “dot operator”, the __getattribute__
method is called, and the whole process of name resolution starts. This process stops when resolution is successful, and it goes something like this:
- return the data descriptor that has the desired name (class level), or
- return instance attribute with the desired name (instance level), or
- return non-data descriptor with the desired name (class level), or
- return class attribute with the desired name (class level), or
- raise
AttributeError
that essentially calls the__getattr__
method.
My initial idea was to leave you with a reference to the official documentation on how this mechanism is implemented, at least a Python mockup, for learning purposes, but I have decided to help you out with that part as well. However, I highly advise you to go and read the whole page of official documentation.
So, in the next code snippet, I’ll put some of the descriptions in the comments, so it is easier to read and understand the code. Here it is:
def object_getattribute(obj, name):
"Emulate PyObject_GenericGetAttr() in Objects/object.c"
# Create vanilla object for later use.
null = object()
"""
obj is an object instantiated from our custom class. Here we try
to find the name of the class it was instantiated from.
"""
objtype = type(obj)
"""
name represents the name of the class function, object's method,
or any class attribute. Here, we try to find it and keep a
reference to it. MRO is short for Method Resolution Order, and it
has to do with class inheritance. Not really that important at
this point. Let's say that this mechanism optimally finds its name
through all parent classes.
"""
cls_var = find_name_in_mro(objtype, name, null)
"""
Here we check if this class attribute is an object that has the
__get__ method implemented. If it does, it is a non-data
descriptor. This is important for further steps.
"""
descr_get = getattr(type(cls_var), '__get__', null)
"""
So now it's either our class attribute references a descriptor, in
which case we test to see if it is a data descriptor and we
return reference to the descriptor's __get__ method, or we go to
the next if code block.
"""
if descr_get is not null:
if (hasattr(type(cls_var), '__set__')
or hasattr(type(cls_var), '__delete__')):
return descr_get(cls_var, obj, objtype) # data descriptor
"""
In cases where the name doesn't reference a data descriptor, we
check to see if it references the variable in the object's
dictionary, and if so, we return its value.
"""
if hasattr(obj, '__dict__') and name in vars(obj):
return vars(obj)[name] # instance variable
"""
In cases where the name does not reference the variable in the
object's dictionary, we try to see if it references a non-data
descriptor and return a reference to it.
"""
if descr_get is not null:
return descr_get(cls_var, obj, objtype) # non-data descriptor
"""
In case name did not reference anything from above, we try to see
if it references a class attribute and return its value.
"""
if cls_var is not null:
return cls_var # class variable
"""
If name resolution was unsuccessful, we throw an AttriuteError
exception, and __getattr__ is being invoked.
"""
raise AttributeError(name)
Keep in mind that this implementation is in Python for the sake of documenting and describing the logic implemented in the __getattribute__
method. In reality, it is implemented in C. Just by looking at it, you can imagine that it is better not to play around with re-implementing the whole thing. The best way is to try doing part of the resolution by yourself and then fall back on the CPython implementation with return super().__getattribute__(name)
as shown in the example above.
The important thing here is that each class function (which is an object) gets wrapped in a non-data descriptor (which is a function
class object), and this means that this wrapper object has the __get__
dunder method defined. What this dunder method does is return a new callable (think of it as a new function), where the first argument is the reference to the object on which we are performing the “dot operator”. I said to think about it as a new function since it is a callable. In essence, it is another object called MethodType
. Check it out:
type(p.shout)
# getting the attribute name: shout
# method
type(Person.shout)
# function
One interesting thing certainly is this function
class. This one is exactly the wrapper object that defines the __get__
method. However, once we try to access it as method shout
by “dot operator”, __getattribute__
iterates through the list and stops at the third case (return non-data descriptor). This __get__
method contains additional logic that takes the object’s reference and creates MethodType
with reference to the function
and object.
Here is the official documentation mockup:
class Function:
...
def __get__(self, obj, objtype=None):
if obj is None:
return self
return MethodType(self, obj)
Disregard the difference in class name. I have been using function
instead of Function
to make it easier for grasping, but I’ll use the Function
name from now on so it follows the official documentation explanation.
Anyway, just by looking at this mockup, it may be enough to understand how this function
class fits the picture, but let me add a couple of lines of code that are missing, which will probably make things even clearer. I’ll add two more class functions in this example, namely:
class Function:
...
def __init__(self, fun, *args, **kwargs):
...
self.fun = fun
def __get__(self, obj, objtype=None):
if obj is None:
return self
return MethodType(self, obj)
def __call__(self, *args, **kwargs):
...
return self.fun(*args, **kwargs)
Why did I add these functions? Well, now you can easily imagine how the Function
object plays its role in this whole scenario of method bounding. This new Function
object stores the original function as an attribute. This object is also callable which means that we can invoke it as a function. In that case, it works just as the function it wraps. Remember, everything in Python is an object, even functions. And MethodType
‘wraps’ Function
object along with the reference to the object on which we are calling method (in our case shout
).
How does MethodType
do this? Well, it keeps these references and implements a callable protocol. Here is the official documentation mockup for the MethodType
class:
class MethodType:
def __init__(self, func, obj):
self.__func__ = func
self.__self__ = obj
def __call__(self, *args, **kwargs):
func = self.__func__
obj = self.__self__
return func(obj, *args, **kwargs)
Again, for brevity’s sake, func
ends up referencing our initial class function (shout
), obj
references instance (p
), and then we have arguments and keyword arguments that are passed along. self
in the shout
declaration ends up referencing this ‘obj’, which is essentially p
in our example.
In the end, it should be clear why we make a distinction between functions and methods and how functions get bound once they are accessed through objects by using the “dot operator”. If you think about it, we would be perfectly okay with invoking class functions in the following way:
class Person:
num_of_persons = 0
def __init__(self, name):
self.name = name
def shout(self):
print(f"Hey! I'm {self.name}.")
p = Person('John')
Person.shout(p)
# Hey! I'm John.
Yet, this really is not the advised way and is just plain ugly. Usually, you will not have to do this in your code.
So, before I conclude, I want to go over a couple of examples of attribute resolution just to make this easier to grasp. Let’s use the previous example and figure out how the dot operator works.
p.name
"""
1. __getattribute__ is invoked with p and "name" arguments.
2. objtype is Person.
3. descr_get is null because the Person class doesn't have
"name" in its dictionary (namespace).
4. Since there is no descr_get at all, we skip the first if block.
5. "name" does exist in the object's dictionary so we get the value.
"""
p.shout('Hey')
"""
Before we go into name resolution steps, keep in mind that
Person.shout is an instance of a function class. Essentially, it gets
wrapped in it. And this object is callable, so you can invoke it with
Person.shout(...). From a developer perspective, everything works just
as if it were defined in the class body. But in the background, it
most certainly is not.
1. __getattribute__ is invoked with p and "shout" arguments.
2. objtype is Person.
3. Person.shout is actually wrapped and is a non-data descriptor.
So this wrapper does have the __get__ method implemented, and it
gets referenced by descr_get.
4. The wrapper object is a non-data descriptor, so the first if block
is skipped.
5. "shout" doesn't exist in the object's dictionary because it is part
of class definition. Second if block is skipped.
6. "shout" is a non-data descriptor, and its __get__ method is returned
from the third if code block.
Now, here we tried accessing p.shout('Hey'), but what we did get is
p.shout.__get__ method. This one returns a MethodType object. Because
of this p.shout(...) works, but what ends up being called is an
instance of the MethodType class. This object is essentially a wrapper
around the `Function` wrapper, and it holds reference to the `Function`
wrapper and our object p. In the end, when you invoke p.shout('Hey'),
what ends up being invoked is `Function` wrapper with p object, and
'Hey' as one of the positional arguments.
"""
Person.shout(p)
"""
Before we go into name resolution steps, keep in mind that
Person.shout is an instance of a function class. Essentially, it gets
wrapped in it. And this object is callable, so you can invoke it with
Person.shout(...). From a developer perspective, everything works just
as if it were defined in the class body. But in the background, it
most certainly is not.
This part is the same. The following steps are different. Check
it out.
1. __getattribute__ is invoked with Person and "shout" arguments.
2. objtype is a type. This mechanism is described in my post on
metaclasses.
3. Person.shout is actually wrapped and is a non-data descriptor,
so this wrapper does have the __get__ method implemented, and it
gets referenced by descr_get.
4. The wrapper object is a non-data descriptor, so first if block is
skipped.
5. "shout" does exist in an object's dictionary because Person is
object after all. So the "shout" function is returned.
When Person.shout is invoked, what actually gets invoked is an instance
of the `Function` class, which is also callable and wrapper around the
original function defined in the class body. This way, the original
function gets called with all positional and keyword arguments.
"""
And this concludes what I wanted to write in this article. Descriptors are next, since it is so important concept.
Until then, best of luck paving your way into the land of Python.
References
Here is one random Midjourney art for you persistent enough!