Is Python pass by value or pass by reference?
This question is ubiquitous in all programming languages. Once you start using your programming language of choice for more than just simple programs and purposes, you will have to deal with more intermediate and advanced concepts. One of these is how function arguments are passed: by value or by reference?
Why is this something that is not that simple or easy to grasp? Well, you have to make sure you know what is happening with memory management once your program is running. Also, this certainly requires knowing the intricacies of language syntax and semantics. And if you have experience using lower-level languages like C or C++, you will know that similar questions bring a whole new level of advanced concepts like pointers, references, differences between them, and best practices for using them. It is not for the faint-hearted, I can tell you that.
Let’s start gently.
As a beginner, you may decide to define specific function for getting the circle circumference knowing the radius. And you may define it like this:
def circumference(radius):
# let's keep it simple and use this value for PI
return 2 * radius * 3.14
Calling this function by passing the value for radius
is easy and intuitive. You pass the value, and you get the result. What else is there to it?
Now, let’s say you want to count how many times you have called the function, by introducing the new variable count
and passing it as the second argument of our function.
def circumference(radius, count):
count += 1
return 2 * radius * 3.14
What would be the result of the function call?
n_calls = 0
print(circumference(1, n_calls))
# 6.28
print(n_calls)
# 0
The circumference is correct, but what happened with n_calls
? Why is it still 0, and not 1? Here the “pass by value or pass by reference?” question arises.
You probably already know that Python has one interesting function called id
that returns the memory location of a variable. For example, we can redefine our function and print the memory location of the count
variable:
def circumference(radius, count):
print('Old count memory location:', id(count))
count = count + 1
print('New count memory location:', id(count))
return 2* radius * 3.14
Now, we can print the memory location of our variable n_calls
before we pass it to the function call, and the function will print the memory location of the count
variable inside of it.
Before you try out this one, let’s make a thought experiment while introducing the concepts of pass by value and pass by reference.
Pass by value means that each variable value that is passed as an argument in a function call is copied somewhere in computer memory, and that function argument name becomes the variable name that is assigned to it. This means that changing the value of this variable (count
in our case) inside the function (value stored in a new memory location), doesn’t affect the value of the original variable (n_calls
) value. How can we check if a new memory location is used for copying the variable value? id
is our rescue.
Pass by reference means that changing the value of a variable by using the name defined in the function declaration (count
) affects the value of the original variable (n_calls
). How can we check for this? Well, we already did with trying to modify n_calls
by changing count
inside our function. From the example, you can clearly see that Python doesn’t pass function argument variables by reference.
What is left for us is to try to make a proof that Python passes function argument variables by value, and we are done.
n_calls = 0
print(id(n_calls))
# 140472497733840
print(circumference(1, n_calls))
# Old count memory location: 140472497733840
# New count memory location: 140472497733872
# 6.28
print(n_calls)
# 0
Alright, this is interesting. Apparently, it looks like the memory location of count
is the same as the memory location of n_calls
. This totally overrules our conclusion that Python passes by value because, in that case, memory locations would be different. Yet, when we change count
in the function, it doesn’t update n_calls
in the caller (outer scope). So it’s not passing a Python reference either. What kind of black magic are we witnessing here?
This is called passing by assignment. Yes, it does seem a bit odd because it sidetracks our initial question. And that question is perfectly valid for any programming language, at least the most widely used ones. So, what this mouthful of a statement means is that Python is passing by reference. Yet, when you use the same variable name in function local scope, this new name (reference) hides the one from the function argument. To go further, it hides the same variable name from any of the enclosing lexical scopes that exist in Python. So, the same variable name now references another object in memory.
How does this affects your options as a developer?
Well, if you want to achieve the effects of passing by reference, do not reassign the variable name to newly instantiated objects inside the function. What you can use is a mutable data structure, like a list
, dict
, or custom-defined object, and update their state while keeping the reference intact. Example:
def circumference(radius, count):
print('Old count memory location:', id(count))
count[0] += 1
print('New count memory location:', id(count))
return 2 * radius * 3.14
n_calls = [0]
print(id(n_calls))
# 140472420555968
print(circumference(1, n_calls))
# Old count memory location: 140472430829632
# New count memory location: 140472430829632
# 6.28
print(n_calls)
# [1]
This way, you keep the reference (name) to the object that was created by the caller and just update the object’s state. Pretty much the same can be achieved with a dictionary or custom object.
def circumference(radius, count):
print('Old count memory location:', id(count))
count['calls'] += 1
print('New count memory location:', id(count))
return 2 * radius * 3.14
n_calls = {'calls': 0}
print(id(n_calls))
# 140472206558080
print(circumference(1, n_calls))
# Old count memory location: 140472206558080
# New count memory location: 140472206558080
# 6.28
print(n_calls)
# {'calls': 1}
I’ll leave you to do the experiment with custom objects.
Why was this so important? Well, I think that some concepts can be tricky to fully grasp, and not knowing them will eventually come back to bite you. In the best-case scenario, you will get the error at runtime, and you will know that something has to be fixed. The worst-case scenario is when everything runs smoothly but the error affects the business logic. These are unknown unknowns.
Check this out:
def test(var):
print(id(var))
var += 1
print(id(var))
return var
a, b = 5, 6
print(id(a), id(b))
# (140472497734000, 140472497734032)
test(a)
# 140472497734000
# 140472497734032
# 6
What is happening here? I’ll probably write about this in some of my next blog posts.