I find the own, borrow, steal metaphor in the Python
documentation to be confusing. I prefer to restrict the use of
reference to the meaning pointer to an PyObject. If
Py_INCREF()
has been called for a reference, the reference is
protected. From the Python coder's point of view, a reference is an
object. Therefore it is reasonable to use object interchangeably with
reference.
Py_INCREF()
and decremented by Py_DECREF()
. If
the counter becomes zero, Python might delete the object.
Py_INCREF()
, there should eventually be a
call to Py_DECREF()
. Call Py_INCREF()
for
objects that you want to keep around for a while. Call
Py_DECREF()
when you are done with them. A pointer to an
object that has been INCREFed is said to be protected.
Py_DECREF()
. The calling function can, in turn,
pass responsibility for the DECREF to its caller.
PyTuple_GetItem()
.
PyTuple_SetItem()
and PyList_SetItem()
. These
functions take over responsibility of the item passed to them -- even if
they fail!
malloc()
and free()
. In
C++, the operators new
and delete
are used with
essentially the same meaning; they are actually implemented using
malloc()
and free()
, so we'll restrict the
following discussion to the latter.
{EE 1.10} Every block of memory allocated with malloc()
should eventually be returned to the pool of available memory by exactly one
call to free()
. It is important to call free()
at
the right time. If a block's address is forgotten but free()
is
not called for it, the memory it occupies cannot be reused until the program
terminates. This is called a memory leak. On the other hand, if a program
calls free()
for a block and then continues to use the block,
it creates a conflict with re-use of the block through another
malloc()
call. This is called using freed memory. It has the
same bad consequences as referencing uninitialized data -- core dumps, wrong
results, mysterious crashes.
{EE 1.10} Common causes of memory leaks are unusual paths through the code. For instance, a function may allocate a block of memory, do some calculation, and then free the block again. Now a change in the requirements for the function may add a test to the calculation that detects an error condition and can return prematurely from the function. It's easy to forget to free the allocated memory block when taking this premature exit, especially when it is added later to the code. Such leaks, once introduced, often go undetected for a long time: the error exit is taken only in a small fraction of all calls, and most modern machines have plenty of virtual memory, so the leak only becomes apparent in a long-running process that uses the leaking function frequently. Therefore, it's important to prevent leaks from happening by having a coding convention or strategy that minimizes this kind of errors.
{EE 1.10} Since Python makes heavy use of malloc()
and free()
, it needs
a strategy to avoid memory leaks as well as the use of freed memory.
The chosen method is called reference counting. The principle is
simple: every object contains a counter, which is incremented when a
pointer to the object is stored somewhere, and which is decremented
when the pointer is deleted. When the counter reaches zero, the last
pointer to the object has been deleted and the object is freed.
PyObject
. They are accessed through pointers of type
PyObject*
. Special rules apply to the use of objects to ensure
they are properly garbage-collected. Objects are never allocated statically
or on the stack; they must be accessed through special macros and functions
only. (Type objects are exceptions to the first rule; the standard types
are represented by statically initialized type objects.)
{object.h} An object has a reference count that is increased or decreased when a pointer to the object is copied or deleted; when the reference count reaches zero there are no references to the object left and it can be removed from the heap.
Every Python object has a reference count and a type (and possibly more).
Here is the relevant code from object.h
, one of the Python
header files. PyObject
is a C struct. Each instance of
PyObject
contains the variable ob_refcnt
where the
reference count is kept and ob_type
where the type is
kept.
#define PyObject_HEAD \ int ob_refcnt; \ struct _typeobject *ob_type; typedef struct _object { PyObject_HEAD } PyObject;{object.h} The type of an object determines what it represents and what kind of data it contains. An object's type is fixed when it is created. Types themselves are represented as objects; an object contains a pointer to the corresponding type object. The type itself has a type pointer pointing to the object representing the type 'type', which contains a pointer to itself!).
{object.h} Objects do not float around in memory; once allocated an object keeps the same size and address. Objects that must hold variable-size data can contain pointers to variable-size parts of the object. Not all objects of the same type have the same size; but the size cannot change after allocation. (These restrictions are made so a reference to an object can be simply a pointer -- moving an object would require updating all the pointers, and changing an object's size would require moving it if there was another object right next to it.)
{object.h} Objects are always accessed through pointers of the type
'PyObject *
'. The type 'PyObject
' is the structure
shown above that only contains the reference count and the type pointer. The
actual memory allocated for an object contains other data that can only be
accessed after casting the pointer to a pointer to a longer structure type.
This longer type must start with the reference count and type fields; the
macro PyObject_HEAD
above should be used for this (to
accommodate for future changes). The implementation of a particular object
type can cast the object pointer to the proper type and back.
{object.h} A standard interface also exists for objects that contain an array of items whose size is determined when the object is allocated. It looks like
#define PyObject_VAR_HEAD \ PyObject_HEAD \ int ob_size; /* Number of items in variable part */ typedef struct _object { PyObject_HEAD } PyObject;If the reference count for some object becomes zero, Python will usually reclaim the memory the object uses, making the object disappear. If an object is used in a piece of code, the code should make sure that the reference count cannot drop to zero, usually by adding one to it.
If the reference count never becomes zero, the memory is never reclaimed. If it was not intended for the object to be permanent, there is a memory leak.
Py_INCREF()
and Py_DECREF()
Py_INCREF(op)
and
Py_DECREF(op)
are used to increment or decrement reference
counts. Py_DECREF()
also calls the object's deallocator
function; for objects that don't contain references to other objects or heap
memory this can be the standard function free()
. Both macros
can be used wherever a void expression is allowed. The argument shouldn't be
a NIL pointer.
{object.h} We assume that the reference count field can never overflow; this can be proven when the size of the field is the same as the pointer size but even with a 16-bit reference count field it is pretty unlikely so we ignore the possibility.
To prevent memory leaks, corresponding to each call to
Py_INCREF()
, there must be a call to Py_DECREF()
:
for each call to Py_INCREF()
, there is a responsibility
to call Py_DECREF()
. This responsibility can be passed around
between functions. The last user of the reference must call
Py_DECREF()
.
Py_INCREF()
and Py_DECREF()
PyObject* Py_Something(arguments);These functions usually (but not always!) call
Py_INCREF()
before returning (a pointer to) the new object. Generally, the function that
called Py_Something
has the responsibility to call
Py_DECREF()
. If, in turn, the function returns the object to
its caller, it passes on the responsibility for the reference. Some things
are best understood by example pseudo-code:
void MyCode(arguments) { PyObject* pyo; ... pyo = Py_Something(args);
MyCode
has responsibility for the reference passed to it by
Py_Something
. When MyCode
is done with
pyo
, it must call:
Py_DECREF(pyo);On the other hand, if
MyCode
returns pyo
, there
must not be a call to Py_DECREF()
.
PyObject* MyCode(arguments) { PyObject* pyo; ... pyo = Py_Something(args); ... return pyo; }In this situation,
MyCode
has passed on the
responsibility for DECREFing the reference.
Note: if a function is to return None
, the C code should
look like:
Py_INCREF(Py_None); return Py_None;Remember to INCREF
Py_None
!
So far, only the most common case has been discussed, where
Py_Something
creates a reference and passes responsibility for
it to the caller. In the Python documentation, this is called a new
reference. For example the documentation says:
PyObject* PyList_New(int len) Return value: New reference. Returns a new list of length len on success, or NULL on failure.The documentation uses the word reference is two closely related ways: a pointer to a PyObject and the responsibility to DECREF the object. I will try to use reference in the first sense only. When a reference has been INCREFed, I prefer to say that the reference is protected. I will often get sloppy and say object when I mean reference. (After all, what a C programmer sees as a
PyObject*
the Python
programmer sees as an object.)
But sometimes the Python source code does not call
Py_DECREF()
:
PyObject * PyTuple_GetItem(register PyObject *op, register int i) { if (!PyTuple_Check(op)) { PyErr_BadInternalCall(); return NULL; } if (i < 0 || i >= ((PyTupleObject *)op) -> ob_size) { PyErr_SetString(PyExc_IndexError, "tuple index out of range"); return NULL; } return ((PyTupleObject *)op) -> ob_item[i]; }In the documentation, this is referred to as borrowing a reference:
PyObject* PyTuple_GetItem(PyObject *p, int pos) Return value: Borrowed reference. Returns the object at position pos in the tuple pointed to by p. If pos is out of bounds, returns NULL and sets an IndexError exception.I prefer to say that the the reference (in the sense I use it) is left unprotected.
The functions which return unprotected referencess (borrowing a reference) are:
PyTuple_GetItem()
,
PyList_GetItem()
,
PyList_GET_ITEM()
,
PyList_SET_ITEM()
,
PyDict_GetItem()
,
PyDict_GetItemString()
,
PyErr_Occurred()
,
PyFile_Name()
,
PyImport_GetModuleDict()
,
PyModule_GetDict()
,
PyImport_AddModule()
,
PyObject_Init()
,
Py_InitModule()
,
Py_InitModule3()
,
Py_InitModule4()
, and
PySequence_Fast_GET_ITEM().
{EE 10.1.2} The function PyImport_AddModule()
does not
INCREF the reference it returns even though it may actually create the
object the reference refers to: this is possible because the object is
INCREFed when it is stored in sys.modules.
See also PyArg_ParseTuple()
in Extending and
Embedding, 1.7. This function sometimes returns PyObjects back to the
caller through its arguments. An example from sysmodule.c
is:
static PyObject * sys_getrefcount(PyObject *self, PyObject *args) { PyObject *arg; if (!PyArg_ParseTuple(args, "O:getrefcount", &arg)) return NULL; return PyInt_FromLong(arg->ob_refcnt); }
arg
is an unprotected object. It should not be DECREFed before
leaving the function (because it was never INCREFed!).
{API 1.2.1.1} Here is an example of how you could write a function that
computes the sum of the items in a list of integers, here using
PySequence_GetItem()
and later using
PyList_GetItem()
.
long sum_sequence(PyObject *sequence) { int i, n; long total = 0; PyObject *item; n = PySequence_Length(sequence); if (n < 0) return -1; /* Has no length. */ /* Caller should use PyErr_Occurred() if a -1 is returned. */ for (i = 0; i < n; i++) { /* PySequence_GetItem INCREFs item. */ item = PySequence_GetItem(sequence, i); if (item == NULL) return -1; /* Not a sequence, or other failure */ if (PyInt_Check(item)) total += PyInt_AsLong(item); Py_DECREF(item); } return total; }
{API 1.2.1} An important situation where this arises is for objects that are passed as arguments to C functions in an extension module that are called from Python; the call mechanism guarantees to hold a reference to every argument for the duration of the call.
Here is the sum_list
example again, this time using
PyList_GetItem()
.
long sum_list(PyObject *list) { int i, n; long total = 0; PyObject *item; n = PyList_Size(list); if (n < 0) return -1; /* Not a list */ /* Caller should use PyErr_Occurred() if a -1 is returned. */ for (i = 0; i < n; i++) { /* PyList_GetItem does not INCREF "item". "item" is unprotected. */ item = PyList_GetItem(list, i); /* Can't fail */ if (PyInt_Check(item)) total += PyInt_AsLong(item); } return total; }
{API 1.2.1} Subtle bugs can occur when when a reference is unprotected. A
common example is to extract an object from a list and hold on to it for a
while without incrementing its reference count. Some other operation might
conceivably remove the object from the list, decrementing its reference
count and possible deallocating it. The real danger is that innocent-looking
operations may invoke arbitrary Python code which could do this; there is a
code path which allows control to flow back to the Python user from a
Py_DECREF()
, so almost any operation is potentially dangerous.
For example:
bug(PyObject *list) { PyObject *item = PyList_GetItem(list, 0); PyList_SetItem(list, 1, PyInt_FromLong(0L)); PyObject_Print(item, stdout, 0); /* BUG! */ }{EE 1.10.3} The PyObject
item
is gotten using
PyList_GetItem
and left unprotected. The code then replaces
list[1]
with the value 0
, and finally prints
item
. Looks harmless, right? But it's not!
{EE 1.10.3} Let's follow the control flow into
PyList_SetItem()
. The list has protected references to all its
items, so when item 1 is replaced, it has to dispose of the original item 1
by DECREFing it. Now let's suppose the original item 1 was an instance of a
user-defined class, and let's further suppose that the class defined a
__del__()
method. If this class instance has a reference count
of 1, disposing of it will call its __del__()
method.
{EE 1.10.3} Since it is written in Python, the __del__()
method can execute arbitrary Python code. Could it perhaps do something to
invalidate the reference to item in bug()
? You bet! Assuming
that the list passed into bug()
is accessible to the
__del__()
method, it could execute a statement to the effect of
del list[0]
, and assuming this was the last reference to that
object, it would free the memory associated with it, thereby invalidating
item
.
{EE 1.10.3} The solution, once you know the source of the problem, is easy: temporarily increment the reference count. The correct version of the function reads:
no_bug(PyObject *list) { PyObject *item = PyList_GetItem(list, 0); Py_INCREF(item); /* Protect item. */ PyList_SetItem(list, 1, PyInt_FromLong(0L)); PyObject_Print(item, stdout, 0); Py_DECREF(item); }{EE 1.10.3} This is a true story. An older version of Python contained variants of this bug and someone spent a considerable amount of time in a C debugger to figure out why his
__del__()
methods would fail...
{EE 1.10.3} The second case of problems with unprotected objects is a
variant involving threads. Normally, multiple threads in the Python
interpreter can't get in each other's way, because there is a global lock
protecting Python's entire object space. However, it is possible to
temporarily release this lock using the macro
Py_BEGIN_ALLOW_THREADS
, and to re-acquire it using
Py_END_ALLOW_THREADS
. This is common around blocking I/O calls,
to let other threads use the CPU while waiting for the I/O to complete.
Obviously, the following function has the same problem as the previous one:
bug(PyObject *list) { PyObject *item = PyList_GetItem(list, 0); Py_BEGIN_ALLOW_THREADS ...some blocking I/O call... Py_END_ALLOW_THREADS PyObject_Print(item, stdout, 0); /* BUG! */ }
int Caller(void) { PyObject* pyo; Function(pyo);Most functions assume that the arguments passed to them are already protected. Therefore
Py_INCREF()
is not called inside
Function
unless Function
wants the argument to
continue to exist after Caller
exits. In the documentation,
Function
is said to borrow a reference:
{EE 10.1.2} When you pass an object reference into another
function, in general, the function borrows the reference from you
-- if it needs to store it, it will use Py_INCREF()
to become an
independent owner.
PyDict_SetItem()
can serve as an example of normal behavior.
Putting something in a dictionary is "storing" it. Therefore
PyDict_SetItem()
INCREFs both the key and the value.
{EE 10.1.2} There are exactly two important functions that do not behave
in this normal way: PyTuple_SetItem()
and
PyList_SetItem()
. These functions take over responsibility of
the item passed to them -- even if they fail! The Python documentation uses
the phrase steal a reference to mean "takes over responsibility".
Here is what PyTuple_SetItem(atuple, i, item)
does: If
atuple[i]
currently contains a PyObject, that PyObject is
DECREFed. Then atuple[i]
is set to item
.
item
is not INCREFed.
If PyTuple_SetItem()
fails to insert item
, it
decrements the reference count for item
.
Similarly, PyTuple_GetItem()
does not increment the returned
item's reference count.
Metaphorically, PyTuple_SetItem()
grabs
responsibility for a reference to item
from you. If
item
is unprotected, PyTuple_SetItem()
might
DECREF it anyway which can crash Python.
Other exceptions are PyList_SET_ITEM()
,
PyModule_AddObject()
, and PyTuple_SET_ITEM()
.
Look at this piece of code:
PyObject *t; PyObject *x; x = PyInt_FromLong(1L);At this point x has a reference count of one. When you are done with it, you normally would call
Py_DECREF(x)
. But if if
PyTuple_SetItem
is called:
PyTuple_SetItem(t, 0, x);you must not call
Py_DECREF()
. PyTuple_SetItem()
will call it for you:
when the tuple is DECREFed, the items will be also.
{API 1.2.1.1} PyTuple_SetItem()
, et. al, were designed to
take over responsibility for a reference because of a common idiom for
populating a tuple or list with newly created objects; for example, the code
to create the tuple (1, 2, "three")
could look like
this (forgetting about error handling for the moment). It is better coding
practice to use the less confusing PySequence family of functions as below.
PyObject *t; t = PyTuple_New(3); PyTuple_SetItem(t, 0, PyInt_FromLong(1L)); PyTuple_SetItem(t, 1, PyInt_FromLong(2L)); PyTuple_SetItem(t, 2, PyString_FromString("three"));{API 1.2.1.1} Incidentally,
PyTuple_SetItem()
is the only way to
set tuple items; PySequence_SetItem()
and
PyObject_SetItem()
refuse to do this since tuples are an
immutable data type. You should only use PyTuple_SetItem()
for
tuples that you are creating yourself.
{API 1.2.1.1} Equivalent code for populating a list can be written using
PyList_New()
and PyList_SetItem()
. Such code can
also use PySequence_SetItem()
; this illustrates the difference
between the two (the extra Py_DECREF()
calls):
PyObject *l, *x; l = PyList_New(3); x = PyInt_FromLong(1L); PySequence_SetItem(l, 0, x); Py_DECREF(x); x = PyInt_FromLong(2L); PySequence_SetItem(l, 1, x); Py_DECREF(x); x = PyString_FromString("three"); PySequence_SetItem(l, 2, x); Py_DECREF(x);{API 1.2.1.1} You might find it strange that the 'better coding practice' takes more code. However, you are unlikely to often use these ways of creating and populating a tuple or list. There's a generic function,
Py_BuildValue()
, that can create most common objects from C
values, directed by a format string. For example, the above two blocks of
code could be replaced by the following (which also takes care of the error
checking):
PyObject *t, *l; t = Py_BuildValue("(iis)", 1, 2, "three"); l = Py_BuildValue("[iis]", 1, 2, "three");{API 1.2.1.1} It is much more common to use
PyObject_SetItem()
and friends with protected objects (ie, the reference count was incremented
before passing the item to you), a typical example being arguments that were
passed in to the function you are writing. In that case, their behaviour
regarding reference counts has a simpler appearance, since you don't have to
do anything at all with reference counts. For example, this function sets
all items of a list (actually, any mutable sequence) to a given item:
int set_all(PyObject *target, PyObject *item) { int i, n; n = PyObject_Length(target); if (n < 0) return -1; for (i = 0; i < n; i++) { if (PyObject_SetItem(target, i, item) < 0) return -1; } return 0; }
PyObject* MyFunction(void) { PyObject* temporary_list=NULL; PyObject* return_this=NULL; temporary_list = PyList_New(1); /* Note 1 */ if (temporary_list == NULL) return NULL; return_this = PyList_New(1); /* Note 1 */ if (return_this == NULL) Py_DECREF(temporary_list); /* Note 2 */ return NULL; } Py_DECREF(temporary_list); /* Note 2 */ return return_this; }Note 1: The object returned by
PyList_New
has a
reference count of 1.temporary_list
should disappear when
MyFunction
exits, it must be DECREFed before any return from
the function. If a return can be reached both before or after
temporary_list
is created, then initialize
temporary_list
to NULL
and use
Py_XDECREF()
.
PyTuple_GetItem()
is used.
PyObject* MyFunction(void) { PyObject* temporary=NULL; PyObject* return_this=NULL; PyObject* tup; PyObject* num; int err; tup = PyTuple_New(2); if (tup == NULL) return NULL; err = PyTuple_SetItem(tup, 0, PyInt_FromLong(222L)); /* Note 1 */ if (err) { Py_DECREF(tup); return NULL; } err = PyTuple_SetItem(tup, 1, PyInt_FromLong(333L)); /* Note 1 */ if (err) { Py_DECREF(tup); return NULL; } temporary = PyTuple_Getitem(tup, 0); /* Note 2 */ if (temporary == NULL) { Py_DECREF(tup); return NULL; } return_this = PyTuple_Getitem(tup, 1); /* Note 3 */ if (return_this == NULL) { Py_DECREF(tup); /* Note 3 */ return NULL; } /* Note 3 */ Py_DECREF(tup); return return_this; }Note 1: If
PyTuple_SetItem
fails or if the tuple it
created is DECREFed to 0, then the object returned by
PyInt_FromLong
is DECREFed.PyTuple_Getitem
does not increment the reference
count for the object it returns.temporary
.