Python Identity: Multiple Personality Disorder, Need Code Shrink

October 31, 2022 Post a Comment

Possible Duplicate: Python “is” operator behaves unexpectedly with integers I stumbled upon the following Python weirdity: >>> two = 2 >>> ii = 2 >>

Solution 1:

Integers between -1 and 255(?), as well as string literals, are interned. Each instance in the source actually represents the same object.

In CPython, the result of id() is the address in the process space of the PyObject.

Solution 2:

Every implementation of Python is fully allowed to optimize to any extent (including.... none at all;-) the identity and allocation of immutable objects (such as numbers, tuples and strings) [[no such latitude exists for mutable objects, such as lists, dicts and sets]].

Between two immutable object references a and b, all the implementation must guarantee is:

id(a) == id(b), AKA a is b, must always imply a == b
and therefore a != b must always imply id(a) != id(b) AKA a is not b

Note in particular there is no constraint, even for immutable types, that a == b must imply a is b (i.e. that id(a) == id(b)). Only None makes that guarantee (so you can always test if x is None: rather than if x == None:).

Current CPython implementations take advantage of these degrees of freedom by "merging" (having a single allocation, thus a single id, for) small integers in a certain range, and built-in immutable-type objects whose literals appear more than once within a given function (so for example if your function f has four occurrences of literal 'foobar' they will all refer to a single instance of string 'foobar' within the function's constants, saving a little space compared to the permissible implementation that would store four identical but separate copies of that constant).

All of these implementation considerations are of pretty minor interest to Python coders (unless you're working on a Python implementation, or at least something that's tightly bound to a specific implementation, such as a debugging system).

Solution 3:

Your fourth question, "in the above example, are two and ii pointers to a memory cell holding the value 2? that would be extremely weird", is really the key to understanding the whole thing.

If you're familiar with languages like C, Python "variables" don't really work the same way. A C variable declaration like:

int j=1;
int k=2;
k += j;

says, "compiler, reserve for me two areas of memory, on the stack, each with enough space to hold an integer, and remember one as 'j' and the other as 'k'. Then fill j with the value '1' and k with the value '2'." At runtime, the code says "take the integer contents of k, add the integer contents of j, and store the result back to k."

The seemingly equivalent code in Python:

j = 1
k = 2
k += j

says something different: "Python, look up the object known as '1', and create a label called 'j' that points to it. Look up the object known as '2', and create a label called 'k' that points to it. Now look up the object 'k' points to ('2'), look up the object 'j' points to ('1'), and point 'k' to the object resulting from performing the 'add' operation on the two."

Disassembling this code (with the dis module) shows this nicely:

  2           0 LOAD_CONST               1 (1)
              3 STORE_FAST               0 (j)

  3           6 LOAD_CONST               1 (2)
              9 STORE_FAST               1 (k)

  4          12 LOAD_FAST                1 (k)
             15 LOAD_FAST                0 (j)
             18 INPLACE_ADD
             19 STORE_FAST               1 (k)

So yes, Python "variables" are labels that point to objects, rather than containers that can be filled with data.

The other three questions are all variations on "when does Python create a new object from a piece of code, and when does it reuse one it already has?". The latter is called "interning"; it happens to smaller integers and strings that look (to Python) like they might be symbol names.

Solution 4:

You should be very careful with these sorts of investigations. You are looking into the internals of the implementation of the language, and those are not guaranteed. The help on id is spot-on: the number will be different for two different objects, and the same for the same object. As an implementation detail, in CPython it is the memory address of the object. CPython might decide to change this detail at any time.

The detail of small integers being interned to same allocation time is also a detail that could change at any time.

Also, if you switch from CPython to Jython, or PyPy, or IronPython, all bets are off, other than the documentation on id().

Solution 5:

Not every number is a unique object, and the fact that some are is an optimization detail of the CPython interpreter. Do not rely on this behavior. For that matter, never use is to test for equality. Only use is if you are absolutely sure you need the exact same object.

Python Playground