Skip to content Skip to sidebar Skip to footer

Assigning Dtype Value Using Array.dtype = In NumPy Arrays Gives Ambiguous Results

I am new to programming and numpy... While reading tutorials and experimenting on jupyter-notebook... I thought of converting dtype of a numpy array as follows: import numpy as np

Solution 1:

Floats and integers (numpy.float64s and numpy.int64s) are represented differently in memory. The value 42 stored in these different types corresponds to a different bit pattern in memory.

When you're reassigning the dtype attribute of an array, you keep the underlying data unchanged, and you're telling numpy to interpret that pattern of bits in a new way. Since the interpretation now doesn't match the original definition of the data, you end up with gibberish (meaningless numbers).

On the other hand, converting your array via .astype() will actually convert the data in memory:

>>> import numpy as np
>>> arr = np.random.rand(3)
>>> arr.dtype
dtype('float64')
>>> arr
array([ 0.7258989 ,  0.56473195,  0.20885672])
>>> arr.data
<memory at 0x7f10d7061288>
>>> arr.dtype = np.int64
>>> arr.data
<memory at 0x7f10d7061348>
>>> arr
array([4604713535589390862, 4603261872765946451, 4596692876638008676])

Proper conversion:

>>> arr = np.random.rand(3)*10
>>> arr
array([ 3.59591191,  1.21786042,  6.42272461])
>>> arr.astype(np.int64)
array([3, 1, 6])

As you can see, using astype will meaningfully convert the original values of the array, in this case it will truncate to the integer part, and return a new array with corresponding values and dtype.

Note that assigning a new dtype doesn't trigger any checks, so you can do very weird stuff with your array. In the above example, 64 bits of floats were reinterpreted as 64 bits of integers. But you can also change the bit size:

>>> arr = np.random.rand(3)
>>> arr.shape
(3,)
>>> arr.dtype
dtype('float64')
>>> arr.dtype = np.float32
>>> arr.shape
(6,)
>>> arr
array([  4.00690371e+35,   1.87285304e+00,   8.62005305e+13,
         1.33751166e+00,   7.17894062e+30,   1.81315207e+00], dtype=float32)

By telling numpy that your data occupies half the space than originally, numpy will deduce that your array has twice as many elements! Clearly not what you should ever want to do.


Another example: consider the 8-bit unsigned integer 255==2**8-1: it corresponds to 11111111 in binary. Now, try to reinterpret two of these numbers as a single 16-bit unsigned integer:

>>> arr = np.array([255,255],dtype=np.uint8)
>>> arr.dtype = np.uint16
>>> arr
array([65535], dtype=uint16)

As you can see, the result is the single number 65535. If that doesn't ring a bell, it's exactly 2**16-1, with 16 ones in its binary pattern. The two full-one patterns were reinterpreted as a single 16-bit number, and the result changed accordingly. The reason you often see weirder numbers is that reinterpreting floats as ints as vice versa will lead to a much stronger mangling of the data, due to how floating-point numbers are represented in memory.


As hpaulj noted, you can directly perform this reinterpretation of the data by constructing a new view of the array with a modified dtype. This is probably more useful than having to reassign the dtype of a given array, but then again changing the dtype is only useful in fairly rare, very specific use cases.


Post a Comment for "Assigning Dtype Value Using Array.dtype = In NumPy Arrays Gives Ambiguous Results"