Skip to content Skip to sidebar Skip to footer

Pandas Astype Not Recognize Fix Length Bytestring Format

Consider the following example: df = pd.DataFrame([[1, 'a'], [2, 'b']], columns=['int', 'str']) df.astype({'int':np.int8, 'str': np.dtype('|S2')}) arr = df.to_records(index=False)

Solution 1:

It means type object, from the docs:

'O' (Python) objects

When you create your DataFrame, although you specify types, the strings are of type Object:

df.dtypes

int     int64
strobject
dtype: object

astype is not an inplace operation, so your command does nothing at the moment, you need to reassign:

df = df.astype({"int":np.int8, "str": np.dtype('|S2')})

This still does not convert the strings from object however:

df.dtypes

int      int8
strobject
dtype: object

So when you use to_records, object is used instead of your designated type.

A fix would be to create your string series separately, and assign it to your DataFrame:

s = pd.Series(['a', 'b'], dtype=np.dtype('|S2'))
df['d'] = s

df.dtypes

int      int8
strobject
d         |S2
dtype: object

And using to_records:

df.to_records(index=False)

rec.array([(1, b'a', b'a'), (2, b'b', b'b')],
          dtype=[('int', 'i1'), ('str', 'O'), ('d', 'S2')])

Post a Comment for "Pandas Astype Not Recognize Fix Length Bytestring Format"