Pandas Astype Not Recognize Fix Length Bytestring Format
Consider the following example: df = pd.DataFrame([[1, 'a'], [2, 'b']], columns=['int', 'str']) df.astype({'int':np.int8, 'str': np.dtype('|S2')}) arr = df.to_records(index=False)
Solution 1:
It means type object
, from the docs:
'O' (Python) objects
When you create your DataFrame, although you specify types, the strings are of type Object
:
df.dtypes
int int64
strobject
dtype: object
astype
is not an inplace operation, so your command does nothing at the moment, you need to reassign:
df = df.astype({"int":np.int8, "str": np.dtype('|S2')})
This still does not convert the strings from object
however:
df.dtypes
int int8
strobject
dtype: object
So when you use to_records
, object
is used instead of your designated type.
A fix would be to create your string series separately, and assign it to your DataFrame:
s = pd.Series(['a', 'b'], dtype=np.dtype('|S2'))
df['d'] = s
df.dtypes
int int8
strobject
d |S2
dtype: object
And using to_records
:
df.to_records(index=False)
rec.array([(1, b'a', b'a'), (2, b'b', b'b')],
dtype=[('int', 'i1'), ('str', 'O'), ('d', 'S2')])
Post a Comment for "Pandas Astype Not Recognize Fix Length Bytestring Format"