Pandas Dataframe: Replace Charactere Conditionally
I have a dataframe with a column named 'Size'. This column have some values containing the size of an android applications list. Size 8.7M 68M 2M I need to replace these values t
Solution 1:
General solution for replace by multiple units:
#dict for replace
_prefix = {'k': 1e3, # kilo
'M': 1e6, # mega
'B': 1e9, # giga
}
#all keys of dict separated by | (or)
k = '|'.join(_prefix.keys())
#extract values to new df
df1 = df['Size'].str.extract('(?P<a>[0-9.]*)(?P<b>' + k +')*', expand=True)
#convert numeric column to float
df1.a = df1.a.astype(float)
#map values by dictionary, replace NaN (no prefix) to 1
df1.b = df1.b.map(_prefix).fillna(1)
#multiple columns together
df['Size'] = df1.a.mul(df1.b).astype(int)
print (df)
Size
0 8700000
1 68000000
2 2000000
If want only replace M
solution should be simplified:
df['Size'] = df['Size'].str.replace('M', '').astype(float).mul(1e6).astype(int)
print (df)
Size
0 8700000
1 68000000
2 2000000
Post a Comment for "Pandas Dataframe: Replace Charactere Conditionally"