Getting Unique Values From Pandas Column Of 2d Array Cells
I have a pandas DataFrame where each cell in a column is a 2d array of items. EX: Observation 1 has column items with values ['Baseball', 'Glove','Snack'] When I use .unique on the
Solution 1:
I would use chain
method of itertools
together with set
s as to solve the problem as follows.
# you have a dataframe called data with the column items.from itertools import chain
unique_lists_in_items = data.items.unique().tolist()
set_of_items = set(chain(*unique_lists_in_items))
set_of_items
is what you want.
Solution 2:
You can use np.unique
and np.concatenate
on the column of interest. I have made an example below:
import pandas as pd
import numpy as np
df = pd.DataFrame({'fruits':(np.array(['banana', 'apple']), np.array(['cherry', 'apple']))})
# items#0 [banana, apple]#1 [cherry, apple]
np.concatenate(df.fruits.values) #.values accesses the numpy array representation of the column#array(['banana', 'apple', 'cherry', 'apple'],# dtype='<U6')
np.unique(np.concatenate(df.fruits.values)) #unique items#array(['apple', 'banana', 'cherry'],# dtype='<U6')
np.unique(np.concatenate(df.fruits.values), return_counts=True) #counts#(array(['apple', 'banana', 'cherry'],# dtype='<U6'), array([2, 1, 1]))
subset = df.fruits.dropna() # getting rid of NaNs
subset.loc[subset.map(len)!=0] #get rid of zero-length arrays#0 [banana, apple]#1 [cherry, apple]#Name: fruits, dtype: object
np.unique(np.concatenate(subset.loc[subset.map(len)!=0].values), return_counts=True) #This works as desired#(array(['apple', 'banana', 'cherry'],
dtype='<U6'), array([2, 1, 1]))
Post a Comment for "Getting Unique Values From Pandas Column Of 2d Array Cells"