Skip to content Skip to sidebar Skip to footer

Getting Unique Values From Pandas Column Of 2d Array Cells

I have a pandas DataFrame where each cell in a column is a 2d array of items. EX: Observation 1 has column items with values ['Baseball', 'Glove','Snack'] When I use .unique on the

Solution 1:

I would use chain method of itertools together with sets as to solve the problem as follows.

# you have a dataframe called data with the column items.from itertools import chain
unique_lists_in_items = data.items.unique().tolist()
set_of_items = set(chain(*unique_lists_in_items))

set_of_items is what you want.

Solution 2:

You can use np.unique and np.concatenate on the column of interest. I have made an example below:

import pandas as pd
import numpy as np

df = pd.DataFrame({'fruits':(np.array(['banana', 'apple']), np.array(['cherry', 'apple']))})
#   items#0  [banana, apple]#1  [cherry, apple]
np.concatenate(df.fruits.values) #.values accesses the numpy array representation of the column#array(['banana', 'apple', 'cherry', 'apple'],#      dtype='<U6')
np.unique(np.concatenate(df.fruits.values)) #unique items#array(['apple', 'banana', 'cherry'],#      dtype='<U6')
np.unique(np.concatenate(df.fruits.values), return_counts=True) #counts#(array(['apple', 'banana', 'cherry'],#   dtype='<U6'), array([2, 1, 1]))
subset = df.fruits.dropna() # getting rid of NaNs
subset.loc[subset.map(len)!=0] #get rid of zero-length arrays#0    [banana, apple]#1    [cherry, apple]#Name: fruits, dtype: object
np.unique(np.concatenate(subset.loc[subset.map(len)!=0].values), return_counts=True) #This works as desired#(array(['apple', 'banana', 'cherry'],
   dtype='<U6'), array([2, 1, 1]))

Post a Comment for "Getting Unique Values From Pandas Column Of 2d Array Cells"