Skip to content Skip to sidebar Skip to footer

Python 2-d Array Get The Function As Np.unique Or Union1d

as follows I have a 2-D list/array list1 = [[1,2],[3,4]] list2 = [[3,4],[5,6]] how can I use the function as union1d(x,y)to make list1 and list2 as one list list3 = [[1,2],[3,4],[

Solution 1:

union1d just does:

unique(np.concatenate((ar1, ar2)))

so if you have a method of finding unique rows, you have the solution.

As described in the suggested link, and elsewhere, you can do this by converting the array to a 1d structured array. Here the simple version is

If arr is:

arr=np.array([[1,2],[3,4],[3,4],[5,6]])

the structured equivalent (a view, same data):

In [4]: arr.view('i,i')
Out[4]: 
array([[(1, 2)],
       [(3, 4)],
       [(3, 4)],
       [(5, 6)]], 
      dtype=[('f0', '<i4'), ('f1', '<i4')])

In [5]: np.unique(arr.view('i,i'))
Out[5]: 
array([(1, 2), (3, 4), (5, 6)], 
      dtype=[('f0', '<i4'), ('f1', '<i4')])

and back to 2d int:

In [7]: np.unique(arr.view('i,i')).view('2int')
Out[7]: 
array([[1, 2],
       [3, 4],
       [5, 6]])

This solution does require a certain familiarity with compound dtypes.

Using return_index saves that return view. We can index arr directly with that index:

In [54]: idx=np.unique(arr.view('i,i'),return_index=True)[1]

In [55]: arr[idx,:]
Out[55]: 
array([[1, 2],
       [3, 4],
       [5, 6]])

For what it's worth, unique does a sort and then uses a mask approach to remove adjacent duplicates.

It's the sort that requires a 1d array, the rest works in 2d

Here arr is already sorted

In [42]: flag=np.concatenate([[True],(arr[1:,:]!=arr[:-1,:]).all(axis=1)])

In [43]: flag
Out[43]: array([ True,  True, False,  True], dtype=bool)

In [44]: arr[flag,:]
Out[44]: 
array([[1, 2],
       [3, 4],
       [5, 6]])

https://stackoverflow.com/a/16971324/901925 shows this working with lexsort.

================

The mention of np.union1d set me and Divakar to focus on numpy methods. But it starting with lists (of lists), it is likely to be faster to use Python set methods.

For example, using list and set comprehensions:

In[99]: [list(x) for x in {tuple(x) for x in list1+list2}]Out[99]: [[1, 2], [3, 4], [5, 6]]

You could also take the set for each list, and do a set union.

The tuple conversion is needed because a list isn't hashable.

Solution 2:

One approach would be to stack those two input arrays vertically with np.vstack and then finding the unique rows in it. It would be memory intensive as we would discard rows from it thereafter.

Another approach would be to find the rows in the first array that are exclusive to it, i.e. not present in the second array and thus just stacking those exclusive rows alongwith the second array. Of course, this would assume that there are unique rows among each input array.

The crux of such a proposed memory-saving implementation would be to get those exclusive rows from first array. For the same we would convert each row into a linear index equivalent considering each row as an indexing tuple on a n-dimensional grid, with the n being the number of columns in the input arrays. Thus, assuming the input arrays as arr1 and arr2, we would have an implementation like so -

# Get dim of ndim-grid on which linear index equivalents are to be mappeddims = np.maximum(arr1.max(0),arr2.max(0)) + 1# Get linear index equivalents for arr1, arr2idx1 = np.ravel_multi_index(arr1.T,dims)
idx2 = np.ravel_multi_index(arr2.T,dims)

# Finally get the exclusive rows and stack with arr2 for desired o/pout = np.vstack((arr1[~np.in1d(idx1,idx2)],arr2))

Sample run -

In [93]: arr1
Out[93]: 
array([[1, 2],
       [3, 4],
       [5, 3]])

In [94]: arr2
Out[94]: 
array([[3, 4],
       [5, 6]])

In [95]: out
Out[95]: 
array([[1, 2],
       [5, 3],
       [3, 4],
       [5, 6]])

For more info on setting up those linear index equivalents, please refer to this post.

Post a Comment for "Python 2-d Array Get The Function As Np.unique Or Union1d"