Memory Growth With Broadcast Operations In Numpy
Solution 1:
@rth's suggestion to do the operation in smaller batches is a good one. You could also try using the function np.subtract
and give it the destination array to avoid creating an addtional temporary array. I also think you don't need to index c
as c[np.newaxis, :, :]
, because it is already a 3-d array.
So instead of
a[:]= b[:,:, np.newaxis]-c[np.newaxis,:,:]# memory explodes here
try
np.subtract(b[:, :, np.newaxis], c, a)
The third argument of np.subtract
is the destination array.
Solution 2:
Well, your array a
takes already 1192953*192*32* 8 bytes/1.e9 = 58 GB
of memory.
The broadcasting does not make additional memory allocations for the initial arrays, but the result of
b[:, :, np.newaxis] - c[np.newaxis, :, :]
is still saved in a temporary array. Therefore at this line, you have allocated at least 2 arrays with the shape of a
for a total memory used >116 GB
.
You can avoid this issue, by operating on a smaller subset of your array at one time,
CHUNK_SIZE = 100000for idx inrange(b.shape[0]/CHUNK_SIZE):
sl = slice(idx*CHUNK_SIZE, (idx+1)*CHUNK_SIZE)
a[sl] = b[sl, :, np.newaxis] - c[np.newaxis, :, :]
this will be marginally slower, but uses much less memory.
Post a Comment for "Memory Growth With Broadcast Operations In Numpy"