Skip to content Skip to sidebar Skip to footer

Creating Large Lmdbs For Caffe With Numpy Arrays

I have two 60 x 80921 matrices, one filled with data, one with reference. I would like to store the values as key/value pairs in two different LMDBs, one for training (say I'll sli

Solution 1:

It's not 100% clear what you are trying to do: are you treating each entry as a separate data sample, or are you trying to train on 60K 1D vectors of dim=60...

Assuming you have 60K training samples of dim 60, you can write the training lmdbs like this:

env_x = lmdb.open('sensormatrix_train_x_lmdb', map_size=map_size) # you can put map_size a little bigger 
env_y = lmdb.open('sensormatrix_train_y_lmdb', map_size=map_size)
with env_x.begin(write=True) as txn_x, env_y.begin(write=True) as txn_y:
    for i in xrange(X_train.shape[1]):
        x = X_train[:,i]
        y = Y_train[:,i] 

        datum_x = caffe.io.array_to_datum(arr=x.reshape((60,1,1)),label=i)
        datum_y = caffe.io.array_to_datum(arr=y.reshape((60,1,1)),label=i)
        keystr = '{:0>10d}'.format(i) # format an lmdb key for this entry
        txn_x.put( keystr, datum_x.SerializeToString() ) # actual write to lmdb
        txn_y.put( keystr, datum_y.SerializeToString() )

Now you have two lmdb for training, in your 'prototxt' you should have two corresponding "Data" layers:

layer {
  name: "input_x"
  top: "x"top: "idx_x"type: "Data"
  data_param { source: "sensormatrix_train_x_lmdb"batch_size: 32 }
  include { phase: TRAIN }
}
layer {
  name: "input_y"
  top: "y"top: "idx_y"type: "Data"
  data_param { source: "sensormatrix_train_y_lmdb"batch_size: 32 }
  include { phase: TRAIN }
}

To make sure you read corresponding xy pairs, you can add a sanity check

layer {
  name:"sanity"type:"EuclideanLoss"bottom:"idx_x"bottom:"idx_y"top:"sanity"loss_weight:0propagate_down:falsepropagate_down:false
}

Post a Comment for "Creating Large Lmdbs For Caffe With Numpy Arrays"