Is There Any Way To Save And Read Multi-dimension Data With Efficiency?
Solution 1:
I think you can use MultiIndex
or Panel
and then if necessary save data to hdf5
.
Also function concat
have parameter keys
which create MultiIndex
from list of DataFrames
.
Sample:
df1 = pd.DataFrame({'A':[1,2,3],
'B':[4,5,6],
'C':[7,8,9],
'D':[1,3,5]})
print (df1)
A B C D
014711258323695
df2 = df1 * 10
dfs = [df1, df2]
df3 = pd.concat(dfs, keys=['a','b'])
print (df3)
A B C D
a 014711258323695
b 010407010120508030230609050print (df3.index)
MultiIndex(levels=[['a', 'b'], [0, 1, 2]],
labels=[[0, 0, 0, 1, 1, 1], [0, 1, 2, 0, 1, 2]])
wp = pd.Panel({'a' : df1, 'b' : df2})
print (wp)
<class'pandas.core.panel.Panel'>
Dimensions: 2 (items) x 3 (major_axis) x 4 (minor_axis)
Items axis: a to b
Major_axis axis: 0 to 2
Minor_axis axis: A to D
Solution 2:
You may want to use HDF, which has been specifically designed to handle huge arrays of multidimensional data.
Solution 3:
The simplest answer may be just to create a sqlite3
database.
It sounds like you have 6 pieces of data per hour (station, timestamp, feature1..feature4) times 1000 stations, times however-many hours.
So that's 6000 data items (at, say, 4 bytes each = 24k), times 24 hours/day times 365 days/year (* 8760), or about 200mb, per year. Depending on how far back you're going, that's not too bad for a db file. (If you're going to do more than 10 years, then yeah, go to something bigger, or maybe compress the data or break it up by year or something...)
Post a Comment for "Is There Any Way To Save And Read Multi-dimension Data With Efficiency?"