Is There Any Way To Save And Read Multi-dimension Data With Efficiency?

September 30, 2023 Post a Comment

Introduction I have a bunch of data series with 1000 stations and each station all have 4 features (e.g Temperature, Wind, CO2 concentration, solar radiation). All the features

Solution 1:

I think you can use MultiIndex or Panel and then if necessary save data to hdf5.

Also function concat have parameter keys which create MultiIndex from list of DataFrames.

Sample:

df1 = pd.DataFrame({'A':[1,2,3],
                   'B':[4,5,6],
                   'C':[7,8,9],
                   'D':[1,3,5]})

print (df1)
   A  B  C  D
014711258323695

df2 = df1 * 10

dfs = [df1, df2]

df3 = pd.concat(dfs, keys=['a','b'])
print (df3)
      A   B   C   D
a 014711258323695
b 010407010120508030230609050print (df3.index)
MultiIndex(levels=[['a', 'b'], [0, 1, 2]],
           labels=[[0, 0, 0, 1, 1, 1], [0, 1, 2, 0, 1, 2]])

wp = pd.Panel({'a' : df1, 'b' : df2})
print (wp)
<class'pandas.core.panel.Panel'>
Dimensions: 2 (items) x 3 (major_axis) x 4 (minor_axis)
Items axis: a to b
Major_axis axis: 0 to 2
Minor_axis axis: A to D

Solution 2:

You may want to use HDF, which has been specifically designed to handle huge arrays of multidimensional data.

Solution 3:

The simplest answer may be just to create a sqlite3 database.

It sounds like you have 6 pieces of data per hour (station, timestamp, feature1..feature4) times 1000 stations, times however-many hours.

So that's 6000 data items (at, say, 4 bytes each = 24k), times 24 hours/day times 365 days/year (* 8760), or about 200mb, per year. Depending on how far back you're going, that's not too bad for a db file. (If you're going to do more than 10 years, then yeah, go to something bigger, or maybe compress the data or break it up by year or something...)

Python Playground

Is There Any Way To Save And Read Multi-dimension Data With Efficiency?

Solution 1:

Solution 2:

Solution 3:

Post a Comment for "Is There Any Way To Save And Read Multi-dimension Data With Efficiency?"