Skip to content Skip to sidebar Skip to footer
Showing posts with the label Bigdata

Python - Parsing A Text Onto Columns By The Position Of Each Item

The Bovespa (brazilian stock exchange) offer a file with all the quotes in a timeframe. The file is… Read more Python - Parsing A Text Onto Columns By The Position Of Each Item

Incremental Pca On Big Data

I just tried using the IncrementalPCA from sklearn.decomposition, but it threw a MemoryError just l… Read more Incremental Pca On Big Data

Correct Way Of Writing Two Floats Into A Regular Txt

I am running a big job, in cluster mode. However, I am only interested in two floats numbers, which… Read more Correct Way Of Writing Two Floats Into A Regular Txt

Get A List Of Subdirectories

I know I can do this: data = sc.textFile('/hadoop_foo/a') data.count() 240 data = sc.textFi… Read more Get A List Of Subdirectories

How Can A Reduce A Key Value Pair To Key And List Of Values?

Let us Assume, I have a key value pair in Spark, such as the following. [ (Key1, Value1), (Key1, Va… Read more How Can A Reduce A Key Value Pair To Key And List Of Values?

Numpy: 3-byte, 6-byte Types (aka Uint24, Uint48)

NumPy seems to lack built-in support for 3-byte and 6-byte types, aka uint24 and uint48. I have a l… Read more Numpy: 3-byte, 6-byte Types (aka Uint24, Uint48)