How To Share Pandas Dataframe Object Between Processes?

March 31, 2024 Post a Comment

This question has the same point of the link that I posted before. ( Is there a good way to avoid memory deep copy or to reduce time spent in multiprocessing? ) I'm getting nowhere

Solution 1:

You can use a Namespace Manager, the following code works as you expect.

#-*- coding: UTF-8 -*-'
import pandas as pd
import numpy as np
from multiprocessing import *
import multiprocessing.sharedctypes as sharedctypes
import ctypes

def add_new_derived_column(ns):
    dataframe2 = ns.df
    dataframe2['new_column']=dataframe2['A']+dataframe2['B'] / 2
    print (dataframe2.head())
    ns.df = dataframe2

if __name__ == "__main__":

    mgr = Manager()
    ns = mgr.Namespace()

    dataframe = pd.DataFrame(np.random.randn(100000, 2), columns=['A', 'B'])
    ns.df = dataframe
    print (dataframe.head())

    # then I pass the "shared_df_obj" to Mulitiprocessing.Process object
    process=Process(target=add_new_derived_column, args=(ns,))
    process.start()
    process.join()

    print (ns.df.head())

Python Playground

How To Share Pandas Dataframe Object Between Processes?

Solution 1:

Post a Comment for "How To Share Pandas Dataframe Object Between Processes?"