Skip to content Skip to sidebar Skip to footer

What Is The Most Efficient Way With Python To Merge Rows In A Csv Which Have A Single Duplicate Field?

I have found somewhat similar questions however the answers that I think could work are too complex for me to morph into what I need. I could use some help figuring out how to acco

Solution 1:

Now's as good a time as any to learn about itertools.groupby:

import csv
from itertools import groupby

# assuming Python 2withopen("source.csv", "rb") as fp_in, open("final.csv", "wb") as fp_out:
    reader = csv.reader(fp_in)
    writer = csv.writer(fp_out)
    grouped = groupby(reader, lambda x: x[0])
    for key, group in grouped:
        rows = list(group)
        rows = [rows[0], rows[-1]]
        columns = zip(*(r[1:] for r in rows))
        use_values = [max(c) for c in columns]
        new_row = [key] + use_values
        writer.writerow(new_row)

produces

$ cat final.csv 
wp.xyz03.def02.01195.1,wp03.xyz03-c01_lc08_m00,wp02.def02-c02_lc14_m00
wp.atl21.lmn01.01193.2,wp03.atl21-c06_lc14_m00,wp03.lmn01
tp.ghi03.ghi05.02194.65,tp05.ghi03-c06_lc11_m00,tp05.ghi05:1

Solution 2:

If I understand what you want to do, have some pseudocode:

Read each line:
Split by comma
Add each section to a large list

NextUntil list is empty:

Foreach value in the list:
Write value to file, then write a comma
Search a list, and remove duplicate values

That seem like it? I can write you a python program if this is what you're intending

Edit:

I wrote a program, as far as I can see, the example inputs you gave me became the example outputs

FileInput = open("Input.txt") #Open an input file
EntireFile = FileInput.read() #Read to the end of the file

EntireFile = EntireFile.replace("\n","").replace("\r","")
#Remove newline characters

SplittedByComma = EntireFile.split(",")
#Split into a list

FileOutput = open("Output.txt","w") #The output file#Go through the list. For each element, remove other ones that are the same
for X in SplittedByComma:
    for Y in range(len(SplittedByComma)-1,0,-1):
        if (X == SplittedByComma[Y]):
            SplittedByComma.pop(Y)

Output = ""#This will eventually get written to the file

for X in SplittedByComma:
    Output +=X + ","#Write output, but dont write the last character (So it doesn't end on a comma)FileOutput.write(Output[:-1])
FileOutput.close()
#Close the file so it saves

Feel free to ask if you have any questions

Post a Comment for "What Is The Most Efficient Way With Python To Merge Rows In A Csv Which Have A Single Duplicate Field?"