What Is The Most Efficient Way With Python To Merge Rows In A Csv Which Have A Single Duplicate Field?
I have found somewhat similar questions however the answers that I think could work are too complex for me to morph into what I need. I could use some help figuring out how to acco
Solution 1:
Now's as good a time as any to learn about itertools.groupby
:
import csv
from itertools import groupby
# assuming Python 2withopen("source.csv", "rb") as fp_in, open("final.csv", "wb") as fp_out:
reader = csv.reader(fp_in)
writer = csv.writer(fp_out)
grouped = groupby(reader, lambda x: x[0])
for key, group in grouped:
rows = list(group)
rows = [rows[0], rows[-1]]
columns = zip(*(r[1:] for r in rows))
use_values = [max(c) for c in columns]
new_row = [key] + use_values
writer.writerow(new_row)
produces
$ cat final.csv
wp.xyz03.def02.01195.1,wp03.xyz03-c01_lc08_m00,wp02.def02-c02_lc14_m00
wp.atl21.lmn01.01193.2,wp03.atl21-c06_lc14_m00,wp03.lmn01
tp.ghi03.ghi05.02194.65,tp05.ghi03-c06_lc11_m00,tp05.ghi05:1
Solution 2:
If I understand what you want to do, have some pseudocode:
Read each line:
Split by comma
Add each section to a large list
NextUntil list is empty:
Foreach value in the list:
Write value to file, then write a comma
Search a list, and remove duplicate values
That seem like it? I can write you a python program if this is what you're intending
Edit:
I wrote a program, as far as I can see, the example inputs you gave me became the example outputs
FileInput = open("Input.txt") #Open an input file
EntireFile = FileInput.read() #Read to the end of the file
EntireFile = EntireFile.replace("\n","").replace("\r","")
#Remove newline characters
SplittedByComma = EntireFile.split(",")
#Split into a list
FileOutput = open("Output.txt","w") #The output file#Go through the list. For each element, remove other ones that are the same
for X in SplittedByComma:
for Y in range(len(SplittedByComma)-1,0,-1):
if (X == SplittedByComma[Y]):
SplittedByComma.pop(Y)
Output = ""#This will eventually get written to the file
for X in SplittedByComma:
Output +=X + ","#Write output, but dont write the last character (So it doesn't end on a comma)FileOutput.write(Output[:-1])
FileOutput.close()
#Close the file so it saves
Feel free to ask if you have any questions
Post a Comment for "What Is The Most Efficient Way With Python To Merge Rows In A Csv Which Have A Single Duplicate Field?"