Skip to content Skip to sidebar Skip to footer

Python3 - Getting The Sum Of A Particular Row From All The Files

I have many files in my directory of the below format: name,sex,count xyz,M,231 abc,F,654 ... i am trying to get the sum of count(3rd coloumn) for all files and store them in a li

Solution 1:

Here's code that works with the absolute bare minimum of modifications (that is, no style fixes were made):

total = []
for filename in os.listdir(direc):
    result = 0
    if filename.endswith('.txt'):
        file = open(direc + '/' + filename, 'r')
        for line in file:
            line = line.strip()
            try:
                name, sex, count = line.split(',')
            except ValueError:
                continue
            if sex == 'F':
                result += int(count)
    total.append(result)

The following had to be fixed:

  1. The result variable was set to zero only once, not once per file, so each new file read kept adding to the previous file's total. By my understanding you are trying to add the result from each file to the total list, so I moved this line to make that variable have the correct result.
  2. The line name, sex, count = line.split(',') is very fragile, whenever a line has a line without 2 commas in it (including the closing newlines), it would throw an error. I wrapped it in a try…except block that catches these errors and moves on to the next line when needed.
  3. The result was appended to the total list on every line read, not per file.

If I misinterpreted your intentions and you just wanted to keep a running total in the total variable for reference, you only need to make modification #2.


Solution 2:

First, you should always make sure you close file descriptors (it's not related to the problem, but it's a code style). Also, you should use os.path.join to concatenate paths, because it's more portable (same comment). Here is what it gives:

total = []
result = 0
for filename in os.listdir(direc):
    if filename.endswith('.txt'):
        with open(os.path.join(direc, filenameà, 'r') as file:
            for line in file:
                line = line.strip()
                name, sex, count = line.split(',')
                if sex == 'F':
                    result += int(count)
                    total.append(result)

Now, back to the problem: you don't select the third row, but all rows whose second column is F…

Here is how you would to it to get the third line:

total = []
result = 0
for filename in os.listdir(direc):
    if filename.endswith('.txt'):
        with open(direc + '/' + filename, 'r') as file:
            file.readline() # Read first line
            file.readline() # Read second line
            line file.readline() # Read third line
            line = line.strip()
            name, sex, count = line.split(',')
            result += int(count)
            total.append(result)

Solution 3:

You should skip the header row of the text files so you aren't trying to parse "count" as an int.

Also, I'm guessing you only want result appended after the loop?

 total = []
 result = 0
 for filename in os.listdir(direc):
     if filename.endswith('.txt'):
         with open(direc + '/' + filename, 'r') as file:
             next(file)
             for line in file:
                 line = line.strip()
                 name, sex, count = line.split(',')
                 if sex == 'F':
                     result += int(count)
             total.append(result)

Solution 4:

I would recommend a few changes:

import glob
from collections import defaultdict

result = defaultdict(int)
for fname in glob.glob('/path/to/dir/*.txt'):
    with open(fname) as f:
        for line in f:
            try:
                name, sex, count = line.split(',')
                if sex == 'F':
                    result[fname] += int(count)
            except ValueError:  # unpack error, integer conversion error
                print("ValueError with '{}', continuing".format(line))

print(result)

A defaultdict is a handy thing: It behaves like a normal dictionary, but if a key does not exist, it is being created with a default value, in this case a zero. Like this you can simply add to your value, and if a total is not yet existing it defaults to 0.

The listdir and filename.endswith structure can be avoided using glob, take a look if you like.

Then, integer-conversion and tuple-unpacking can be a troublesome business, especially when you have varying input. To prevent your script from breaking I'd recommend a try...except block.


Post a Comment for "Python3 - Getting The Sum Of A Particular Row From All The Files"