Skip to content Skip to sidebar Skip to footer

How To Compare Two Csv Files In Python

I have two csv files. One is called 'Standard reg.csv', the other is 'Driver Details.csv' In 'Standard reg.csv' the first two lines are: ['Day', 'Month', 'Year', 'Reg Plate', 'Hour

Solution 1:

I suggest you first load all of the details from the Driver Details.csv into a dictionary, using the registration number as the key. This would then allow you to easily look up a given entry without having to keep reading all of the lines from the file again:

import csv

driver_details = {}

withopen('Driver Details.csv') as f_driver_details:
    csv_driver_details = csv.reader(f_driver_details)
    header = next(csv_driver_details)       # skip the headerfor row in csv_driver_details:
        driver_details[row[7]] = row

withopen('Standard Reg.csv') as f_standard_reg:
    csv_standard_reg = csv.reader(f_standard_reg)
    header = next(csv_standard_reg)     # skip the headerfor row in csv_standard_reg:
        try:
            driver = driver_details[row[3]]
            print('Match found - {} {}'.format(driver[0], driver[1]))
        except KeyError as e:
            print('No match found')

The code as you have it will iterate through file_2 and leave the file pointer either at the end (if no match is found) or at the location of a match (potentially missing matches earlier on for the next entry). For your approach to work you would have to start reading the file from the start for each loop, which would be very slow.


To add an output csv and display the full address you could do something like the following:

import csv

speed = 74.3
fine = 35

driver_details = {}

withopen('Driver Details.csv') as f_driver_details:
    csv_driver_details = csv.reader(f_driver_details)
    header = next(csv_driver_details)       # skip the headerfor row in csv_driver_details:
        driver_details[row[7]] = row

withopen('Standard Reg.csv') as f_standard_reg, open('Output log.csv', 'w', newline='') as f_output:
    csv_standard_reg = csv.reader(f_standard_reg)
    header = next(csv_standard_reg)     # skip the header
    csv_output = csv.writer(f_output)

    for row in csv_standard_reg:
        try:
            driver = driver_details[row[3]]
            print('Match found - Fine {}, Speed {}\n{} {}\n{}'.format(fine, speed, driver[0], driver[1], '\n'.join(driver[2:7])))
            csv_output.writerow(driver[0:7] + [speed, fine])
        except KeyError as e:
            print('No match found')

This would print the following:

Match found - Fine 35, Speed 74.3
Violet Kirby
585-4073 Convallis Street
Balfour
Orkney
United Kingdom
OC1X 6QE

And produce an output file containing:

Violet,Kirby,585-4073 Convallis Street,Balfour,Orkney,United Kingdom,OC1X 6QE,74.3,35

Solution 2:

The testing condition if reg == reg2 appears outside both loops (for file_1 and for file_2). That is why the testing is only done with the last line from each file.

Another problem is that you use the same loop variable row in both for loops.

Solution 3:

Try csv.DictReader to eliminate most of your lines of code:

import csv
Violations = defaultdict(list)

# Read in the violations, there are probably less violations than drivers (I hope!)withopen('Standard reg.csv') as violations:
    for v in csv.DictReader(violations):
        Violations[v['Reg Plate']] = v

withopen('Driver Details.csv') as drivers:
    for d in csv.DictReader(drivers):
        fullname = "{driver.FirstName} {driver.LastName}".format(driver=d)
        if d['Registration'] in Violations:
            count = len(Violations[d['Registration']])
            print("{fullname} has {count} violations.".format(fullname=fullname, count=count))
        else:
            print("{fullname} is too fast to catch!".format(fullname=fullname))

Post a Comment for "How To Compare Two Csv Files In Python"