Skip to content Skip to sidebar Skip to footer

Find Missing Filenames In Sequence Off Numbers Stored In A List

I have a string list of timestamp (date_millisecondtime.csv) based filenames like these: [..., file_20181105_110001.csv, file_20181105_120002.csv, file_20181105_130002.csv,

Solution 1:

One solution is to use a set comprehension to extract the times present. If I understand your requirement, you can then calculate the min and max times and take the difference from a set derived from a range:

L = ['file_20181105_110001.csv', 'file_20181105_120002.csv', 'file_20181105_130002.csv',
     'file_20181105_140002.csv', 'file_20181105_150003.csv', 'file_20181105_160002.csv',
     'file_20181105_170002.csv', 'file_20181105_200002.csv', 'file_20181105_210002.csv']

present = {int(i.rsplit('_', 1)[-1][:2]) for i in L}

min_time, max_time = min(present), max(present)

res = set(range(min_time, max_time)) - present  # {18, 19}

You can then build your filenames from the missing times. I'll leave this as an exercise [hint: list comprehension].

Solution 2:

Another solution in case you need to check also files missing at the beginning/end of the list (e.g: hour 0-10, 22 and 23)

filenames = ['file_20181105_110001.csv', 'file_20181105_120002.csv', 'file_20181105_150003.csv']
pos = 0for h in range(0, 23):
    n = "file_20181105_" + str(h).zfill(2)
    ifpos < len(filenames) and n == filenames[pos][: len(n)]:
        print("Found", h)
        pos += 1else: print("Not found", h)

Of course, you can build the n with the day you want to go through in multiple different ways. If needed, you can create another loop to go through days.

Edit:

If we want to check for more than one day, we can loop through the days checking its files/hours.

IMHO, i would suggest a lot of changes in the following code depending on the use case, number of days, number of file names, preference and code style, etc.

filenames = ['file_20181104_110001.csv', 'file_20181105_120002.csv', 'file_20181105_150003.csv']
pos = 0
missing = []
for d in (4, 5):
    for h inrange(0, 23):
        n = "file_201811" + str(d).zfill(2) + "_" + str(h).zfill(2)
        if pos < len(filenames) and n == filenames[pos][: len(n)]:
            pos += 1print("Found", d, h)
        else:
            print("Not Found", d, h)

Post a Comment for "Find Missing Filenames In Sequence Off Numbers Stored In A List"