How To Find Frequency Of The Keys In A Dictionary Across Multiple Text Files?
Solution 1:
Well, I'm not exactly sure what you mean by all the files in the document "X" but I assume it's analogous to pages in a book. With this interpretation, I would do my best to store the the data in the easiest way. Putting data in easily manipulable adds efficiency later, because you can always just add a method for accomplishing and type of output you want.
Since it seems the main key you're looking at is keyword, I would create a nested python dictionary with this structure
dict = (keyword:{file:count})
Once it's in this form, you can do any type of manipulation on the data really easily.
To create this dict,
import os
# returns the next word in the filedefwords_generator(fileobj):
for line in fileobj:
for word in line.split():
yield word
word_count_dict = {}
for dirpath, dnames, fnames in os.walk("./"):
for file in fnames:
f = open(file,"r")
words = words_generator(f)
for word in words:
if word notin word_count_dict:
word_count_dict[word] = {"total":0}
if file notin word_count_dict[word]:
word_count_dict[word][file] = 0
word_count_dict[word][file] += 1
word_count_dict[word]["total"] += 1
This will create an easily parsable dictionary.
Want the number of total words Britain?
word_count_dict["Britain"]["total"]
Want the number of times Britain is in files 74.txt and 75.txt?
sum([word_count_dict["Britain"][file] if file in word_count_dict else 0 for file in ["74.txt", "75.txt"]])
Want to see all files that the word Britain shows up in?
[file for key in word_count_dict["Britain"]]
You can of course write functions that perform these operations with a simple call.
Post a Comment for "How To Find Frequency Of The Keys In A Dictionary Across Multiple Text Files?"