Skip to content Skip to sidebar Skip to footer

Python Finds A String In Multiple Files Recursively And Returns The File Path

I'm learning Python and would like to search for a keyword in multiple files recursively. I have an example function which should find the *.doc extension in a directory. Then, t

Solution 1:

deffind_word(extension, word):
    for root, dirs, files in os.walk('/DOC'):
        # filter files for given extension:
        files = [fi for fi in files if fi.endswith(".{ext}".format(ext=extension))]
        for filename in files:
            path = os.path.join(root, filename)
            # open each file and read itwithopen(path) as f:
                # split() will create list of words and set will# create list of unique words 
                words = set(f.read().split())
                if word in words:
                    print(path)

Solution 2:

.doc files are rich text files, i.e. they wont open with a simple text editor or python open method. In this case, you can use other python modules such as python-docx.

Update

For doc files (previous to Word 2007) you can also use other tools such as catdoc or antiword. Try the following.

import subprocess


defdoc_to_text(filename):
    return subprocess.Popen(
        'catdoc -w "%s"' % filename,
        shell=True,
        stdout=subprocess.PIPE
    ).stdout.read()

print doc_to_text('fixtures/doc.doc')

Solution 3:

If you are trying to read .doc file in your code the this won't work. you will have to change the part where you are reading the file.

Here are some links for reading a .doc file in python.

extracting text from MS word files in python

Reading/Writing MS Word files in Python

Reading/Writing MS Word files in Python

Post a Comment for "Python Finds A String In Multiple Files Recursively And Returns The File Path"