Python - Extract Text From Pdf Page-wise To List
I am trying to extract text page wise from a PDF and store text as a list per page in a list like [['This', 'is', 'one', 'page'] , ['I', 'am', 'page', 'TWO'] , ['Three', 'that\'s',
Solution 1:
Well, you could try this:
import PyPDF2
pages = []
pdf_file = <Enter your file path>
read_pdf = PyPDF2.PdfFileReader(pdf_file)
number_of_pages = read_pdf.getNumPages()
for page_number in range(number_of_pages): # use xrange in Py2
page = read_pdf.getPage(page_number).extractText().split(" ") # Extract page wise text then split based on spaces as required by you
pages.append(page)
Post a Comment for "Python - Extract Text From Pdf Page-wise To List"