Beautifulsoup Different Parsers
could anyone elaborate more about the difference between parsers like html.parser and html5lib? I've stumbled across a weird behavior where when using html.parser it ignores all th
Solution 1:
You can use lxml
which is very fast and can use find_all
or select
to get all tags.
from bs4 import BeautifulSoup
html = """
<html><head></head><body><!--[if lte IE 8]> <!-- data-module-name="test"--> <![endif]-->
<![endif]-->
<ahref="test"></a><ahref="test"></a><ahref="test"></a><ahref="test"></a><!--[if lte IE 8]>
<![endif]--></body></html>
"""
soup = BeautifulSoup(html, 'lxml')
tags = soup.find_all('a')
print(tags)
OR
from bs4 import BeautifulSoup
html = """
<html><head></head><body><!--[if lte IE 8]> <!-- data-module-name="test"--> <![endif]-->
<![endif]-->
<ahref="test"></a><ahref="test"></a><ahref="test"></a><ahref="test"></a><!--[if lte IE 8]>
<![endif]--></body></html>
"""
soup = BeautifulSoup(html, 'lxml')
tags = soup.select('a')
print(tags)
Post a Comment for "Beautifulsoup Different Parsers"