Parsing A Website With Beautifulsoup And Selenium
Solution 1:
First of all, these are two different classes - align_right
and temperature_red
- you've joined them and added that table_data_td
for some reason. And, the elements having these two classes are td
elements, not table
.
In any case, to get the climate table, it looks like you should be looking for the div
element having id="climate_table"
:
climate_table = soup.find(id="climate_table")
Another important thing to note that there is a potential for the "timing" issues here - when you get the driver.page_source
value, the climate information might not be there. This is usually approached adding an Explicit Wait after navigating to the page:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup
url = "https://usclimatedata.com/climate/binghamton/new-york/unitedstates/usny0124"
browser = webdriver.Chrome()
try:
browser.get(url)
# wait for the climate data to be loaded
WebDriverWait(browser, 10).until(EC.presence_of_element_located((By.ID, "climate_table")))
soup = BeautifulSoup(browser.page_source, "lxml")
climate_table = soup.find(id="climate_table")
print(climate_table.prettify())
finally:
browser.quit()
Note the addition of the try/finally
that would safely close the browser in case of an error - that would also help to avoid "hanging" browser windows.
And, look into pandas.read_html()
that can read your climate information table into a DataFrame
auto-magically.
Post a Comment for "Parsing A Website With Beautifulsoup And Selenium"