Skip to content Skip to sidebar Skip to footer

Parsing A Website With Beautifulsoup And Selenium

Trying to compare avg. temperatures to actual temperatures by scraping them from: https://usclimatedata.com/climate/binghamton/new-york/united-states/usny0124 I can successfully ga

Solution 1:

First of all, these are two different classes - align_right and temperature_red - you've joined them and added that table_data_td for some reason. And, the elements having these two classes are td elements, not table.

In any case, to get the climate table, it looks like you should be looking for the div element having id="climate_table":

climate_table = soup.find(id="climate_table")

Another important thing to note that there is a potential for the "timing" issues here - when you get the driver.page_source value, the climate information might not be there. This is usually approached adding an Explicit Wait after navigating to the page:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup


url = "https://usclimatedata.com/climate/binghamton/new-york/unitedstates/usny0124"
browser = webdriver.Chrome()

try:
    browser.get(url)

    # wait for the climate data to be loaded
    WebDriverWait(browser, 10).until(EC.presence_of_element_located((By.ID, "climate_table")))

    soup = BeautifulSoup(browser.page_source, "lxml")
    climate_table = soup.find(id="climate_table")

    print(climate_table.prettify())
finally:
    browser.quit()

Note the addition of the try/finally that would safely close the browser in case of an error - that would also help to avoid "hanging" browser windows.

And, look into pandas.read_html() that can read your climate information table into a DataFrame auto-magically.

Post a Comment for "Parsing A Website With Beautifulsoup And Selenium"