Skip to content Skip to sidebar Skip to footer

Python Beautifulsoup Get Titles From Date Range

I am trying to get titles, links and dates from a date range, like Fourdaysago to today. First I choose Current Month in the dropdown select option, and then choose dates which bet

Solution 1:

This code can be refined, but it should solve your use case. If you have any issues please let me know and I will try to address them.

import requests
from bs4 import BeautifulSoup
from dateutil.parser import parse
from datetime import datetime, timedelta


four_days_ago = (parse((datetime.now() - timedelta(days=10)).strftime('%Y-%m-%d')))
start_date = datetime.strptime(str(four_days_ago), "%Y-%m-%d %H:%M:%S").strftime('%Y-%m-%d')
end_date = datetime.strptime(str(datetime.now()), "%Y-%m-%d %H:%M:%S.%f").strftime('%Y-%m-%d')


html_link = 'https://www.ksei.co.id/publications/new-securities-registration?setLocale=en-US'
html = requests.get(html_link).text
soup = BeautifulSoup(html, 'html.parser')
for ultag in soup.find_all('ul', {'class': 'list-nostyle'}):
    for litag in ultag.find_all('li'):
        for dates in litag.find_all('small', {'class': 'muted'}):
            clean_date = datetime.strptime(str(dates.text), "%B %d, %Y").strftime('%Y-%m-%d')
            if start_date <= clean_date <= end_date:
                title = litag.find('h2', {'class': 'h4 no-margin'})
                document_link = litag.find('a', href=True)
                print(clean_date)
                print(title.text)
                print(f"https://www.ksei.co.id{document_link['href']}")
                # OUTPUT
                2021-05-11
                KSEI-3629/DIR/0521 
                https://www.ksei.co.id/Announcement/Files/127505_ksei_3629_dir_0521_202105140513.pdf
                2021-05-06
                KSEI-3512/DIR/0521 
                https://www.ksei.co.id/Announcement/Files/127181_ksei_3512_dir_0521_202105070825.pdf
                2021-05-05
                KSEI-3482/DIR/0521 
                https://www.ksei.co.id/Announcement/Files/127076_ksei_3482_dir_0521_202105051506.pdf
                truncated...

Post a Comment for "Python Beautifulsoup Get Titles From Date Range"