Beautifulsoup Python Youtube Scrape Not Working

May 30, 2024 Post a Comment

I'm trying to scrape Youtube URLs + Title from youtube accounts which are formatted like https://www.youtube.com/c/%s/videos %accountName. for example Apple The class given to the

Solution 1:

The content you see in the browser is loaded mostly by javascript. By using simple GET requests you do not receive the dynamic content of the page.

By looking at users' pages on YouTube, I can see you do not get a lot of proper HTML information, but rather you get JSONs in the body tag.

To answer your question, in the future when you want to scrape something from a website, first make sure you actually have the content when using requests.get rather than assuming that you get the same content a browser gets.

Now, specifically for the YouTube problem, if you save req.text in a file and open it in a file editor and open the <body> tag, you will see that under the <script> tag (the second one) the variable window["ytInitialData"] is set to a very-very long JSON.

Inside it there is all the available info you need for every video (title, duration, video ID, etc.). I suggest you parse that JSON and see if it solves your problem.

Python Playground

Beautifulsoup Python Youtube Scrape Not Working

Solution 1:

Post a Comment for "Beautifulsoup Python Youtube Scrape Not Working"