Beautifulsoup Python Youtube Scrape Not Working
Solution 1:
The content you see in the browser is loaded mostly by javascript. By using simple GET requests you do not receive the dynamic content of the page.
By looking at users' pages on YouTube, I can see you do not get a lot of proper HTML information, but rather you get JSONs in the body
tag.
To answer your question, in the future when you want to scrape something from a website, first make sure you actually have the content when using requests.get
rather than assuming that you get the same content a browser gets.
Now, specifically for the YouTube problem, if you save req.text
in a file and open it in a file editor and open the <body>
tag, you will see that under the <script>
tag (the second one) the variable window["ytInitialData"]
is set to a very-very long JSON.
Inside it there is all the available info you need for every video (title, duration, video ID, etc.). I suggest you parse that JSON and see if it solves your problem.
Post a Comment for "Beautifulsoup Python Youtube Scrape Not Working"