Skip to content Skip to sidebar Skip to footer

Pagination Giving The First Page In Every Iteration

I'm trying to scrape paginated web, but it gives me the first page in every iteration. When I click it in the browser, the content is different. url = 'http://www.x.y/z/a-b#/page-%

Solution 1:

You are adding text to the "Fragment Identifier" (i.e. after a #) see https://www.w3.org/DesignIssues/Fragment.html

The fragment identifier is a string after URI, after the hash, which identifies something specific as a function of the document. For a user interface Web document such as HTML poage, it typically identifies a part or view. For example in the object

RFC3986 says

the fragment identifier is separated from the rest of the URI prior to a dereference, and thus the identifying information within the fragment itself is dereferenced solely by the user agent, regardless of the URI scheme. Although this separate handling is often perceived to be a loss of information, particularly for accurate redirection of references as resources move over time, it also serves to prevent information providers from denying reference authors the right to refer to information within a resource selectively. Indirect referencing also provides additional flexibility and extensibility to systems that use URIs, as new media types are easier to define and deploy than new schemes of identification.

So you are adding you index to a part of a URL that is not sent to the server. It is for client side use only "dereferenced solely by the user agent". The server is seeing the same URL every iteration.

The way the page is most likely rendered is that there is some JavaScript reading the fragment identifier and making another request to get the data or determining which part of the data to display.

I suggest examining all the requests the page makes using Live HTTP Headers or some other tool to see if there is a second request you can utilise or use a JavaScript rendering technology like Selenium, dryscrape or PyQT5, see my answer to Scraping Google Finance (BeautifulSoup) for details.

Post a Comment for "Pagination Giving The First Page In Every Iteration"