Skip to content Skip to sidebar Skip to footer

Python 3 Web Scraping Error 403

I am trying to get the Market cost from this website, but I am not being able to get the price from this specific website, I read in other topics that this could happen because I

Solution 1:

Well, the case seems similar with the one in this thread: HTTP error 403 in Python 3 Web Scraping

Stefano states that "This is probably because of mod_security or some similar server security feature which blocks known spider/bot user agents (urllib uses something like python urllib/3.3.0, it's easily detected). Try setting a known browser user agent with:"

Here is code for your example:

import urllib.request
from urllib.request import urlopen
import re

htmlfile =Request('http://xiv-market.com/item_details.php?id=2727', headers={'User-Agent': 'Mozilla/5.0'})
htmltext = urlopen(htmlfile).read()

regex = b'<h2 class="details">Market Cost: <img src="images/gil.png" width="24px" height="23px" style="margin-bottom:-5px;" alt="Gil"/>(.+?)</h2>\n'
pattern = re.compile(regex)

price = re.findall(pattern, htmltext) 

print( price )

Looks like this is working. I also changed the regex a little bit to get a result. Hope this will help.

Solution 2:

You would need to know the exact reason that you are receiving a 403 error page in order to find an absolute work around. There are many causes that could produce that error. If you wish to attempt to circumvent it by providing user agent data, you'd need to build a full request and include user agent data in the headers of your request.

Example:

req = urllib.request.Request(
    url, 
    data=None, 
    headers={
        'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36'
    }
)

f = urllib.request.urlopen(req)

Python Documentation

Post a Comment for "Python 3 Web Scraping Error 403"