Python 3 Web Scraping Error 403
Solution 1:
Well, the case seems similar with the one in this thread: HTTP error 403 in Python 3 Web Scraping
Stefano states that "This is probably because of mod_security or some similar server security feature which blocks known spider/bot user agents (urllib uses something like python urllib/3.3.0, it's easily detected). Try setting a known browser user agent with:"
Here is code for your example:
import urllib.request
from urllib.request import urlopen
import re
htmlfile =Request('http://xiv-market.com/item_details.php?id=2727', headers={'User-Agent': 'Mozilla/5.0'})
htmltext = urlopen(htmlfile).read()
regex = b'<h2 class="details">Market Cost: <img src="images/gil.png" width="24px" height="23px" style="margin-bottom:-5px;" alt="Gil"/>(.+?)</h2>\n'
pattern = re.compile(regex)
price = re.findall(pattern, htmltext)
print( price )
Looks like this is working. I also changed the regex a little bit to get a result. Hope this will help.
Solution 2:
You would need to know the exact reason that you are receiving a 403 error page in order to find an absolute work around. There are many causes that could produce that error. If you wish to attempt to circumvent it by providing user agent data, you'd need to build a full request and include user agent data in the headers of your request.
Example:
req = urllib.request.Request(
url,
data=None,
headers={
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36'
}
)
f = urllib.request.urlopen(req)
Post a Comment for "Python 3 Web Scraping Error 403"