Screenscaping Aspx With Python Mechanize - Javascript Form Submission
I'm trying to scrape UK Food Ratings Agency data aspx seach results pages (e.,g http://ratings.food.gov.uk/QuickSearch.aspx?q=po30 ) using Mechanize/Python on scraperwiki ( http://
Solution 1:
Mechanize doesn´t handle javascript, but for this particular case it isn´t needed.
First we open the result page with mechanize
url = 'http://ratings.food.gov.uk/QuickSearch.aspx?q=po30'
br = mechanize.Browser()
br.set_handle_robots(False)
br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]
br.open(url)
response = br.response().read()
Then we select the aspnet form:
br.select_form(nr=0) #Select the first (and only) form - it has no name so we reference by number
The form has 5 submit buttons - we want to submit the one that takes us to the next result page:
response = br.submit(name='ctl00$ContentPlaceHolder1$uxResults$uxNext').read() #"Press" the next submit button
The other submit buttons in the form are:
ctl00$uxLanguageSwitch # Switch language to Welsh
ctl00$ContentPlaceHolder1$uxResults$Button1 # Search submit button
ctl00$ContentPlaceHolder1$uxResults$uxFirst # First result page
ctl00$ContentPlaceHolder1$uxResults$uxPrevious # Previous result page
ctl00$ContentPlaceHolder1$uxResults$uxLast # Last result page
In mechanize we can get form info like this:
for form in br.forms():
print form
Solution 2:
Mechanize does not handle JavaScript.
There are many ways to handle this, however, including QtWebKit, python-spidermonkey, HtmlUnit (using Jython), or SeleniumRC.
Here is how it might be done with SeleniumRC:
import selenium
sel=selenium.selenium("localhost",4444,"*firefox", "http://ratings.food.gov.uk")
sel.start()
sel.open("QuickSearch.aspx?q=po30")
sel.click('ctl00$ContentPlaceHolder1$uxResults$uxNext')
See also these related SO questions:
Post a Comment for "Screenscaping Aspx With Python Mechanize - Javascript Form Submission"