Skip to content Skip to sidebar Skip to footer

How To Follow A Redirect With Urllib?

I'm creating a script in Python 3 which access a page like: example.com/daora/zz.asp?x=qqrzzt using the urllib.request.urlopen('example.com/daora/zz.asp?x=qqrzzt'), but this cod

Solution 1:

urllib.request follows redirects automatically; you don't need to do anything.

The problem here is that there is no redirect to follow. The web page uses Javascript to fake a form submission as soon as it's loaded. urllib just fetches the page; it doesn't implement a browser DOM and run Javascript code.

Depending on how general you need your script to be, the simplest solution may be something hacky. For example, if you're just trying to spider 500 pages that all have a similar structure but different details, just find the action of the first form and navigate to that.

Also, if fetching the pages and processing them are two distinct steps, you may want to write a fetcher with super-simple Javascript/Greasemonkey (running in the browser, so it's already got a working DOM implementation, etc.) and a separate fancy processing script in Python (which just operates on the finally-fetched/generated HTML pages).

If you need to be fully general, the simplest solution is probably to use the selenium browser automation framework. (Or, maybe, PyWin32 or PyObjC to automate IE or Webkit directly.)

If you want the best possible solution, and have infinite resources… write your own implementation of the DOM and hook up your favorite Javascript interpreter (probably spidermonkey or v8). That's only about 2/3rds as much work as writing a new browser. (And you may be able to find pieces that get you 80% of the way there. For example, if you're willing to use Jython instead of CPython as your Python interpreter, HtmlUnit is pretty slick.)

Post a Comment for "How To Follow A Redirect With Urllib?"