Skip to content Skip to sidebar Skip to footer

Given A Big List Of Urls, What Is A Way To Check Which Are Active/inactive?

Suppose I was giving this list of urls: website.com/thispage website.com/thatpage website.com/thispageagain website.com/thatpageagain website.com/morepages ... could possibly be

Solution 1:

Perform a HEAD request on each of them.

Use this library: http://docs.python-requests.org/en/latest/user/quickstart/#make-a-request

requests.head('http://httpbin.org/get').status_code

Solution 2:

Here is an example in Python

import httplib2

h = httplib2.Http()
listUrls = ['http://www.google.com','http://www.xkcd.com','http://somebadurl.com']
count = 0for each in listUrls:
    try:
        response, content = h.request(listUrls[count])
        if response.status==200:
            print"UP"except httplib2.ServerNotFoundError:
        print"DOWN"
    count = count + 1

Solution 3:

There's an SO answer showing how to perform HEAD requests in Python:

How do you send a HEAD HTTP request in Python 2?

Solution 4:

Open a pool of threads, open a Url for each, wait for a 200 or a 404. Rinse and repeat.

Post a Comment for "Given A Big List Of Urls, What Is A Way To Check Which Are Active/inactive?"