Use Scrapy + Splash Return Html
Solution 1:
Splash response does contain some hints:
{'description': 'Error happened while executing Lua script',
'error': 400,
'info': {'error': "bad argument #2 to 'assert' (string expected, got table)",
'line_number': 8,
'message': 'Lua error: [string "..."]:8: bad argument #2 to \'assert\' (string expected, got table)',
'source': '[string "..."]',
'type': 'LUA_ERROR'},
'type': 'ScriptError'}
If you try your script in Splash's web interface (it is your friend!), you have the same error, coming from this line:
assert(splash:runjs("$('#title.play-ball > a:first-child').click()"))
If you change that Lua script a bit, catching the error (by the way, I believe you meant .title.play-ball > a:first-child
because there's no element with id="title"
):
function main(splash)
local url = splash.args.url
assert(splash:go(url))
assert(splash:wait(1))
-- go back 1 month in time and wait a little (1 second)
ok, err = splash:runjs("$('.title.play-ball > a:first-child').click()")
assert(splash:wait(1))
-- return result as a JSON object
return {
html = splash:html(),
error = err
-- we don't need screenshot or network activity
--png = splash:png(),
--har = splash:har(),
}
end
and running it in the web interface, you get an "error" object in the response, which shows:
error: Object
js_error: "ReferenceError: Can't find variable: $"
js_error_message: "Can't find variable: $"
js_error_type: "ReferenceError"
message: "JS error: \"ReferenceError: Can't find variable: $\""
splash_method: "runjs"
type: "JS_ERROR"
It appears the $
magic is not working on that website. You can use it in Chrome console for example, but with Splash you probably/apparently need to load jQuery (or something similar), with splash:autoload
usually. For example:
function main(splash)
assert(splash:autoload("https://code.jquery.com/jquery-3.1.1.min.js"))
local url = splash.args.url
assert(splash:go(url))
assert(splash:wait(1))
-- go back 1 month in time and wait a little (1 second)
ok, err = splash:runjs("$('.title.play-ball > a:first-child').click()")
assert(splash:wait(1))
-- return result as a JSON object
return {
html = splash:html(),
error = err
-- we don't need screenshot or network activity
--png = splash:png(),
--har = splash:har(),
}
end
Note that this JavaScript code did not work for me with Splash (the screenshot did not show the "History" thing).
But I tried with the following in the web interface, and I got the "History" show (in the png screenshot -- which is commented here):
function main(splash)
-- no need to load jQuery when you use splash:select
--assert(splash:autoload("https://code.jquery.com/jquery-3.1.1.min.js"))
local url = splash.args.url
assert(splash:go(url))
assert(splash:wait(15))
local element = splash:select('.title.play-ball > a:first-child')
local bounds = element:bounds()
assert(element:mouse_click{x=bounds.width/2, y=bounds.height/2})
assert(splash:wait(5))
-- return result as a JSON object
return {
html = splash:html(),
-- we don't need screenshot or network activity
--png = splash:png(),
--har = splash:har(),
}
end
Indeed, Splash 2.3 has helpers for that kind of interaction (e.g clicking on an element). See for example splash:select and element:mouse_click
Also note that I increased the wait()
values.
Solution 2:
You need to "quote" your script before you pass it to Splash:
script = """Your script"""
from urllib.parse import quote
script = quote(script)
# 'Your%20script'
Post a Comment for "Use Scrapy + Splash Return Html"