Is It Ok For Scrapy's Request_fingerprint Method To Return None?
I'd like to override Scrapy's default RFPDupefilter class as follows: from scrapy.dupefilters import RFPDupeFilter class URLDupefilter(RFPDupeFilter): def request_fingerprint
Solution 1:
If you look into request_seen()
method of DupeFilter
class you can see how scrapy compares fingerprints:
def request_seen(self, request):
fp = self.request_fingerprint(request)
if fp inself.fingerprints:
return True
self.fingerprints.add(fp)
ifself.file:
self.file.write(fp + os.linesep)
fp in self.fingerprints
, in your case this would resolve to None in {None}
, since your fingerprint is None
and self.fingerprints
is a set
type object. This is valid python and resolves properly.
So yes, you can return None
.
Edit: However this will let through first xml
response, since the fingerprints
set will not have None
fingerprint in it yet. Ideally you want to fix request_seen
method in your dupefilter as well to simply return False
if fingerprint is None
.
Post a Comment for "Is It Ok For Scrapy's Request_fingerprint Method To Return None?"