How To Access All Scraped Items In Scrapy Item Pipeline?
Solution 1:
I think signals might help. I did something similar here
https://github.com/dm03514/CraigslistGigs/blob/master/craigslist_gigs/pipelines.py
It seems kind of hacky but In your spider you can create a property which will store all your Scraped items. In your pipeline you can register a method to Be called on spider closed signal. This method takes a spider instance as parameter. You can then access the spider property that contains all your scraped items
Solution 2:
This pipeline will make sure all Items have a rank.
classMyPipeline(object):
def process_item(self, item, spider):
item['rank'] = item.get('rank') or '1'return item
Solution 3:
You can collect all scraped items using Extensions and Signals.
from scrapy import signals
classItemCollectorExtension:
def__init__(self):
self.items = []
@classmethoddeffrom_crawler(cls, crawler):
extension = cls()
crawler.signals.connect(extension.add_item, signal=signals.item_scraped)
crawler.signals.connect(extension.spider_closed, signal=signals.spider_closed)
return extension
defspider_closed(self):
print(self.items) # Replace with your codedefadd_item(self, item):
self.items.append(item)
Now, every time a new item is successfully scraped, it is added to self.items
. When all items have been collected, and the spider is closing, the spider_closed
function is called. Here, you can access all the collected items.
Don't forget to enable the Extension in settings.py
.
Post a Comment for "How To Access All Scraped Items In Scrapy Item Pipeline?"