'Scrapy: command to overwrite previous export file
Set-up
I export my data to a .csv file by the standard command in Terminal (Mac OS), e.g.
scrapy crawl spider -o spider_ouput.csv
Problem
When exporting a new spider_output.csv
Scrapy appends it to the existing spider_output.csv
.
I can think of two solutions,
- Command Scrapy to overwrite instead of append
- Command Terminal to remove the existing
spider_output.csv
prior to crawling
I've read that (to my surprise) Scrapy currently isn't able to do 1. Some people have proposed workarounds, but I can't seem to get it to work.
I've found an answer to solution 2, but can't get it to work either.
Can somebody help me? Perhaps there is a third solution I haven't thought of?
Solution 1:[1]
There is an open issue with scrapy for this feature: https://github.com/scrapy/scrapy/issues/547
There are some solutions proposed in the issue thread:
scrapy runspider spider.py -t json --nolog -o - > out.json
Or just delete output before running scrapy spider:
rm data.jl; scrapy crawl myspider -o data.jl
Solution 2:[2]
option -t
defines the file format like json, csv, ...
option -o FILE
dump scraped items into FILE (use -
for stdout)
>filename
pipes output to filename
altogether we get for overwriting previous export file:
replace output file instead of appending:
scrapy crawl spider -t csv -o - >spider.csv
or for json format:
scrapy crawl spider -t json -o - >spider.json
Solution 3:[3]
Use big O:
scrapy crawl spider -O spider_ouput.csv
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Granitosaurus |
Solution 2 | |
Solution 3 | Suraj Rao |