neo@MacBook-Pro ~/Documents/crawler % scrapy
Scrapy 1.4.0 - project: crawler
Usage:
scrapy <command> [options] [args]
Available commands:
bench Run quick benchmark test
check Check spider contracts
crawl Run a spider
edit Edit spider
fetch Fetch a URL using the Scrapy downloader
genspider Generate new spider using pre-defined templates
list List available spiders
parse Parse URL (using its spider) and print the results
runspider Run a self-contained spider (without creating a project)
settings Get settings values
shell Interactive scraping console
startproject Create new project
version Print Scrapy version
view Open URL in browser, as seen by Scrapy
Use "scrapy <command> -h" to see more info about a command
neo@MacBook-Pro ~/Documents % scrapy startproject crawler
New Scrapy project 'crawler', using template directory '/usr/local/lib/python3.6/site-packages/scrapy/templates/project', created in:
/Users/neo/Documents/crawler
You can start your first spider with:
cd crawler
scrapy genspider example example.com
neo@MacBook-Pro ~/Documents/crawler % scrapy genspider netkiller netkiller.cn
Created spider 'netkiller' using template 'basic' in module:
crawler.spiders.netkiller
neo@MacBook-Pro ~/Documents/crawler % scrapy list
bing
book
example
netkiller
neo@MacBook-Pro ~/Documents/crawler % scrapy crawl netkiller
運行結果輸出到 json 檔案中
neo@MacBook-Pro ~/Documents/crawler % scrapy crawl netkiller -o output.json