HTML资源嗅探,scrapy
首先我们要使用scrapy shell 最好先安装ipython, 这个应用能让我们在python中使用Tab来补齐命令
pip install ipython
我们开始抓取一个网站
进入我们的项目目录
root@uliweb:~/spider/boge# pwd
/root/spider/boge
root@uliweb:~/spider/boge# scrapy shell 外链网址已屏蔽
2014-06-04 08:22:37+0800 [scrapy] INFO: Scrapy 0.22.2 started (bot: boge)
2014-06-04 08:22:37+0800 [scrapy] INFO: Optional features available: ssl, http11
2014-06-04 08:22:37+0800 [scrapy] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'boge.spiders', 'SPIDER_MODULES': ['boge.spiders'], 'LOGSTATS_INTERVAL': 0, 'BOT_NAME': 'boge'}
2014-06-04 08:22:37+0800 [scrapy] INFO: Enabled extensions: TelnetConsole, CloseSpider, WebService, CoreStats, SpiderState
2014-06-04 08:22:37+0800 [scrapy] INFO: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware,
HTML资源嗅探,scrapy
首先我们要使用scrapy shell 最好先安装ipython, 这个应用能让我们在python中使用Tab来补齐命令
pip install ipython
我们开始抓取一个网站
进入我们的项目目录
root@uliweb:~/spider/boge# pwd
/root/spider/boge
root@uliweb:~/spider/boge# scrapy shell 外链网址已屏蔽
2014-06-04 08:22:37+0800 [scrapy] INFO: Scrapy 0.22.2 started (bot: boge)
2014-06-04 08:22:37+0800 [scrapy] INFO: Optional features available: ssl, http11
2014-06-04 08:22:37+0800 [scrapy] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'boge.spiders', 'SPIDER_MODULES': ['boge.spiders'], 'LOGSTATS_INTERVAL': 0, 'BOT_NAME': 'boge'}
2014-06-04 08:22:37+0800 [scrapy] INFO: Enabled extensions: TelnetConsole, CloseSpider, WebService, CoreStats, SpiderState
2014-06-04 08:22:37+0800 [scrapy] INFO: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware,