Python 爬虫之Scrapy的安装

一.Scrapy的介绍

Scrapy是用Python开发的一个快速,高层次的屏幕抓取和web抓取框架,
用于抓取web站点并从页面中提取结构化的数据。
Scrapy用途广泛,可以用于数据挖掘、监测和自动化测试。
它有个萌出血的外号叫小抓抓。

下面主要介绍下Scrapy的安装步骤以及遇到的一些坑。

二.安装步骤

首先安装python包工具pip,它相当于iOS开发中的CocoaPods。

1. $ sudo easy_install pip

安装成功后执行下面命令

2. $ sudo pip install Scrapy
    DEPRECATION: Uninstalling a distutils installed project (six) has been deprecated and will be removed in a future version. This is due to the fact that uninstalling a distutils project will only partially uninstall the project.
    Uninstalling six-1.4.1:
Exception:
Traceback (most recent call last):
  File "/Library/Python/2.7/site-packages/pip-9.0.1-py2.7.egg/pip/basecommand.py", line 215, in main
    status = self.run(options, args)
  File "/Library/Python/2.7/site-packages/pip-9.0.1-py2.7.egg/pip/commands/install.py", line 342, in run
    prefix=options.prefix_path,
  File "/Library/Python/2.7/site-packages/pip-9.0.1-py2.7.egg/pip/req/req_set.py", line 778, in install
    requirement.uninstall(auto_confirm=True)
  File "/Library/Python/2.7/site-packages/pip-9.0.1-py2.7.egg/pip/req/req_install.py", line 754, in uninstall
    paths_to_remove.remove(auto_confirm)
  File "/Library/Python/2.7/site-packages/pip-9.0.1-py2.7.egg/pip/req/req_uninstall.py", line 115, in remove
    renames(path, new_path)
  File "/Library/Python/2.7/site-packages/pip-9.0.1-py2.7.egg/pip/utils/__init__.py", line 267, in renames
    shutil.move(old, new)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/shutil.py", line 302, in move
    copy2(src, real_dst)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/shutil.py", line 131, in copy2
    copystat(src, dst)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/shutil.py", line 103, in copystat
    os.chflags(dst, st.st_flags)
OSError: [Errno 1] Operation not permitted: '/tmp/pip-QfQY7O-uninstall/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/six-1.4.1-py2.7.egg-info'

安装提示Mac自带的python 2.7版本有误,所以我们需要重新安装python

3. $ brew install python

(如果此前没有安装过brew, 请先安装,命令行如下)

$ ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)" 

python 安装成功后提示:

==> **Summary**
🍺  /usr/local/Cellar/python/2.7.13: 6,337 files, 86.7M, built in 3 minutes 19 seconds
4. $ sudo pip install Scrapy

出现下面提示则代表lxml 安装出错,

Command "/usr/local/opt/python/bin/python2.7 -u -c "import setuptools, tokenize;__file__='/private/tmp/pip-build-keKznw/lxml/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-FEkrEy-record/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /private/tmp/pip-build-keKznw/lxml/

按照下面三步重新安装lxml

1.xcode-select --install  
2.C_INCLUDE_PATH=/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.11.sdk/usr/include/libxml2:/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.11.sdk/usr/include/libxml2//libxml:/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.11.sdk/usr/include  
3.sudo pip install lxml 
5. 成功后再次执行 $ sudo pip install Scrapy
Successfully built Twisted PyDispatcher lxml pycparser
Installing collected packages: cssselect, six, w3lib, queuelib, zope.interface, constantly, incremental, Twisted, PyDispatcher, lxml, enum34, ipaddress, idna, pycparser, cffi, cryptography, pyOpenSSL, parsel, Scrapy
Successfully installed PyDispatcher-2.0.5 Scrapy-1.3.0 Twisted-16.6.0 cffi-1.9.1 constantly-15.1.0 cryptography-1.7.1 cssselect-1.0.0 enum34-1.1.6 idna-2.2 incremental-16.10.1 ipaddress-1.0.17 lxml-3.7.1 parsel-1.1.0 pyOpenSSL-16.2.0 pycparser-2.17 queuelib-1.4.2 six-1.10.0 w3lib-1.16.0 zope.interface-4.3.3
You are using pip version 8.1.2, however version 9.0.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.

到这里就已经安装成功了。(因为Scrapy安装源被墙,所以安装过程最好使用VPN)

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容

  • linux和windows下安装python拓展包-pycharm、numpy、scipy、matplotlib、...
    hzyido阅读 81,403评论 2 10
  • 本文分享的大体框架包含以下三部分 (1)首先介绍html网页,用来解析html网页的工具xpath(2)介绍pyt...
    不忘初心c阅读 7,399评论 0 14
  • 最近想使用scrapy来开发网页爬虫,本来开始是打算在windows上开发的,但是在尝试之后,发现在wind...
    ppc阅读 5,803评论 1 3
  • 上大学的时候,手机还不是每个人的标配,与家人联系的方式一般就是宿舍电话,和家书。执意要离家很远求学的我,第一次写家...
    宵汀阅读 4,724评论 8 15
  • 身体里的无数把锁, 交错盘旋, 牢牢锁住了通往各个肢体的通道, 也无情地锁掉了通向他人之心的小径。
    ENC阅读 3,043评论 2 5