Python 爬虫之Scrapy的安装

一.Scrapy的介绍

Scrapy是用Python开发的一个快速,高层次的屏幕抓取和web抓取框架，
用于抓取web站点并从页面中提取结构化的数据。
Scrapy用途广泛，可以用于数据挖掘、监测和自动化测试。
它有个萌出血的外号叫小抓抓。

下面主要介绍下Scrapy的安装步骤以及遇到的一些坑。

二.安装步骤

首先安装python包工具pip，它相当于iOS开发中的CocoaPods。

1. $ sudo easy_install pip

安装成功后执行下面命令

2. $ sudo pip install Scrapy

    DEPRECATION: Uninstalling a distutils installed project (six) has been deprecated and will be removed in a future version. This is due to the fact that uninstalling a distutils project will only partially uninstall the project.
    Uninstalling six-1.4.1:
Exception:
Traceback (most recent call last):
  File "/Library/Python/2.7/site-packages/pip-9.0.1-py2.7.egg/pip/basecommand.py", line 215, in main
    status = self.run(options, args)
  File "/Library/Python/2.7/site-packages/pip-9.0.1-py2.7.egg/pip/commands/install.py", line 342, in run
    prefix=options.prefix_path,
  File "/Library/Python/2.7/site-packages/pip-9.0.1-py2.7.egg/pip/req/req_set.py", line 778, in install
    requirement.uninstall(auto_confirm=True)
  File "/Library/Python/2.7/site-packages/pip-9.0.1-py2.7.egg/pip/req/req_install.py", line 754, in uninstall
    paths_to_remove.remove(auto_confirm)
  File "/Library/Python/2.7/site-packages/pip-9.0.1-py2.7.egg/pip/req/req_uninstall.py", line 115, in remove
    renames(path, new_path)
  File "/Library/Python/2.7/site-packages/pip-9.0.1-py2.7.egg/pip/utils/__init__.py", line 267, in renames
    shutil.move(old, new)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/shutil.py", line 302, in move
    copy2(src, real_dst)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/shutil.py", line 131, in copy2
    copystat(src, dst)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/shutil.py", line 103, in copystat
    os.chflags(dst, st.st_flags)
OSError: [Errno 1] Operation not permitted: '/tmp/pip-QfQY7O-uninstall/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/six-1.4.1-py2.7.egg-info'

安装提示Mac自带的python 2.7版本有误，所以我们需要重新安装python

3. $ brew install python

（如果此前没有安装过brew，请先安装，命令行如下）

$ ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

python 安装成功后提示：

==> **Summary**
🍺  /usr/local/Cellar/python/2.7.13: 6,337 files, 86.7M, built in 3 minutes 19 seconds

4. $ sudo pip install Scrapy

出现下面提示则代表lxml 安装出错，

Command "/usr/local/opt/python/bin/python2.7 -u -c "import setuptools, tokenize;__file__='/private/tmp/pip-build-keKznw/lxml/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-FEkrEy-record/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /private/tmp/pip-build-keKznw/lxml/

按照下面三步重新安装lxml

1.xcode-select --install  
2.C_INCLUDE_PATH=/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.11.sdk/usr/include/libxml2:/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.11.sdk/usr/include/libxml2//libxml:/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.11.sdk/usr/include  
3.sudo pip install lxml

5. 成功后再次执行 $ sudo pip install Scrapy

Successfully built Twisted PyDispatcher lxml pycparser
Installing collected packages: cssselect, six, w3lib, queuelib, zope.interface, constantly, incremental, Twisted, PyDispatcher, lxml, enum34, ipaddress, idna, pycparser, cffi, cryptography, pyOpenSSL, parsel, Scrapy
Successfully installed PyDispatcher-2.0.5 Scrapy-1.3.0 Twisted-16.6.0 cffi-1.9.1 constantly-15.1.0 cryptography-1.7.1 cssselect-1.0.0 enum34-1.1.6 idna-2.2 incremental-16.10.1 ipaddress-1.0.17 lxml-3.7.1 parsel-1.1.0 pyOpenSSL-16.2.0 pycparser-2.17 queuelib-1.4.2 six-1.10.0 w3lib-1.16.0 zope.interface-4.3.3
You are using pip version 8.1.2, however version 9.0.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.

到这里就已经安装成功了。（因为Scrapy安装源被墙，所以安装过程最好使用VPN）