Architecture overview

<h3>Architecture overview</h3>
This paper describes the architecture of Scrapy and how its components interact.


Data flow

The data flow in Scrapy is controlled by the execution engine ,and goes like this:

  1. The Engine gets the inital requests to crawl from the Spider.
  2. The Engine schedules the requests in the Scheduler and ask for the next requests for crawl.
  3. The Schedular return the next requests to the Engine.
  4. The Engine sends the requests to the Donwnloader, passing through the Downloader Middlewares (see process_request()).
  5. Once the page finishes downloading the downloader generates a response(with that page) and sends it to the engine,passing through the downloader middlewares (see process_response()).
  6. The engine receives the response from the downloader and sends it to he spider for processing,passing through spider middleware (see process_spider_input()).
  7. The spider processes the response and returns the scraped items and new requests to the engine ,passing through the spider middleware(see process_spider_output()).
  8. The engine sends the processed items to the item pipelines ,then send processed requests to the scheduler and ask for possible next request to crawl.
  9. The process repeats (from step 1 ) until there are no more requests from the scheduler.

<h3>componets</h3>
<h4>Scrapy Engine</h4>
The engine is responsible for contrilling the data flow betweent all components of the system,and trigger events when certeain actions occur. See the data flow above for more details.
<h4>Scheduler</h4>
The Scheduler receives the requests from the engine and enqueues them and feeding them later(also to the engine) when the engine requests them.
<h4>Downloader</h4>
The Downloader is responsible for fetching web pages and feeding them to the engine which .in turn,feeds them to the spiders.
<h4>Spiders></h4>
Spiders are custom classes written by Scrapy users to parse responses and extract items from them or additional requests to follow.
<h4>Item Pipeline</h4>
cleansing,validationand persistems.

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容

  • PLEASE READ THE FOLLOWING APPLE DEVELOPER PROGRAM LICENSE...
    念念不忘的阅读 14,597评论 5 6
  • **2014真题Directions:Read the following text. Choose the be...
    又是夜半惊坐起阅读 13,482评论 0 23
  • 在今年之前,每次回家都能听到诸如"多吃一点,要长胖""看你瘦的,风都能吹走"等等。家里要干农活,我一上去,人就说...
    likeekil阅读 1,449评论 0 1
  • 源码编译 编译最新版webrtc源码和编译好的整个项目10多个Gwebrtc源webrtc技术实践depot_to...
    残剑阅读 5,403评论 0 2
  • “哇,那家的菜好好吃哦!”她边说边比划,眼睛眯成了一条缝,整个人沉浸其中,一看到她那神情,我的胃酸马上开始泛滥。“...
    凤之子阅读 2,629评论 0 2