Skip to main content
Filter by
Sorted by
Tagged with
1 vote
1 answer
94 views

My version of scrapy is 2.11.0 I am learning scrapy and the code they give as an example to try is this: from pathlib import Path import scrapy class QuotesSpider(scrapy.Spider): name = "...
crawlingdev's user avatar
3 votes
1 answer
141 views

I'm making a tutorial on how to scrape with Scrapy. For that, I use Quarto/RStudio and the website https://quotes.toscrape.com/. For pedagogic purposes, I need to run a first crawl on the first page, ...
Didier mac cormick's user avatar
Advice
0 votes
0 replies
62 views

I’m building a Scrapy-based crawler and facing Cloudflare protection on some sites. Here’s my current setup: I have a separate API service that can bypass Cloudflare by simulating a real browser (e.g....
Muhammad Sameer's user avatar
0 votes
1 answer
79 views

I'm quite new to web scraping, and in particular in using Scrapy's spiders, pipelines... I'm getting some 202 status from some spider requests' response, hence the page content is not available yet ...
Manu310's user avatar
  • 178
2 votes
1 answer
67 views

After starting a spider, there is the problem with freezing on a stage when pipeline must enable. There is no errors, just scrapy-playwrigth script, but it stopes on beggining before even starts ...
Max Plakushko's user avatar
0 votes
1 answer
98 views

I'm far from a Python expert and this is my first Scrapy project. I installed Scrapy using Brew. I've been able to do some basics with Scrapy and making progress. I need to add Beautiful Soup to clean ...
JReekes's user avatar
0 votes
0 answers
108 views

I’m writing a sitemap XML parser using lxml.etree.iterparse class Sitemap: """Class to parse Sitemap (type=urlset) and Sitemap Index (type=sitemapindex) files""" ...
abebus's user avatar
  • 1
0 votes
1 answer
78 views

My scrapy logic is as follows: get all rows from child_page_table where parent_page_id is null for each row, if parent_page_id is (still) null, yield a Request with callback scrape_page [scrape_page] ...
user1713450's user avatar
  • 1,513
2 votes
1 answer
75 views

I am trying to make a web crawler with Scrapy which fetches some html pages and saves them via default Request callback i.e. parse() The thing is, I want the spider to stop crawling pending or ...
srajan0149's user avatar
-1 votes
2 answers
99 views

Trying to build a scraper that extracts nutritional information from each product page on Sainsbury (for eg, scraping energy values out of https://www.sainsburys.co.uk/gol-ui/product/sainsburys-...
Siddharth Gianchandani's user avatar
0 votes
1 answer
74 views

I have pressed the command scrapy or scrapy crawl bookspider -o bookdata.csv and the error looks like this: Traceback (most recent call last): File "C:\Users\Tunansh Vatsa\AppData\Local\...
noob_coder123's user avatar
-4 votes
1 answer
68 views

I had a small webcrawler that was written using scrapy and since I didn't want to run it against real site during development I used a local mirror. Mirror was served with python -m http.server 8000 ...
Anton's user avatar
  • 132
2 votes
1 answer
79 views

        I have a scrapy Crawlspider that parses reviews, using a scrapy-rotating-proxies.         But when I tried to connect to the site I got the 507 status code. In ...
CollonelDain's user avatar
0 votes
1 answer
76 views

I'm using Scrapy v2.12.0 and Item Loader. My spider returns None for some item fields in certain items. For instance, the field 'exterior_color' is processed as follows:: in the spider, in ...
Dmitry Borisoglebsky's user avatar
1 vote
0 answers
69 views

This code is supposed to download some documents it most locate within series of given links. While is does seemingly locate the link of the pdf file, its failing to download it. What might be the ...
42WaysToAnswerThat's user avatar

15 30 50 per page
1
2 3 4 5
1190