Search Page returns empty through scrapyrt only #116

keyiyek · 2020-12-09T20:00:07Z

(Sorry can't find how to label this)
I hope this is the right place where to ask this.

I created a spider that can scrape a page in an e-commerce site and gather the data on the different items.
The spider works fine with specific pages of the site (www.sitedomain/123-item-category), as well as with the search page (www.sitedomain/searchpage?controller?search=keywords+item+to+be+found).

But, when I run it through scrapyrt the specific page works fine, but the search page returns 0 items. No errors, just 0 items.This occurs on 2 different sites with 2 different spiders.

Is there something specific to search pages that has to be taken in account when using scrapyrt?

pawelmhm · 2021-01-29T12:29:51Z

Can you post your spider code? I don't see a way to reproduce it without spider code. Try to pinpoint the problem so that there is small code sample of spider running in raw ScrapyRT (without any middlewares, pipelines and other stuff from your project intefering). This way we can see this is problem on ScrapyRT side.

keyiyek · 2021-01-29T12:55:23Z

yes, sure.

so, my spider, stripped of all other suff looks like this:

`import scrapy

class QuotesSpider(scrapy.Spider):
name = "minimal"

def start_requests(self):
    urls = [
       "https://www.dungeondice.it/ricerca?controller=search&s=ticket+to+ride",
    ]
    for url in urls:
        yield scrapy.Request(url=url, callback=self.parse)
        

def parse(self, response):
    print("Found ", len(response.css("article")), " items")
    for article in response.css("article"):
        print("Item: ", [article.css("img::attr(title)").get())`]

and I set Obey_robots = False

when I do

scrape crawl minimal

I get 20 items in the response, but if I go

curl "http://localhost:9081/crawl.json?spider_name=minimal&url=https://www.dungeondice.it/ricerca?controller=search&s=ticket+to+ride"

I get 0 items, no error, just 0 items.
I wonder if, in some way, returns the results before the page gets completely loaded?

(sorry couldn't get the markup to work correctly)

Yansuko · 2022-02-03T08:00:08Z

Seems that when there is '&' on the url.
scrapyrt split it right before the &.

pawelmhm added the more info needed original poster should provide more details to allow us to identify the problem label Jan 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Search Page returns empty through scrapyrt only #116

Search Page returns empty through scrapyrt only #116

keyiyek commented Dec 9, 2020 •

edited

Loading

pawelmhm commented Jan 29, 2021

keyiyek commented Jan 29, 2021

Yansuko commented Feb 3, 2022

Search Page returns empty through scrapyrt only #116

Search Page returns empty through scrapyrt only #116

Comments

keyiyek commented Dec 9, 2020 • edited Loading

pawelmhm commented Jan 29, 2021

keyiyek commented Jan 29, 2021

Yansuko commented Feb 3, 2022

keyiyek commented Dec 9, 2020 •

edited

Loading