composer require dezento/crawlify
Crawlify is a lightweight crawler for manipulating HTML,XML and JSON using DomCrawler.
It uses GuzzleHttp\Pool to make concurrent request and that means you can use all Request Options available.
The result it gives back is wrapped with Laravel Collections.
use Dezento\Crawlify;
$links = [];
for ($i = 1; $i <= 100; $i++) {
$links[] = 'https://jsonplaceholder.typicode.com/posts/' . $i ;
}
$json = (new Crawlify(collect($links))) // you can pass Array or Collection of links
->settings([
'type' => 'JSON' //this is Crawlify Option
])
->fetch()
->get('fulfilled')
->map(fn ($p) => collect(json_decode($p->response)))
->dd();
For traversing XML refer to DomCrawler documentation.
$xml = (new Crawlify([
'https://www.nytimes.com/svc/collections/v1/publish/https://www.nytimes.com/section/world/rss.xml',
]))
->fetch()
->get('fulfilled')
->map(fn ($item) =>
collect($item->response->filter('item')->children())
->map(fn ($data) => $data->textContent)
)->dd();
For traversing HTML refer to DomCrawler documentation.
$html = (new Crawlify([
'https://en.wikipedia.org/wiki/Category:Lists_of_spider_species_by_family'
]))
->settings([
#'proxy' => 'http://username:[email protected]:10',
'concurrency' => 5,
'delay' => 0
])
->fetch()
->get('fulfilled')
->map(fn ($item) =>
collect($item->response->filter('a')->links())
->map(fn($el) => $el->getUri())
)
->reject(fn($a) => $a->isEmpty())
->dd();
->settings([
'proxy' => 'http://username:[email protected]:10',
'concurrency' => 5,
'delay' => 0,
....
])
For options you can refer to Request Options documentation.
The only Crawlify custom options is 'type' => 'JSON'
Before using dd() helper you must install it.
composer require symfony/var-dumper