Skip to content
This repository has been archived by the owner on Jun 27, 2022. It is now read-only.

Commit

Permalink
Updated docs and cli code
Browse files Browse the repository at this point in the history
  • Loading branch information
Chris Watson committed Jul 1, 2019
1 parent 2eae3a2 commit 2ac33f6
Show file tree
Hide file tree
Showing 4 changed files with 46 additions and 55 deletions.
42 changes: 42 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,9 @@ Arachnid is a fast and powerful web scraping framework for Crystal. It provides

- [Arachnid](#Arachnid)
- [Installation](#Installation)
- [The CLI](#The-CLI)
- [Summarize](#Summarize)
- [Sitemap](#Sitemap)
- [Examples](#Examples)
- [Usage](#Usage)
- [Configuration](#Configuration)
Expand Down Expand Up @@ -65,6 +68,45 @@ Arachnid is a fast and powerful web scraping framework for Crystal. It provides
2. Run `shards install`

To build the CLI

1. Run `shards build --release`

2. Add the `./bin` directory to your path or symlink `./bin/arachnid` with `sudo ln -s /home/path/to/arachnid /usr/local/bin`

## The CLI

Arachnid provides a CLI for basic scanning tasks, here is what you can do with it so far:

### Summarize

The `summarize` subcommand allows you to generate a report for a website. It can give you the number of pages, the internal and external links for every page, and a list of pages and their status codes (helpful for finding broken pages).

You can use it like this:

```
arachnid summarize https://crystal-lang.org --ilinks --elinks -c 404 503
```

This will generate a report for crystal-lang.org which will include every page and it's internal and external links, and a list of every page that returned a 404 or 503 status. For complete help use `arachnid summarize --help`

### Sitemap

Arachnid can also generate a XML or JSON sitemap for a website by scanning the entire site, following internal links. To do so just use the `arachnid sitemap` subcommand.

```
# XML sitemap
arachnid sitemap https://crystal-lang.org --xml

# JSON sitemap
arachnid sitemap https://crystal-lang.org --json

# Custom output file
arachnid sitemap https://crystal-lang.org --xml -o ~/Desktop/crystal-lang.org-sitemap.xml
```
Full help is available with `arachnid sitemap --help`
## Examples
Arachnid provides an easy to use, powerful DSL for scraping websites.
Expand Down
4 changes: 2 additions & 2 deletions src/arachnid/cli.cr
Original file line number Diff line number Diff line change
Expand Up @@ -44,8 +44,8 @@ module Arachnid
if args.empty?
STDERR.puts "At least one site is required"
else
count = Arachnid::Cli::Count.new
count.run(opts, args)
summarize = Arachnid::Cli::Summarize.new
summarize.run(opts, args)
end
end
end
Expand Down
51 changes: 0 additions & 51 deletions src/arachnid/cli/forum.crystal-lang.org.xml

This file was deleted.

4 changes: 2 additions & 2 deletions src/arachnid/cli/count.cr → src/arachnid/cli/summarize.cr
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ require "json"

module Arachnid
class Cli < Clim
class Count < Cli::Action
class Summarize < Cli::Action

def run(opts, urls)
spinner = Spinner::Spinner.new("Wait...")
Expand Down Expand Up @@ -65,7 +65,7 @@ module Arachnid
report["codes"] = codes if codes

if outfile
File.write(outfile.to_s, report.to_json, mode: "w+")
File.write(File.expand_path(outfile.to_s, __DIR__), report.to_json, mode: "w+")
puts "Report saved to #{outfile}"
else
pp report
Expand Down

0 comments on commit 2ac33f6

Please sign in to comment.