All the Places

A project to generate point of interest (POI) data sourced from websites with 'store location' pages. The project uses scrapy, a popular Python-based web scraping framework, to execute individual site spiders that retrieve POI data, publishing the results in a standard format. There are various scrapy tutorials on the Internet and this series on YouTube is reasonable.

Getting started

Development setup

Windows users may need to follow some extra steps, please follow the scrapy docs for up to date details.

Ubuntu

These instructions were tested with Ubuntu 22.04.1 LTS on 2024-02-21.

Install Python 3 and pip:

$ sudo apt-get update
$ sudo apt-get install -y python3 python3-pip python-is-python3

Install pyenv and ensure the correct version of Python is available. The following is a summary of the steps, please refer to the pyenv documentation for the most up-to-date instructions.

$ sudo apt-get install -y build-essential libssl-dev zlib1g-dev \
      libbz2-dev libreadline-dev libsqlite3-dev curl git \
      libncursesw5-dev xz-utils tk-dev libxml2-dev libxmlsec1-dev \
      libffi-dev liblzma-dev
$ curl https://pyenv.run | bash
$ echo 'export PATH="$HOME/.pyenv/bin:$PATH"' >> ~/.bashrc
$ echo 'eval "$(pyenv init --path)"' >> ~/.bashrc
$ echo 'eval "$(pyenv init -)"' >> ~/.bashrc
$ exec "$SHELL"
$ pyenv install 3.11

Install pipenv and check that it runs:

$ pip install --user pipenv
$ pipenv --version
pipenv, version 2023.12.1

Clone a copy of the project from the All the Places repo (or your own fork if you are considering contributing to the project):
```
$ git clone [email protected]:alltheplaces/alltheplaces.git
```
Use pipenv to install the project dependencies:
```
$ cd alltheplaces
$ pipenv sync
```
Test for successful project installation:
```
$ pipenv run scrapy
```
If the above runs without complaint, then you have a functional installation and are ready to run and write spiders.

macOS

These instructions were tested with macOS 14.3.1 on 2024-02-21.

Install Python 3 and pip:
```
$ brew install python@3
```
Install pyenv and ensure the correct version of Python is available. The following is a summary of the steps, please refer to the pyenv documentation for the most up-to-date instructions.
```
$ brew install pyenv
$ echo 'eval "$(pyenv init --path)"' >> ~/.zshrc
$ echo 'eval "$(pyenv init -)"' >> ~/.zshrc
$ exec "$SHELL"
$ pyenv install 3.11
```

Install pipenv and check that it runs:

$ brew install pipenv
$ pipenv --version
pipenv, version 2023.12.1

Clone a copy of the project from the All the Places repo (or your own fork if you are considering contributing to the project):
```
$ git clone [email protected]:alltheplaces/alltheplaces.git
```
Use pipenv to install the project dependencies:
```
$ cd alltheplaces
$ pipenv sync
```
Test for successful project installation:
```
$ pipenv run scrapy
```
If the above runs without complaint, then you have a functional installation and are ready to run and write spiders.

Codespaces

You can use GitHub Codespaces to run the project. This is a cloud-based development environment that is created from the project's repository and includes a pre-configured environment with all the tools you need to develop the project. To use Codespaces, click the button below:

Docker

You can use Docker to run the project. This is a container-based development environment that is created from the project's repository and includes a pre-configured environment with all the tools you need to develop the project.

Clone a copy of the project from the All the Places repo (or your own fork if you are considering contributing to the project):
```
$ git clone [email protected]:alltheplaces/alltheplaces.git
```

Build the Docker image:

$ cd alltheplaces
$ docker build -t alltheplaces .

Run the Docker container:
```
$ docker run -it alltheplaces
```

Contributing code

Many of the sites provide their data in a standard format. Others export their data via simple APIs. We have a number of guides to help you develop spiders:

The weekly run

The output from running the project is published on a regular cadence to our website: alltheplaces.xyz. You should not run all the spiders to pick up the output: the less the project "bothers" a website the more we will be tolerated.

Contact us

Communication is primarily through tickets on the project GitHub issue tracker. Many contributors are also present on OSM US Slack, in particular we watch the #poi channel.

License

The data generated by our spiders is provided on our website and released under Creative Commons’ CC-0 waiver.

The spider software that produces this data (this repository) is licensed under the MIT license.

Name		Name	Last commit message	Last commit date
Latest commit History 10,431 Commits
.devcontainer		.devcontainer
.github		.github
ci		ci
contrib		contrib
docs		docs
locations		locations
preview		preview
templates/spiders		templates/spiders
tests		tests
.dockerignore		.dockerignore
.flake8		.flake8
.git-blame-ignore-revs		.git-blame-ignore-revs
.gitignore		.gitignore
.isort.cfg		.isort.cfg
.pre-commit-config.yaml		.pre-commit-config.yaml
API.md		API.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
DATA_FORMAT.md		DATA_FORMAT.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
pyproject.toml		pyproject.toml
scrapinghub.yml		scrapinghub.yml
scrapy.cfg		scrapy.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

All the Places

Getting started

Development setup

Ubuntu

macOS

Codespaces

Docker

Contributing code

The weekly run

Contact us

License

About

Releases

Packages

Languages

License

rjw62/alltheplaces

Folders and files

Latest commit

History

Repository files navigation

All the Places

Getting started

Development setup

Ubuntu

macOS

Codespaces

Docker

Contributing code

The weekly run

Contact us

License

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages