Skip to content
This repository was archived by the owner on Jan 10, 2025. It is now read-only.

Commit 0964c48

Browse files
author
Dale McDiarmid
authored
Merge pull request #159 from elastic/pip-and-https-restaurant
Pip and https fix for restaurant
2 parents 14fec98 + 0a23ce5 commit 0964c48

File tree

3 files changed

+69
-18
lines changed

3 files changed

+69
-18
lines changed
Lines changed: 55 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,33 +1,73 @@
11
## Ingest Data using Python scripts
22

3-
If you want to ingest data into Elasticsearch starting with the raw CSV data files follow the instructions below:
3+
If you want to ingest data into Elasticsearch starting with the raw CSV data
4+
files follow the instructions below:
45

5-
##### 1. Download the following files: <br>
6-
- `ingestRestaurantData.py` - Python script to process and ingest. This script downloads the required dataset.
6+
#### 1. Download the following files:
7+
8+
- `ingestRestaurantData.py` - Python script to process and ingest. Note that this script downloads the required dataset.
79
- `inspection_mapping.json` contains mapping for Elasticsearch index
810

911
#### 2. Install and Configure Python
1012

1113
Requires Python 3.
12-
Install Dependencies using pip i.e. `pip install -r requirements.txt`
14+
Install Dependencies using pip i.e.
15+
```shell
16+
pip install -r requirements.txt
17+
```
18+
19+
Note that MacOS users may need to `brew install python3`,
20+
which would change the pip command to
21+
```shell
22+
pip3 install -r requirements.txt
23+
```
24+
#### 3. Optionally, configure the Python script for SSL
25+
26+
If your instance of Elasticsearch requires SSL, is not running locally, or both,
27+
you can tweak the script to enable it.
28+
29+
Inside the script you will notice the connection string for Elasticsearch:
30+
31+
```code
32+
es = elasticsearch.Elasticsearch(
33+
# ['host1'],
34+
# http_auth=('myuser', 'mypassword'),
35+
# port=443,
36+
# use_ssl=True
37+
)
38+
```
1339

40+
Replace the host entry with the name of your Elasticsearch endpoint (if more
41+
than one endpoint you can use a comma-separated list). For additional arguments
42+
see the Elasticsearch Python Client documentation
43+
(https://elasticsearch-py.readthedocs.io/en/master/api.html)
44+
45+
#### 4. Run Python script to process, join data and index data
46+
47+
Run `ingestRestaurantData.py` (requires Python 3). When the script is done
48+
running, you will have a `nyc_restaurants` index in your Elasticsearch instance
1449

15-
##### 2. Run Python script to process, join data and index data<br>
16-
Run `ingestRestaurantData.py` (requires Python 3). When the script is done running, you will have a `nyc_restaurants` index in your Elasticsearch instance
1750
```
18-
python3 ingestRestaurantData.py
51+
python3 ingestRestaurantData.py
1952
```
53+
2054
NOTE:
2155
- The script makes a call to Google geocoding API to get the lat/lon information for restaurants addresses. (a) You might need to sign up for a API key to avoid hitting usage limits. (b) Depending on your internet connection and the size of the inspection dataset, this step might take a 30 minutes to a few hours to complete.
2256
- We have also included a iPython Notebook version of the script `ingestRestaurantData.ipynb` in case you prefer running in a cell-by-cell mode.
2357

24-
##### 3. Check if data is available in Elasticsearch
58+
#### 5. Check if data is available in Elasticsearch
59+
2560
Check to see if all the data is available in Elasticsearch. If all goes well, you should get a `count` response of `473039` when you run the following command.
2661

27-
```shell
28-
curl -H "Content-Type: application/json" -XGET localhost:9200/nyc_restaurants/_count -d '{
29-
"query": {
30-
"match_all": {}
31-
}
32-
}'
33-
```
62+
```shell
63+
curl -H "Content-Type: application/json" -XGET localhost:9200/nyc_restaurants/_count -d '{
64+
"query": {
65+
"match_all": {}
66+
}
67+
}'
68+
```
69+
70+
NOTE:
71+
72+
If you are using https you will likely need to also use the
73+
`--user username:password` option with your curl command

Exploring Public Datasets/nyc_restaurants/scripts/ingestRestaurantData.py

Lines changed: 11 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,8 +6,17 @@
66
import elasticsearch
77
import json
88
import re
9-
10-
es = elasticsearch.Elasticsearch()
9+
import certifi
10+
11+
# If you are using the Elastic cloud, or need https/ssl, toggle the below
12+
# commented sections. Note that the Elastic cloud may be using port 9243
13+
#
14+
es = elasticsearch.Elasticsearch(
15+
# ['host1'],
16+
# http_auth=('myuser', 'mypassword'),
17+
# port=443,
18+
# use_ssl=True
19+
)
1120

1221
# In this example, we use the [Google geocoding API](https://developers.google.com/maps/documentation/geocoding/) to translate addresses into geo-coordinates. Google imposes usages limits on the API. If you are using this script to index data, you many need to sign up for an API key to overcome limits.
1322

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,10 @@
11
elasticsearch==5.0
2+
cython==0.26
23
geopy==1.11.0
34
numpy==1.11.2
45
pandas==0.19.0
56
python-dateutil==2.5.3
67
pytz==2016.7
78
six==1.10.0
8-
urllib3==1.18
9+
urllib3==1.18
10+
certifi==2017.7.27.1

0 commit comments

Comments
 (0)