Skip to content

Commit

Permalink
Merge pull request #161 from DrifterKaru/master
Browse files Browse the repository at this point in the history
Add tor proxy and onion basic spider configurations
  • Loading branch information
sajithaliyanage authored Jul 19, 2022
2 parents bddecde + 81bef55 commit 6c81be0
Show file tree
Hide file tree
Showing 22 changed files with 13,084 additions and 19,961 deletions.
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -106,7 +106,7 @@ celerybeat.pid
*.sage.py

# Environments
.env
*.env
.venv
env/
venv/
Expand Down
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@

# CrawlerX - Develop Extensible, Distributed, Scalable Crawler System

The CrawlerX is a platform which we can use for crawl web URLs in different kind of protocols in a distributed way. Web crawling often called web scraping is a method of programmatically going over a collection of web pages and extracting data which useful for data analysis with web-based data. With a web scraper, you can mine data about a set of products, get a large corpus of text or quantitative data to play around with, get data from a site without an official API, or just satisfy your own personal curiosity.
Expand Down
19 changes: 11 additions & 8 deletions crawlerx_app/.env
Original file line number Diff line number Diff line change
@@ -1,8 +1,11 @@
VUE_APP_FIREBASE_API_KEY = "<your-api-key>"
VUE_APP_FIREBASE_AUTH_DOMAIN = "<your-auth-domain>"
VUE_APP_FIREBASE_DB_DOMAIN= "<your-db-domain>"
VUE_APP_FIREBASE_PROJECT_ID = "<your-project-id>"
VUE_APP_FIREBASE_STORAGE_BUCKET = "<your-storage-bucket>"
VUE_APP_FIREBASE_MESSAGING_SENDER_ID= "<your-messaging-sender-id>"
VUE_APP_FIREBASE_APP_ID = "<your-app-id>"
VUE_APP_FIREBASE_MEASURMENT_ID = "<your-measurementId>"
VUE_APP_FIREBASE_API_KEY = "AIzaSyBz5zJU8nWCwpB4N60b1pyGyW88g5CdBpY"
VUE_APP_FIREBASE_AUTH_DOMAIN = "crawlerx-e4a5d.firebaseapp.com"
VUE_APP_FIREBASE_DB_DOMAIN= "https://crawlerx-e4a5d.firebaseapp.com"
VUE_APP_FIREBASE_PROJECT_ID = "crawlerx-e4a5d"
VUE_APP_FIREBASE_STORAGE_BUCKET = "crawlerx-e4a5d.appspot.com"
VUE_APP_FIREBASE_MESSAGING_SENDER_ID= "352593421105"
VUE_APP_FIREBASE_APP_ID = "1:352593421105:web:5b82330e1c74538a418610"
VUE_APP_FIREBASE_MEASURMENT_ID = ""
VUE_APP_DJANGO_PROTOCOL = "http"
VUE_APP_DJANGO_HOSTNAME = "django"
VUE_APP_DJANGO_PORT = "8000"
9 changes: 2 additions & 7 deletions crawlerx_app/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,8 +1,6 @@
# Choose the Image which has Node installed already
FROM node:lts-alpine

# install simple http server for serving static content
RUN npm install -g http-server

# make the 'app' folder the current working directory
WORKDIR /app

Expand All @@ -15,8 +13,5 @@ RUN npm install
# copy project files and folders to the current working directory (i.e. 'app' folder)
COPY . .

# build app for production with minification
RUN npm run build

EXPOSE 8080
CMD [ "http-server", "dist" ]
CMD [ "npm", "run", "serve" ]
28 changes: 28 additions & 0 deletions crawlerx_app/nginx/nginx.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
server {
listen 8080;
server_name _;
server_tokens off;
client_max_body_size 20M;

location / {
root /usr/share/nginx/html;
index index.html index.htm;
try_files $uri $uri/ /index.html;
}

location /api {
try_files $uri @proxy_api;
}


location @proxy_api {
proxy_set_header X-Forwarded-Proto https;
proxy_set_header X-Url-Scheme $scheme;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header Host $http_host;
proxy_redirect off;
proxy_pass http://backend:8000;
}


}
Loading

0 comments on commit 6c81be0

Please sign in to comment.