Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker revamp #19

Open
wants to merge 58 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
58 commits
Select commit Hold shift + click to select a range
b292b6d
Add docker compose for kibana and elasticsearch
heyqule Aug 22, 2019
4aef0a3
Merge remote-tracking branch 'upstream/master'
heyqule Aug 22, 2019
8ee20b8
Add support for multiple set of nltk tokens. Controls by --index
heyqule Aug 23, 2019
9dca234
Fequency adjustment
heyqule Aug 23, 2019
74e6b36
Fully automate the build with docker
heyqule Aug 25, 2019
53408c9
Add support to bypass fetching stock price outside of regular hours.
heyqule Aug 26, 2019
d5393e7
Fix time display
heyqule Aug 26, 2019
4dc0238
Optimization
heyqule Aug 26, 2019
b7cda28
Fix hour() error
heyqule Aug 26, 2019
a8bf22a
Fix Cache Cleaning issue
heyqule Aug 27, 2019
aea6a59
Change startup.sh to startup.sample.sh
heyqule Aug 27, 2019
64ddc35
Add Curl to python instance for cleaning purposes.
heyqule Aug 27, 2019
004c17c
Clean cache
heyqule Aug 28, 2019
3cd6de0
Change Kibana template
heyqule Aug 28, 2019
fde9181
Move news out of original sentiment script
heyqule Aug 29, 2019
92a9447
Break down News SA
heyqule Aug 29, 2019
4205c8c
remove exposed ports
heyqule Aug 29, 2019
ff5a8cf
Elasticsearch / Kibana 7.3 change
heyqule Aug 31, 2019
e310886
Add ndjson importer
heyqule Aug 31, 2019
10502c8
Add ndjson importer
heyqule Aug 31, 2019
7675154
Remove kibana 5.6 export
heyqule Aug 31, 2019
c2a7010
Fix kibana importer
heyqule Sep 2, 2019
cb42d10
Update Copyright
heyqule Sep 2, 2019
fd9fe56
Change to wt
heyqule Sep 2, 2019
3a3b452
Change Mapping to 7.3 format
heyqule Sep 2, 2019
f3c1895
Disable twitter sentiment stream in start.sh
heyqule Sep 2, 2019
e6c9f1b
Rename original py to og.py
heyqule Sep 2, 2019
3fc49a6
Change config handling
heyqule Sep 8, 2019
e86efe7
Fix twitter
heyqule Sep 9, 2019
5f1d87f
Since it's single node insance, disable replica
heyqule Sep 9, 2019
3e862ec
Refactors
heyqule Sep 9, 2019
84c6324
Minor Import script adjustment
heyqule Sep 9, 2019
4cd6af4
Index structure change
heyqule Sep 9, 2019
622eae1
Fix message body
heyqule Sep 9, 2019
baa9d5f
Optimiaztion
heyqule Sep 10, 2019
040887b
Add delay before fetching from elasticsearch .
heyqule Sep 10, 2019
56901dc
Kibana change
heyqule Sep 10, 2019
abbc740
Kibana - remove legend
heyqule Sep 10, 2019
a6002ac
Add kibana listener
heyqule Sep 10, 2019
c6cf17b
Revert ndjson
heyqule Sep 10, 2019
bda22a4
Attempt to fix stock price operant error
heyqule Sep 10, 2019
b7226d4
Fix elastic mapping
heyqule Sep 11, 2019
5cde9c9
Add delay for Seek Alpha
heyqule Sep 11, 2019
c3431c4
Add delay for Seek Alpha
heyqule Sep 11, 2019
d086bc6
- Separate sentiment for message and title
heyqule Sep 14, 2019
f84a379
- Kibana adjustment
heyqule Sep 14, 2019
60e06fc
- Config adjustment
heyqule Sep 14, 2019
0d7c7a4
- Improve Kibana dashboard
heyqule Sep 17, 2019
fb6bea1
- Improve Kibana Dashboard
heyqule Sep 22, 2019
097c774
- Additonal Readme change
heyqule Sep 22, 2019
efc7387
- Fix kibana tmp folder issue
heyqule Sep 22, 2019
175dd61
- Minor change to spawn timers
heyqule Sep 22, 2019
a178733
Minor Refactor
heyqule Sep 24, 2019
9c55d3d
Merge branch 'master' into master
shirosaidev Oct 11, 2019
6f38025
Fix issue found by shaggy63
heyqule Oct 12, 2019
5d17a6c
Merge remote-tracking branch 'origin/master'
heyqule Oct 12, 2019
646b0d9
Disable unnecessary exposed ports
heyqule Oct 12, 2019
b985410
Add copyright blocks to non-py files
heyqule Oct 16, 2019
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -139,3 +139,10 @@ optional arguments:
-q, --quiet Run quiet with no message output
-V, --version Prints version and exits
```

### HOWTO DOCKER
- Change config.py
- Change startup.sh to include your tickers
- run docker-compose up
- ???
- Profit
3 changes: 3 additions & 0 deletions docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,9 @@ services:
nproc:
soft: 2048
hard: 2048
#expose this for local dev only!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens when this is exposed permanently?

#ports:
# - "9200:9200"
kibana:
image: docker.elastic.co/kibana/kibana:5.6.16
depends_on:
Expand Down
1 change: 1 addition & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,4 @@ tweepy
beautifulsoup4
textblob
vaderSentiment
pytz
21 changes: 16 additions & 5 deletions src/config.py.sample → src/config.sample.py
Original file line number Diff line number Diff line change
@@ -1,15 +1,18 @@
#Global Config
elasticsearch_host = "elasticsearch"
elasticsearch_port = 9200
elasticsearch_user = ""
elasticsearch_password = ""

#Sentiment Analyizers config
consumer_key = ""
consumer_secret = ""
access_token = ""
access_token_secret = ""
nltk_tokens_required = {
'default': ("Tesla", "@Tesla", "#Tesla", "tesla", "TSLA", "tsla", "#TSLA", "#tsla", "elonmusk", "Elon", "Musk"),
'tsla': ("Tesla", "@Tesla", "#Tesla", "tesla", "TSLA", "tsla", "#TSLA", "#tsla", "elonmusk", "Elon", "Musk"),
'amd': ('amd','ryzen','epyc','radeon','server','data','center','crossfire','threadripper')
'default': ("increase","decrease","buying","sold","buy","selling","winning","losing"),
'tsla': ("tesla", "@tesla", "#tesla", "tsla", "#tsla", "elonmusk", "elon", "musk"),
'amd': ('amd','ryzen','epyc','radeon','crossfire','threadripper')
}
nltk_tokens_ignored = ("win", "Win", "giveaway", "Giveaway")
twitter_feeds = ["@elonmusk", "@cnbc", "@benzinga", "@stockwits",
Expand All @@ -19,5 +22,13 @@
"@Carl_C_Icahn", "@ReformedBroker", "@bespokeinvest", "@stlouisfed",
"@muddywatersre", "@mcuban", "@AswathDamodaran", "@elerianm",
"@MorganStanley", "@ianbremmer", "@GoldmanSachs", "@Wu_Tang_Finance",
"@Schuldensuehner", "@NorthmanTrader", "@Frances_Coppola", "@bySamRo",
"@BuzzFeed","@nytimes"]
"@Schuldensuehner", "@NorthmanTrader", "@Frances_Coppola", "@BuzzFeed","@nytimes"]
sentiment_frequency = 3600

#Stock Price fetcher config
price_frequency = 900
weekday_start = 1
weekday_end = 5
hour_start = 9
hour_end = 18
timezone_str = 'America/Toronto'
24 changes: 21 additions & 3 deletions src/sentiment.py
Original file line number Diff line number Diff line change
Expand Up @@ -200,15 +200,18 @@ def on_timeout(self):


class NewsHeadlineListener:
def __init__(self, url=None, frequency=3600):
def __init__(self, url=None, frequency=sentiment_frequency):
self.url = url
self.headlines = []
self.followedlinks = []
self.frequency = frequency
self.max_cache = 1000;

while True:
new_headlines = self.get_news_headlines(self.url)

self.cleanup()

# add any new headlines
for htext, htext_url in new_headlines:
if htext not in self.headlines:
Expand Down Expand Up @@ -265,6 +268,21 @@ def __init__(self, url=None, frequency=3600):

logger.info("Will get news headlines again in %s sec..." % self.frequency)
time.sleep(self.frequency)
def cleanup(self):
new_headline = []
new_followlink = []
if len(self.headlines) > self.max_cache:
for i in range(self.max_cache / 2, len(self.headlines) - 1):
new_headline.append(self.headlines[i])

self.headlines = new_headline

if len(self.followedlinks) > self.max_cache:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could the loop be replaced with a array slice like: self.headlines = self.headlines[self.max_cache / 2:]?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is not applicable in the latest commit.

for i in range(self.max_cache / 2, len(self.followedlinks) - 1):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also looks like this can be replaced with an array slice. Consider using descriptive variable for lower bound.

new_followlink.append(self.followedlinks[i])

self.followedlinks = new_followlink


def get_news_headlines(self, url):

Expand Down Expand Up @@ -494,8 +512,8 @@ def get_twitter_users_from_file(file):
help="Use twitter user ids from file")
parser.add_argument("-n", "--newsheadlines", metavar="SYMBOL",
help="Get news headlines instead of Twitter using stock symbol, example: TSLA")
parser.add_argument("--frequency", metavar="FREQUENCY", default=3600, type=int,
help="How often in seconds to retrieve news headlines (default: 3600 sec)")
parser.add_argument("--frequency", metavar="FREQUENCY", default=sentiment_frequency, type=int,
help="How often in seconds to retrieve news headlines (default: %d sec)" % sentiment_frequency)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider using f string instead of old school variable replacement, like so: f"How often in seconds to retrieve news headlines (default: {sentiment_frequency} sec)"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is no longer applicable in the latest commit.

parser.add_argument("--followlinks", action="store_true",
help="Follow links on news headlines and scrape relevant text from landing page")
parser.add_argument("-v", "--verbose", action="store_true",
Expand Down
2 changes: 1 addition & 1 deletion src/startup.sh
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#!/bin/bash

sleep 15
sleep 30
python sentiment.py -n TSLA --followlinks -i tsla &
sleep 1
python stockprice.py -s TSLA -i tsla &
Expand Down
38 changes: 32 additions & 6 deletions src/stockprice.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,10 @@
import logging
import sys
import time

import datetime
import re
import requests
from pytz import timezone

try:
from elasticsearch5 import Elasticsearch
Expand All @@ -24,7 +26,8 @@
from random import randint

# import elasticsearch host
from config import elasticsearch_host, elasticsearch_port, elasticsearch_user, elasticsearch_password
from config import elasticsearch_host, elasticsearch_port, elasticsearch_user, elasticsearch_password, \
price_frequency, weekday_start, weekday_end, hour_start, hour_end, timezone_str


STOCKSIGHT_VERSION = '0.1-b.5'
Expand All @@ -37,19 +40,31 @@
es = Elasticsearch(hosts=[{'host': elasticsearch_host, 'port': elasticsearch_port}],
http_auth=(elasticsearch_user, elasticsearch_password))

regex = re

class GetStock:

def get_price(self, url, symbol):
import re

eastern_timezone = timezone(timezone_str)

while True:

if self.isNotLive(eastern_timezone):
#logger.info("Stock market is not live. Current time: %s" % datetime.datetime.now(timezone).strftime("%Y-%m-%d %H:%M"))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currentTime = datetime.datetime.now(timezone).strftime("%Y-%m-%d %H:%M") logger.info(f"Stock market is not live. Current time: {currentTime}")

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's from an outdated commit.

today = datetime.datetime.now(eastern_timezone)
logger.info("Stock market is not live. Current time: %s" % today.strftime('%H'))
logger.info("Will get stock data again in %s sec..." % args.frequency)
time.sleep(args.frequency)
continue


logger.info("Grabbing stock data for symbol %s..." % symbol)

try:

# add stock symbol to url
url = re.sub("SYMBOL", symbol, url)
url = regex.sub("SYMBOL", symbol, url)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice touch, improves readability.

# get stock data (json) from url
try:
r = requests.get(url)
Expand Down Expand Up @@ -113,6 +128,17 @@ def get_price(self, url, symbol):
logger.info("Will get stock data again in %s sec..." % args.frequency)
time.sleep(args.frequency)

def isNotLive(self, timezone):
today = datetime.datetime.now(timezone);
if today.weekday() >= weekday_start and \
today.weekday() <= weekday_end and \
today.hour() >= hour_start and \
today.hour() <= hour_end:
return False;

return True;



if __name__ == '__main__':

Expand All @@ -124,8 +150,8 @@ def get_price(self, url, symbol):
help="Delete existing Elasticsearch index first")
parser.add_argument("-s", "--symbol", metavar="SYMBOL",
help="Stock symbol to use, example: TSLA")
parser.add_argument("-f", "--frequency", metavar="FREQUENCY", default=600, type=int,
help="How often in seconds to retrieve stock data (default: 120 sec)")
parser.add_argument("-f", "--frequency", metavar="FREQUENCY", default=price_frequency, type=int,
help="How often in seconds to retrieve stock data (default: %d sec)" % price_frequency)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

help=f"How often in seconds to retrieve stock data (default: {price_frequency} sec)"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is no longer applicable in the latest commit.

parser.add_argument("-v", "--verbose", action="store_true",
help="Increase output verbosity")
parser.add_argument("--debug", action="store_true",
Expand Down