Welcome to ANYWAY's Data Science Team!
Our current challenge is forming an automatic generator of infographics to empower and serve journalists, bloggers, public opinion leaders, community leaders etc. in the era of data journalism. The generated infographics will enhance reporting and news writing with the use of statistics. Each infographic will be created for a real-time road accident related news flash and will provide a deeper insight into the story based on historical data. This, we believe, will increase both the quantity and quality of articles dealing with road accidents, and will result in raising public awareness and creating pressure on decision makers to initiate infrastructure improvements in light of Vision Zero.
As volunteers in the Data Science team, we are responsible for coming up with the most relevant infographics in response to different road accident related news flashes. As a start, we will propose infographics based on statistical analysis and on a deep understanding of existing media coverage of the topic of road accidents.
Later on we intend to extend our reseach tools, analyze the data from more perspectives, and obtain more data resources. For example, we would like to use our road accidents clustering PoC to enrich our data analysis; obtain a great new data source for Computer Vision analysis by applying our privacy license plates blurring algorithm on videos from Shomreiy Haderech (שומרי הדרך); and use Natural Language Processing for creating automatic descriptions for infographics.
We encourage you to come up with your own research ideas, as well as discuss any comments, suggestions, corrctions, etc. :-)
ANYWAY is a volunteer based project acting under the umbrella of Data For Change that aims to reduce road accidents by:
- Raising public awareness of existing road hazards, thereby leading towards safer road behaviour.
- Collaborating with authorities in order to assist and drive them to find solutions in light of Vision Zero. Such solutions will improve road infrastructure and behavioural problems in attempt to prevent road fatalities.
Please see ANYWAY's main repository to read more about our vision, collaborations and previous projects :-)
Via Slack. Ask any member for an invite by email.
Please join channels #anyway, #anyway_data_science, #anyway_news_report and #general.
There are several teams in the Anyway project:
Front End, Back End, Data Engineering, Data Science, etc.
Similarly there are several “anyway” Github repositories too:
https://github.com/data-for-change?q=anyway
The repo that we, the Data Scientists, work with is:
https://github.com/data-for-change/anyway-data-science
We mainly add jupyter notebooks to it, for two purposes:
- Share useful Data Explorations
- Perform task PoC’s (Proof of Concepts)
- To later be integrated into production by DS and\or BE team
At the moment it mainly contains these three folders:
/analysis_notebooks
: Mostly used for PoC’s and other things worth considering/learning_notebooks
: Mostly used for Data Explorations/tasks
: specific tasks related notebooks
ANYWAY's google drive also consists of useful information that is difficult to show on git. You will need to request access to view the drive.
First-timers are advised to browse through (or play with) some notebooks in the repo.
They contain pieces of code and data you may find useful.
When you feel ready to take on a task, assign yourself one. (see “Tasks:” for more info)
Task owners shouldn’t deal with production-related overhead. When possible, keep it simple.
To that end we’ve gathered here a few tips:
- Notebook: Use JupyterLab, or Colab if you have trouble installing packages
- Database: Use the offline CSVs (rather then connecting to the DB)
- Location: Find “close” records using Euclidean distance
Without root is ok too: (x_db - x_news)**2 + (y_db - y_news)**2
If you haven't already, first send a permission request to view the google drive
Then, make sure you go over the ./Data/Data Index
doc file.
Redash (an online platform to query our tables using SQL)
Get some inspiration from the world - Accident data from different places in the world and data analysis
Tasks are added as Github Issues in our repo:
https://github.com/data-for-change/anyway-data-science/issues
Feel free to assign yourself a task to let the others know someone’s working on it.
Feel free to also create an issue yourself if you think of a new task, or want to report a problem (You may need repo permissions to do so).
If you haven’t already, install Git.
Enter our repo
If you haven’t already, click “Fork” on the top right to create your own copy of the code.
On the right, click “Clone or download” and then the “copy” icon.
Find a path to create a folder called “src”.
Open a terminal (or “command prompt”) and change to that directory:
cd my_folder
cd src
Paste the link after these two words and run it:
git clone <the-link-that-you-copied>
Enter the cloned repo:
cd anyway-data-science
Create your own branch:
git checkout -b task_1234
For each file you wanna add, run this:
git add my_notebook.ipynb
Wrap it all up in a commit:
git commit -m “Add my awesome notebook for task #1234”
Push it to your repo on Github’s website:
git push
this might print an error message with a suggested command, copy and run it
On your fork (copy) on Github, go to “branches” and select “New pull request” for your branch.
Set the remote repository as your “upstream” repo:
git remote add upstream https://github.com/data-for-change/anyway-data-science.git
Fetch the upstream:
git fetch upstream
Check out your fork’s local master branch:
git checkout master
Merge the changes from upstream’s master to your local master:
git merge upstream/master
Push the changes from your local master to your fork’s master in Github:
git push