Data science final project for DATA201. This project consisted of two tasks using python.
The purpose of this project is two work through two very different sets of data. For Task 1, the purpose is to carry out some exploratory data analysis on this dataset and finding out how frequent each value is, and how much missing data there is.
Some of the things I decided to explore were:
How many different species were killed? (Here 'killed' is defined as having the status of either 'dead', 'killed' or 'decomposed'.) What is the most common combination of capture method and target? Which fishery area is the worst in terms of bycatch? That is, find the area with highest number of bycatch events. Draw a scatter plot the latitude and longitude of all fisheries. Mark the area found in the previous question with a different colour. For each fishing year, compute the ratio of 'dead'/'killed'/'decomposed' to 'alive'. Fit a linear regression model to see how this ratio changes with time. Can we say with confidence that the bycatch situation is getting better or worse?
Whilst for Task 2, the purpose of this is to experiment with some PCA techniques on the shape of a hand using a text file.