Skip to content

Commit

Permalink
v1.1.0
Browse files Browse the repository at this point in the history
  • Loading branch information
thomasthaddeus committed Jan 3, 2024
1 parent d8dab3b commit eefb4c4
Show file tree
Hide file tree
Showing 31 changed files with 1,437 additions and 105 deletions.
79 changes: 0 additions & 79 deletions docs/PyPI_Upload.md

This file was deleted.

2 changes: 2 additions & 0 deletions docs/TODO/part10.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# Part 10: User-friendly GUI

Developing a user-friendly Graphical User Interface (GUI) for the DataAnalysisToolkit, to make it accessible to users who are not comfortable with coding, involves a series of detailed steps. Here's a comprehensive TODO list for this development:

1. **Research and User Experience Design**:
Expand Down
4 changes: 3 additions & 1 deletion docs/TODO/part2.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# Part 2: Advanced Statistical Analysis

To implement the "Advanced Statistical Analysis" feature in the DataAnalysisToolkit, encompassing methods like regression analysis, ANOVA, time series analysis, and hypothesis testing, the following TODO list can be followed:

1. **Research and Conceptualization**:
Expand Down Expand Up @@ -60,4 +62,4 @@ To implement the "Advanced Statistical Analysis" feature in the DataAnalysisTool
- Regularly update the statistical analysis modules to incorporate new methods and improvements.
- Monitor and fix any issues that arise post-deployment.

By completing these tasks, the DataAnalysisToolkit will be significantly enhanced with advanced statistical analysis capabilities, catering to a wider range of data analysis requirements and providing deeper insights from data.
By completing these tasks, the DataAnalysisToolkit will be significantly enhanced with advanced statistical analysis capabilities, catering to a wider range of data analysis requirements and providing deeper insights from data.
4 changes: 3 additions & 1 deletion docs/TODO/part3.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# Part 3: Machine Learning Integration

To integrate basic machine learning algorithms for classification, regression, and clustering, along with features for hyperparameter tuning and model evaluation into the DataAnalysisToolkit, you would need to complete the following tasks:

1. **Research and Planning**:
Expand Down Expand Up @@ -64,4 +66,4 @@ To integrate basic machine learning algorithms for classification, regression, a
- Regularly update the machine learning modules to incorporate new algorithms, methodologies, and improvements.
- Address any issues or bugs that emerge after deployment.

Completing these tasks would effectively integrate basic machine learning functionalities into the DataAnalysisToolkit, enhancing its capabilities and making it a more versatile tool for data analysts and scientists.
Completing these tasks would effectively integrate basic machine learning functionalities into the DataAnalysisToolkit, enhancing its capabilities and making it a more versatile tool for data analysts and scientists.
2 changes: 2 additions & 0 deletions docs/TODO/part4.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# Part 4: Natural Language Processing (NLP) Capabilities

To add Natural Language Processing (NLP) capabilities to the DataAnalysisToolkit, focusing on sentiment analysis, topic modeling, and text classification, the following tasks should be undertaken:

1. **Research and Feasibility Study**:
Expand Down
2 changes: 2 additions & 0 deletions docs/TODO/part5.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# Part 5: Automated Data Quality Checks

Implementing features for automated data quality checks, specifically focusing on detecting inconsistencies, anomalies, and biases in datasets, involves a series of methodical steps. Here's a detailed TODO list to guide the development of this feature in the DataAnalysisToolkit:

1. **Research and Conceptual Framework**:
Expand Down
2 changes: 2 additions & 0 deletions docs/TODO/part6.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# Part 6: Interactive Dashboards and Reporting

To implement functionalities for creating interactive dashboards and automated reports in the DataAnalysisToolkit, which are crucial for visualizing data insights and sharing them with non-technical stakeholders, the following TODO list should be followed:

1. **Research and Requirements Analysis**:
Expand Down
2 changes: 2 additions & 0 deletions docs/TODO/part7.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# Part 7: Real-time Data Analysis

Implementing real-time data analysis capabilities in the DataAnalysisToolkit, particularly for handling streaming data relevant to monitoring systems, financial markets, and IoT devices, involves a series of strategic and technical steps. Here's a detailed TODO list for this implementation:

1. **Research and Conceptualization**:
Expand Down
2 changes: 2 additions & 0 deletions docs/TODO/part8.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# Part 8: Customizable Data Transformation Pipelines

To implement customizable data transformation pipelines in the DataAnalysisToolkit, enabling users to create, save, and reuse these pipelines across different projects, the following tasks should be undertaken:

1. **Research and Conceptual Planning**:
Expand Down
4 changes: 3 additions & 1 deletion docs/TODO/part9.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# Part 9: Parallel Processing and Optimization

To optimize the DataAnalysisToolkit for performance by enabling parallel processing, especially beneficial for handling large datasets, the following TODO list should be completed:

1. **Research and Analysis**:
Expand Down Expand Up @@ -60,4 +62,4 @@ To optimize the DataAnalysisToolkit for performance by enabling parallel process
- Regularly maintain and update the parallel processing features to adapt to new technological advancements and user needs.
- Address any performance issues or bugs that emerge post-deployment.

Completing these tasks will significantly enhance the DataAnalysisToolkit's performance, making it more capable and efficient in handling large datasets and complex data analysis tasks.
Completing these tasks will significantly enhance the DataAnalysisToolkit's performance, making it more capable and efficient in handling large datasets and complex data analysis tasks.
85 changes: 85 additions & 0 deletions docs/data_import_documentation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
# Data Import Documentation

## Overview

The Data Import module of the DataAnalysisToolkit provides functionalities to import data from various sources such as Excel files, SQL databases, and APIs. It is designed to simplify the process of data collection and integration for analysis and machine learning tasks.

## Features

- **Excel Connector**: Import data from Excel files (.xlsx, .xls).
- **SQL Connector**: Connect and import data from SQL databases like MySQL, PostgreSQL, etc.
- **API Connector**: Fetch data from various APIs with handling for authentication and rate-limiting.
- **Data Integrator**: Merge or concatenate data from different sources into a unified DataFrame.
- **Data Formatter**: Standardize and transform the imported data into a consistent format.

## Getting Started

### Excel Connector

To import data from Excel files:

```python
from data_sources.excel_connector import ExcelConnector

connector = ExcelConnector('path/to/excel/file.xlsx')
data = connector.load_data(sheet_name='Sheet1')
```

### SQL Connector

For SQL databases:

```python
from data_sources.sql_connector import SQLConnector

connector = SQLConnector('database_URI')
data = connector.query_data('SELECT * FROM table_name')
```

### API Connector

To fetch data from an API:

```python
from data_sources.api_connector import APIConnector

connector = APIConnector('https://api.example.com', auth=('username', 'password'))
response = connector.get('endpoint')
```

### Data Integrator

Merge or concatenate data from multiple sources:

```python
from integrators.data_integrator import DataIntegrator

integrator = DataIntegrator()
integrator.add_data(data_from_excel)
integrator.add_data(data_from_sql)
combined_data = integrator.concatenate_data()
```

### Data Formatter

Standardize or transform the data:

```python
from formatters.data_formatter import DataFormatter

formatter = DataFormatter(combined_data)
formatter.standardize_dates('date_column')
formatter.normalize_numeric(['numeric_column'])
```

## Error Handling

The toolkit includes error handling for common issues encountered during data import, such as file not found, invalid format, or connection issues. Ensure to handle exceptions in your implementation to maintain robustness.

## Examples

Refer to the `examples` directory for detailed examples of using each connector and integrating data from multiple sources.

## Contribution

Contributions to enhance the data import module, such as adding new connectors or improving existing functionalities, are welcome. Please refer to the contribution guidelines for more details.
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"

[project]
name = "dataanalysistoolkit"
version = "1.0.1"
version = "1.1.0"
description = "The `DataAnalysisToolkit` project is a Python-based data analysis tool designed to streamline various data analysis tasks. It allows users to load data from CSV files and perform operations such as statistical calculations, outlier detection, data cleaning, and visualization."
authors = [
{ name = "Thaddeus Thomas", email = "[email protected]" }
Expand Down
6 changes: 6 additions & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,10 @@ scipy
scikit-learn
pandas
matplotlib
seaborn
pytest
nltk
requests
sqlalchemy
pytest-mock
openpyxl
Empty file added src/data_sources/__init__.py
Empty file.
122 changes: 122 additions & 0 deletions src/data_sources/api_connector.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
"""api_connector.py
_summary_
_extended_summary_
Returns:
_type_: _description_
# Example usage
connector = APIConnector('https://api.example.com', auth=('username', 'password'))
response = connector.get('endpoint', params={'key': 'value'})
update_response = connector.put('endpoint', json={'key': 'updated_value'})
delete_response = connector.delete('endpoint', params={'key': 'value'})
patch_response = connector.patch('endpoint', json={'key': 'new_value'})
print(response.json())
"""

import requests

class APIConnector:
"""
_summary_
_extended_summary_
"""
def __init__(self, base_url, auth=None):
"""
Initialize the APIConnector with the base URL and optional authentication.
Args:
base_url (str): The base URL for the API.
auth (tuple, optional): A tuple for authentication, typically (username, password) or an API token.
"""
self.base_url = base_url
self.auth = auth
self.session = requests.Session()
if auth:
self.session.auth = auth

def get(self, endpoint, params=None):
"""
Send a GET request to the API.
Args:
endpoint (str): The API endpoint to send the request to.
params (dict, optional): A dictionary of parameters to send with the request.
Returns:
Response: The response from the API.
"""
url = f"{self.base_url}/{endpoint}"
response = self.session.get(url, params=params)
response.raise_for_status() # Will raise an HTTPError if the HTTP request returned an unsuccessful status code
return response

def post(self, endpoint, data=None, json=None):
"""
Send a POST request to the API.
Args:
endpoint (str): The API endpoint to send the request to.
data (dict, optional): A dictionary of data to send in the body of the request.
json (dict, optional): A JSON serializable object to send in the body of the request.
Returns:
Response: The response from the API.
"""
url = f"{self.base_url}/{endpoint}"
response = self.session.post(url, data=data, json=json)
response.raise_for_status()
return response


def put(self, endpoint, data=None, json=None):
"""
Send a PUT request to the API.
Args:
endpoint (str): The API endpoint to send the request to.
data (dict, optional): A dictionary of data to send in the body of the request.
json (dict, optional): A JSON serializable object to send in the body of the request.
Returns:
Response: The response from the API.
"""
url = f"{self.base_url}/{endpoint}"
response = self.session.put(url, data=data, json=json)
response.raise_for_status()
return response

def delete(self, endpoint, params=None):
"""
Send a DELETE request to the API.
Args:
endpoint (str): The API endpoint to send the request to.
params (dict, optional): A dictionary of parameters to send with the request.
Returns:
Response: The response from the API.
"""
url = f"{self.base_url}/{endpoint}"
response = self.session.delete(url, params=params)
response.raise_for_status()
return response

def patch(self, endpoint, data=None, json=None):
"""
Send a PATCH request to the API.
Args:
endpoint (str): The API endpoint to send the request to.
data (dict, optional): A dictionary of data to send in the body of the request.
json (dict, optional): A JSON serializable object to send in the body of the request.
Returns:
Response: The response from the API.
"""
url = f"{self.base_url}/{endpoint}"
response = self.session.patch(url, data=data, json=json)
response.raise_for_status()
return response
Loading

0 comments on commit eefb4c4

Please sign in to comment.