v1.1.0

thomasthaddeus · Jan 3, 2024 · eefb4c4 · eefb4c4
1 parent d8dab3b
commit eefb4c4
Show file tree

Hide file tree

Showing 31 changed files with 1,437 additions and 105 deletions.
diff --git a/docs/PyPI_Upload.md b/docs/PyPI_Upload.md
diff --git a/docs/TODO/part10.md b/docs/TODO/part10.md
@@ -1,3 +1,5 @@
+# Part 10: User-friendly GUI
+
 Developing a user-friendly Graphical User Interface (GUI) for the DataAnalysisToolkit, to make it accessible to users who are not comfortable with coding, involves a series of detailed steps. Here's a comprehensive TODO list for this development:
 
 1. **Research and User Experience Design**:

diff --git a/docs/TODO/part2.md b/docs/TODO/part2.md
@@ -1,3 +1,5 @@
+# Part 2: Advanced Statistical Analysis
+
 To implement the "Advanced Statistical Analysis" feature in the DataAnalysisToolkit, encompassing methods like regression analysis, ANOVA, time series analysis, and hypothesis testing, the following TODO list can be followed:
 
 1. **Research and Conceptualization**:
@@ -60,4 +62,4 @@ To implement the "Advanced Statistical Analysis" feature in the DataAnalysisTool
     - Regularly update the statistical analysis modules to incorporate new methods and improvements.
     - Monitor and fix any issues that arise post-deployment.
 
-By completing these tasks, the DataAnalysisToolkit will be significantly enhanced with advanced statistical analysis capabilities, catering to a wider range of data analysis requirements and providing deeper insights from data.
+By completing these tasks, the DataAnalysisToolkit will be significantly enhanced with advanced statistical analysis capabilities, catering to a wider range of data analysis requirements and providing deeper insights from data.
diff --git a/docs/TODO/part3.md b/docs/TODO/part3.md
@@ -1,3 +1,5 @@
+# Part 3: Machine Learning Integration
+
 To integrate basic machine learning algorithms for classification, regression, and clustering, along with features for hyperparameter tuning and model evaluation into the DataAnalysisToolkit, you would need to complete the following tasks:
 
 1. **Research and Planning**:
@@ -64,4 +66,4 @@ To integrate basic machine learning algorithms for classification, regression, a
     - Regularly update the machine learning modules to incorporate new algorithms, methodologies, and improvements.
     - Address any issues or bugs that emerge after deployment.
 
-Completing these tasks would effectively integrate basic machine learning functionalities into the DataAnalysisToolkit, enhancing its capabilities and making it a more versatile tool for data analysts and scientists.
+Completing these tasks would effectively integrate basic machine learning functionalities into the DataAnalysisToolkit, enhancing its capabilities and making it a more versatile tool for data analysts and scientists.
diff --git a/docs/TODO/part4.md b/docs/TODO/part4.md
@@ -1,3 +1,5 @@
+# Part 4: Natural Language Processing (NLP) Capabilities
+
 To add Natural Language Processing (NLP) capabilities to the DataAnalysisToolkit, focusing on sentiment analysis, topic modeling, and text classification, the following tasks should be undertaken:
 
 1. **Research and Feasibility Study**:

diff --git a/docs/TODO/part5.md b/docs/TODO/part5.md
@@ -1,3 +1,5 @@
+# Part 5: Automated Data Quality Checks
+
 Implementing features for automated data quality checks, specifically focusing on detecting inconsistencies, anomalies, and biases in datasets, involves a series of methodical steps. Here's a detailed TODO list to guide the development of this feature in the DataAnalysisToolkit:
 
 1. **Research and Conceptual Framework**:

diff --git a/docs/TODO/part6.md b/docs/TODO/part6.md
@@ -1,3 +1,5 @@
+# Part 6: Interactive Dashboards and Reporting
+
 To implement functionalities for creating interactive dashboards and automated reports in the DataAnalysisToolkit, which are crucial for visualizing data insights and sharing them with non-technical stakeholders, the following TODO list should be followed:
 
 1. **Research and Requirements Analysis**:

diff --git a/docs/TODO/part7.md b/docs/TODO/part7.md
@@ -1,3 +1,5 @@
+# Part 7: Real-time Data Analysis
+
 Implementing real-time data analysis capabilities in the DataAnalysisToolkit, particularly for handling streaming data relevant to monitoring systems, financial markets, and IoT devices, involves a series of strategic and technical steps. Here's a detailed TODO list for this implementation:
 
 1. **Research and Conceptualization**:

diff --git a/docs/TODO/part8.md b/docs/TODO/part8.md
@@ -1,3 +1,5 @@
+# Part 8: Customizable Data Transformation Pipelines
+
 To implement customizable data transformation pipelines in the DataAnalysisToolkit, enabling users to create, save, and reuse these pipelines across different projects, the following tasks should be undertaken:
 
 1. **Research and Conceptual Planning**:

diff --git a/docs/TODO/part9.md b/docs/TODO/part9.md
@@ -1,3 +1,5 @@
+# Part 9: Parallel Processing and Optimization
+
 To optimize the DataAnalysisToolkit for performance by enabling parallel processing, especially beneficial for handling large datasets, the following TODO list should be completed:
 
 1. **Research and Analysis**:
@@ -60,4 +62,4 @@ To optimize the DataAnalysisToolkit for performance by enabling parallel process
     - Regularly maintain and update the parallel processing features to adapt to new technological advancements and user needs.
     - Address any performance issues or bugs that emerge post-deployment.
 
-Completing these tasks will significantly enhance the DataAnalysisToolkit's performance, making it more capable and efficient in handling large datasets and complex data analysis tasks.
+Completing these tasks will significantly enhance the DataAnalysisToolkit's performance, making it more capable and efficient in handling large datasets and complex data analysis tasks.
diff --git a/docs/data_import_documentation.md b/docs/data_import_documentation.md
@@ -0,0 +1,85 @@
+# Data Import Documentation
+
+## Overview
+
+The Data Import module of the DataAnalysisToolkit provides functionalities to import data from various sources such as Excel files, SQL databases, and APIs. It is designed to simplify the process of data collection and integration for analysis and machine learning tasks.
+
+## Features
+
+- **Excel Connector**: Import data from Excel files (.xlsx, .xls).
+- **SQL Connector**: Connect and import data from SQL databases like MySQL, PostgreSQL, etc.
+- **API Connector**: Fetch data from various APIs with handling for authentication and rate-limiting.
+- **Data Integrator**: Merge or concatenate data from different sources into a unified DataFrame.
+- **Data Formatter**: Standardize and transform the imported data into a consistent format.
+
+## Getting Started
+
+### Excel Connector
+
+To import data from Excel files:
+
+```python
+from data_sources.excel_connector import ExcelConnector
+
+connector = ExcelConnector('path/to/excel/file.xlsx')
+data = connector.load_data(sheet_name='Sheet1')
+```
+
+### SQL Connector
+
+For SQL databases:
+
+```python
+from data_sources.sql_connector import SQLConnector
+
+connector = SQLConnector('database_URI')
+data = connector.query_data('SELECT * FROM table_name')
+```
+
+### API Connector
+
+To fetch data from an API:
+
+```python
+from data_sources.api_connector import APIConnector
+
+connector = APIConnector('https://api.example.com', auth=('username', 'password'))
+response = connector.get('endpoint')
+```
+
+### Data Integrator
+
+Merge or concatenate data from multiple sources:
+
+```python
+from integrators.data_integrator import DataIntegrator
+
+integrator = DataIntegrator()
+integrator.add_data(data_from_excel)
+integrator.add_data(data_from_sql)
+combined_data = integrator.concatenate_data()
+```
+
+### Data Formatter
+
+Standardize or transform the data:
+
+```python
+from formatters.data_formatter import DataFormatter
+
+formatter = DataFormatter(combined_data)
+formatter.standardize_dates('date_column')
+formatter.normalize_numeric(['numeric_column'])
+```
+
+## Error Handling
+
+The toolkit includes error handling for common issues encountered during data import, such as file not found, invalid format, or connection issues. Ensure to handle exceptions in your implementation to maintain robustness.
+
+## Examples
+
+Refer to the `examples` directory for detailed examples of using each connector and integrating data from multiple sources.
+
+## Contribution
+
+Contributions to enhance the data import module, such as adding new connectors or improving existing functionalities, are welcome. Please refer to the contribution guidelines for more details.
diff --git a/pyproject.toml b/pyproject.toml
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 
 [project]
 name = "dataanalysistoolkit"
-version = "1.0.1"
+version = "1.1.0"
 description = "The `DataAnalysisToolkit` project is a Python-based data analysis tool designed to streamline various data analysis tasks. It allows users to load data from CSV files and perform operations such as statistical calculations, outlier detection, data cleaning, and visualization."
 authors = [
     { name = "Thaddeus Thomas", email = "[email protected]" }

diff --git a/requirements.txt b/requirements.txt
@@ -3,4 +3,10 @@ scipy
 scikit-learn
 pandas
 matplotlib
+seaborn
 pytest
+nltk
+requests
+sqlalchemy
+pytest-mock
+openpyxl
diff --git a/src/data_sources/__init__.py b/src/data_sources/__init__.py
diff --git a/src/data_sources/api_connector.py b/src/data_sources/api_connector.py
@@ -0,0 +1,122 @@
+"""api_connector.py
+_summary_
+
+_extended_summary_
+
+Returns:
+    _type_: _description_
+
+# Example usage
+connector = APIConnector('https://api.example.com', auth=('username', 'password'))
+response = connector.get('endpoint', params={'key': 'value'})
+update_response = connector.put('endpoint', json={'key': 'updated_value'})
+delete_response = connector.delete('endpoint', params={'key': 'value'})
+patch_response = connector.patch('endpoint', json={'key': 'new_value'})
+print(response.json())
+"""
+
+import requests
+
+class APIConnector:
+    """
+     _summary_
+
+    _extended_summary_
+    """
+    def __init__(self, base_url, auth=None):
+        """
+        Initialize the APIConnector with the base URL and optional authentication.
+
+        Args:
+            base_url (str): The base URL for the API.
+            auth (tuple, optional): A tuple for authentication, typically (username, password) or an API token.
+        """
+        self.base_url = base_url
+        self.auth = auth
+        self.session = requests.Session()
+        if auth:
+            self.session.auth = auth
+
+    def get(self, endpoint, params=None):
+        """
+        Send a GET request to the API.
+
+        Args:
+            endpoint (str): The API endpoint to send the request to.
+            params (dict, optional): A dictionary of parameters to send with the request.
+
+        Returns:
+            Response: The response from the API.
+        """
+        url = f"{self.base_url}/{endpoint}"
+        response = self.session.get(url, params=params)
+        response.raise_for_status()  # Will raise an HTTPError if the HTTP request returned an unsuccessful status code
+        return response
+
+    def post(self, endpoint, data=None, json=None):
+        """
+        Send a POST request to the API.
+
+        Args:
+            endpoint (str): The API endpoint to send the request to.
+            data (dict, optional): A dictionary of data to send in the body of the request.
+            json (dict, optional): A JSON serializable object to send in the body of the request.
+
+        Returns:
+            Response: The response from the API.
+        """
+        url = f"{self.base_url}/{endpoint}"
+        response = self.session.post(url, data=data, json=json)
+        response.raise_for_status()
+        return response
+
+
+    def put(self, endpoint, data=None, json=None):
+        """
+        Send a PUT request to the API.
+
+        Args:
+            endpoint (str): The API endpoint to send the request to.
+            data (dict, optional): A dictionary of data to send in the body of the request.
+            json (dict, optional): A JSON serializable object to send in the body of the request.
+
+        Returns:
+            Response: The response from the API.
+        """
+        url = f"{self.base_url}/{endpoint}"
+        response = self.session.put(url, data=data, json=json)
+        response.raise_for_status()
+        return response
+
+    def delete(self, endpoint, params=None):
+        """
+        Send a DELETE request to the API.
+
+        Args:
+            endpoint (str): The API endpoint to send the request to.
+            params (dict, optional): A dictionary of parameters to send with the request.
+
+        Returns:
+            Response: The response from the API.
+        """
+        url = f"{self.base_url}/{endpoint}"
+        response = self.session.delete(url, params=params)
+        response.raise_for_status()
+        return response
+
+    def patch(self, endpoint, data=None, json=None):
+        """
+        Send a PATCH request to the API.
+
+        Args:
+            endpoint (str): The API endpoint to send the request to.
+            data (dict, optional): A dictionary of data to send in the body of the request.
+            json (dict, optional): A JSON serializable object to send in the body of the request.
+
+        Returns:
+            Response: The response from the API.
+        """
+        url = f"{self.base_url}/{endpoint}"
+        response = self.session.patch(url, data=data, json=json)
+        response.raise_for_status()
+        return response