Skip to content

Latest commit

 

History

History
422 lines (321 loc) · 20 KB

README.md

File metadata and controls

422 lines (321 loc) · 20 KB

Rudaev, Alexander, 22303397

Rahman, Kazi Shafwanur, 22305619

Assistance Systems Project

https://mygit.th-deg.de/ar08397/Assistant-Systems-Project

https://mygit.th-deg.de/ar08397/Assistant-Systems-Project/-/wikis/home

Assistance Systems Project Banner

Project Description

Assistance Systems Project is a data-driven web application designed to help users analyze health-related data, gain valuable insights, and receive personalized recommendations aimed at reducing the risk of stroke. The system integrates multiple technologies—Streamlit for the frontend, Rasa for a conversational chatbot, and scikit-learn for machine learning—to deliver a comprehensive health assistance platform.

Key functionalities include:

  • Data Analysis and Visualization: Users can upload or filter existing health data, detect outliers, and visualize correlations between features such as age, BMI, and glucose levels.
  • Machine Learning for Risk Assessment: Multiple models (e.g., Logistic Regression, SVM, Random Forest) are trained on both real and augmented datasets to predict stroke risk. The project supports data augmentation using Faker and SMOTE to address class imbalance.
  • Personalized Health Recommendations: By inputting personal health metrics (age, gender, BMI, etc.), users receive customized advice to mitigate stroke risk. These recommendations are generated by the trained machine learning models and displayed in a user-friendly interface.
  • Rasa-Powered Chatbot: A built-in chatbot answers user queries, provides health insights, and offers real-time recommendations. It leverages a Rasa model for intent classification, entity extraction, and dialogue management.

By uniting these components, the Assistance Systems Project empowers users to make informed decisions about their health, quickly access essential insights, and engage with an intelligent chatbot for targeted recommendations—all within a single, containerized application environment.

Installation

Prerequisites

  • Docker: Ensure Docker is installed on your system.
  • Git: For cloning the repository and managing submodules.

Steps

  1. Clone the Repository:
    git clone https://mygit.th-deg.de/ar08397/Assistant-Systems-Project.git
    cd Assistant-Systems-Project
  1. Build and Start Services:

    docker-compose up --build
  2. Access the Streamlit Application: Open your browser and navigate to http://localhost:8501 to access the interactive web interface.

IMPORTANT NOTE ℹ️

If you are using the chatbot feature for the first time, please ensure that the Rasa model has been properly trained. Training the model is crucial for the chatbot to function correctly. To train the Rasa model, please follow the instructions in the Training the Rasa Chatbot section below. Failing to train the Rasa model may result in unexpected behavior or errors when interacting with the chatbot.

Data

Dataset

We utilize the Stroke Prediction Dataset from Kaggle for training and evaluation.

Data Handling

  • Outlier Detection: Implemented using the Z-score method to identify and handle anomalies.
  • Data Augmentation: Added 30% realistic synthetic data to enhance dataset robustness.
  • Data Transformation: Normalized numerical features following best practices outlined in Google's Machine Learning Crash Course.

Basic Usage

Running the Application with Docker

  1. Ensure Docker Engine is Installed: Make sure Docker is installed and running on your system.

  2. Build and Start Services: Navigate to the project root directory and execute:

    docker-compose up --build

    This command builds the Docker images, trains the Rasa model, and starts all services as defined in docker-compose.yml.

  3. Access the Streamlit Application: Open your browser and navigate to http://localhost:8501 to access the interactive web interface.

Training the Rasa Chatbot

If a Rasa model has not been trained in the models/chatbot/ directory, follow these steps:

  1. Access the Rasa Server Container:

    docker exec -it rasa_server bash
  2. Train the Rasa Model: Inside the container, execute:

    rasa train
  3. Restart Services: Exit the container and restart the Docker services:

    exit
    docker-compose down
    docker-compose up --build -d
  4. Finalize Setup:

    • Navigate to the Data Analysis page in the Streamlit app and wait for the evaluation models to finish training.
    • Once evaluations are complete, models will be available for use within the Chatbot.
    • Ensure that data filters are applied as needed and that session management maintains these filters when switching between Data Analysis and Chatbot sections.

Demonstration Video

A screencast demonstrating the key functionalities of the Assistance Systems Project. The demo includes:

  • Application Workflow Without Rasa Model:

    • Showcases the application's basic functionalities when the Rasa model is not trained, highlighting data analysis and recommendation features.
  • Training the Rasa Model:

    • Demonstrates the steps to train the Rasa model, ensuring the chatbot is operational.
  • Data Analysis Workflow:

    • Walks through the data analysis process, including loading data, preprocessing, outlier detection, and visualization.
  • Generating Recommendations:

    • Illustrates how personalized health recommendations are generated based on user input and model predictions.
  • Chatbot Functionalities:

    • Exhibits interactions with the Rasa chatbot, including handling user queries, providing recommendations, and managing conversations.

Watch the Demo Video

Implementation of the Requests

The Assistance Systems Project encompasses a multi-faceted approach to developing a data-driven web application integrated with a chatbot for personalized health recommendations. Below is an overview of how each project request has been implemented:

  1. Multi-page Web App with Streamlit:

    • Home Page:
      • File: src/web/home.py
      • Function: run()
      • Description: Implements the user interface for collecting personal health information through interactive forms using Streamlit's form functionalities.
    • Data Analysis Page:
      • File: src/web/data_analysis_page.py
      • Function: run()
      • Description: Handles data analysis operations, including loading, preprocessing, filtering, visualization, and model training.
    • Personalized Recommendations Page:
      • File: src/web/recommendations.py
      • Function: run()
      • Description: Displays personalized health recommendations based on user input and predictive modeling.
    • Chatbot Page:
      • File: src/web/chatbot_page.py
      • Function: run()
      • Description: Integrates the Rasa-based chatbot within the Streamlit application, enabling real-time user interactions and assistance.
  2. Data Handling and Augmentation:

    • Data Import:
      • File: src/data/data_loader.py
      • Function: load_data(filepath=None)
      • Description: Loads the dataset from a predefined CSV file, ensuring seamless data ingestion into the application.
    • Outlier Handling:
      • File: src/data/data_analysis.py
      • Function: preprocess_data(data)
      • Description: Implements statistical methods to identify and manage outliers, enhancing data integrity and reliability.
    • Fake Data Generation:
      • File: src/data/data_augmentation.py
      • Function: augment_data(X, y, augmentation_factor=0.3)
      • Description: Utilizes the Faker library to generate synthetic data, augmenting the original dataset by 30% to improve model robustness.
  3. Machine Learning Integration with Scikit-Learn:

    • Model Training:

      • File: src/data/data_analysis.py
      • Function: DataAnalysis.train_models(status_text, progress_bar)
      • Description: Trains multiple machine learning models, including Logistic Regression, Support Vector Machines, and Random Forest classifiers, on both real and augmented datasets.
    • Model Evaluation:

      • File: src/data/data_analysis.py
      • Function: DataAnalysis.evaluate_model(model, model_name, X_test, y_test, data_type="")
      • Description: Assesses model performance using metrics such as Accuracy, Precision, Recall, F1 Score, and ROC AUC, with results visualized within the application.
    • Recommendation System:

      • File: src/web/recommendations.py
      • Function: run()
      • Description: Generates personalized health recommendations based on user-provided data and predictive modeling outputs.
    • Chat Recommendation System:

      • File: actions/actions.py
      • Function: ActionGenerateRecommendation.run(dispatcher, tracker, domain)
      • Description: Generates personalized health recommendations based on user-provided data and predictive modeling outputs.
  4. Chatbot Development with Rasa:

    • Intent Recognition and Entity Extraction:
      • Files: data/nlu.yml, data/domain.yml
      • Description: Defined intents and entities within Rasa's configuration files to enable accurate understanding of user inputs.
    • Custom Actions:
      • File: actions/actions.py
      • Functions: ActionShowDataAnalysis.run(), ActionGenerateRecommendation.run(), ActionProvideStrokeRiskReductionAdvice.run(), ActionFallback.run()
      • Description: Implements custom actions to handle data analysis summaries, generate recommendations, provide stroke risk reduction advice, and manage fallback responses.
    • Integration with Streamlit:
      • File: src/web/chatbot_page.py
      • Function: run()
      • Description: Ensures seamless communication between the Streamlit application and the Rasa chatbot through REST APIs.
  5. Documentation and Version Control:

    • MyGit Repository and Wiki:
      • Files: All project files are maintained within the Git repository, with detailed documentation in the Wiki.
      • Description: Organizes source code, documentation, and model files in a structured manner, facilitating collaboration and version control.
    • README Structure:
      • File: README.md
      • Description: Adheres to the specified structure, providing clear instructions, project details, and comprehensive information on setup and usage.

Each component has been meticulously developed to ensure a cohesive and user-friendly application that leverages data analysis and machine learning to deliver meaningful health recommendations.

Work Done

The Assistance Systems Project was developed collaboratively by two team members, each contributing distinct components to ensure a comprehensive and robust application.

Student 1: [Alexander Rudaev, Mat-No: 22303397]

  1. Graphical User Interface (GUI) / Visualization:

    • Developed the Streamlit-based multi-page web application interface.
    • Implemented interactive forms for data collection and dynamic visualizations using Altair.
    • Designed the layout and navigation structure to enhance user experience.
  2. General Data Analysis:

    • Conducted exploratory data analysis using Pandas to uncover key insights and correlations.
    • Implemented statistical methods for outlier detection and data cleaning.
    • Integrated data visualization tools to present analysis results within the application.
  3. Sample Dialogs:

    • Created and documented sample interaction scenarios for the Rasa chatbot.
    • Ensured that dialogues effectively cover use cases such as data analysis requests and health recommendations.
    • Collaborated on refining chatbot responses to align with user intents.

Student 2: [Kazi Shafwanur Rahman, Mat-No: 22305619]

  1. Strategies for Outliers and Fake Data:

    • Developed algorithms for identifying and managing outliers within the dataset.
    • Utilized the Faker library to generate realistic synthetic data, augmenting the original dataset by 30%.
    • Documented the approaches and their impact on model training and performance.
  2. Scikit-Learn Integration:

    • Trained multiple machine learning models including Logistic Regression, Support Vector Machines, and Random Forest classifiers.
    • Performed model evaluation using metrics such as Accuracy, Precision, Recall, F1 Score, and ROC AUC.
    • Selected the best-performing model based on evaluation results and integrated it into the recommendation system.
  3. Dialog Flow:

    • Designed and implemented the dialog flow within the Rasa framework to handle various user intents.
    • Configured intents, entities, and actions to support seamless interactions and accurate intent recognition.
    • Ensured that the chatbot effectively manages conversation states and provides relevant responses.

Both Members: Documentation and Programming

  • Documentation:

    • Maintained comprehensive project documentation within the MyGit Wiki, covering project setup, data handling, model training, and usage instructions.
    • Structured the README.md file according to the specified guidelines, ensuring clarity and completeness.
  • Programming:

    • Collaborated on integrating different components of the application, including the web interface, data analysis modules, machine learning models, and chatbot functionalities.
    • Ensured code quality through consistent coding standards, thorough testing, and effective version control using Git.

This collaborative effort resulted in a well-rounded and functional application that meets the project’s objectives and provides valuable health recommendations through an intuitive user interface and intelligent chatbot assistance.

Features

  • Interactive Web Interface: Built with Streamlit, offering a seamless and responsive user experience.
  • Personalized Recommendations: Utilizes Scikit-Learn algorithms to provide tailored suggestions.
  • Data Analysis & Visualization: Employs Pandas and Altair for insightful data analysis and visualization.
  • Chatbot Support: Integrates a Rasa-powered chatbot to assist users and enhance interaction.
  • Robust Data Handling: Implements strategies for outlier detection and augmentation with realistic fake data.

Chatbot Integration

Overview

The chatbot is built using the Rasa framework and is designed to interact contextually with the data analysis results presented on the Streamlit app. It can assist users in navigating the application, provide recommendations, and answer queries related to the data insights.

Features

  • Context-Aware Conversations: Understands the context from user interactions and provides relevant responses.
  • Data-Driven Responses: Fetches and presents data analysis results upon user requests.
  • Seamless Integration: Embedded within the Streamlit app for a unified user experience.

Configuration

  • Rasa Server: Runs on port 5005.
  • Rasa Action Server: Runs on port 5055.
  • Streamlit App: Communicates with the Rasa server via the Docker network.

Custom Actions

Custom actions are implemented in actions/actions.py to enable the chatbot to fetch and present data analysis results. These actions interact with the Streamlit app's data processing modules to retrieve relevant insights based on user queries.

Modeling

Algorithms

  • Random Forest Classifier: Chosen for its robustness and ability to handle feature interactions.
  • Support Vector Machine (SVM): Utilized for its effectiveness in high-dimensional spaces.

Model Training

Models are trained using Scikit-Learn, with performance evaluated based on accuracy, precision, recall, and F1-score. The best-performing model is integrated into the Streamlit app for generating personalized recommendations.

Docker Setup

The project is containerized using Docker and orchestrated with Docker Compose to ensure consistent environments across development and production.

Services

  • Rasa Server (rasa_server): Handles natural language understanding (NLU) and dialogue management.
  • Rasa Action Server (rasa_action_server): Executes custom actions defined in the project.
  • Streamlit App (streamlit_app): Provides the interactive frontend for users.
  • Duckling (duckling): (Optional) Extracts entities like dates, times, and numbers from user inputs.

Running the Services

Ensure Docker and Docker Compose are installed, then execute:

docker-compose up --build

Accessing Services

Project Structure

assistance-systems-project/
├── actions/
│   ├── actions.py
│   ├── Dockerfile
│   ├── requirements-actions.txt
│   └── __init__.py
├── data/
│   ├── data_analysis.py
│   ├── data_augmentation.py
│   ├── data_loader.py
│   ├── data_preprocessor.py
│   ├── data_visualization.py
│   ├── nlu.yml
│   ├── processed/
│   │   └── ...
│   ├── raw/
│   │   └── healthcare-dataset-stroke-data.csv
│   ├── stories.yml
│   └── user_data/
│       └── ...
├── models/
│   ├── chatbot/
│   │   └── ...
│   └── data_analysis/
│       └── ...
├── src/
│   ├── app.py
│   ├── chatbot/
│   │   ├── rasa_chatbot.py
│   │   └── __init__.py
│   ├── web/
│   │   ├── home.py
│   │   ├── recommendations.py
│   │   ├── chatbot_page.py
│   │   ├── data_analysis_page.py
│   │   └── __init__.py
│   └── __init__.py
├── .dockerignore
├── .gitignore
├── Dockerfile
├── docker-compose.yml
├── requirements.txt
├── README.md
└── docs/
    └── ...

License

This project is licensed under the GNU GENERAL PUBLIC LICENSE.

Contact

For any inquiries or support, please open an issue in the MyGit Repository.