Skip to content

Latest commit

Β 

History

History
395 lines (322 loc) Β· 23.7 KB

README.md

File metadata and controls

395 lines (322 loc) Β· 23.7 KB

KNOWLEDGE-EXTRACT

Unlocking Knowledge, Powering Conversations.

MIT License last-commit repo-top-language repo-language-count


πŸ”— Table of Contents


πŸ“ Overview

The Knowledge-Extract project revolutionizes the way we interact with digital content by transforming real-time conversations into structured, readable markdown documents. Leveraging advanced APIs for transcription and speech synthesis, this tool is ideal for educators, journalists, and content creators who need to capture and document dynamic interactions effortlessly. Its seamless integration and user-friendly interface ensure a smooth, efficient workflow, enhancing productivity and content accessibility.


πŸ‘Ύ Features

Feature Summary
βš™οΈ Architecture
  • Utilizes a multi-language stack including Python, JavaScript, and HTML.
  • Employs both frontend and backend components, likely with a web interface.
  • Containerized with Docker for easy deployment and isolation.
πŸ”© Code Quality
  • Includes a variety of dependencies indicating a complex but potentially feature-rich application.
  • Uses modern libraries like Pydantic for data validation and FastAPI or similar frameworks.
  • Codebase likely follows modern coding practices given the use of tools like poetry for dependency management.
πŸ“„ Documentation
  • Presence of 'readmeai' suggests automated documentation generation or enhancement.
  • Use of 'robots.txt' implies web crawling management, important for SEO and web visibility.
  • Documentation might be extensive to cover diverse dependencies and their integration.
πŸ”Œ Integrations
  • Integrates with various data handling and visualization libraries like pandas, matplotlib, and scikit-learn.
  • Web technologies integration (HTML, CSS, JavaScript) for frontend development.
  • API development suggested by dependencies on libraries like 'fastjsonschema' and 'httpcore'.
🧩 Modularity
  • Use of Docker and poetry implies a modular approach to dependency and environment management.
  • Project likely structured in a way that supports scalability and maintenance.
  • Modular codebase can be inferred from the diverse set of dependencies catering to different functionalities.
πŸ§ͺ Testing
  • Dependencies like 'pytest', 'pluggy', and 'mock' suggest a strong emphasis on testing.
  • Use of continuous integration tools or practices could be inferred.
  • Testing likely covers multiple layers of the application, from backend logic to frontend integration.
⚑️ Performance
  • Use of performance optimization libraries like 'numpy' and 'pandas' for data handling.
  • 'psutil' and 'matplotlib' for resource monitoring and visualization.
  • Performance considerations are likely a priority given the computational nature of tasks.
πŸ›‘οΈ Security
  • Security libraries like 'cryptography', 'argon2-cffi-bindings', and 'pydantic' for data validation and secure data handling.
  • Use of 'docker' also suggests an isolated environment which enhances security.
  • Web security features indicated by 'urllib3' and 'requests' for safe HTTP requests.

πŸ“ Project Structure

└── knowledge-extract/
    β”œβ”€β”€ README.md
    β”œβ”€β”€ client_backend
    β”‚   β”œβ”€β”€ Dockerfile
    β”‚   β”œβ”€β”€ README.md
    β”‚   β”œβ”€β”€ __pycache__
    β”‚   β”œβ”€β”€ conversation_custom.py
    β”‚   β”œβ”€β”€ main.py
    β”‚   β”œβ”€β”€ poetry.lock
    β”‚   β”œβ”€β”€ prompts.py
    β”‚   β”œβ”€β”€ pyproject.toml
    β”‚   └── test_ai.py
    β”œβ”€β”€ python_direct
    β”‚   └── run.py
    β”œβ”€β”€ requirements.txt
    └── vocode-web
        β”œβ”€β”€ README.md
        β”œβ”€β”€ config-overrides.js
        β”œβ”€β”€ package-lock.json
        β”œβ”€β”€ package.json
        β”œβ”€β”€ public
        └── src

πŸ“‚ Project Index

KNOWLEDGE-EXTRACT/
__root__
requirements.txt - The `requirements.txt` file serves as a critical component within the broader architecture of the project, primarily focusing on managing and documenting the dependencies required for the project's environment
- This file lists all the necessary Python packages and their specific versions to ensure that the project runs consistently across different setups by avoiding discrepancies caused by varying package versions
- It includes a variety of libraries, such as `aiohttp` for asynchronous HTTP networking, `anyio` which abstracts asynchronous features across different libraries, and specific dependencies that are tailored for performance and security like `argon2-cffi` for password hashing.

In essence, requirements.txt is pivotal for maintaining a stable development, testing, and production environment, facilitating reproducibility and compatibility in the project's lifecycle
- This file is typically used in conjunction with tools like pip to install the specified versions of packages, thereby aligning the project's software stack across different machines and deployments.

python_direct
run.py - Establishes a streaming conversation environment using external APIs for transcription, dialogue management, and speech synthesis
- It configures and initiates a real-time interactive session that integrates microphone input and speaker output, handling user interruptions and maintaining the flow of conversation until manually terminated.
vocode-web
package-lock.json - The file vocode-web/package-lock.json is a crucial component within the vocode-web project, primarily serving as a manifest for managing exact versions of npm dependencies and ensuring consistent installations across different environments
- This file locks down the versions of all packages and their dependencies, which is essential for maintaining the reliability and stability of the application across multiple development and production setups
- It includes dependencies critical for the project's front-end development, such as React and various testing libraries, which are integral for building and testing the user interface
- This setup supports the project's overall architecture by ensuring that all team members and deployment processes utilize precisely the same versions of each package, thereby avoiding discrepancies that can arise from version mismatches in a dynamically evolving ecosystem like npm.
config-overrides.js - Config-overrides.js customizes the Webpack configuration for the vocode-web project, ensuring compatibility and functionality enhancements
- It modifies the resolution fallbacks and integrates new plugins to support Buffer globally within the application
- This adjustment facilitates smoother integration of Node.js modules that rely on Buffer, enhancing the project's overall build process and runtime execution.
package.json - Defines the configuration and dependencies for the vocode-web project, setting up the environment for a React application
- It includes libraries for testing, document object model rendering, and markdown processing, ensuring compatibility across major browsers
- The scripts section provides commands for starting, building, testing, and ejecting the application.
src
ConversationComponent.js - ConversationComponent serves as the interactive interface for capturing and processing user conversations in real-time
- It utilizes web sockets for live communication, converts speech to text, and sends this data for server-side processing to generate readable markdown content, which is then displayed to the user.
index.css - Establishes the foundational styling for the web interface of the vocode-web project, setting universal font properties and smoothing for text and code elements across the application
- This CSS file ensures a consistent visual experience by standardizing typography and eliminating default browser styling.
App.css - App.css establishes the visual styling for the main application interface in the vocode-web project
- It centers text alignment, animates the logo, and sets the app header's background color, size, and font
- Additionally, it ensures accessibility with motion preferences and styles links for better visibility and interaction.
App.test.js - App.test.js serves as a unit test for the main application component in the vocode-web project
- It verifies the presence of a specific link within the App component, ensuring that the application's user interface renders expected elements correctly, thereby supporting the reliability and stability of the user interface throughout the development lifecycle.
setupTests.js - Enhances testing capabilities within the vocode-web project by integrating custom Jest matchers from jest-dom
- This setup facilitates more expressive assertions on DOM nodes during testing, improving the clarity and efficiency of tests
- It supports developers in verifying UI components against expected text contents and other DOM properties, crucial for ensuring the reliability of the user interface.
App.js - App.js serves as the root component in the vocode-web project, orchestrating the user interface
- It integrates the ConversationComponent within the application's main layout, defined under a styled header, facilitating the primary interactive element of the web application
- This setup centralizes user interactions and visual structure in the project's architecture.
reportWebVitals.js - ReportWebVitals.js enhances the performance monitoring of the vocode-web application by dynamically importing and utilizing web vitals metrics such as CLS, FID, FCP, LCP, and TTFB
- These metrics help in assessing the user experience quality by measuring visible and interactive aspects of the application, contributing to its overall responsiveness and efficiency.
index.js - Serves as the entry point for the vocode-web application, initializing the React framework and rendering the main App component within a strict mode context
- It also includes styling through index.css and sets up performance monitoring with reportWebVitals, facilitating potential enhancements based on user interaction metrics.
public
index.html - Serves as the entry point for the web application, initializing the user interface by loading essential resources like icons, metadata, and the manifest for app installation
- It sets up the environment for the React application to mount, ensuring compatibility and responsiveness across various devices.
manifest.json - Manifest.json configures the web application's appearance on mobile devices, defining essential properties such as app names, icons, and theme colors
- It ensures the app can be installed on home screens with specific icons and colors, enhancing user experience and brand consistency across devices
- This setup supports the project's broader goal of providing a seamless, standalone user interface.
robots.txt - Manages web crawler access for the vocode-web project, specifically within the public directory
- By setting no restrictions in the robots.txt, it allows all search engines to index all content, enhancing the site's visibility and searchability on the internet
- This approach supports optimal SEO practices for the website.
client_backend
conversation_custom.py - Manages real-time conversations through a WebSocket interface, integrating transcription and synthesis services
- It initializes conversation components, handles incoming audio, and manages events related to transcripts, ensuring seamless interaction between the user and the system via audio streams.
test_ai.py - Test_summarize_conversation in client_backend/test_ai.py validates the functionality of summarizing conversations within the application
- It uses a series of mock messages between a user and an assistant to ensure the summarize_conversation function accurately condenses and reflects the interaction's essence, crucial for enhancing user experience and system reliability.
main.py - Centralizes backend functionalities for a FastAPI application, handling conversation processing and transcript summarization into blog posts
- It integrates with OpenAI's GPT models and Azure's text-to-speech services, facilitating dynamic conversation handling and audio synthesis, while ensuring cross-origin resource sharing via middleware configuration.
pyproject.toml - Defines the configuration for the client-backend component, specifying its dependencies and build system
- It sets the Python version, integrates external libraries like python-dotenv, vocode, and elevenlabs, and utilizes Poetry for package management
- This setup ensures the backend aligns with the required libraries and Python standards for seamless integration and deployment within the project's architecture.
prompts.py - Client_backend/prompts.py defines interactive templates for generating blog content and conducting interviews within the application
- It includes a blog generator for creating Markdown-formatted articles from transcripts and prompts for guiding deep, insightful interviews, enhancing the user engagement and content creation process in the project.
Dockerfile - Establishes the environment for the client backend by setting up a Docker container with Python and necessary audio libraries
- It installs dependencies, configures a non-virtual environment for Python packages, and prepares the main application to run on a server using Uvicorn, exposing it on port 3000.


πŸš€ Getting Started

β˜‘οΈ Prerequisites

Before getting started with knowledge-extract, ensure your runtime environment meets the following requirements:

  • Programming Language: JavaScript
  • Package Manager: Pip, Npm, Poetry
  • Container Runtime: Docker

βš™οΈ Installation

Install knowledge-extract using one of the following methods:

Build from source:

  1. Clone the knowledge-extract repository:
❯ git clone https://github.com/sandeepsalwan1/knowledge-extract
  1. Navigate to the project directory:
❯ cd knowledge-extract
  1. Install the project dependencies:

Using pip Β 

❯ echo 'INSERT-INSTALL-COMMAND-HERE'

Using npm Β 

❯ npm install

Using poetry Β 

❯ echo 'INSERT-INSTALL-COMMAND-HERE'

Using docker Β 

❯ docker build -t sandeepsalwan1/knowledge-extract .

πŸ€– Usage

Run knowledge-extract using the following command: Using pip Β 

❯ echo 'INSERT-RUN-COMMAND-HERE'

Using npm Β 

❯ npm start

Using poetry Β 

❯ echo 'INSERT-RUN-COMMAND-HERE'

Using docker Β 

❯ docker run -it {image_name}

πŸ§ͺ Testing

Run the test suite using the following command: Using pip Β 

❯ echo 'INSERT-TEST-COMMAND-HERE'

Using npm Β 

❯ npm test

Using poetry Β 

❯ echo 'INSERT-TEST-COMMAND-HERE'

πŸ“Œ Project Roadmap

  • Task 1: Implement login.
  • Task 2: Implement More agents.
  • Task 3: Make more userfriendly.

πŸ”° Contributing

Contributing Guidelines
  1. Fork the Repository: Start by forking the project repository to your github account.
  2. Clone Locally: Clone the forked repository to your local machine using a git client.
    git clone https://github.com/sandeepsalwan1/knowledge-extract
  3. Create a New Branch: Always work on a new branch, giving it a descriptive name.
    git checkout -b new-feature-x
  4. Make Your Changes: Develop and test your changes locally.
  5. Commit Your Changes: Commit with a clear message describing your updates.
    git commit -m 'Implemented new feature x.'
  6. Push to github: Push the changes to your forked repository.
    git push origin new-feature-x
  7. Submit a Pull Request: Create a PR against the original project repository. Clearly describe the changes and their motivations.
  8. Review: Once your PR is reviewed and approved, it will be merged into the main branch. Congratulations on your contribution!
Contributor Graph


πŸŽ— License

This project is released under the MIT License. For more details, please refer to the LICENSE file.