Skip to content

dinhieufam/FinLove

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

57 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FinLove — Intelligent Portfolio Construction Platform

Authors: Nguyen Van Duy Anh · Pham Dinh Hieu · Cao Pham Minh Dang · Tran Anh Chuong · Ngo Dinh Khanh
GitHub: https://github.com/dinhieufam/FinLove


Overview

FinLove is a comprehensive portfolio construction and analysis platform that integrates advanced quantitative finance methodologies with modern machine learning techniques. The system provides intelligent investment planning, comprehensive risk analysis, forward-looking predictions, and AI-powered explanations to support informed investment decision-making.

The platform is built as a full-stack web application with a Next.js frontend and FastAPI backend, providing a modern, responsive user interface. It combines sophisticated risk models, multiple optimization strategies, realistic backtesting capabilities, and state-of-the-art forecasting methods to deliver actionable portfolio insights. FinLove is designed for both individual investors seeking professional-grade portfolio management tools and financial professionals requiring robust analytical capabilities.

Additionally, a Streamlit dashboard is available as an alternative interface for quick prototyping and analysis.


Key Features

💰 Investment Plan

The Investment Plan feature transforms optimized portfolio weights into actionable dollar-based allocation strategies. Users can specify their total investment capital, and the system automatically calculates precise dollar allocations for each asset based on the optimized weights generated by the portfolio construction engine.

Capabilities:

  • Convert optimized portfolio weights to dollar allocations
  • Support for customizable investment amounts
  • Real-time allocation updates based on portfolio optimization results
  • Visual representation of capital deployment across assets
  • Detailed allocation tables with percentage and dollar breakdowns

🔍 Analyze

The Analyze module provides comprehensive portfolio performance and risk assessment through advanced statistical analysis and visualization. It leverages multiple risk models and performance metrics to deliver deep insights into portfolio behavior.

Capabilities:

  • Performance Analysis: Cumulative returns, rolling Sharpe ratios, drawdown analysis, and benchmark comparisons
  • Risk Assessment: Value at Risk (VaR), Conditional VaR (CVaR), volatility analysis, and correlation matrices
  • Portfolio Composition: Current allocation visualization, weight evolution over time, and concentration analysis
  • Statistical Metrics: Annualized returns, volatility, Sharpe ratio, maximum drawdown, turnover, and weight stability
  • Risk Model Diagnostics: Comprehensive analysis using multiple risk estimation methods (Ledoit-Wolf, GLASSO, GARCH, DCC)

🔮 Prediction

The Prediction system employs ensemble forecasting methodologies to project future portfolio performance. The system evaluates multiple model combinations, selects top-performing strategies, and aggregates predictions to generate robust forward-looking estimates.

Capabilities:

  • Model Selection: Automatic evaluation of 20 model combinations (5 optimization methods × 4 risk models)
  • Ensemble Forecasting: Combines predictions from top-performing models using multiple time series methods
  • Forecasting Methods: Supports ARIMA, Prophet, LSTM, Exponential Smoothing, and Moving Average approaches
  • Future Returns Projection: Forecasts portfolio returns over user-specified horizons (e.g., 30 days, 90 days)
  • Confidence Intervals: Provides uncertainty estimates for predictions
  • Performance-Based Selection: Ranks and selects models based on historical performance metrics

🤖 LLM to Explain

The LLM Explanation feature leverages large language models to provide natural language interpretations of portfolio analysis results. The system uses Retrieval-Augmented Generation (RAG) to ground explanations in actual portfolio data, ensuring accurate and contextually relevant insights.

Capabilities:

  • AI-Powered Insights: Natural language explanations of portfolio performance, risk metrics, and allocation decisions
  • Context-Aware Responses: RAG system retrieves relevant portfolio data to provide accurate, data-grounded explanations
  • Interactive Q&A: Users can ask questions about their portfolio and receive detailed, contextual answers
  • Chart Explanations: Automatic generation of explanations for visualizations and performance charts
  • Educational Guidance: Provides investment education and interpretation guidance without making specific investment recommendations
  • Multi-Model Support: Supports multiple LLM providers (OpenAI GPT models, Google Gemini) with automatic fallback mechanisms

Installation

Prerequisites

  • Python 3.8 or higher
  • Node.js 18+ and npm
  • pip package manager

Step 1: Install Backend Dependencies

# Install Python dependencies
pip install -r requirements.txt

# Install backend-specific dependencies
cd web/backend
pip install -r requirements.txt
cd ../..

Step 2: Set Up API Keys and Environment Variables

FinLove requires API keys for LLM-powered features (such as Gemini) and uses environment variables for configuration.

  1. Set up environment variables for the backend API:

    • Copy the example environment file:

      cp web/backend/env.example.txt web/backend/.env
    • Edit web/backend/.env to add your own credentials and API keys.
      Example contents (see web/backend/env.example.txt for full list):

      # Multiple API keys (comma-separated, for rotation)
      GEMINI_API_KEYS=your-gemini-api-key-here
      
      # Multiple models (comma-separated)
      GEMINI_MODELS=gemini-2.5-pro
      
      # Embedding model for vector search
      HUGGINGFACE_EMBEDDING_MODEL=all-MiniLM-L6-v2
      
    • Note: You must provide at least one valid GEMINI API key for LLM explanations and chatbot features.

Step 3: Install Frontend Dependencies

cd web/frontend
npm install
cd ../..

Step 4: (Optional) Pre-download Data

For improved performance, pre-download financial datasets:

python scripts/download_data.py

See DATA.md for detailed information about data download and caching mechanisms.

Step 5: Run the Web Application

The main FinLove platform is a full-stack web application with Next.js frontend and FastAPI backend:

Terminal 1 - Start Backend:

cd web/backend
python app.py

The backend API will run on http://localhost:8000

Terminal 2 - Start Frontend:

cd web/frontend
npm run dev

The web application will be available at http://localhost:3000

The frontend automatically proxies API requests to the backend, so you only need to access http://localhost:3000 in your browser.


Quick Start Guide

Using the Web Application

  1. Start the Application

    # Terminal 1: Start backend
    cd web/backend
    python app.py
    
    # Terminal 2: Start frontend
    cd web/frontend
    npm run dev
  2. Access the Application

    • Open http://localhost:3000 in your web browser
    • The frontend will automatically connect to the backend API
  3. Select Assets

    • Enter company tickers separated by commas (e.g., AAPL,MSFT,GOOGL)
    • Or use the default sector ETFs option
  4. Configure Analysis Parameters

    • Date Range: Select start and end dates for historical analysis
    • Portfolio Objective: Choose optimization method (Markowitz, Sharpe, CVaR, etc.)
    • Risk Engine: Select risk model (Ledoit-Wolf recommended for stability)
    • Investment Capital: Specify total capital amount for investment plan
    • Risk Appetite: Adjust risk aversion parameter
    • Testing Style: Choose between simple or walk-forward backtesting
  5. Run Analysis

    • Click "Run Analysis" to execute portfolio optimization
    • Explore results across the main features:
      • 💰 Investment Plan: Dollar-based allocation strategy
      • 🔍 Analyze: Comprehensive performance and risk analysis
      • 🔮 Prediction: Future portfolio performance forecasts
      • 🤖 LLM Explanations: AI-powered Q&A about your portfolio
  6. Use LLM Explanations

    • The web application includes an integrated chatbot for asking questions about your portfolio
    • The RAG system provides context-aware answers based on your portfolio data

Alternative: Using the Streamlit Dashboard

For a simpler interface, you can use the Streamlit dashboard:

  1. Launch the Dashboard

    streamlit run dashboard/app.py
  2. Follow similar steps as above, but using the Streamlit interface at http://localhost:8501

Using the Prediction System Programmatically

from src.predict import predict_future_performance

# Run complete prediction pipeline
results = predict_future_performance(
    tickers=['AAPL', 'MSFT', 'GOOGL'],
    start_date="2015-01-01",
    end_date="2024-01-01",
    forecast_horizon=30,  # Forecast next 30 days
    forecast_method='ensemble',  # Use ensemble forecasting
    use_top_models=5  # Use top 5 performing models
)

# Access aggregated prediction
future_returns = results['aggregated_prediction']
print(f"Expected daily return: {future_returns.mean()*100:.4f}%")

Project Structure

FinLove/
├── src/                    # Core business logic and shared modules
│   ├── data.py            # Data acquisition and preprocessing
│   ├── risk.py            # Risk models (Ledoit-Wolf, GLASSO, GARCH, DCC)
│   ├── optimize.py        # Optimization methods (Markowitz, BL, CVaR, etc.)
│   ├── backtest.py        # Backtesting engine
│   ├── metrics.py         # Performance metrics calculation
│   ├── forecast.py        # Time series forecasting methods
│   ├── predict.py         # Prediction pipeline orchestration
│   └── model_collector.py # Model evaluation and selection
├── web/                   # Full-stack web application
│   ├── backend/          # FastAPI backend
│   │   ├── app.py        # API server entry point
│   │   ├── routers/      # API route definitions
│   │   └── src/          # Backend-specific modules (RAG system)
│   └── frontend/         # Next.js frontend
│       ├── app/          # Application pages and routing
│       └── components/   # Reusable UI components
├── scripts/               # Utility scripts
│   ├── download_data.py  # Data pre-download script
│   ├── example_prediction.py  # Example prediction usage
│   └── train_all_models.py  # Model training script
├── report_generation/     # Report generation utilities
│   ├── convert.py        # Markdown to PDF converter
│   └── report.md         # Project report (markdown)
├── models/                # Saved machine learning models
├── data/                  # Raw and processed data storage
├── data_cache/            # Cached financial data
├── evaluation/            # Model evaluation notebooks
├── requirements.txt       # Python dependencies
├── README.md             # This file

Technical Architecture

Risk Models

  • Ledoit-Wolf Shrinkage: Reduces estimation error by shrinking sample covariance matrix toward a structured target, improving stability in high-dimensional settings
  • Graphical LASSO (GLASSO): Estimates sparse precision matrix using L1 regularization, useful for identifying conditional independence relationships
  • GARCH(1,1): Models time-varying volatility per asset, capturing volatility clustering and heteroskedasticity
  • DCC (Dynamic Conditional Correlation): Estimates time-varying correlation structure, allowing for dynamic relationships between assets

Optimization Methods

  • Markowitz Mean-Variance: Maximizes expected return minus risk penalty: μ'w - (λ/2) * w'Σw
  • Minimum Variance: Minimizes portfolio variance subject to constraints
  • Sharpe Maximization: Maximizes risk-adjusted returns: (μ'w - rf) / sqrt(w'Σw)
  • Black-Litterman: Combines market equilibrium returns with investor views for more stable optimization
  • CVaR Optimization: Minimizes Conditional Value at Risk, focusing on tail risk management

Backtesting Framework

  • Simple Backtest: One-time optimization using all available historical data
  • Walk-Forward Backtest: Rolling window approach with training and testing periods for realistic performance evaluation
  • Transaction Costs: Incorporates proportional transaction costs per rebalancing event
  • Rebalance Bands: Implements drift-based rebalancing to reduce unnecessary turnover

Forecasting System

  • ARIMA/SARIMA: Statistical time series models for trend and seasonality

LLM Integration

  • RAG Architecture: Retrieval-Augmented Generation system for context-aware responses
  • Multi-Provider Support: Compatible with OpenAI GPT models and Google Gemini
  • Automatic Fallback: Handles API quota limits and errors with graceful degradation
  • Portfolio Context: Embeds portfolio-specific data for accurate, grounded explanations

Usage Examples

Example 1: Investment Plan Generation

from src.data import prepare_portfolio_data
from src.risk import get_covariance_matrix
from src.optimize import optimize_portfolio

# Prepare data
tickers = ['AAPL', 'MSFT', 'GOOGL']
returns, prices = prepare_portfolio_data(tickers, start_date="2020-01-01")

# Estimate covariance
covariance = get_covariance_matrix(returns, method='ledoit_wolf')

# Optimize portfolio
weights = optimize_portfolio(
    returns,
    covariance,
    method='markowitz',
    constraints={'long_only': True},
    risk_aversion=1.0
)

# Convert to dollar allocation
investment_amount = 100000  # $100,000
dollar_allocation = weights * investment_amount
print(dollar_allocation)

Example 2: Comprehensive Analysis

from src.backtest import walk_forward_backtest
from src.metrics import calculate_all_metrics

# Run walk-forward backtest
portfolio_returns, weights_history, metrics = walk_forward_backtest(
    returns,
    train_window=36,  # 36 months training
    test_window=1,    # 1 month testing
    optimization_method='markowitz',
    risk_model='ledoit_wolf',
    transaction_cost=0.001,  # 0.1% transaction cost
    rebalance_band=0.05  # 5% rebalance band
)

# Calculate comprehensive metrics
all_metrics = calculate_all_metrics(portfolio_returns, weights_history)
print(f"Sharpe Ratio: {all_metrics['sharpe_ratio']:.3f}")
print(f"Annualized Return: {all_metrics['annualized_return']*100:.2f}%")
print(f"Max Drawdown: {all_metrics['max_drawdown']*100:.2f}%")

Example 3: Future Performance Prediction

from src.predict import predict_future_performance

# Generate predictions
results = predict_future_performance(
    tickers=['AAPL', 'MSFT', 'GOOGL'],
    start_date="2015-01-01",
    end_date="2024-01-01",
    forecast_horizon=30,
    forecast_method='ensemble',
    use_top_models=5
)

# Access results
future_returns = results['aggregated_prediction']
top_models = results['top_models']

print(f"Expected 30-day return: {(1 + future_returns).prod() - 1:.2%}")
print(f"Top models used: {list(top_models['model_id'])}")

Dependencies

Core Dependencies

  • Data Processing: numpy>=1.24.0, pandas>=2.0.0, scipy>=1.10.0
  • Financial Data: yfinance>=0.2.28
  • Optimization: cvxpy>=1.3.0, scikit-learn>=1.3.0
  • Risk Models: arch>=6.2.0 (for GARCH/DCC)

Visualization

  • matplotlib>=3.7.0
  • seaborn>=0.12.0
  • plotly>=5.14.0

Web Application

  • Backend: FastAPI (included in requirements)
  • Frontend: Next.js, React, TypeScript (see web/frontend/package.json)

Alternative Dashboard

  • streamlit>=1.28.0 (for Streamlit dashboard alternative)

AI/ML (Optional)

  • openai>=1.12.0 (for LLM explanations)
  • statsmodels>=0.14.0 (for ARIMA forecasting)
  • prophet>=1.1.4 (for Prophet forecasting)
  • tensorflow>=2.13.0 (for LSTM models)
  • xgboost>=2.0.0 (for XGBoost forecasting)

Report Generation (Optional)

  • markdown>=3.4.0 (for markdown to HTML conversion)
  • weasyprint>=60.0 (for HTML to PDF conversion)
  • pypandoc>=1.12 (optional, for better LaTeX support in PDFs - requires pandoc binary)

See requirements.txt for the complete list of dependencies.


Data Sources

  • Primary Source: Yahoo Finance via yfinance library
  • Default Universe: 11 liquid Sector ETFs (XLK, XLF, XLV, XLY, XLP, XLE, XLI, XLB, XLU, XLRE, XLC)
  • Data Types: Historical prices (OHLCV), company fundamentals, market data
  • Frequency: Daily data
  • Caching: Automatic 24-hour cache for improved performance

For detailed information about data management, see DATA.md.


Contributors

This project is the result of collaborative effort by the following team members:

  • Cao Pham Minh Dang: Prediction models and forecasting system
  • Tran Anh Chuong: Data cleaning, exploratory data analysis (EDA), and dashboard development
  • Pham Dinh Hieu: LLM integration, RAG system, and dashboard features
  • Nguyen Van Duy Anh: Risk models and optimization algorithms
  • Ngo Dinh Khanh: Dashboard development and landing page design

Project Advisors

  • Nguyen Huy Hung: Project Advisor
  • Dr. Mo El-Haj: Project Advisor

Best Practices

  1. Data Quality: Ensure at least 2-3 years of historical data for reliable analysis
  2. Risk Models: Ledoit-Wolf is recommended for most use cases due to its stability
  3. Transaction Costs: Include realistic transaction costs (0.1-0.5% for stocks) in backtesting
  4. Walk-Forward Testing: Use walk-forward backtesting for more realistic performance estimates
  5. Model Selection: The prediction system automatically selects top-performing models, but review the selection criteria
  6. LLM API Keys: Store API keys securely and be aware of usage quotas when using LLM explanations

Troubleshooting

Common Issues

"Insufficient data" Error

  • Verify that tickers are valid and have sufficient historical data for the selected date range
  • Try adjusting the date range or selecting different tickers

"No valid data after cleaning" Error

  • Some tickers may have excessive missing values
  • Remove problematic tickers or use a shorter date range

Slow Performance

  • Reduce the number of tickers in the portfolio
  • Use shorter date ranges for analysis
  • Prefer simpler risk models (sample or ledoit_wolf) for faster computation
  • Pre-download data using scripts/download_data.py

Web Application Setup Issues

  • Ensure both backend and frontend are running in separate terminals
  • Verify backend is accessible at http://localhost:8000 (check /health endpoint)
  • Check that frontend can connect to backend (check browser console for API errors)
  • Ensure Node.js 18+ is installed for the frontend
  • Run npm install in web/frontend if dependencies are missing

LLM API Errors

  • Verify API key is valid and has sufficient quota
  • Check network connectivity
  • The system includes automatic fallback mechanisms for quota errors

Project Status

Production Ready - All core features implemented and tested

Completed Features:

  • ✅ Multiple risk models (Ledoit-Wolf, GLASSO, GARCH, DCC)
  • ✅ Various optimization methods (Markowitz, Black-Litterman, CVaR, Minimum Variance, Sharpe)
  • ✅ Realistic backtesting with transaction costs and rebalance bands
  • ✅ Investment plan generation with dollar allocations
  • ✅ Comprehensive analysis and visualization
  • ✅ Prediction system with ensemble forecasting
  • ✅ LLM-powered explanations with RAG architecture
  • ✅ Full-stack web application (Next.js frontend + FastAPI backend)
  • ✅ Alternative Streamlit dashboard for quick prototyping
  • ✅ Data caching system for performance optimization
  • ✅ Comprehensive documentation

License

See LICENSE file for details.


Support and Contributions

For issues, questions, or contributions:


Disclaimer

This software is provided for educational and research purposes. The predictions, analyses, and recommendations generated by this system should not be considered as financial advice. Users should conduct their own due diligence and consult with qualified financial professionals before making investment decisions. Past performance does not guarantee future results.

About

An Intelligence Portfolio Construction Platform

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 5