Network Security — Phishing Detection

A modular Python project that demonstrates an end-to-end machine learning pipeline for detecting phishing/malicious network entries and exposes a minimal HTTP API for running training and making batch predictions.

This README gives safe, high-level instructions for setup, running, and contribution without including any secrets or sensitive information.

Features

Modular training pipeline: data ingestion → validation → transformation → model training → evaluation
Training uses scikit-learn models and utilities
Lightweight HTTP API (FastAPI) with endpoints for:
- triggering training
- uploading a CSV and receiving predictions
Artifacts (models, preprocessors, reports) are saved locally to configurable artifact directories
Dockerfile included for containerized execution (development-friendly)

Security note

This repository must never contain plaintext credentials, connection strings, API keys, or other secrets. Always keep secrets out of source control:

Use a .env file (listed in .gitignore) or a secrets manager to store credentials.
Add a .env.example with placeholder variables (no real secrets).
If a secret was accidentally committed, rotate/revoke it immediately and remove it from the repository history (see "Removing secrets from history" below).

Do NOT commit any files containing real credentials. Audit the repository and working tree before pushing.

Prerequisites

Python 3.9 or newer (project tested with 3.10)
pip
Optional: Docker (for containerized runs)
Optional: MongoDB or other data source if ingestion reads from a database (set connection string via environment variables)

Quickstart (local)

Clone the project and change into the directory:
```
git clone <your-repo-url>
cd <project-root>
```

Create and activate a virtual environment:

macOS / Linux:

python -m venv .venv
source .venv/bin/activate

Windows (PowerShell):

python -m venv .venv
.\.venv\Scripts\Activate.ps1

Install requirements:
```
pip install -r requirements.txt
```
Create a .env file in the project root with required variables (example .env.example is provided with placeholders). Example variables (placeholders only):
```
# .env (example)
MONGODB_URL="mongodb+srv://<username>:<password>@<host>/"
OTHER_SECRET=""
```
- Never paste real credentials into a file you will commit.
- Add .env to .gitignore.

Run the API (development):

python app.py

or (recommended) run with uvicorn:

uvicorn app:app --host 0.0.0.0 --port 8001 --reload

Visit the API docs (Swagger UI) at:
```
http://localhost:8001/docs
```

API

GET /train
- Triggers the training pipeline end-to-end.
- Useful for ad-hoc retraining in development.
POST /predict
- Accepts a CSV file upload and returns predictions (or a viewable table).
- The CSV must follow the same column/features used during training. See "Input data format" below.

The exact endpoint behavior and accepted file schema can be inspected in the API docs at /docs when the server is running.

Input data format (predict endpoint)

Provide a CSV with the same feature columns used at training time. If you are not sure which features are required:

Inspect the training data schema in Artifacts/ or the data ingestion code to learn expected column names.
Alternatively, run a small pipeline locally on a known sample dataset to check expected columns.

Example CSV structure (placeholder names — replace with actual feature names used in your project):

feature_1,feature_2,feature_3,...
value,value,value,...

Do not include target/label column when sending data for prediction unless the endpoint explicitly expects it.

Artifacts and models

Trained models and preprocessors are saved to configured artifact directories (e.g., Artifacts/, final_model/, or saved_models/ depending on config).
Ensure these directories are writable by the process running training or prediction.

Consider adopting a model registry or experiment tracking (e.g., MLflow) for production-grade workflows and model versioning.

Running with Docker (development)

Build the image:
```
docker build -t network_security:dev .
```
Run the container (pass environment variables at runtime — do not bake secrets into the image):
```
docker run -e MONGODB_URL="your_connection_string_here" -p 8001:8001 network_security:dev
```

Notes:

For production, run uvicorn with workers and use a process manager/reverse proxy.
Prefer multi-stage builds and a non-root user for production images.
Do not store secrets in images.

Logging, tests, and CI

Add structured logging and rotate logs for long-running services.
Add automated tests (pytest) for key components:
- unit tests for utilities (save/load, evaluation)
- integration tests for data ingestion and model trainer
Add CI (e.g., GitHub Actions) to run linters, tests, and build images on push.

Removing secrets from history (if needed)

If a credential is accidentally committed, take these steps:

Revoke/rotate the leaked credential immediately.
Remove the secret from all commits using a history-rewriting tool such as:
- BFG Repo-Cleaner
- git-filter-repo

Example (BFG, local use — do not publish credentials anywhere):

# Example only — DO NOT run with real credentials printed here
bfg --delete-files YOUR_FILE_CONTAINING_SECRET
git reflog expire --expire=now --all && git gc --prune=now --aggressive

Contributing

Contributions are welcome. Good first steps:

Open issues for bugs and enhancements.
Add tests for any new feature or bugfix.
Follow the repository coding style and add relevant documentation for any change.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.github/workflows		.github/workflows
networksecurity		networksecurity
templates		templates
.env		.env
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
app.py		app.py
aws_credential_check.sh		aws_credential_check.sh
main.py		main.py
push_data.py		push_data.py
requirements.txt		requirements.txt
setup.py		setup.py
testmonogdb.py		testmonogdb.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Network Security — Phishing Detection

Features

Security note

Prerequisites

Quickstart (local)

API

Input data format (predict endpoint)

Artifacts and models

Running with Docker (development)

Logging, tests, and CI

Removing secrets from history (if needed)

Contributing

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Vivek-120604/network_security

Folders and files

Latest commit

History

Repository files navigation

Network Security — Phishing Detection

Features

Security note

Prerequisites

Quickstart (local)

API

Input data format (predict endpoint)

Artifacts and models

Running with Docker (development)

Logging, tests, and CI

Removing secrets from history (if needed)

Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages