Bilingual Abusive Text Detection Engine

A content moderation engine designed to maintain professional standards in our tutor-finding platform. Supports both English and Indonesian languages.

Overview

This engine provides enterprise-grade content moderation capabilities:

Real-time detection of inappropriate content
Multilingual support (English and Indonesian)
High-precision text classification
Scalable architecture for production deployment
Production-ready API endpoints
Comprehensive batch processing capabilities

Key Features

Content Analysis

Multi-language support (EN/ID)
Character substitution detection
Context-aware classification
Pattern recognition for evasion attempts
Real-time content validation

Technical Capabilities

Low-latency response times (<100ms)
High-throughput batch processing
Scalable worker configuration
Configurable confidence thresholds
Comprehensive logging and monitoring

Technical Stack

Runtime: Python 3.11+
Framework: FastAPI
Server: Granian (High-performance ASGI server)
ML Framework: TensorFlow 2.x
Package Management: UV

Installation

Using Docker (Recommended)

# Build the image
docker build -t abusive-detection:latest .

# Run the container
docker run -d -p 8000:8000 abusive-detection:latest

Manual Installation

# Install dependencies
uv sync

# Start the server
granian web.main:app \
    --host 0.0.0.0 \
    --port 8000 \
    --interface asgi \
    --workers $(nproc)

Configuration

Environment Variables

Variable	Description	Default
`GRANIAN_HOST`	Server host	`0.0.0.0`
`GRANIAN_PORT`	Server port	`8000`
`GRANIAN_WORKERS_PER_CORE`	Workers per CPU core	`2`
`GRANIAN_MAX_WORKERS`	Maximum worker limit	`32`
`GRANIAN_LOG_LEVEL`	Logging verbosity	`info`

API Reference

Single Text Analysis

POST /predict
Content-Type: application/json

{
    "text": "Content to analyze"
}

Batch Analysis

POST /predict_batch
Content-Type: application/json

{
    "texts": [
        "First content to analyze",
        "Second content to analyze"
    ]
}

Response Schema

interface PredictionResponse {
  text: string;
  probability: float; // Range: 0-1
  is_abusive: boolean;
  confidence: float; // Range: 0-1
  early_detection: boolean;
  matched_words: string[];
}

Example Response

{
  "text": "Sample text for analysis",
  "probability": 0.12,
  "is_abusive": false,
  "confidence": 0.88,
  "early_detection": false,
  "matched_words": []
}

Model Training

Datasets

English: Hate Speech and Offensive Language Detection
Indonesian: Indonesian Abusive and Hate Speech Twitter Text

Health Monitoring

GET /health

Returns service health status and basic metrics.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
data		data
models		models
notebook		notebook
web		web
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
architecture.jpg		architecture.jpg
cloudbuild.yaml		cloudbuild.yaml
dockerfile		dockerfile
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bilingual Abusive Text Detection Engine

Overview

Key Features

Content Analysis

Technical Capabilities

Technical Stack

Installation

Using Docker (Recommended)

Manual Installation

Configuration

Environment Variables

API Reference

Single Text Analysis

Batch Analysis

Response Schema

Example Response

Model Training

Datasets

Health Monitoring

About

Contributors 3

Languages

Tutortoise/bilingual-abusive-detection-service

Folders and files

Latest commit

History

Repository files navigation

Bilingual Abusive Text Detection Engine

Overview

Key Features

Content Analysis

Technical Capabilities

Technical Stack

Installation

Using Docker (Recommended)

Manual Installation

Configuration

Environment Variables

API Reference

Single Text Analysis

Batch Analysis

Response Schema

Example Response

Model Training

Datasets

Health Monitoring

About

Topics

Resources

Stars

Watchers

Forks

Contributors 3

Languages