Mailtrace is a command-line tool for tracing emails via SSH or OpenSearch.
For development within the mail-analyzer project:
# Enter Nix development shell from project root
cd /path/to/mail-analyzer
nix develop .#mailtrace
# Or with direnv (automatic activation)
cd rca-agent/external-tools/mailtrace
direnv allow
# Install dependencies
uv syncFor standalone installation:
$ pip install mailtraceYou can copy the example configuration file from the repository:
$ cp config.yaml.sample ~/.config/mailtrace.yaml
mailtrace run \
-c ~/.config/mailtrace.yaml \
-h mail.example.com \
-k user@example.com \
--time "2025-07-21 10:00:00" \
--time-range 10h
You can specify the following parameters on the command line:
-c: Path to the configuration file.-h: Hostname of the mail server to begin tracing.-k: Keyword to search for, such as an email address.--time: The central time for the trace.--time-range: The duration to search before and after the central time. For example, if--timeis "10:00" and--time-rangeis "1h", the search will cover from 9:00 to 11:00.
Password-related options are also available:
--login-pass: Password for SSH login authentication.--sudo-pass: Password for sudo authentication.--opensearch-pass: Password for OpenSearch authentication.
To help prevent password leakage, you can use the following flags to enter passwords interactively at the prompt: --ask-login-pass, --ask-sudo-pass, --ask-opensearch-pass.
The tracing command continuously queries logs from OpenSearch and generates distributed traces sent to an OpenTelemetry endpoint:
mailtrace tracing \
-c ~/.config/mailtrace.yaml \
--otel-endpoint http://localhost:4317 \
--interval 60 \
--ask-opensearch-pass- Continuous monitoring: Automatically queries logs at regular intervals
- Message ID tracking: Uses message ID as trace ID to maintain email chains across multiple query cycles
- The message ID is hashed to generate a consistent 128-bit trace ID
- This ensures that logs from the same email, even if fetched at different times, belong to the same trace
- For example, if an email has 3 hops and logs from hops 1-2 are fetched in one cycle, and hop 3 in the next cycle, they will still be part of the same trace
- Queue ID tracking: Uses queue ID as span ID for consistent span identification
- Each queue ID is hashed to generate a consistent 64-bit span ID
- This allows spans to be properly correlated across different fetch cycles
- Multi-host support: Queries all hosts defined in the clusters configuration
- Automatic trace generation: Groups logs by message ID and generates OpenTelemetry traces
-c, --config-path: Path to configuration file--otel-endpoint: OpenTelemetry OTLP endpoint (default:http://localhost:4317)--interval: Interval in seconds between log queries (default:60)--opensearch-pass: OpenSearch password--ask-opensearch-pass: Prompt for OpenSearch password interactively
- Requires OpenSearch configuration in the config file
- If
methodis not set toopensearch, a warning will be logged but the command will still use the OpenSearch configuration - Make sure
opensearch_configsection is properly configured with host, credentials, index, and field mappings
- If
- Requires hosts to be configured in the
clusterssection - Install tracing dependencies:
uv sync --group tracing
- Set up your configuration with OpenSearch and clusters:
method: opensearch
clusters:
mail-cluster:
- smtp-relay-1.example.com
- smtp-relay-2.example.com
- mail-delivery.example.com
opensearch_config:
host: localhost
port: 9200
username: admin
index: mail-logs-*
# ... other opensearch settings- Start a Jaeger instance to receive traces:
docker run -d --name jaeger \
-p 4317:4317 \
-p 16686:16686 \
jaegertracing/all-in-one:latest- Run continuous tracing:
mailtrace tracing \
-c ~/.config/mailtrace.yaml \
--otel-endpoint http://localhost:4317 \
--interval 60 \
--ask-opensearch-pass- View traces in Jaeger UI at
http://localhost:16686
The tracer will continuously fetch logs every 60 seconds, group them by message ID to maintain complete email chains even if logs are split across multiple fetches, and send the generated traces to Jaeger.
The trace command automatically traces the complete mail flow and generates a Graphviz graph showing the routing path.
The generated graph uses a clean, topology-focused format:
- Nodes: Actual hostnames where mail was processed (e.g.,
smtp-relay-1.example.com) - Edges: Mail queue IDs showing the flow between hosts (e.g.,
ABC123) - Clusters: When you specify a cluster name with
-h, the graph starts from the actual physical host where the mail was first found
Generate a .dot file:
mailtrace graph \
-c ~/.config/mailtrace.yaml \
-h mail.example.com \
-k user@example.com \
--time "2025-07-21 10:00:00" \
--time-range 10h \
-o mail_trace.dotThen visualize with Graphviz:
dot -Tpng mail_trace.dot -o mail_trace.pngOmit the -o option to output the graph directly to stdout:
mailtrace graph \
-c ~/.config/mailtrace.yaml \
-h mail.example.com \
-k user@example.com \
--time "2025-07-21 10:00:00" \
--time-range 10hOr explicitly use -o - for stdout:
mailtrace graph \
-c ~/.config/mailtrace.yaml \
-h mail.example.com \
-k user@example.com \
--time "2025-07-21 10:00:00" \
--time-range 10h \
-o -You can pipe the output directly to Graphviz for instant visualization:
mailtrace graph \
-c ~/.config/mailtrace.yaml \
-h mail.example.com \
-k user@example.com \
--time "2025-07-21 10:00:00" \
--time-range 10h | dot -Tpng > mail_trace.pngOr to SVG for scalable graphics:
mailtrace graph \
-c ~/.config/mailtrace.yaml \
-h mail.example.com \
-k user@example.com \
--time "2025-07-21 10:00:00" \
--time-range 10h | dot -Tsvg > mail_trace.svgdigraph {
smtp-relay-1.example.com;
smtp-relay-2.example.com;
mail-delivery.example.com;
smtp-relay-1.example.com -> smtp-relay-2.example.com [key=0, label=8DCB211F769];
smtp-relay-2.example.com -> mail-delivery.example.com [key=1, label=9EF8A12BC3D];
}This shows mail with queue ID 8DCB211F769 flowing from smtp-relay-1 to smtp-relay-2, where it received new queue ID 9EF8A12BC3D before final delivery.
Mailtrace can also be used as a Python library in your own scripts:
#!/usr/bin/env python3
"""
Example: Using mailtrace as a library
"""
from mailtrace import (
load_config,
select_aggregator,
trace_mail_flow_to_file,
query_logs_by_keywords,
)
# Load configuration
config = load_config('config.yaml')
# Select the appropriate aggregator (SSHHost or OpenSearch)
aggregator_class = select_aggregator(config)
# Example 1: Trace mail flow and save to file
trace_mail_flow_to_file(
config=config,
aggregator_class=aggregator_class,
start_host='mail.example.com',
keywords=['user@example.com'],
time='2025-07-21 10:00:00',
time_range='10h',
output_file='mail_trace.dot' # Optional: omit or use None for stdout
)
# Example 2: Query logs by keywords only
logs_by_id = query_logs_by_keywords(
config=config,
aggregator_class=aggregator_class,
start_host='mail.example.com',
keywords=['user@example.com'],
time='2025-07-21 10:00:00',
time_range='10h'
)
# Process the results
for mail_id, (host, log_entries) in logs_by_id.items():
print(f"Mail ID: {mail_id} (from {host})")
for entry in log_entries:
print(f" {entry}")load_config(config_path)- Load configuration from a YAML fileselect_aggregator(config)- Select the appropriate aggregator class (SSHHost or OpenSearch) based on configtrace_mail_flow_to_file(config, aggregator_class, start_host, keywords, time, time_range, output_file=None)- Trace mail flow and output as Graphviz dot format (file or stdout)query_logs_by_keywords(config, aggregator_class, start_host, keywords, time, time_range)- Query logs and return mail IDs with their log entriestrace_mail_flow(trace_id, aggregator_class, config, host, graph)- Trace a specific mail ID and build a MailGraphMailGraph()- Create and manipulate mail flow graphsadd_hop(from_host, to_host, queue_id)- Add a mail hop between hoststo_dot(path=None)- Write graph to DOT format (file path or stdout if None)
The configuration file supports these parameters:
method: Tracing method, either "ssh" or "opensearch".log_level: Logging level, one of "DEBUG", "INFO", "WARNING", "ERROR", or "CRITICAL".ssh_config: Configuration for SSH tracing.opensearch_config: Configuration for OpenSearch tracing.clusters: Named groups of hosts for high availability scenarios.domain: Domain name for hostname resolution (optional).
Example ssh_config section:
ssh_config:
username: username
password: ""
private_key: /path/to/private.key
sudo_pass: ""
sudo: true
timeout: 10
ssh_config_file: ~/.ssh/config
host_config:
log_files:
- /var/log/mail.log
# log_parser: SyslogParser # Optional - auto-detects format by default
time_format: "%Y-%m-%dT%H:%M:%S"
hosts:
another.mailserver.example.com:
log_parser: Rfc3164Parser # Force BSD syslog format if needed
time_format: "%b %d %H:%M:%S"username: SSH username for authentication. Required.password: SSH password for authentication. Optional ifprivate_keyis provided. For security, it's recommended to provide this via the CLI using the--ask-login-passflag or theMAILTRACE_SSH_PASSWORDenvironment variable.private_key: Path to the SSH private key file. Optional ifpasswordis provided. Supports~expansion for home directory.sudo_pass: Password for sudo authentication when accessing logs. For security, it's recommended to provide this via the CLI using the--ask-sudo-passflag or theMAILTRACE_SUDO_PASSWORDenvironment variable.sudo: Whether to use sudo for reading log files (default:true).timeout: SSH connection timeout in seconds (default:10).ssh_config_file: Path to an SSH config file (e.g.,~/.ssh/configor a custom config file). Optional. When specified, paramiko will merge settings from this file with the above parameters. SSH config settings take precedence forhostname,user,port, andidentityfile. This is similar to using thessh -F ./my_ssh_configcommand.
-
host_config: Default settings applied to all hosts.log_files: List of log file paths to read (required).log_parser: Log parser for processing log files (default:SyslogParser). Available parsers:SyslogParser: Auto-detects RFC 3164 vs RFC 5424 format (recommended)Rfc5424Parser: Force RFC 5424 format (ISO 8601 timestamp:2025-01-01T10:00:00+08:00)Rfc3164Parser: Force RFC 3164 format (BSD syslog:Feb 1 10:00:00)
time_format: Time format string for parsing timestamps (default:"%Y-%m-%d %H:%M:%S"). Used for time-based filtering.
-
hosts: Host-specific configurations, overridinghost_configfor particular hosts. Uses the same format ashost_config.
If you use an SSH config file, you can centralize your SSH settings there. For example, in ~/.ssh/config:
Host mail1.example.com
User mailuser
Port 2222
IdentityFile ~/.ssh/id_rsa_mail
Host mail2.example.com
User mailuser
IdentityFile ~/.ssh/id_rsa_mail
Host jumphost
HostName jump.example.com
User jumpuser
Then in your mailtrace config.yaml:
ssh_config:
username: default_user
private_key: ~/.ssh/id_rsa
ssh_config_file: ~/.ssh/config
sudo_pass: "mypassword"
# ... rest of configWhen connecting to mail1.example.com, mailtrace will use the User (mailuser) and IdentityFile from the SSH config file, port 2222, etc. You don't need to duplicate these settings in config.yaml.
Loghost is a centralized logging server that collects logs from multiple servers and allows admins to access logs.
To use Mailtrace if you are using a loghost, you need to use SSH as log source and configure it:
- Set up an SSH config file that redirects SSH connections to mail servers to a loghost. For example, in
~/.ssh/config:
Host loghost
HostName logs.example.com
User loguser
IdentityFile ~/.ssh/id_rsa
Host mx.example.com
HostName loghost
Host mailer.example.com
HostName loghost
Host mailpolicy.example.com
HostName loghost
Host mailbox.example.com
HostName loghost
- In your
config.yaml, configure thelog_filesfields for each mail server:
ssh_config:
username: default_user
private_key: ~/.ssh/id_rsa
ssh_config_file: ~/.ssh/config
sudo_pass: "mypassword"
host_config:
log_files:
- /var/log/mail.log
# log_parser: SyslogParser # Optional - auto-detects format by default
time_format: "%Y-%m-%dT%H:%M:%S"
hosts:
mx.example.com:
log_files:
- /var/log/mx/mail.log
# log_parser and time_format inherited from host_config
mailer.example.com:
log_files:
- /var/log/mailer/mail.log
log_parser: Rfc3164Parser # Override if this host uses different format
time_format: "%b %d %H:%M:%S"
mailpolicy.example.com:
log_files:
- /var/log/mailpolicy/mail.log
mailbox.example.com:
log_files:
- /var/log/mailbox/mail.logExample opensearch_config section:
opensearch_config:
host: "localhost"
port: 9200
username: "admin"
password: ""
index: "mailtrace-logs-*"
use_ssl: true
verify_certs: false
time_zone: "+00:00"
timeout: 10
mapping:
facility: "log.syslog.facility.name"
hostname: "host.name"
message: "message"
timestamp: "@timestamp"
service: "log.syslog.appname"host: Hostname or IP address of the OpenSearch server. Required.port: Port number for OpenSearch (default:9200).username: OpenSearch username for authentication. Required.password: OpenSearch password for authentication. For security, it's recommended to provide this via the CLI using the--ask-opensearch-passflag or theMAILTRACE_OPENSEARCH_PASSWORDenvironment variable.index: Name of the OpenSearch index or index pattern for storing/querying logs (e.g.,mailtrace-logs-*). Required.use_ssl: Whether to use SSL/TLS for communication (default:false).verify_certs: Whether to verify SSL certificates (default:false). Set totruein production for security.time_zone: Timezone offset for log timestamps (default:"+00:00"). Used for time-based filtering.timeout: Connection timeout in seconds (default:10).
The mapping section allows you to specify how application fields map to OpenSearch fields:
facility: OpenSearch field for log facility (default:"log.syslog.facility.name").hostname: OpenSearch field for hostname (default:"host.name").message: OpenSearch field for log message (default:"message").timestamp: OpenSearch field for log timestamp (default:"@timestamp").service: OpenSearch field for service name (default:"log.syslog.appname").
This allows mailtrace to work with different OpenSearch index schemas. Customize these mappings based on your actual field names in OpenSearch.
You can define named clusters for high availability scenarios:
clusters:
mx-cluster-us:
- mx1.us.example.com
- mx2.us.example.com
- mx3.us.example.com
mx-cluster-eu:
- mx1.eu.example.com
- mx2.eu.example.comThen you can trace across a cluster by specifying the cluster name instead of individual hostnames.
pip install mailtraceTo use the continuous tracing feature, install with the tracing dependency group:
# Using uv
uv sync --group tracing
# Using pip (after cloning the repository)
pip install -e ".[tracing]"The tracing dependencies include:
opentelemetry-api: OpenTelemetry APIopentelemetry-sdk: OpenTelemetry SDKopentelemetry-exporter-otlp-proto-grpc: OTLP gRPC exporter for sending traces
The tracing command is organized into separate modules for maintainability:
models.py: Data models for email traces (EmailTraceclass)otel.py: OpenTelemetry setup and trace generation functions- OTLP exporter setup
- Tracer creation for hosts
- Trace ID and span ID generation from message/queue IDs
- Distributed trace span creation
query.py: Log querying and grouping logic- Query logs from all hosts in clusters
- Group logs by message ID to maintain email chains
continuous.py: Main continuous monitoring loop- Periodic log querying at configured intervals
- Trace generation and sending to OTLP endpoint
__init__.py: Module re-exports for public API
For security, sensitive information can be provided via environment variables instead of hardcoding in the config file:
MAILTRACE_CONFIG: Path to the configuration file (default:config.yaml).MAILTRACE_SSH_PASSWORD: SSH login password.MAILTRACE_SUDO_PASSWORD: Sudo authentication password.MAILTRACE_OPENSEARCH_PASSWORD: OpenSearch authentication password.
An aggregator can read the logs and find out the related ones. It then extracts information from the logs, including hostname, mail_id, etc.
With the information extracted, it can find out the next stop of the mail flow. The tracing is performed by the do_trace function in aggregator/__init__.py, the core of this tool.
Refer to the demo directory for a sample configuration and demo video.