一个完整的 AlphaFold2/ColabFold 离线预测与可视化流程演示,
支持单序列、MSA 输入,集成 PDB 3D 结构展示、PAE 热图分析 和 静态图像导出。
本项目是HFUT深度学习课程的Ai4Science的16组代码部分,成员包括:刘思思,柳霄,王学林,何艺超,刘明鑫,其中代码构建部分由何艺超完成。 本项目基于 ColabFold(AlphaFold2 的开源实现)构建了一个轻量级的蛋白质结构预测与可视化工作流,包含:
- ✅ 离线运行脚本:无需 Google Colab,可在本地或服务器上运行。
- ✅ 完整预测流程:从 FASTA 输入 → MSA 搜索 → 结构预测 → 结果输出。
- ✅ 交互式 3D 可视化:使用
nglview在 Jupyter Notebook 中实时查看预测结构。 - ✅ 自动保存 PNG 图像:生成高质量 PDB 和 PAE 静态图用于报告或论文。
| 模块 | 功能说明 | 关键技术 |
|---|---|---|
| run_single_sequence.sh | 单序列无 MSA 模式预测 | ColabFold + AlphaFold2-ptm |
| run_single_sequence_with_msa.sh | 使用预生成 MSA 进行预测 | MMseqs2 / a3m 文件输入 |
| colabfold_visualization.ipynb | 结果可视化核心脚本 | nglview, seaborn, matplotlib |
# 创建虚拟环境
conda create -n colabfold python=3.9
conda activate colabfold
# 安装关键包
pip install colabfold nglview matplotlib seaborn numpy jupyterlab# 下载 ColabFold 所需参数(models)
cd /home/u2024170925/Ai4Science-Demo/ColabFold_demo/databases
wget https://storage.googleapis.com/alphafold/alphafold_params_2022-12-06.tar
tar -xvf alphafold_params_2022-12-06.tar
# 下载 UniRef50 数据库(用于 MSA 搜索)
wget https://ftp.uniprot.org/pub/databases/uniprot/uniref/uniref50/uniref50.fasta.gz
gunzip /home/u2024170925/Ai4Science-Demo/ColabFold_demo/databases/uniref50.fasta.gz确保以下文件存在:
results_single/
├── protein.fasta # 单个蛋白序列(FASTA 格式)
└── protein_msa.fasta # 可选:已生成的 MSA(用于更准确预测)
示例 protein.fasta 内容:
>query1
MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKKVADALTNAVA
bash run_single_sequence.shbash run_single_sequence_with_msa.sh✅ 输出将保存在
results_single/output/目录下,包括:
query1_unrelaxed_rank_001_alphafold2_ptm_model_1_seed_000.pdbquery1_predicted_aligned_error_v1.json
启动 Jupyter 并打开 colabfold_visualization.ipynb:
jupyter notebook
colabfold_visualization.ipynb示例预测如下:
ColabFold_Ai4Science_Demo/
├── ColabFold/ # 模型及代码依赖子目录
├── databases/ # 用于 MSA/模型的数据库文件夹
├── results_single/ # 单序列预测结果输入/输出目录
│ ├── output/ # 输出文件(PDB、PAE、JSON 等)
│ └── … # 脚本运行前的输入(如 FASTA/MSA)
├── .gitignore
├── README.md # 项目说明文档
├── colabfold_visualization.ipynb # Jupyter Notebook 可视化脚本
├── run_single_sequence.sh # 单序列预测脚本
├── run_single_sequence_with_msa.sh# 使用 MSA 的预测脚本
└── visual.png # 蛋白质预测图
本项目基于 ColabFold 开源框架构建,遵循其 MIT 许可协议。
若用于学术发表,请引用原始 AlphaFold 和 ColabFold 文献:
- Jumper et al., Nature (2021): "Highly accurate protein structure prediction with AlphaFold"
- Elnaggar et al., BioRxiv (2022): "ColabFold: Fast and accurate protein structure prediction on Colab"
感谢 Sokrypton 提供的 ColabFold 工具链,
以及 AlphaFold 团队推动结构生物学进入 AI 新时代。
A complete AlphaFold2/ColabFold offline workflow demonstration for protein structure prediction and visualization.
Supports both single-sequence and MSA inputs, integrating 3D PDB visualization, PAE heatmap analysis, and static figure export.
This project is part of the Ai4Science course at HFUT (Hefei University of Technology), developed by Team 16: SiSi Liu, Xiao Liu, Xuelin Wang, Yichao He, and Mingxin Liu.
The code implementation was primarily completed by Yichao He.
This project builds a lightweight workflow for protein structure prediction and visualization based on ColabFold (the open-source implementation of AlphaFold2), including:
- ✅ Offline execution scripts — run locally or on servers, no Google Colab required.
- ✅ Full prediction pipeline — from FASTA input → MSA search → structure prediction → result output.
- ✅ Interactive 3D visualization — view predicted structures in real time using
nglviewin Jupyter Notebook. - ✅ Automatic PNG export — generate high-quality PDB and PAE plots for reports or papers.
| Module | Description | Core Technologies |
|---|---|---|
| run_single_sequence.sh | Predict structure from a single sequence without MSA | ColabFold + AlphaFold2-ptm |
| run_single_sequence_with_msa.sh | Predict structure using pre-generated MSA | MMseqs2 / A3M input |
| colabfold_visualization.ipynb | Core visualization notebook | nglview, seaborn, matplotlib |
# Create virtual environment
conda create -n colabfold python=3.9
conda activate colabfold
# Install essential packages
pip install colabfold nglview matplotlib seaborn numpy jupyterlab# Download ColabFold model parameters
cd /home/u2024170925/Ai4Science-Demo/ColabFold_demo/databases
wget https://github.com/sokrypton/ColabFold/releases/download/v1.0/params.tar.gz
tar -xzf params.tar.gz
# Download UniRef50 database (for MSA search)
wget https://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/idmapping/by_organism/UniRef50.fasta.gz
gunzip UniRef50.fasta.gzEnsure the following files exist:
results_single/
├── protein.fasta # Single protein sequence (FASTA format)
└── protein_msa.fasta # Optional: pre-generated MSA for higher accuracy
Example of protein.fasta:
>query1
MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKKVADALTNAVA
bash run_single_sequence.shbash run_single_sequence_with_msa.sh✅ The output will be saved in the
results_single/output/directory, including:
query1_unrelaxed_rank_001_alphafold2_ptm_model_1_seed_000.pdbquery1_predicted_aligned_error_v1.json
Launch Jupyter and open the visualization notebook:
jupyter notebook
colabfold_visualization.ipynbExample prediction:
ColabFold_Ai4Science_Demo/
├── ColabFold/ # Core model and code dependencies
├── databases/ # Databases for MSA/model parameters
├── results_single/ # Input/output directory for single-sequence predictions
│ ├── output/ # Output files (PDB, PAE, JSON, etc.)
│ └── … # Input files before running (FASTA / MSA)
├── .gitignore
├── README.md # Project documentation
├── colabfold_visualization.ipynb # Jupyter Notebook visualization script
├── run_single_sequence.sh # Script for single-sequence prediction
├── run_single_sequence_with_msa.sh# Script for MSA-based prediction
└── visual.png # Example protein visualization
This project is built upon the open-source framework ColabFold under the MIT License.
If used in academic publications, please cite the following works:
- Jumper et al., Nature (2021): "Highly accurate protein structure prediction with AlphaFold"
- Elnaggar et al., BioRxiv (2022): "ColabFold: Fast and accurate protein structure prediction on Colab"
Special thanks to Sokrypton for providing the ColabFold toolkit, and to the AlphaFold team for advancing structural biology into the AI era.
