AI-Powered Prediction

Antioxidant Peptide
Prediction Tool

Enter a peptide sequence and let our 9 ML/DL ensemble predict antioxidant activity.

Peptide Sequence
0 / 30
Select Models
·
🧬

Enter a peptide sequence and select one or more models, then click Predict.

If you use this tool, please cite: Wang, Yali, et al. "An AI-driven multilayer strategy and curated dataset for mining antioxidant peptides from yak bone collagen hydrolysates." Food Research International (2026): 118689.
📊

Metadata (Excel)

Comprehensive metadata for all peptides including sequences, activities, and physicochemical properties.

⬇ Download XLSX
🧬

Sequences (FASTA)

All peptide sequences in standard FASTA format, ready for embedding or alignment tools.

⬇ Download FASTA
📋

Dataset (CSV)

Complete dataset in CSV format with binary labels for use in machine learning pipelines.

⬇ Download CSV
If you use this dataset, please cite: Wang, Yali, et al. "An AI-driven multilayer strategy and curated dataset for mining antioxidant peptides from yak bone collagen hydrolysates." Food Research International (2026): 118689.

We developed an AI-driven multilayer strategy combined with a curated dataset for mining antioxidant peptides from yak bone collagen hydrolysates. This tool provides access to the trained prediction models and the AoXpDb database introduced in the study.

Wang, Yali, et al. "An AI-driven multilayer strategy and curated dataset for mining antioxidant peptides from yak bone collagen hydrolysates." Food Research International (2026): 118689.
🔗 View Paper

When should you cite us?

1

When using the AoXpDb online predictor to screen peptides for antioxidant activity.

2

When downloading and using the AoXpDb dataset (FASTA, CSV, or metadata).

3

When benchmarking our models (LR, RF, KNN, SVM, MLP, XGB, LGBM, CNN, LSTM) in your own work.

🔬 ESM-2 Embeddings

All sequences are encoded using Meta's ESM-2 protein language model (esm2_t6_8M_UR50D, 320-dimensional). Mean pooling across residues produces fixed-length feature vectors for classification.

📐 Training Protocol

Traditional ML models were optimized via 10-fold GridSearchCV (accuracy-scored). CNN and LSTM models used 10-fold CV with Adam optimizer across a grid of learning rates and hidden dimensions.

🗂️ Dataset

AoXpDb contains curated antioxidant and non-antioxidant peptide sequences from yak bone collagen hydrolysates. Training used balanced embeddings to address class imbalance.

Supported Models
Model Full Name Type Hyperparameter Search