Improving the generalization of protein expression models with mechanistic sequence information

Excited to share our latest work on improving the generalization of protein expression models, which has just been online in Nucleic Acids Research!

Check the paper or Check the repo.

🧬 We show that integrating mechanistic sequence features—such as mRNA stability, codon usage, and peptide properties—can enhance model generalization, improving predictive accuracy for novel sequences.

🖥️ We explore multiple strategies to combine mechanistic features with standard encodings, including feature stacking, ensemble stacking, and geometric stacking with graph neural networks.

💡 This work highlights the importance of domain knowledge and feature engineering in ML-driven sequence design, offering new insights for applications in synthetic biology and strain engineering.

Huge thanks to my supervisors, Dr Diego Oyarzún and Prof Grzegorz Kudla for their guidance and support throughout this project!