EvoFlow-RNA: Generating and optimizing non-coding RNAs with masked discrete diffusion
EvoFlow-RNA
We've posted EvoFlow-RNA to bioRxiv — a masked discrete diffusion framework for generating and optimizing non-coding RNAs. Non-coding RNAs sit at the heart of an enormous design space: tRNAs, riboswitches, ribozymes, regulatory elements, and engineered scaffolds. Existing methods either treat the design problem auto-regressively (difficult to condition mid-sequence) or rely on continuous-relaxation diffusion (lossy on a discrete alphabet). EvoFlow-RNA frames the problem natively in the discrete domain.
The model trains on a curated corpus of evolved ncRNAs and learns to in-fill, extend, or wholesale generate sequences conditioned on structural and functional priors. In our experiments, EvoFlow-RNA produces designs that recover wild-type fitness on held-out tasks and improves over autoregressive baselines on guided generation.
The work is part of our broader research program on programmable nucleic acid design. The preprint is available now on bioRxiv; we'll be presenting follow-on results at upcoming venues.
Read the preprint