Chinese scholars have made progress in the field of intelligent design of synthetic biological gene regulatory sequences


  

  Figure 2 Modular modeling and intelligent design of enhancer based on TFBU

  With the support of the National Natural Science Foundation of China projects (approval numbers: 62250007, 62225307) and other grants, Professor Wang Xiaowei's team from the Department of Automation at Tsinghua University has made progress in the intelligent design of gene regulatory sequences in synthetic biology. The series of research results have been published in two consecutive papers: (1) "Systematic representation and optimization enable the inverse design of cross species regulatory sequences in bacteria" was published in the journal Nature Communications on February 19, 2025. Paper link: https://doi.org/10.1038/s41467-025-57031-1 (2) The title "Modeling and designing enhancers by introducing and utilizing transcription factor binding units" was published on February 8, 2025 in the journal Nature Communications. Paper link: https://www.nature.com/articles/s41467-025-56749-2 .

  In response to the bottleneck problem of poor cross host adaptation of gene circuits in biomanufacturing, the research team characterized functional regulatory sequences as conditional probability distributions in the DNA sequence space from the perspective of information encoding, and found that cross species regulatory rules are hidden in the overlapping regions of different species' conditional probability distributions; By integrating millions of functional sequence data from thousands of species, a high-dimensional semantic representation space and intelligent generation model for DNA across species boundaries were constructed, breaking through the species barrier of natural components. Experimental results showed that the model achieved a 93.3% accuracy in cross host sequence adaptation in Escherichia coli and Pseudomonas aeruginosa (Figure 1). In addition, a new transcription factor binding unit (TFBU) model has been proposed to address the challenge of quantitative modeling of gene enhancers in mammalian cells; This model models the core binding sites of transcription factors and their surrounding environmental sequences as a functional whole, breaking through the limitations of traditional methods that only focus on local combinations of binding sites and ignore the global effects of sequence context. It successfully quantifies the impact of environmental sequences on transcription factor binding and enhancer activity, providing a new tool for the development of novel therapies such as gene therapy (Figure 2).

  The series of studies combines intelligent model driven digital evolution with active learning driven synthetic biology experiments, significantly improving the design efficiency of synthetic biology sequences through collaborative exploration and closed-loop iterative optimization of the "virtual world" and "material world".