Background
Relay catalysis is a strategy that organically couples multiple catalytic reactions, significantly improving synthesis efficiency and selectivity while reducing energy consumption and raw material waste. However, designing a reasonable relay catalytic pathway is not easy. Researchers must review large volumes of scattered literature, compare different reaction conditions, and ensure that each step can be smoothly connected. This process is often time-consuming, highly experience-dependent, and subject to uncertainty. In addition, because reaction data are dispersed across different sources without systematic integration, researchers struggle to quickly obtain comprehensive and reliable information, making pathway design a considerable challenge.
Research detail
To address the long-standing difficulty of efficiently designing relay catalytic pathways, Professor Jun Cheng’s team at Xiamen University and collaborators proposed an innovative approach that integrates knowledge graphs (KG) with large language models (LLMs) for intelligent relay catalysis pathway recommendation.
The research team first identified five major categories and 29 key types of information of concern in relay catalysis research (Fig. 1). Based on this, they designed a workflow for automated data extraction and knowledge graph updating (Fig. 2). In this workflow, LLMs play a central role by efficiently identifying and extracting core data—such as reactants, products, catalysts, reaction conditions, and performance metrics—from more than 15,000 catalysis-related papers. Using this structured and standardized information, the team built a traceable catalysis knowledge graph (Cat-KG). This graph not only integrates scattered literature data but also provides bidirectional links to original publications, ensuring reliability and transparency of the data.


For pathway recommendation, the team combined graph-based search methods with expert-designed chemical rules to screen theoretically reasonable and experimentally feasible multi-step relay catalytic pathways from Cat-KG (Fig. 3). They particularly emphasized the compatibility of reaction conditions between adjacent steps, avoiding mismatches in temperature, atmosphere, or additives. LLMs were then used to transform the filtered pathways into intuitive chemical equations and concise explanations, enabling researchers to quickly understand and evaluate them.

This method is both efficient and innovative: it can generate pathway recommendations within seconds to minutes and has successfully validated multiple classic relay catalytic pathways consistent with literature reports. Moreover, it identified 20 previously unreported potential pathways, providing important theoretical references and new directions for experimental exploration.
Significance
This work develops a pathway recommendation approach distinct from traditional “black-box” AI systems, offering transparency, interpretability, and traceability. Each recommended pathway is accompanied by supporting data and literature links, assisting chemists in evaluation and decision-making before experiments. The system is highly flexible and scalable: it can be seamlessly upgraded with more advanced LLMs and expanded to applications such as photocatalysis and electrocatalysis. The team also plans to introduce expert feedback in future versions to continuously optimize the recommendation model.
Currently, the Cat-KG supports catalytic reaction queries and is open to the public (https://ai4ec.ac.cn/apps/chembrain). Additional applications such as pathway queries will be rolled out gradually.
Outlook
At present, the system primarily evaluates each reaction step independently. Future research will focus on addressing more complex interactions between steps, such as catalyst coupling effects, catalyst stability under real conditions, as well as economic and practical feasibility. These improvements aim to make the overall catalytic process more effective under realistic experimental settings.
**DOI: **https://doi.org/10.1093/nsr/nwaf271
