Synergizing a knowledge graph and large language model for relay catalysis pathway recommendation
Relay catalysis integrates multiple catalytic reactions to efficiently transform intermediates and enhance conversion and selectivity. However, designing these pathways and multifunctional catalysts is often lengthy and costly, heavily relying on in-depth literature analysis by experienced researchers. To address this, we developed an approach that combines a knowledge graph (KG) and large language models (LLMs) to automatically recommend multistep catalytic reaction pathways. Our method involves using an LLM-assisted workflow for data acquisition and organization, followed by the construction of a detailed catalysis knowledge graph (Cat-KG). After querying the Cat-KG, promising relay catalysis pathways are identified by applying scoring rules informed by expertise in relay catalysis. The LLM then transforms the structured pathways and reaction condition data into readable chemical equations and descriptions for chemists. This step integrates catalysis knowledge from the Cat-KG and helps avoid LLM-induced hallucinations by using reliable information. The method efficiently recommended relay catalysis pathways for ethylene, ethanol, 2,5-furandicarboxylate and other targets within minutes, identifying pathways consistent with reported ones while using different reaction conditions, validating its effectiveness. Thus, this strategy can extrapolate known and novel relay catalysis pathways, showcasing its potential for application in pathway selection.