01 Core Concepts & Foundations
Definition & Scope
DTI is the computational task of predicting binding interactions between drug candidates and biological targets. Modern research views binary classification and continuous Drug-Target Affinity (DTA) as interconnected representation learning problems.
Generalization Priority
Beyond random split scores, 2025+ reviews focus on Cold Drug, Cold Target, and Out-of-Distribution (OOD) performance.
Representation
SMILES, Molecular Graphs, 3D structures, and PLM-based sequence embeddings.
Relational Context
Integrating PPI networks, pathways, and Knowledge Graphs (KGs) for holistic modeling.
02 Critical Challenges
Inductive Generalization
Extreme performance drops on new drugs/targets. Existing models struggle with unseen chemical spaces and limited interaction info.
Evaluation Bias
Random splits are insufficient. 2025 trends demand scaffold splits and more rigorous negative sampling protocols.
Multimodal Noise
Integrating structure, sequence, and network data is difficult due to differing scales and data sparse nature.
Biological Credibility
Bridging the gap between attention visualizations and actual wet-lab pharmacological mechanisms.
Evolution of
Methodologies
-
GNN-based Methods
Mainstream post-2025, specializing in graph transformers and heterogeneous GNNs.
-
Multimodal Fusion (MFCADTI)
Cross-attention mechanisms to fuse sequence and network features.
-
Foundation & Generative Models
ChemBERTa, ESM2, and VGAN-DTI frameworks are reshaping representation learning.
Open Problems
- ● Discrepancy between high benchmark scores and real-world robustness.
- ● Negative label reliability: "not observed" ≠ "no interaction".
- ● The 3D vs Sequence-based PLM performance/availability trade-off.
Future Directions
- → Unified Foundation Models + Graphs + Knowledge.
- → Standardization of OOD-centric evaluation protocols.
- → Transition from static prediction to generative molecule design.
Key Source Explorer
One-Liner Summary
"Creating practical models that are robust in OOD and cold-start environments through multimodal representations and knowledge infusion, rather than focusing solely on more complex models."