From Local to Global: A Graph RAG Approach to Query-Focused Summarization
Key Idea: GraphRAG targets global QA by extracting an entity–relation graph from the corpus, clustering it into communities, and generating hierarchical community summaries that are combined via a map-reduce answering step. It’s evaluated on podcast/news corpora with LLM-based rubrics (e.g., comprehensiveness/diversity) and shows large context-token reductions at higher summary levels while outperforming a vector-RAG baseline on those global QA metrics in their setup.
Video ReCap: Recursive Captioning of Hour-Long Videos
Key Idea: Introduces a hierarchical approach to video captioning via recursive temporal summarization.
Relevance: Shows the power of hierarchical modeling for long-form video understanding.
Limitation: Designed for caption generation, not structured reasoning or graph-based representation.
GraphVQA: Language-Guided Graph Neural Networks for Scene Graph Question Answering
Key Idea: Uses question-guided graph neural networks to perform reasoning over scene graphs for visual question answering.
Relevance: Highlights the benefits of structured, interpretable reasoning using scene graphs aligned with language.
Limitation: Focuses on static graphs without modeling temporal changes or multi-scale event structures.
References
[1] Edge et al. (2025). From Local to Global: A Graph RAG Approach to Query-Focused Summarization. arXiv:2404.16130 [cs.CL]. https://arxiv.org/abs/2404.16130
[2] Islam, Md Mohaiminul, et al. “Video recap: Recursive captioning of hour-long videos.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024.
[3] Liang, Weixin, Yanhao Jiang, and Zixuan Liu. “GraghVQA: Language-guided graph neural networks for graph-based visual question answering.” arXiv preprint arXiv:2104.10283 (2021).[6] Nag, Sayak, et al. “Unbiased scene graph generation in videos.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023.
