G-Retriever for Obsidian Chat

G-Retriever for Obsidian Chat

Posted on December 19, 2025 by Mikel Bahn
G-Retriever for Obsidian Vault - Documentation

G-Retriever for Obsidian

Transform your Obsidian Vault into an intelligent, searchable knowledge graph

? Overview

This system transforms your Obsidian Vault into a searchable knowledge graph using Graph Neural Networks and Large Language Models. It combines modern RAG (Retrieval-Augmented Generation) techniques with G-Retriever to provide precise answers based on your personal notes.

? What does the system do?

  • Graph Conversion: Converts Markdown notes into a NetworkX graph
  • QA Generation: Automatically creates question-answer pairs using Ollama
  • Smart Retrieval: Finds relevant notes using embeddings and graph algorithms
  • Contextual Answers: Uses your local LLM for precise answers
  • Optional: GNN Training: Trains a specialized neural network on your data

System Architecture

Obsidian Vault
Graph Builder
Training Data
PyG Dataset
GNN Training
Chat Interface

?️ Technical Architecture

Two variants available:

G-Retriever Light (Untrained)

  • Ready to use immediately
  • No GPU required
  • Fast responses
  • Embedding-based retrieval
  • PCST subgraph construction
  • Ollama for answer generation
Recommendation: Start with this! It works very well without training.

G-Retriever Full (Trained)

  • Requires training (1-3h)
  • GPU recommended
  • Specialized for your data
  • GNN-based retrieval
  • Graph Attention Networks
  • 5-10% better results
Note: Only necessary for enthusiasts or large vaults (>5000 notes).

Core components:

1. Graph Neural Network (GAT)

Uses Graph Attention Networks to learn relationships between notes. With 3 layers and 4 attention heads, the model can recognize complex connection patterns.

2. Sentence Transformers

< p>Creates semantic embeddings for all notes. The model all-MiniLM-L6-v2 is fast and efficient with 384-dimensional vectors.

3. PCST Algorithm

Prize-Collecting Steiner Tree finds the optimally connected subgraph from relevant nodes – essential for coherent answers.

4. Ollama LLM

Your local Llama3 model generates the final answers based on the retrieved context. Complete privacy, no cloud!

⚙️ Installation

Requirements:

  • Python 3.9 or higher
  • CUDA (optional, for GPU acceleration)
  • Ollama installed with llama3:8b model
  • Approx. 10 GB free storage space

Step 1: Virtual Environment

python -m venv venv
source venv/bin/activate  # Linux/Mac
# or
venv\Scripts\activate  # Windows

Step 2: Install PyTorch

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

Step 3: PyTorch Geometric

pip install torch-geometric
pip install pyg-lib torch-scatter torch-sparse -f https://data.pyg.org/whl/torch-2.0.0+cu118.html

Step 4: Additional Dependencies

pip install sentence-transformers networkx pcst-fast requests tqdm numpy pandas

Step 5: Set up Ollama

# Check if Ollama is running
curl http://localhost:11434/api/version

# Pull Llama3 model
ollama pull llama3:8b
✓ Installation complete! You are ready to get started.

? Modules

1. obsidian_to_graph.py

Function: Converts Obsidian Vault into a NetworkX graph

Input: Path to the vault

Output: graph.gpickle, graph.json, stats.json

Features:

  • Parses Markdown files
  • Extracts Wiki-links [[link]] and Markdown links
  • Automatically removes images
  • Extracts #tags
  • Creates a directed graph with edges for links

2. generate_training_data.py

Function: Generates QA pairs using Ollama

Input: graph.gpickle

Output: train.json, val.json, qa_pairs.json

Question types:

  • Factual: Precise factual questions
  • Connection: Questions about relationships
  • Summary: Summary questions
  • Multi-Node: Questions spanning multiple connected notes

Performance: ~500 QA pairs in 1-2 hours

3. pyg_dataset.py

Function: Creates PyTorch Geometric datasets

Input: Graph + QA JSONs

Output: train_data.pt, val_data.pt

Features:

  • Node embeddings with Sentence Transformers
  • Question embeddings
  • Edge index for GNN
  • 80/20 train/val split

4. gretriever_inference.py

Function: Chat interface (untrained)

Pipeline:

  1. Retrieval: k-NN with cosine similarity
  2. Subgraph Construction: PCST for optimal subgraph
  3. Answer Generation: Ollama with context

Advantage: Ready to use immediately, no training needed!

5. train_gretriever.py

Function: Trains GNN on QA pairs

Model: GAT (Graph Attention Network)

Loss: Binary Cross Entropy (relevant vs. irrelevant nodes)

Optimizer: Adam with learning rate 0.001

Training: 20 epochs, ~1-3 hours

6. gretriever_inference_trained.py

Function: Chat interface with trained GNN

Difference: Uses trained model for retrieval instead of embeddings

Performance: 5-10% better relevance for large vaults

7. pipeline.py

Function: Runs the complete pipeline automatically

Options: Skip individual steps with --skip

Perfect for: Initial setup or restart

? Workflow

Quick Start (Untrained Variant):

1

Create graph

python obsidian_to_graph.py

Converts your notes into a graph. Takes: ~1-5 minutes for 1100 notes.

2

Generate training data

python generate_training_data.py

Creates 500 QA pairs using Ollama. Takes: 1-2 hours.

Tip: Start with 200 QA pairs for testing (num_samples=200), then expand to 500-1000.
3

Start chat

python gretriever_inference.py

Interactive chat interface opens. Ask questions about your notes!

Advanced (Trained Variant):

4

Create PyG dataset

python pyg_dataset.py

Converts data into PyTorch Geometric format. Takes: 5-10 minutes.

5

GNN Training

python train_gretriever.py

Trains the Graph Neural Network. Takes: 1-3 hours depending on hardware.

GPU Tip: With GPU 3-5x faster. CPU works too!
6

Chat with trained model

python gretriever_inference_trained.py

Uses the trained model for better retrieval.

? Training Details

How many training data do you need?

Vault Size Recommended QA Pairs Duration Purpose
< 500 notes 200-300 30-60 min Quick Test
500-1500 notes 500-800 1-2 h Standard (recommended)
1500-3000 notes 1000-1500 3-4 h Good coverage
> 3000 notes 2000+ 6+ h Very good

Training Hyperparameters:

Model Architecture

  • Node Embed Dim: 384 (from Sentence Transformer)
  • Hidden Dim: 256
  • Num Layers: 3
  • Attention Heads: 4
  • Total Parameters: ~2.5M

Training Setup

  • Optimizer: Adam
  • Learning Rate: 0.001
  • Loss Function: BCE with Logits
  • Epochs: 20 (default)
  • Batch Size: 1 (full graph per sample)

? Training Tips:

  • Start with fewer epochs (10) for testing
  • Monitor validation loss – stop early if overfitting
  • Best model is automatically saved
  • Training history is exported as JSON

? Usage

Example Chat Session:

$ python gretriever_inference.py

============================================================
G-Retriever Chat Interface for Obsidian Vault
Type 'quit' or 'exit' to end
============================================================

Your question: What are the most important concepts in my ML notes?
Query: What are the most important concepts in my ML notes?
Retrieving relevant nodes...
Constructing subgraph...
Generating answer...
Answer: Based on your notes, the most important Machine Learning
concepts are: Neural Networks with Backpropagation, Gradient Descent for
optimization, various Loss Functions (MSE, Cross-Entropy), and
regularization via L1/L2. You also have detailed notes on
Convolutional Neural Networks and their application in Computer Vision.
Used notes: Neural Networks, Backpropagation, Gradient Descent,
Loss Functions, Regularization

Example Queries:

? Factual Questions

  • "What is the difference between L1 and L2 regularization?"
  • "Which Python libraries do I use for Data Science?"
  • "What does my note about Transformers say?"

? Relationship Questions

  • "How are my notes on GraphQL and REST APIs connected?"
  • "Which projects use React?"
  • "What are the connections between my psychology notes?"

? Summaries

  • "Summarize my notes on Quantum Computing"
  • "What have I learned about productivity?"
  • "Overview of my travel notes to Japan"

Code Adjustments:

Adjust paths in the modules:

# In obsidian_to_graph.py
vault_path = "/path/to/your/vault"
output_path = "./graph_output"
In generate_training_data.py
graph_path = "./graph_output/graph.gpickle"
output_path = "./training_data"
num_samples = 500  # Number of QA pairs
In gretriever_inference.py
graph_path = "./graph_output/graph.gpickle"
ollama_model = "llama3:8b"

⚖️ Untrained vs. Trained

Performance Comparison:

Aspect Untrained (Light) Trained (Full)
Setup Time 1-2 hours 3-5 hours
GPU required? ❌ No ⚠️ Recommended
Retrieval Quality 85-90% 90-95%
Response Speed 2-5 seconds 3-6 seconds
Vault Size Recommendation < 2000 notes > 2000 notes
Maintenance None Re-training for major changes
Memory Requirement ~2 GB RAM ~4 GB RAM + 2 GB VRAM

✨ Recommendation:

Start with the untrained variant! It is quick to set up, works excellently, and you can start right away. Only train if:

  • You have more than 2000-3000 notes
  • You need the absolute best retrieval quality
  • You enjoy experimenting

The quality improvement from training is marginal (5-10%), but the effort is significantly higher.

? Troubleshooting

Problem: Ollama Connection Error

Solution:

# Check if Ollama is running
curl http://localhost:11434/api/version
Start Ollama
ollama serve

Problem: CUDA Out of Memory

Solution:

# In gretriever_inference.py or train_gretriever.py
device = "cpu"  # Instead of "cuda"

Problem: Too few QA pairs generated

Causes:

  • Many notes are too short (< 100 characters)
  • JSON parsing fails
  • Ollama timeouts

Solution: Increase num_samples by 20-30% more than desired.

Problem: Import Errors

Solution:

# Reinstall dependencies
pip install --force-reinstall torch-geometric
pip install pyg-lib torch-scatter torch-sparse

Problem: Training very slow

Optimizations:

  • Use GPU instead of CPU
  • Reduce Hidden Dim to 128
  • Reduce Num Layers to 2
  • Use fewer QA pairs for first test

? Advanced Configuration

Change Embedding Models:

# Better quality (slower)
embedding_model = "all-mpnet-base-v2"
Multilingual
embedding_model = "paraphrase-multilingual-MiniLM-L12-v2"
Specialized for code
embedding_model = "microsoft/codebert-base"

Tune GNN Architecture:

# More capacity
hidden_dim = 512
num_layers = 5
num_heads = 8
Faster, less capacity
hidden_dim = 128
num_layers = 2
num_heads = 2

Retrieval Parameters:

# In gretriever_inference.py
More context
k_retrieve = 30  # Instead of 20
Larger subgraph
max_subgraph_size = 20  # In construct_subgraph_pcst
More notes in LLM context
max_context_nodes = 15  # In generate_answer

Switch Ollama Model:

# Larger model (better quality)
ollama_model = "llama3:70b"
Faster model
ollama_model = "phi3:mini"
Specialized
ollama_model = "codellama:13b"  # For code-heavy vaults

⚠️ PCST Behavior: Selection, not Expansion

The Prize-Collecting Steiner Tree (PCST) step does not expand the retrieved node set. It performs a global optimization and selects a structurally optimal subset of nodes.

Key Point:
A retrieved node is never guaranteed to appear in the final subgraph. Retrieval provides candidates — PCST decides which ones are worth keeping.

In the current implementation, PCST is called as:

vertices, _ = pcst_fast(
    edges,
    prizes,
    costs,
    root,
    1,
    1,
    'strong'
)

How PCST makes decisions

1. Node Prizes

prizes[relevant_nodes] = similarities[relevant_nodes]
  • Only retrieved nodes receive a prize > 0
  • All other nodes start with prize = 0
  • A retrieved node is optional, not mandatory

2. Edge Costs

costs = np.ones(edges.shape[0])
  • Each edge has uniform cost = 1
  • Long or weakly connected paths are expensive

3. Optimization Criterion

keep node if:  prize(node) ≥ sum(edge costs to connect it)
  • High similarity + short distance → kept
  • Medium similarity + many hops → dropped
  • Low similarity + strong connectivity → often kept
Formal Property:
subgraph ⊆ retrieved_nodes ∪ connector_nodes
PCST never guarantees that all retrieved nodes survive.

Why the subgraph is usually smaller than retrieval

  • Retrieved nodes may be thematically scattered
  • Connection costs can outweigh semantic relevance
  • Highly connected hubs are often preferred

This explains why, for example, well-connected authors or concepts may remain in the subgraph while isolated but semantically relevant notes are removed.

How to influence PCST behavior

You can actively steer how selective PCST is:

# Option A: Increase prizes (keep more retrieved nodes)
prizes[relevant_nodes] = similarities[relevant_nodes] * 100

# Option B: Reduce edge costs (favor larger connected subgraphs)
costs = np.full(edges.shape[0], 0.01)

# Option C: Disable PCST entirely (pure Top-K retrieval)
subgraph_nodes = relevant_nodes
Summary:
PCST is a filtering mechanism that extracts the most structurally coherent core — not an expansion step. Differences between retrieval output and final context are expected and indicate correct behavior.

? Performance Optimization

For large vaults (>5000 notes):

1. Node Embedding Caching

Pre-compute and store embeddings separately:

import pickle
After first run
with open('node_embeddings.pkl', 'wb') as f:
    pickle.dump(self.node_embeddings, f)
In subsequent runs load
with open('node_embeddings.pkl', 'rb') as f:
    self.node_embeddings = pickle.load(f)

2. Batch Processing for QA Generation

Use larger batches:

# In generate_training_data.py
batch_size = 10  # Multiple prompts in parallel

3. Graph Pruning

Remove isolated nodes:

# After graph.build()
isolated = list(nx.isolates(self.graph))
self.graph.remove_nodes_from(isolated)

Benchmark (1100 nodes):

Operation CPU (M2) GPU (A100)
Graph Building tbd tbd
Node Embeddings tbd tbd
500 QA pairs tbd tbd
PyG Dataset tbd tbd
Training (tbd epochs) tbd tbd
Query Inference tbd tbd

❓ FAQ

Can I use other LLMs instead of Ollama?

Yes! You can modify generate_answer() to use OpenAI, Anthropic, or other APIs. Ollama is just the privacy-friendly default option.

Does it also work with other note-taking apps?

In principle, yes! You just need to adapt obsidian_to_graph.py to parse the specific format (e.g., Notion, Roam Research).

How do I keep the system up to date when I add new notes?

Simply run the pipeline again. For incremental updates, you could write a script that processes only new/changed notes.

Can I use multiple vaults at the same time?

Yes! Create a separate output folder for each vault. You can even combine multiple graphs in the same chat interface.

Are my data uploaded anywhere?

No! Everything runs locally. Ollama is local, embeddings are local, training is local. Complete privacy.

Does the system work in other languages?

Yes! Use multilingual embedding models and ensure your Ollama model supports the language. Llama3 works well with German, French, Spanish, etc.

? Resources & Links

Community

  • PyTorch Geometric Discord
  • Obsidian Community Forum
  • r/LocalLLaMA on Reddit

? Conclusion

You now have a complete graph-based RAG system!

This system combines state-of-the-art technologies:

  • ✅ Graph Neural Networks for structured knowledge
  • ✅ Semantic Search with embeddings
  • ✅ Intelligent subgraph construction (PCST)
  • ✅ Local LLMs for privacy
  • ✅ Modular, extensible code

Next Steps:

  1. Start with the untrained variant
  2. Test different questions
  3. Generate more QA pairs if needed
  4. Optional: Train for better results
  5. Experiment with different models and parameters