G-Retriever for Obsidian
Transform your Obsidian Vault into an intelligent, searchable knowledge graph
? Overview
This system transforms your Obsidian Vault into a searchable knowledge graph using Graph Neural Networks and Large Language Models. It combines modern RAG (Retrieval-Augmented Generation) techniques with G-Retriever to provide precise answers based on your personal notes.
? What does the system do?
- Graph Conversion: Converts Markdown notes into a NetworkX graph
- QA Generation: Automatically creates question-answer pairs using Ollama
- Smart Retrieval: Finds relevant notes using embeddings and graph algorithms
- Contextual Answers: Uses your local LLM for precise answers
- Optional: GNN Training: Trains a specialized neural network on your data
System Architecture
?️ Technical Architecture
Two variants available:
G-Retriever Light (Untrained)
- Ready to use immediately
- No GPU required
- Fast responses
- Embedding-based retrieval
- PCST subgraph construction
- Ollama for answer generation
G-Retriever Full (Trained)
- Requires training (1-3h)
- GPU recommended
- Specialized for your data
- GNN-based retrieval
- Graph Attention Networks
- 5-10% better results
Core components:
1. Graph Neural Network (GAT)
Uses Graph Attention Networks to learn relationships between notes. With 3 layers and 4 attention heads, the model can recognize complex connection patterns.
2. Sentence Transformers
< p>Creates semantic embeddings for all notes. The modelall-MiniLM-L6-v2 is fast and efficient with 384-dimensional vectors.
3. PCST Algorithm
Prize-Collecting Steiner Tree finds the optimally connected subgraph from relevant nodes – essential for coherent answers.
4. Ollama LLM
Your local Llama3 model generates the final answers based on the retrieved context. Complete privacy, no cloud!
⚙️ Installation
Requirements:
- Python 3.9 or higher
- CUDA (optional, for GPU acceleration)
- Ollama installed with
llama3:8bmodel - Approx. 10 GB free storage space
Step 1: Virtual Environment
python -m venv venv
source venv/bin/activate # Linux/Mac
# or
venv\Scripts\activate # Windows
Step 2: Install PyTorch
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
Step 3: PyTorch Geometric
pip install torch-geometric
pip install pyg-lib torch-scatter torch-sparse -f https://data.pyg.org/whl/torch-2.0.0+cu118.html
Step 4: Additional Dependencies
pip install sentence-transformers networkx pcst-fast requests tqdm numpy pandas
Step 5: Set up Ollama
# Check if Ollama is running
curl http://localhost:11434/api/version
# Pull Llama3 model
ollama pull llama3:8b
? Modules
1. obsidian_to_graph.py
Function: Converts Obsidian Vault into a NetworkX graph
Input: Path to the vault
Output: graph.gpickle, graph.json, stats.json
Features:
- Parses Markdown files
- Extracts Wiki-links [[link]] and Markdown links
- Automatically removes images
- Extracts #tags
- Creates a directed graph with edges for links
2. generate_training_data.py
Function: Generates QA pairs using Ollama
Input: graph.gpickle
Output: train.json, val.json, qa_pairs.json
Question types:
- Factual: Precise factual questions
- Connection: Questions about relationships
- Summary: Summary questions
- Multi-Node: Questions spanning multiple connected notes
Performance: ~500 QA pairs in 1-2 hours
3. pyg_dataset.py
Function: Creates PyTorch Geometric datasets
Input: Graph + QA JSONs
Output: train_data.pt, val_data.pt
Features:
- Node embeddings with Sentence Transformers
- Question embeddings
- Edge index for GNN
- 80/20 train/val split
4. gretriever_inference.py
Function: Chat interface (untrained)
Pipeline:
- Retrieval: k-NN with cosine similarity
- Subgraph Construction: PCST for optimal subgraph
- Answer Generation: Ollama with context
Advantage: Ready to use immediately, no training needed!
5. train_gretriever.py
Function: Trains GNN on QA pairs
Model: GAT (Graph Attention Network)
Loss: Binary Cross Entropy (relevant vs. irrelevant nodes)
Optimizer: Adam with learning rate 0.001
Training: 20 epochs, ~1-3 hours
6. gretriever_inference_trained.py
Function: Chat interface with trained GNN
Difference: Uses trained model for retrieval instead of embeddings
Performance: 5-10% better relevance for large vaults
7. pipeline.py
Function: Runs the complete pipeline automatically
Options: Skip individual steps with --skip
Perfect for: Initial setup or restart
? Workflow
Quick Start (Untrained Variant):
Create graph
python obsidian_to_graph.py
Converts your notes into a graph. Takes: ~1-5 minutes for 1100 notes.
Generate training data
python generate_training_data.py
Creates 500 QA pairs using Ollama. Takes: 1-2 hours.
num_samples=200), then expand to 500-1000.
Start chat
python gretriever_inference.py
Interactive chat interface opens. Ask questions about your notes!
Advanced (Trained Variant):
Create PyG dataset
python pyg_dataset.py
Converts data into PyTorch Geometric format. Takes: 5-10 minutes.
GNN Training
python train_gretriever.py
Trains the Graph Neural Network. Takes: 1-3 hours depending on hardware.
Chat with trained model
python gretriever_inference_trained.py
Uses the trained model for better retrieval.
? Training Details
How many training data do you need?
| Vault Size | Recommended QA Pairs | Duration | Purpose |
|---|---|---|---|
| < 500 notes | 200-300 | 30-60 min | Quick Test |
| 500-1500 notes | 500-800 | 1-2 h | Standard (recommended) |
| 1500-3000 notes | 1000-1500 | 3-4 h | Good coverage |
| > 3000 notes | 2000+ | 6+ h | Very good |
Training Hyperparameters:
Model Architecture
- Node Embed Dim: 384 (from Sentence Transformer)
- Hidden Dim: 256
- Num Layers: 3
- Attention Heads: 4
- Total Parameters: ~2.5M
Training Setup
- Optimizer: Adam
- Learning Rate: 0.001
- Loss Function: BCE with Logits
- Epochs: 20 (default)
- Batch Size: 1 (full graph per sample)
? Training Tips:
- Start with fewer epochs (10) for testing
- Monitor validation loss – stop early if overfitting
- Best model is automatically saved
- Training history is exported as JSON
? Usage
Example Chat Session:
$ python gretriever_inference.py
============================================================
G-Retriever Chat Interface for Obsidian Vault
Type 'quit' or 'exit' to end
============================================================
Your question: What are the most important concepts in my ML notes?
Query: What are the most important concepts in my ML notes?
Retrieving relevant nodes...
Constructing subgraph...
Generating answer...
Answer: Based on your notes, the most important Machine Learning
concepts are: Neural Networks with Backpropagation, Gradient Descent for
optimization, various Loss Functions (MSE, Cross-Entropy), and
regularization via L1/L2. You also have detailed notes on
Convolutional Neural Networks and their application in Computer Vision.
Used notes: Neural Networks, Backpropagation, Gradient Descent,
Loss Functions, Regularization
Example Queries:
? Factual Questions
- "What is the difference between L1 and L2 regularization?"
- "Which Python libraries do I use for Data Science?"
- "What does my note about Transformers say?"
? Relationship Questions
- "How are my notes on GraphQL and REST APIs connected?"
- "Which projects use React?"
- "What are the connections between my psychology notes?"
? Summaries
- "Summarize my notes on Quantum Computing"
- "What have I learned about productivity?"
- "Overview of my travel notes to Japan"
Code Adjustments:
Adjust paths in the modules:
# In obsidian_to_graph.py
vault_path = "/path/to/your/vault"
output_path = "./graph_output"
In generate_training_data.py
graph_path = "./graph_output/graph.gpickle"
output_path = "./training_data"
num_samples = 500 # Number of QA pairs
In gretriever_inference.py
graph_path = "./graph_output/graph.gpickle"
ollama_model = "llama3:8b"
⚖️ Untrained vs. Trained
Performance Comparison:
| Aspect | Untrained (Light) | Trained (Full) |
|---|---|---|
| Setup Time | 1-2 hours | 3-5 hours |
| GPU required? | ❌ No | ⚠️ Recommended |
| Retrieval Quality | 85-90% | 90-95% |
| Response Speed | 2-5 seconds | 3-6 seconds |
| Vault Size Recommendation | < 2000 notes | > 2000 notes |
| Maintenance | None | Re-training for major changes |
| Memory Requirement | ~2 GB RAM | ~4 GB RAM + 2 GB VRAM |
✨ Recommendation:
Start with the untrained variant! It is quick to set up, works excellently, and you can start right away. Only train if:
- You have more than 2000-3000 notes
- You need the absolute best retrieval quality
- You enjoy experimenting
The quality improvement from training is marginal (5-10%), but the effort is significantly higher.
? Troubleshooting
Problem: Ollama Connection Error
Solution:
# Check if Ollama is running
curl http://localhost:11434/api/version
Start Ollama
ollama serve
Problem: CUDA Out of Memory
Solution:
# In gretriever_inference.py or train_gretriever.py
device = "cpu" # Instead of "cuda"
Problem: Too few QA pairs generated
Causes:
- Many notes are too short (< 100 characters)
- JSON parsing fails
- Ollama timeouts
Solution: Increase num_samples by 20-30% more than desired.
Problem: Import Errors
Solution:
# Reinstall dependencies
pip install --force-reinstall torch-geometric
pip install pyg-lib torch-scatter torch-sparse
Problem: Training very slow
Optimizations:
- Use GPU instead of CPU
- Reduce Hidden Dim to 128
- Reduce Num Layers to 2
- Use fewer QA pairs for first test
? Advanced Configuration
Change Embedding Models:
# Better quality (slower)
embedding_model = "all-mpnet-base-v2"
Multilingual
embedding_model = "paraphrase-multilingual-MiniLM-L12-v2"
Specialized for code
embedding_model = "microsoft/codebert-base"
Tune GNN Architecture:
# More capacity
hidden_dim = 512
num_layers = 5
num_heads = 8
Faster, less capacity
hidden_dim = 128
num_layers = 2
num_heads = 2
Retrieval Parameters:
# In gretriever_inference.py
More context
k_retrieve = 30 # Instead of 20
Larger subgraph
max_subgraph_size = 20 # In construct_subgraph_pcst
More notes in LLM context
max_context_nodes = 15 # In generate_answer
Switch Ollama Model:
# Larger model (better quality)
ollama_model = "llama3:70b"
Faster model
ollama_model = "phi3:mini"
Specialized
ollama_model = "codellama:13b" # For code-heavy vaults
⚠️ PCST Behavior: Selection, not Expansion
The Prize-Collecting Steiner Tree (PCST) step does not expand the retrieved node set. It performs a global optimization and selects a structurally optimal subset of nodes.
A retrieved node is never guaranteed to appear in the final subgraph. Retrieval provides candidates — PCST decides which ones are worth keeping.
In the current implementation, PCST is called as:
vertices, _ = pcst_fast(
edges,
prizes,
costs,
root,
1,
1,
'strong'
)
How PCST makes decisions
1. Node Prizes
prizes[relevant_nodes] = similarities[relevant_nodes]
- Only retrieved nodes receive a prize > 0
- All other nodes start with prize = 0
- A retrieved node is optional, not mandatory
2. Edge Costs
costs = np.ones(edges.shape[0])
- Each edge has uniform cost = 1
- Long or weakly connected paths are expensive
3. Optimization Criterion
keep node if: prize(node) ≥ sum(edge costs to connect it)
- High similarity + short distance → kept
- Medium similarity + many hops → dropped
- Low similarity + strong connectivity → often kept
subgraph ⊆ retrieved_nodes ∪ connector_nodesPCST never guarantees that all retrieved nodes survive.
Why the subgraph is usually smaller than retrieval
- Retrieved nodes may be thematically scattered
- Connection costs can outweigh semantic relevance
- Highly connected hubs are often preferred
This explains why, for example, well-connected authors or concepts may remain in the subgraph while isolated but semantically relevant notes are removed.
How to influence PCST behavior
You can actively steer how selective PCST is:
# Option A: Increase prizes (keep more retrieved nodes)
prizes[relevant_nodes] = similarities[relevant_nodes] * 100
# Option B: Reduce edge costs (favor larger connected subgraphs)
costs = np.full(edges.shape[0], 0.01)
# Option C: Disable PCST entirely (pure Top-K retrieval)
subgraph_nodes = relevant_nodes
PCST is a filtering mechanism that extracts the most structurally coherent core — not an expansion step. Differences between retrieval output and final context are expected and indicate correct behavior.
? Performance Optimization
For large vaults (>5000 notes):
1. Node Embedding Caching
Pre-compute and store embeddings separately:
import pickle
After first run
with open('node_embeddings.pkl', 'wb') as f:
pickle.dump(self.node_embeddings, f)
In subsequent runs load
with open('node_embeddings.pkl', 'rb') as f:
self.node_embeddings = pickle.load(f)
2. Batch Processing for QA Generation
Use larger batches:
# In generate_training_data.py
batch_size = 10 # Multiple prompts in parallel
3. Graph Pruning
Remove isolated nodes:
# After graph.build()
isolated = list(nx.isolates(self.graph))
self.graph.remove_nodes_from(isolated)
Benchmark (1100 nodes):
| Operation | CPU (M2) | GPU (A100) |
|---|---|---|
| Graph Building | tbd | tbd |
| Node Embeddings | tbd | tbd |
| 500 QA pairs | tbd | tbd |
| PyG Dataset | tbd | tbd |
| Training (tbd epochs) | tbd | tbd |
| Query Inference | tbd | tbd |
❓ FAQ
Can I use other LLMs instead of Ollama?
Yes! You can modify generate_answer() to use OpenAI, Anthropic, or other APIs. Ollama is just the privacy-friendly default option.
Does it also work with other note-taking apps?
In principle, yes! You just need to adapt obsidian_to_graph.py to parse the specific format (e.g., Notion, Roam Research).
How do I keep the system up to date when I add new notes?
Simply run the pipeline again. For incremental updates, you could write a script that processes only new/changed notes.
Can I use multiple vaults at the same time?
Yes! Create a separate output folder for each vault. You can even combine multiple graphs in the same chat interface.
Are my data uploaded anywhere?
No! Everything runs locally. Ollama is local, embeddings are local, training is local. Complete privacy.
Does the system work in other languages?
Yes! Use multilingual embedding models and ensure your Ollama model supports the language. Llama3 works well with German, French, Spanish, etc.
? Resources & Links
Papers & Research
Community
- PyTorch Geometric Discord
- Obsidian Community Forum
- r/LocalLLaMA on Reddit
? Conclusion
You now have a complete graph-based RAG system!
This system combines state-of-the-art technologies:
- ✅ Graph Neural Networks for structured knowledge
- ✅ Semantic Search with embeddings
- ✅ Intelligent subgraph construction (PCST)
- ✅ Local LLMs for privacy
- ✅ Modular, extensible code
Next Steps:
- Start with the untrained variant
- Test different questions
- Generate more QA pairs if needed
- Optional: Train for better results
- Experiment with different models and parameters