Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
c2513c0
Port graphrag-rs core pipeline to Swift
claude Jun 28, 2026
71b04cb
Address Gemini review: BM25 tf, offset accuracy, async URLSession, ne…
claude Jun 28, 2026
33d42c8
Update BM25 docstring to match raw-tf scoring
claude Jun 28, 2026
af6d547
Address Codex review: 10 correctness/robustness fixes
claude Jun 28, 2026
7bd410f
Add GitHub Actions CI to build and test on every commit
claude Jun 28, 2026
c845d99
CI: target stable Swift 6.1 toolchain
claude Jun 28, 2026
159b116
Address Codex review (round 3): honor config knobs + graph edge cases
claude Jun 28, 2026
5cf5576
Address Codex review (round 4): config unification, guards, extractor…
claude Jun 28, 2026
74524ba
Address Codex review (round 5): cap scoping, sync, tokenization, vali…
claude Jun 28, 2026
3bfd74f
Address Codex review (round 6): tokenizer, corpus count, guards, rela…
claude Jun 28, 2026
b3de614
Address Codex review (round 7): sentence abbreviations, clamps, clear…
claude Jun 28, 2026
f6bdaf4
Address Codex review (round 8): sentence-end context, minConfidence, …
claude Jun 28, 2026
420f1cc
Address Codex review (round 9): newline boundaries, best relation pai…
claude Jun 28, 2026
38c78b7
Address Codex review (round 10): org-name commas, LLM keys, host port…
claude Jun 28, 2026
f872b93
Fix build: make LLM parse structs Decodable (CI compile error)
claude Jun 28, 2026
ab37e9a
Address Codex review (round 11): path guard, trailing cues, newlines,…
claude Jun 28, 2026
b6dfa55
Address Codex review (round 12): drop evidence-less edges, dead graph…
claude Jun 28, 2026
8f51562
Address Codex review (round 13): drop orphaned entities, skip keyword…
claude Jun 28, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 30 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
name: CI

on:
push:
branches: ["**"]
pull_request:

concurrency:
group: ci-${{ github.ref }}
cancel-in-progress: true

jobs:
linux:
name: Build & Test (Linux / Swift)
runs-on: ubuntu-latest
# Official Swift toolchain image. Must be >= the package's
# swift-tools-version (6.1); bump this tag when raising the manifest.
container: swift:6.1
steps:
- name: Checkout
uses: actions/checkout@v4

- name: Swift version
run: swift --version

- name: Build
run: swift build --build-tests

- name: Test
run: swift test --skip-build
2 changes: 1 addition & 1 deletion Package.swift
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
// swift-tools-version: 6.3
// swift-tools-version: 6.1
// The swift-tools-version declares the minimum version of Swift required to build this package.

import PackageDescription
Expand Down
100 changes: 100 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
# GraphRAG (Swift)

[![CI](https://github.com/PicoMLX/GraphRAG/actions/workflows/ci.yml/badge.svg)](https://github.com/PicoMLX/GraphRAG/actions/workflows/ci.yml)

A Swift port of the Rust crate [`graphrag-rs`](https://github.com/automataIA/graphrag-rs):
Graph-based Retrieval Augmented Generation. It builds a knowledge graph from
documents and answers natural-language questions using graph-based context
retrieval.

This package ports the **core library** (`graphrag-core`) — the parts that make
GraphRAG work end to end — into idiomatic, Swift 6, dependency-free code. It runs
fully offline out of the box, and can optionally talk to a local
[Ollama](https://ollama.com) server for LLM-backed extraction and answer
generation.

## Installation

Add the package to your `Package.swift`:

```swift
.package(url: "https://github.com/picomlx/graphrag.git", branch: "main")
```

and depend on the `GraphRAG` product.

## Quick start

```swift
import GraphRAG

// Offline pipeline: hash embeddings + pattern-based entity extraction.
let rag = try GraphRAGBuilder()
.withChunkSize(800)
.withChunkOverlap(100)
.withTopK(5)
.build()

await rag.addDocument(text: """
Ada Lovelace collaborated with Charles Babbage on the Analytical Engine,
an early mechanical general-purpose computer.
""")

try await rag.build() // chunk → extract → embed → index
let answer = try await rag.ask("Who worked on the Analytical Engine?")
print(answer.text)
print(answer.sources) // grounding chunk ids
```

### Using a local LLM (Ollama)

```swift
let rag = try GraphRAGBuilder()
.withLocalDefaults() // Ollama chat + embeddings
.build()
```

With Ollama enabled, entity/relationship extraction uses the LLM extraction
prompt, and `ask` synthesizes a natural-language answer from the retrieved
context. Without it, extraction is pattern-based and `ask` returns an extractive
summary of the top chunks.

## What's included

| Area | Types |
| --- | --- |
| Core model | `Document`, `TextChunk`, `Entity`, `Relationship`, `EntityMention`, typed IDs, `GraphRAGError` |
| Abstractions | `LanguageModel`, `EmbeddingModel`, `EntityExtracting`, `ChunkingStrategy` |
| Text | `HierarchicalChunker`, `TextProcessor`, `TfIdfKeywordExtractor` |
| Graph | `KnowledgeGraph`, `PageRank`, `GraphTraversal` (BFS/DFS/ego/paths), `GraphAnalytics` (degree/closeness/betweenness/components) |
| Retrieval | `BM25Retriever`, `InMemoryVectorStore` (cosine), `HybridRetriever` (RRF / weighted / CombSUM / MaxScore fusion) |
| Extraction | `PatternEntityExtractor`, `LLMEntityExtractor`, `Prompts` |
| Embeddings | `HashEmbedder` (offline, deterministic), `OllamaEmbedder` |
| LLM | `OllamaClient` |
| Orchestration | `GraphRAG` (actor), `GraphRAGBuilder`, `Config` |

## Design notes / port fidelity

- **Defaults match the Rust crate**: PageRank damping `0.85` / tolerance `1e-6`,
BM25 `k1 = 1.2`, `b = 0.75`, hybrid `RRF k = 60`, semantic/keyword weights
`0.7 / 0.3`, traversal `maxDepth = 3`, min relationship strength `0.5`, etc.
- **Concurrency**: `GraphRAG` is an `actor`; backends are `Sendable` existentials
(`any EmbeddingModel`, `any LanguageModel`, `any EntityExtracting`). Builds
cleanly under Swift 6 strict concurrency.
- **Unicode safety**: the Rust chunker works on UTF-8 byte offsets guarded by
`is_char_boundary`. This port operates on `Character` (grapheme) arrays, which
are always valid boundaries; sizes and offsets are measured in characters.
- **Scope**: this is the portable core pipeline. The Rust workspace's
server/WASM/CLI crates and heavier optional subsystems (LightRAG, ROGRAG,
Leiden communities, distributed caching, persistence backends) are out of
scope for this port.

## Testing

```bash
swift test
```

The suite covers chunking, keyword extraction, BM25 ranking, cosine/vector
search, the knowledge graph, PageRank, traversal, analytics, pattern extraction,
and the end-to-end offline build/ask pipeline.
85 changes: 85 additions & 0 deletions Sources/GraphRAG/Core/Error.swift
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
// Error.swift
// Ported from graphrag-rs `core::error::GraphRAGError`.

import Foundation

/// The unified error type for every fallible GraphRAG operation.
///
/// Mirrors the variants of the Rust `GraphRAGError` enum. Each case carries a
/// human-readable message (and, where relevant, structured fields) so callers
/// can pattern-match or surface a description.
public enum GraphRAGError: Error, Sendable, CustomStringConvertible {
case config(message: String)
case notInitialized
case noDocuments
case io(message: String)
case http(message: String)
case json(message: String)
case textProcessing(message: String)
case graphConstruction(message: String)
case vectorSearch(message: String)
case entityExtraction(message: String)
case retrieval(message: String)
case generation(message: String)
case functionCall(message: String)
case storage(message: String)
case embedding(message: String)
case languageModel(message: String)
case parallel(message: String)
case serialization(message: String)
case validation(message: String)
case network(message: String)
case auth(message: String)
case notFound(resource: String, id: String)
case alreadyExists(resource: String, id: String)
case timeout(operation: String, seconds: Double)
case resourceLimit(resource: String, limit: Int)
case dataCorruption(message: String)
case unsupported(operation: String, reason: String)
case rateLimit(message: String)
case conflictResolution(message: String)
case incrementalUpdate(message: String)

public var description: String {
switch self {
case .config(let m): return "Configuration error: \(m)"
case .notInitialized: return "GraphRAG system is not initialized"
case .noDocuments: return "No documents have been added"
case .io(let m): return "I/O error: \(m)"
case .http(let m): return "HTTP error: \(m)"
case .json(let m): return "JSON error: \(m)"
case .textProcessing(let m): return "Text processing error: \(m)"
case .graphConstruction(let m): return "Graph construction error: \(m)"
case .vectorSearch(let m): return "Vector search error: \(m)"
case .entityExtraction(let m): return "Entity extraction error: \(m)"
case .retrieval(let m): return "Retrieval error: \(m)"
case .generation(let m): return "Generation error: \(m)"
case .functionCall(let m): return "Function call error: \(m)"
case .storage(let m): return "Storage error: \(m)"
case .embedding(let m): return "Embedding error: \(m)"
case .languageModel(let m): return "Language model error: \(m)"
case .parallel(let m): return "Parallel processing error: \(m)"
case .serialization(let m): return "Serialization error: \(m)"
case .validation(let m): return "Validation error: \(m)"
case .network(let m): return "Network error: \(m)"
case .auth(let m): return "Authentication error: \(m)"
case .notFound(let resource, let id):
return "\(resource) not found: \(id)"
case .alreadyExists(let resource, let id):
return "\(resource) already exists: \(id)"
case .timeout(let operation, let seconds):
return "Operation '\(operation)' timed out after \(seconds)s"
case .resourceLimit(let resource, let limit):
return "Resource limit exceeded for \(resource): \(limit)"
case .dataCorruption(let m): return "Data corruption: \(m)"
case .unsupported(let operation, let reason):
return "Unsupported operation '\(operation)': \(reason)"
case .rateLimit(let m): return "Rate limit exceeded: \(m)"
case .conflictResolution(let m): return "Conflict resolution error: \(m)"
case .incrementalUpdate(let m): return "Incremental update error: \(m)"
}
}
}

/// Convenience matching the Rust `pub type Result<T> = ...` alias.
public typealias GraphRAGResult<T> = Swift.Result<T, GraphRAGError>
39 changes: 39 additions & 0 deletions Sources/GraphRAG/Core/Identifiers.swift
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
// Identifiers.swift
// Strongly-typed identifier wrappers, ported from graphrag-rs `core::DocumentId`,
// `core::EntityId` and `core::ChunkId`.

/// Stable identifier for a `Document`.
public struct DocumentID: Hashable, Codable, Sendable, CustomStringConvertible,
ExpressibleByStringLiteral
{
public var raw: String

public init(_ raw: String) { self.raw = raw }
public init(stringLiteral value: String) { self.raw = value }

public var description: String { raw }
}

/// Stable identifier for an `Entity`.
public struct EntityID: Hashable, Codable, Sendable, CustomStringConvertible,
ExpressibleByStringLiteral
{
public var raw: String

public init(_ raw: String) { self.raw = raw }
public init(stringLiteral value: String) { self.raw = value }

public var description: String { raw }
}

/// Stable identifier for a `TextChunk`.
public struct ChunkID: Hashable, Codable, Sendable, CustomStringConvertible,
ExpressibleByStringLiteral
{
public var raw: String

public init(_ raw: String) { self.raw = raw }
public init(stringLiteral value: String) { self.raw = value }

public var description: String { raw }
}
Loading
Loading