Skip to content

🎢 Using the Discogs database export for local graph exploration. 🎢

License

Notifications You must be signed in to change notification settings

SimplicityGuy/discogsography

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

646 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🎡 Discogsography

Build Code Quality Tests E2E Tests codecov License: MIT Python 3.13+ Rust uv just Ruff Cargo Clippy pre-commit mypy Bandit Docker

A modern Python 3.13+ microservices platform for transforming the complete Discogs music database into powerful, queryable knowledge graphs and analytics engines.

πŸš€ Quick Start | πŸ“– Documentation | 🎯 Features | πŸ’¬ Community

🎯 What is Discogsography?

Discogsography transforms monthly Discogs data dumps (50GB+ compressed XML) into:

  • πŸ”— Neo4j Graph Database: Navigate complex music industry relationships
  • 🐘 PostgreSQL Database: High-performance queries and full-text search
  • πŸ” Interactive Explorer: Graph visualisation, trends, and path discovery
  • πŸ“Š Real-time Dashboard: Monitor system health and processing metrics

Perfect for music researchers, data scientists, developers, and music enthusiasts who want to explore the world's largest music database.

πŸ›οΈ Architecture Overview

βš™οΈ Core Services

Service Purpose Key Technologies
πŸ” API User accounts and JWT authentication FastAPI, psycopg3, redis, Discogs OAuth 1.0
πŸ—‚οΈ Curator Background collection & wantlist sync FastAPI, psycopg3, neo4j-driver
πŸ“Š Dashboard Real-time system monitoring FastAPI, WebSocket, reactive UI
πŸ” Explore Serves graph exploration frontend (static files) FastAPI, D3.js, Plotly.js
⚑ Extractor High-performance Rust-based extractor tokio, quick-xml, lapin
πŸ”— Graphinator Builds Neo4j knowledge graphs neo4j-driver, graph algorithms
πŸ”§ Schema-Init One-shot database schema initializer neo4j-driver, psycopg3
🐘 Tableinator Creates PostgreSQL analytics tables psycopg3, JSONB, full-text search

πŸ“ System Architecture

graph TD
    S3[("🌐 Discogs S3<br/>Monthly Data Dumps<br/>~50GB XML")]
    SCHEMA[["πŸ”§ Schema-Init<br/>One-shot DDL<br/>Initialiser"]]
    EXT[["⚑ Extractor<br/>High-Performance<br/>XML Processing"]]
    RMQ{{"🐰 RabbitMQ 4.x<br/>Message Broker<br/>8 Queues + DLQs"}}
    NEO4J[("πŸ”— Neo4j 2026<br/>Graph Database<br/>Relationships")]
    PG[("🐘 PostgreSQL 18<br/>Analytics DB<br/>Full-text Search")]
    REDIS[("πŸ”΄ Redis<br/>Cache Layer<br/>Query Cache")]
    GRAPH[["πŸ”— Graphinator<br/>Graph Builder"]]
    TABLE[["🐘 Tableinator<br/>Table Builder"]]
    DASH[["πŸ“Š Dashboard<br/>Real-time Monitor<br/>WebSocket"]]
    EXPLORE[["πŸ” Explore<br/>Graph Explorer<br/>Trends & Paths"]]
    API[["πŸ” API<br/>User Auth<br/>JWT & OAuth"]]
    CURATOR[["πŸ—‚οΈ Curator<br/>Collection<br/>Sync"]]

    SCHEMA -->|0. Create Indexes & Constraints| NEO4J
    SCHEMA -->|0. Create Tables & Indexes| PG
    S3 -->|1. Download & Parse| EXT
    EXT -->|2. Publish Messages| RMQ
    RMQ -->|3a. Artists/Labels/Releases/Masters| GRAPH
    RMQ -->|3b. Artists/Labels/Releases/Masters| TABLE
    GRAPH -->|4a. Build Graph| NEO4J
    TABLE -->|4b. Store Data| PG

    EXPLORE -.->|Health Check| NEO4J

    API -.->|User Accounts| PG
    API -.->|Graph Queries| NEO4J
    API -.->|OAuth State| REDIS

    CURATOR -.->|Sync Collections| NEO4J
    CURATOR -.->|Sync History| PG

    DASH -.->|Monitor| EXT
    DASH -.->|Monitor| GRAPH
    DASH -.->|Monitor| TABLE
    DASH -.->|Monitor| EXPLORE
    DASH -.->|Cache| REDIS
    DASH -.->|Stats| RMQ
    DASH -.->|Stats| NEO4J
    DASH -.->|Stats| PG

    style S3 fill:#e1f5fe,stroke:#01579b,stroke-width:2px
    style SCHEMA fill:#f9fbe7,stroke:#827717,stroke-width:2px
    style EXT fill:#ffccbc,stroke:#d84315,stroke-width:2px
    style RMQ fill:#fff3e0,stroke:#e65100,stroke-width:2px
    style NEO4J fill:#f3e5f5,stroke:#4a148c,stroke-width:2px
    style PG fill:#e8f5e9,stroke:#1b5e20,stroke-width:2px
    style REDIS fill:#ffebee,stroke:#b71c1c,stroke-width:2px
    style GRAPH fill:#e0f2f1,stroke:#004d40,stroke-width:2px
    style TABLE fill:#fce4ec,stroke:#880e4f,stroke-width:2px
    style DASH fill:#fce4ec,stroke:#880e4f,stroke-width:2px
    style EXPLORE fill:#e8eaf6,stroke:#283593,stroke-width:2px
    style API fill:#e3f2fd,stroke:#0d47a1,stroke-width:2px
    style CURATOR fill:#fff8e1,stroke:#f57f17,stroke-width:2px
Loading

🌟 Key Features

  • ⚑ High-Speed Processing: 5,000–10,000 records/second XML parsing with Rust-based extractor
  • πŸ”„ Smart Deduplication: SHA256 hash-based change detection prevents reprocessing
  • πŸ“ˆ Handles Big Data: Processes 15M+ releases, 2M+ artists across ~50GB compressed XML
  • πŸ” Auto-Recovery: Automatic retries with exponential backoff and dead letter queues
  • πŸ‹ Container Security: Non-root users, read-only filesystems, dropped capabilities
  • πŸ“ Type Safety: Full type hints with strict mypy validation and Bandit security scanning
  • βœ… Comprehensive Testing: Unit, integration, and E2E tests with Playwright

πŸš€ Quick Start

# Clone and start all services
git clone https://github.com/SimplicityGuy/discogsography.git
cd discogsography
docker-compose up -d

# Access the dashboard
open http://localhost:8003
Service URL Default Credentials
πŸ” API http://localhost:8004 Register via /api/auth/register
πŸ“Š Dashboard http://localhost:8003 None
πŸ”— Neo4j http://localhost:7474 neo4j / discogsography
🐘 PostgreSQL localhost:5433 discogsography / discogsography
🐰 RabbitMQ http://localhost:15672 discogsography / discogsography

See the Quick Start Guide for prerequisites, local development setup, and environment configuration.

πŸ“– Documentation

πŸš€ Getting Started

Document Purpose
Quick Start Guide ⚑ Get Discogsography running in minutes
Configuration Guide βš™οΈ Complete environment variable and settings reference
Architecture Overview πŸ›οΈ System architecture, components, data flow, and scale
CLAUDE.md πŸ€– Claude Code integration guide & development standards

πŸ’‘ Usage & Data

Document Purpose
Usage Examples πŸ’‘ Neo4j Cypher and PostgreSQL query examples
Database Schema πŸ—„οΈ Complete Neo4j graph model and PostgreSQL schema
Monitoring Guide πŸ“Š Real-time dashboard, metrics, and debug utilities

πŸ‘¨β€πŸ’» Development

Document Purpose
Development Guide πŸ’» Project structure, tooling, and developer workflow
Testing Guide πŸ§ͺ Unit, integration, and E2E testing with Playwright
Logging Guide πŸ“Š Structured logging standards and emoji conventions
Contributing Guide 🀝 How to contribute: process, standards, and PR flow
Python Version Management 🐍 Managing Python 3.13+ across the project

πŸ”§ Operations

Document Purpose
Troubleshooting Guide πŸ”§ Common issues, solutions, and debugging steps
Maintenance Guide πŸ”„ Package upgrades, dependency management
Performance Guide ⚑ Database tuning, hardware specs, optimization
Performance Benchmarks πŸ“ˆ Processing rates and tuning results
Database Resilience πŸ’Ύ Database connection patterns & error handling

πŸ‹ Infrastructure & CI/CD

Document Purpose
Dockerfile Standards πŸ‹ Best practices for writing Dockerfiles
Docker Security πŸ”’ Container hardening & security practices
GitHub Actions Guide πŸš€ CI/CD workflows, automation & best practices
Task Automation βš™οΈ Complete just and uv run task command reference
Monorepo Guide πŸ“¦ Managing Python monorepo with shared dependencies

πŸ“‹ Reference

Document Purpose
State Marker System πŸ“‹ Extraction progress tracking & safe restart system
State Marker Periodic Updates πŸ’Ύ Periodic state saves and crash recovery
Consumer Cancellation πŸ”„ File completion and consumer lifecycle management
File Completion Tracking πŸ“Š Intelligent completion tracking and stall detection
Neo4j Indexing πŸ”— Advanced Neo4j indexing strategies
Platform Targeting 🎯 Cross-platform compatibility guidelines
Emoji Guide πŸ“‹ Standardized emoji usage across the project
Recent Improvements πŸš€ Latest platform enhancements and changelog

πŸ’¬ Support & Community

πŸ“„ License

This project is licensed under the MIT License β€” see the LICENSE file for details.

πŸ™ Acknowledgments

  • 🎡 Discogs for providing the monthly data dumps
  • πŸš€ uv for blazing-fast package management
  • πŸ”₯ Ruff for lightning-fast linting
  • 🐍 The Python community for excellent libraries and tools

Made with ❀️ by the Discogsography community

Sponsor this project

Contributors 5