Skip to content

Conversation

@dimitri-yatsenko
Copy link
Member

Summary

Improve clarity of the builtin_codecs.py module docstring by explicitly listing dual-mode codecs with both their inline and external forms.

Changes

File: src/datajoint/builtin_codecs.py (lines 8-16)

Updated the module-level docstring to:

  • List <blob> and <blob@> separately (was: combined as one entry)
  • List <attach> and <attach@> separately (was: combined as one entry)
  • Fix <object> to <object@> (external-only, no inline mode)
  • Fix <hash> to <hash@> (external-only, no inline mode)
  • Correct hash algorithm from SHA256 to MD5 (matches implementation)
  • Add clear storage mode indicators: "(in-table storage)", "(external only)", etc.

Before:

Built-in Codecs:
    - ``<blob>``: Serialize Python objects (internal) or external with dedup
    - ``<hash>``: Hash-addressed storage with SHA256 deduplication
    - ``<object>``: Schema-addressed storage for files/folders (Zarr, HDF5)
    - ``<attach>``: File attachment (internal) or external with dedup
    - ``<filepath@store>``: Reference to existing file in store
    - ``<npy@>``: Store numpy arrays as portable .npy files (external only)

After:

Built-in Codecs:
    - ``<blob>``: Serialize Python objects (in-table storage)
    - ``<blob@>``: Serialize Python objects (external with hash-addressed dedup)
    - ``<attach>``: File attachment (in-table storage)
    - ``<attach@>``: File attachment (external with hash-addressed dedup)
    - ``<hash@>``: Hash-addressed storage with MD5 deduplication (external only)
    - ``<object@>``: Schema-addressed storage for files/folders (external only)
    - ``<npy@>``: Store numpy arrays as portable .npy files (external only)
    - ``<filepath@store>``: Reference to existing file in store (external only)

Motivation

The original docstring was ambiguous about which codecs support both inline and external storage modes. This caused confusion when:

  • Users tried to use <object> without @ (not supported)
  • Developers creating custom codecs (like Davis's ZarrCodec) weren't sure whether ObjectCodec was meant to be dual-mode

By explicitly listing both forms, it's now immediately clear that:

  • <blob> and <attach> support both inline and external storage
  • <hash@>, <object@>, <npy@>, <filepath@> are external-only

Related

Replace deprecated 'external storage' terminology with canonical terms:
- 'object storage' for general concept
- 'in-store storage' for @ modifier specifics
- 'in-table storage' for database storage

Changes:
- builtin_codecs.py: Update BlobCodec, AttachCodec, HashCodec docstrings
  * 'internal/external' → 'in-table/in-store'
  * Update examples and get_dtype() docstrings
- settings.py: Update StoresSettings docstrings
- gc.py: Update module docstring and format_stats()
- expression.py: Update to_dicts() docstring
- heading.py, codecs.py, declare.py: Update internal comments
- migrate.py: Add note explaining use of legacy terminology

Ref: TERMINOLOGY.md, DOCSTRING_TERMINOLOGY_REPORT.md
Replace deprecated SQL-derived terms with accurate DataJoint terminology:
- 'semijoin/antijoin' → 'restriction/anti-restriction'
- Clarify that A & B restricts A (does not join attributes)

Changes in source code comments:
- expression.py:1081: 'antijoin' → 'anti-restriction'
- condition.py:296: '(semijoin/antijoin)' → 'for restriction'
- condition.py:401: '(aka semijoin and antijoin)' → removed

Rationale: In relational algebra, joins combine attributes from both operands.
DataJoint's A & B restricts A to matching entities—no attributes from B appear
in the result. This is fundamentally restriction, not a join operation.
- List <blob> and <blob@> separately to show both inline and external modes
- List <attach> and <attach@> separately to show both modes
- Change <hash> to <hash@> (external only)
- Change <object> to <object@> (external only)
- Clarify storage mode for each codec variant
- Also corrected hash algorithm from SHA256 to MD5

This makes it clear which codecs support dual modes vs external-only.
@github-actions github-actions bot added enhancement Indicates new improvements feature Indicates new features labels Jan 16, 2026
@dimitri-yatsenko dimitri-yatsenko merged commit 984a9be into pre/v2.0 Jan 16, 2026
7 of 8 checks passed
@dimitri-yatsenko dimitri-yatsenko deleted the feature/unified-stores-config branch January 16, 2026 23:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement Indicates new improvements feature Indicates new features

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants