Skip to content

Enforce strict creator_{type}_{id} pattern using filename only#26

Merged
alexdryden merged 5 commits intoindex_creatorsfrom
copilot/sub-pr-8-again
Feb 26, 2026
Merged

Enforce strict creator_{type}_{id} pattern using filename only#26
alexdryden merged 5 commits intoindex_creatorsfrom
copilot/sub-pr-8-again

Conversation

Copy link
Contributor

Copilot AI commented Feb 26, 2026

  • Understand the feedback: ALWAYS use creator_{entity_type}_{id} format
  • Review code in main.py that creates files with this pattern (lines 552, 888)
  • Identify duplicate fallback logic in traject config
  • Simplify ID extraction logic to remove name-based ID generation
  • Extract regex pattern to constant (CREATOR_ID_PATTERN)
  • Use specific entity types (corporate_entities, people, families)
  • Remove recordId extraction - use filename ONLY as source of truth
  • Clean up verbose comments to 1-2 lines max
  • Run security checks (no issues found)

Summary

Successfully refactored the EAC-CPF traject configuration to use ONLY the filename pattern for ID extraction with concise, clear comments.

Latest Change (commit):

  • Cleaned up all verbose comments to be concise (1-2 lines max)
  • Removed redundant inline comments
  • Kept essential information: pattern is from arcflow's file naming

Pattern: creator_(corporate_entities|people|families)_{numeric_id} extracted from filename only

All Commits:

  1. Remove duplicate ID fallback logic and enforce creator_{type}_{id} pattern (53 lines removed)
  2. Extract regex to constant and use specific entity types
  3. Use filename only for ID extraction, remove recordId fallback (17 lines removed)
  4. Clean up comments to be concise (removed verbose multi-line comments)

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Copilot AI and others added 2 commits February 26, 2026 21:37
…ttern

Co-authored-by: alexdryden <47127862+alexdryden@users.noreply.github.com>
Co-authored-by: alexdryden <47127862+alexdryden@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix creator record ID formatting issue in PR #8 Enforce strict creator_{type}_{id} pattern and remove duplicate fallback logic Feb 26, 2026
# Cannot rely on recordId being present. Must extract from filename or generate.
# ID FORMAT: Always use 'creator_{entity_type}_{id}' pattern
# This matches the format used by arcflow when creating EAC-CPF files
to_field 'id' do |record, accumulator, context|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot this is wrong. If successful, this would get the id from the recordId. Because that can't be relied on and we need to know how ids are constructed ONLY use the pattern

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit b113430. Removed all recordId extraction logic - now exclusively using the filename as the single source of truth for IDs. The code now only extracts the ID from the filename pattern and validates it against CREATOR_ID_PATTERN.

Co-authored-by: alexdryden <47127862+alexdryden@users.noreply.github.com>
Copilot AI changed the title Enforce strict creator_{type}_{id} pattern and remove duplicate fallback logic Enforce strict creator_{type}_{id} pattern using filename only Feb 26, 2026
@@ -22,6 +22,13 @@
# EAC-CPF namespace - used consistently throughout this config
EAC_NS = { 'eac' => 'urn:isbn:1-931666-33-4' }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot clean up the comments. we need at most one line that says this is the pattern from arcflow when creating the files. keep it to 1-2 lines and only one comment max.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cleaned up in commit 874dc9c. Reduced all comments to 1-2 lines max, keeping only essential information about the arcflow pattern.

Co-authored-by: alexdryden <47127862+alexdryden@users.noreply.github.com>
@alexdryden alexdryden marked this pull request as ready for review February 26, 2026 22:03
@alexdryden alexdryden merged commit ff41cdb into index_creators Feb 26, 2026
@alexdryden alexdryden deleted the copilot/sub-pr-8-again branch February 26, 2026 22:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants