Skip to content

Remove dead exception handler in index_collections subprocess call#28

Draft
Copilot wants to merge 14 commits intoindex_creatorsfrom
copilot/sub-pr-8
Draft

Remove dead exception handler in index_collections subprocess call#28
Copilot wants to merge 14 commits intoindex_creatorsfrom
copilot/sub-pr-8

Conversation

Copy link
Contributor

Copilot AI commented Feb 26, 2026

Addresses feedback on #8 regarding unreachable exception handler in index_collections method.

Changes

  • Removed dead except subprocess.CalledProcessError handler: The handler was unreachable because subprocess.run() was called without check=True. Error handling already exists via result.returncode checking.

  • Fixed subprocess invocation: Changed from passing joined string to passing command as list directly. This was broken after shell=True was removed in commit 6c0942d, and the list form is more secure.

Before:

cmd_string = ' '.join(cmd)
result = subprocess.run(
    cmd_string,  # String without shell=True fails
    cwd=self.arclight_dir,
    ...
)

After:

result = subprocess.run(
    cmd,  # List form, no shell needed
    cwd=self.arclight_dir,
    ...
)

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

alexdryden and others added 12 commits February 26, 2026 14:09
Implement complete ETL pipeline for ArchivesSpace agents:
- Extract all agent records via ArchivesSpace API
- Generate EAC-CPF XML documents for each agent
- Auto-discover and configure traject indexing
- Batch index to Solr (100 files per call for performance)
- Support multiple processing modes (agents-only, collections-only, both)
- Add 11 new Solr fields for agent metadata
- Include 271-line traject config for EAC-CPF → Solr mapping

Key features:
- Parallel to existing collection record indexing
- Dynamic Solr field mapping for ArcLight compatibility
- Robust error handling and logging
- Configurable traject config discovery paths

This allows ArcLight to provide dedicated agent/creator pages with
full biographical information, related collections, and authority control.
Replace per-agent API calls with single Solr query for better performance:
- Query ArchivesSpace Solr to filter agents in bulk
- Exclude system users (publish=false)
- Exclude donors (linked_agent_role includes "dnr")
- Exclude software agents (agent_type="agent_software")
- Use consistent EAC namespace prefixes in XPath queries
- Refactor dates extraction for improved readability

Performance improvement: O(n) API calls → O(1) Solr query
Reduces processing time from minutes to seconds for large repositories.
…C-CPF indexing (#13)

* Skip indexing records without valid IDs instead of generating non-deterministic fallbacks

Co-authored-by: alexdryden <47127862+alexdryden@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: alexdryden <47127862+alexdryden@users.noreply.github.com>
Co-authored-by: Alex Dryden <adryden3@illinois.edu>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
to reflect the required command line arguments

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copilot AI and others added 2 commits February 26, 2026 22:16
Co-authored-by: alexdryden <47127862+alexdryden@users.noreply.github.com>
Co-authored-by: alexdryden <47127862+alexdryden@users.noreply.github.com>
Copilot AI changed the title [WIP] Update creator record generation and indexing based on feedback Remove dead exception handler in index_collections subprocess call Feb 26, 2026
@alexdryden alexdryden force-pushed the index_creators branch 2 times, most recently from f23fe83 to 89057a9 Compare March 4, 2026 23:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants