Enhance deferred fields for production ETL reliability#168
Enhance deferred fields for production ETL reliability#168
Conversation
- Fix deferred-fields matching to handle both 'field' and 'field/id' formats - Add XML-ID resolution for non-self-referencing deferred fields (e.g., responsible_id) - Support binary field deferral for image imports (e.g., image_1920) - Fix batch rejection to not inherit same error message for all records - Extract per-row errors from Odoo's response when available - Fall back to individual processing when batch has multiple failures - Add --company-id CLI parameter for multicompany imports - Sets allowed_company_ids and force_company in context - Add _extract_per_row_errors helper for parsing Odoo's error messages - Add _resolve_external_id_for_pass2 helper for XML-ID resolution These changes address critical issues with: - Deferred fields not working in fail mode - All batch records inheriting the same failure reason - Cross-company field references causing import failures - Large image imports overwhelming the server 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Convert deferred_fields CLI parameter from comma-separated string to list - Fix ignore_list filtering in Pass 1 to handle both 'field' and 'field/id' formats - Normalize ignore_set to strip '/id' suffix before matching column names - Verified working with local Odoo 18 instance: - Pass 1 correctly excludes deferred fields from initial import - Pass 2 successfully resolves XML-IDs and updates records 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Summary of ChangesHello @bosd, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the 'deferred-fields' feature within the Odoo data flow tool, making it more robust and reliable for production ETL operations. The changes focus on improving data integrity, error handling, and usability, particularly for complex scenarios involving cross-model references, binary data, and multicompany setups. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request significantly enhances the "deferred-fields" feature, improving its reliability for production ETL workloads through better field matching, external XML-ID resolution, granular error handling, and binary field support. However, a medium-severity vulnerability was identified in the error handling of the new XML-ID resolution logic, which could lead to incorrect data relationships and potential access control risks under specific failure conditions. Additionally, feedback includes suggestions for moving an import to the top level for style consistency, simplifying a complex connection probing block, and refining exception handling for better specificity and robustness.
| except Exception: # noqa: S112 | ||
| continue |
There was a problem hiding this comment.
The broad except Exception: statement within the ID resolution loop is a medium-severity vulnerability. It can lead to incorrect record linking and potential access control issues by silently catching transient errors (e.g., network issues, DB locks) and proceeding with less specific variations. This could result in an external ID resolving to the wrong database record, linking data to an incorrect owner or parent. Beyond the security implications, broad exception handling also hides unexpected bugs and makes debugging difficult. It is recommended to catch more specific exceptions, such as OdooError, KeyError, ValueError, or IndexError, and only catch 'not found' errors, allowing other exceptions to fail loudly to prevent incorrect data mapping and improve debuggability.
| Returns: | ||
| A dictionary mapping row indices (0-based) to error messages. | ||
| """ | ||
| import re |
| conn = None | ||
| for attr in ["connection", "client", "_connection", "_client"]: | ||
| try: | ||
| val = getattr(model_obj, attr, None) | ||
| if val and not callable(val): | ||
| conn = val | ||
| break | ||
| elif val and callable(val) and hasattr(val, "get_model"): | ||
| conn = val | ||
| break | ||
| except Exception: # noqa: S112 | ||
| continue | ||
|
|
||
| if conn: | ||
| for method_name in ["model", "get_model"]: | ||
| if hasattr(conn, method_name): | ||
| try: | ||
| method = getattr(conn, method_name) | ||
| ir_model_data_proxy = method("ir.model.data") | ||
| if ir_model_data_proxy: | ||
| break | ||
| except Exception: # noqa: S112 | ||
| continue |
There was a problem hiding this comment.
This block for retrieving the ir.model.data proxy is overly complex and relies on probing several private attributes, which is fragile and can break with library updates. Since odoolib model objects typically store a reference to their connection, you can simplify this logic significantly.
A more direct approach is to access the connection object and call get_model on it. This is more readable, maintainable, and robust.
conn = getattr(model_obj, "_connection", getattr(model_obj, "connection", None))
if conn and hasattr(conn, "get_model"):
try:
ir_model_data_proxy = conn.get_model("ir.model.data")
except Exception: # noqa: S112
passAdds --auto-defer CLI flag that automatically defers all non-required many2one fields to Pass 2. This enables progressive import where records are created first and relational fields are populated afterwards. Required many2one fields are NOT deferred as they must succeed in Pass 1. Usage: odoo-data-flow import --auto-defer --file data.csv --model res.partner
When records are created using the create() method (in fail mode or when load() falls back to create()), XML IDs were not being persisted to ir.model.data. This caused XML IDs to be missing after import. Added _create_xmlid_entry() helper function that: - Parses module and name from XML ID (uses __import__ for IDs without prefix) - Creates or updates ir.model.data entry for each created record - Handles edge cases like existing entries with different res_id This ensures XML IDs are properly persisted regardless of whether records are created via load() or create().
…acks Added new CLI options for better control over import behavior: --on-missing-ref: Handle missing references per field - create: auto-create via name_create - skip: skip row (default) - empty: set field to False --auto-create-refs: Auto-create all missing m2o references --set-empty-on-missing: Set fields to empty on missing refs --fallback-values: Default values for invalid selection/boolean fields --tracking-disable/--tracking-enable: Control mail tracking (default: disabled) --defer-parent-store: Defer parent store computation for hierarchies These options map to Odoo's native import context parameters: - name_create_enabled_fields - import_set_empty_fields - fallback_values - defer_parent_store_computation
Performance optimizations: - Remove hard-coded 4-thread connection cap in RpcThread Users can now specify higher --worker values based on server capacity - Add LRU cache (100k entries) to to_xmlid() function Significantly speeds up repeated XML ID sanitizations - Pre-calculate column filter indices before batch loop Ignore set and indices now computed once per batch, not per chunk 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add protocol selection to import and export commands: - --protocol option: xmlrpc, xmlrpcs, jsonrpc, jsonrpcs, json2, json2s - Can also set protocol in connection config file - JSON-RPC recommended for Odoo 10-18 (~30% faster than XML-RPC) - JSON-2 supported for Odoo 19+ (requires API key) Protocol is passed through odoolib which handles the actual connection. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The --ignore CLI option was not being converted from a comma-separated string to a list before being passed to run_import(), causing a TypeError when concatenating with deferred_fields list. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Configuration guide: - Document all protocol options (xmlrpc, jsonrpc, json2) - Add JSON-RPC performance recommendation for Odoo 10+ - Document JSON-2 API for Odoo 19+ with API key requirements - Add CLI --protocol override example Performance tuning guide: - Add new "Choosing the Right Protocol" section - Add protocol comparison table - Add worker tuning section with db_maxconn formula - Add warnings about connection pool exhaustion 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Verify that import correctly preserves: - Unicode characters (Japanese, Chinese, Korean, emojis) - Multiline values in text fields - Tab characters - Quoted strings 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add batch_delay parameter to control the pause between batch submissions during imports. This helps prevent server overload and 503 errors when importing large datasets. - Add --delay CLI option (default: 0, recommended: 0.5-2.0 for busy servers) - Propagate batch_delay through import_data and _orchestrate_pass_1 - Add delay between batch submissions in _run_threaded_pass - Fix Python 3.14 compatibility for ValueError message format in test 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When the server returns 502/503 errors indicating overload, the importer now automatically: - Detects server overload conditions (502, 503, service unavailable) - Adds increasing delays (up to 10 seconds) between batch submissions - Gradually reduces the delay after successful batches - Combines with user-specified --delay for total throttling This helps prevent overwhelming busy servers and allows imports to complete even under high load conditions. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The progress bar was shifting because the RichHandler and Progress bar use separate Console instances that compete for stdout. Added a context manager `suppress_console_handler()` that temporarily disables the RichHandler while a Progress bar is active. Applied to all Progress bars in: - import_threaded.py - export_threaded.py - write_threaded.py - importer.py 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Exclude mapper.py (callable objects break introspection) - Add write_threaded.py and tools.py to compilation - Add usage documentation to setup.py docstring - Add *.so to .gitignore To build with mypyc: ODF_COMPILE_MYPYC=1 python setup.py build_ext --inplace 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add comprehensive tests for _extract_per_row_errors function - Add tests for _filter_ignored_columns edge cases - Add tests for _execute_write_batch success and failure paths - Add tests for _execute_load_batch force_create, timeout, and pool errors - Add tests for _format_odoo_error dict extraction - Add tests for _create_batch_individually error handling - Add tests for import_data with dict config - Add tests for relational_import derivation and query functions - Add tests for O2M tuple import edge cases - Add tests for write tuple import edge cases Coverage improved from 80.65% to 85.28% 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implements streaming CSV processing that reads and processes data in batches without loading the entire file into memory: - Add _stream_csv_batches() generator that yields batches directly from file - Add _count_csv_rows() for progress bar initialization - Add _orchestrate_streaming_pass_1() for streaming import orchestration - Add --stream CLI flag for enabling streaming mode - Automatic fallback to standard mode when incompatible options are used (o2m, groupby, deferred_fields, force_create) Streaming mode is ideal for very large CSV files where memory is a concern. When enabled, the importer processes batches as they are read from disk, significantly reducing peak memory usage. Usage: odoo-data-flow import conn.conf data.csv --model res.partner --stream 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Checkpoint/Resume Support: - Add checkpoint module for saving/restoring import progress - Save checkpoint after Pass 1 completes with id_map - Resume from checkpoint if Pass 1 was already completed - Delete checkpoint on successful completion - File hash check prevents resuming if data file changed - CLI options: --resume/--no-resume, --no-checkpoint Multi-Company Support: - Add --all-companies flag to auto-set allowed_company_ids - Fetches user's company_ids and sets context automatically - Mimics Odoo web UI behavior for cross-company imports Bug Fixes: - Fix Pass 2 failures not being written to fail file - Use sanitized IDs in source_data_map to match id_map keys 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add --dry-run option to validate CSV data before importing: - Checks required fields are populated - Validates selection field values against allowed values - Verifies relational references exist in Odoo - Displays formatted validation results with error summary New validation module: - ValidationError and ValidationResult dataclasses - Reference checking for both external IDs and database IDs - Caching of reference lookups for performance - Formatted output with rich panels Usage: odoo-data-flow import --dry-run --file data.csv --model res.partner 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add --check-refs option to verify relational references before import: - Scans CSV for all many2one/many2many references - Batch-checks external IDs and database IDs against Odoo - Reports missing references with examples Options: - --check-refs=fail: Abort import if references missing (strict mode) - --check-refs=warn: Show warning but continue (default) - --check-refs=skip: Skip the reference check entirely This helps catch missing reference data early, avoiding partial imports that fail mid-way through processing. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add intelligent error categorization and retry strategies: Error Categories: - Transient: Timeouts, 502/503, deadlocks, connection pool - will retry - Permanent: Constraint violations, access denied - fail immediately - Recoverable: Missing references, company issues - suggest alternatives Features: - Exponential backoff with configurable base delay and max delay - Jitter to prevent thundering herd effect - Retry statistics tracking - Helper functions for retry decisions - Recommendations for error handling Usage: - categorize_error(error) -> (ErrorCategory, pattern) - retry_with_backoff(func, config, stats) -> (result, error) - get_retry_recommendation(error) -> dict with action/message 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add functionality for skip-unchanged record detection: Features: - Normalize values for comparison (handles False, empty strings, m2o tuples) - Compare source values with existing Odoo records - Filter out unchanged rows before import - Track statistics (new, changed, unchanged, skip rate) Key functions: - get_existing_records(): Fetch records from Odoo by external ID - find_unchanged_records(): Identify unchanged records from dict data - filter_unchanged_rows(): Filter unchanged rows from list data - display_idempotent_stats(): Show import statistics This module enables imports to be run multiple times safely, only importing records that have actually changed. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add adaptive throttling based on server response times: Server Health Levels: - HEALTHY: Normal operation, no throttling - DEGRADED: Slight slowdown, add small delays - STRESSED: Significant load, reduce batch sizes - OVERLOADED: Critical, aggressive throttling Features: - Rolling average response time monitoring - Automatic delay adjustment between requests - Dynamic batch size scaling based on health - Hysteresis for health recovery (prevents flapping) - Error recording for server errors (5xx) - Comprehensive statistics tracking Configuration: - Customizable thresholds for each health level - Configurable delays and batch multipliers - Aggressive mode for sensitive servers 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Complete integration of the remaining 3 stability features: 1. **Smarter Retry Logic** - Integrated into error handling: - Uses ErrorCategory enum to classify errors as transient/permanent - Exponential backoff with jitter for server overload (502/503) - Database serialization conflict handling with backoff 2. **Idempotent Import Mode** (`--skip-unchanged`): - Fetches existing records from Odoo before import - Compares field values to detect unchanged records - Skips records that haven't changed, making imports idempotent - Reports skip statistics in final output 3. **Health-Aware Throttling** (`--adaptive-throttle`): - ThrottleController monitors server response times - Automatically adjusts delays based on server health - Records timing after each batch load operation - Reports throttle statistics at end of import All 597 tests passing. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This adds a comprehensive workflow for managing VAT validation during contact imports, addressing VIES API timeouts in large imports. Features: - Local VAT format validation with regex patterns for all EU countries - Checksum validation for BE, DE, NL - Support for custom validators (e.g., Rust-based via PyO3) - Save/restore VAT validation settings across companies - Disable both VIES (online) and stdnum (local) validation - Batch VIES validation with user notifications CLI commands: - vat get-settings: Display current VAT validation settings - vat disable: Disable VAT validation, save settings to JSON - vat restore: Restore settings from JSON file - vat validate: Batch VIES validation with notifications 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add VIES/VAT Manager to API reference (autodoc) - Add Module Manager to API reference (autodoc) - Add comprehensive VAT Validation Management guide section - Include CLI usage examples, programmatic usage, and custom validators 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add return type annotations to test functions - Fix S110: Add logging to try-except-pass blocks - Fix C901: Add noqa comments for complex functions - Fix D417: Add missing docstring parameter descriptions - Fix E501: Break long lines - Fix RUF059: Remove/rename unused variables - Use Optional[str] instead of str | None for Python 3.9 compatibility - Replace assert type narrowing with conditional checks 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- validation.py: Cast search_count comparisons to bool explicitly - idempotent.py: Rename loop variable to avoid redefinition - preflight.py: Cast check_refs comparisons to bool explicitly 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Enables parsing date/datetime columns with custom formats using Polars'
vectorized str.to_date() and str.to_datetime() for efficient conversion.
Example usage:
processor = Processor(
mapping={},
dataframe=df,
date_formats={"birth_date": "%d/%m/%Y"},
datetime_formats={"created_at": "%d/%m/%Y %H:%M:%S"},
)
This provides an alternative to Polars' automatic date detection
(try_parse_dates=True) for cases where explicit format control is needed,
such as ambiguous date formats (DD/MM vs MM/DD).
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Addresses timeout issues when using --move-date with large inventory imports on production databases: 1. Longer timeout for post-action (10 minutes) - Uses socket.setdefaulttimeout() for RPC calls - Handles timeout/connection errors gracefully - Returns success even on timeout (server may have completed) 2. Extract product IDs before post-action - New _get_product_ids_from_quants() helper function - Product IDs captured while connection is reliable - Allows move identification even after timeout 3. Time window fallback (2 hours) - Replaces exact timestamp filtering - Finds moves by product + inventory location + recent create_date - Handles cases where server completes after client timeout 4. Added diagnostic logging - Logs when product IDs are extracted and how many - Warns when move date update is skipped (no products or failed post-action) - Helps troubleshoot issues in production 5. Comprehensive test coverage - Tests for timeout handling - Tests for product ID extraction - Tests for move date update flow - Tests for edge cases (empty products, failed post-action) 6. Updated documentation - Explains timeout handling behavior - Added troubleshooting entries for timeout scenarios Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
1cf2e5b to
c2df94b
Compare
When reading CSV files with polars, the library infers column types by examining the first N rows. If a column like 'default_code' has numeric values in early rows and alphanumeric values later (e.g., "eWB0071-ASSY-11"), polars would infer it as integer and fail. This was causing errors in fail mode imports: "Could not read csv header: could not parse: "eWB0071-ASSY-11" as dtype `i64` at column `default_code`" Fixed by adding `infer_schema_length=0` to all pl.read_csv calls in preflight.py and importer.py. This forces polars to read all columns as strings, which is the correct behavior for a data import tool where we don't need type inference. Files fixed: - src/odoo_data_flow/lib/preflight.py (3 occurrences) - src/odoo_data_flow/importer.py (1 occurrence) Note: sort.py already had this fix.
Addresses stability issues when importing to remote Odoo servers with limited workers (e.g., single worker hosting): 1. Added new transient error patterns for server crash detection: - JSONDecodeError / "expecting value" (empty response) - "empty response", "incomplete read", "eof occurred" - "broken pipe", "connection aborted", "remotedisconnected" - "500" internal server error - "server closed connection" 2. Enhanced server overload detection in import_threaded.py: - Expanded pattern matching for crash indicators - Longer backoff for likely crashes (5s base, up to 120s max) - Standard backoff for overload (1s base, up to 60s max) - Clear messaging: "Server crash/empty response" vs "Server overload" 3. Added tests for new error patterns: - test_categorize_transient_json_decode_error - test_categorize_transient_empty_response - test_categorize_transient_connection_reset - test_categorize_transient_broken_pipe - test_categorize_transient_500_error These changes help the tool automatically recover when the Odoo server crashes or restarts during large imports, which is common with single worker configurations.
Changed adaptive_throttle default from False to True across: - CLI (--adaptive-throttle/--no-adaptive-throttle) - import_threaded.py - importer.py Since adaptive throttling only adds delays when server response times degrade, there's minimal overhead for fast servers. For production imports to remote servers (especially with limited workers), this provides automatic protection against server overload. Users who want maximum speed on local/powerful servers can use --no-adaptive-throttle to disable it.
- Fix E501 line length issues throughout codebase - Add noqa: C901 comments for complex functions - Add missing docstring argument for connection parameter - Fix test type annotations (Optional[dict] for context param) - Fix test formatting issues - Sort __all__ exports alphabetically Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The vies_manager module was incorrectly trying to parse INI-style config files as YAML. This fix: - Uses configparser (stdlib) to match conf_lib.py's approach - Removes the unnecessary pyyaml dependency - Updates the test to use INI format Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
05e031a to
571e559
Compare
- Add explicit `return None` for early returns in run_import() - Update _get_env_from_config() to accept Optional config parameter Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add comprehensive tests across multiple modules to reach the 85% coverage threshold. Key areas covered include checkpoint cleanup, phone normalization, config file handling, throttle controller, retry logic, validation edge cases, and various expression functions. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Windows defaults to cp1252 encoding which cannot handle Cyrillic characters in geonames test data. Explicitly specifying UTF-8 encoding in all write_text() calls fixes the UnicodeEncodeError on Windows CI. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When importing related models like res.partner.bank, tracking_disable alone doesn't prevent chatter messages on the parent res.partner record. Added additional Odoo context keys: - mail_create_nolog: Don't log record creation - mail_notrack: Don't track field changes - mail_activity_automation_skip: Skip activity automation These flags are now set automatically when tracking_disable is True, ensuring complete suppression of mail/chatter messages during imports. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Previously, post-actions like action_apply_inventory were only executed when the import was fully successful. This caused stock.quant inventory adjustments to remain in draft state when any records failed. Changes: - run_import now returns the id_map on partial failure instead of None, preserving the successfully imported record IDs - import_cmd now runs post-action whenever import_result is not None (i.e., whenever the import process ran, even with partial failures) - Only critical failures (process crash) skip the post-action - Added "Import Partially Complete" panel showing success/failure counts
The export was incorrectly handling many2many fields with /.id format, returning only the first ID instead of all IDs. This was because both many2one (id, name) tuples and many2many [id1, id2, ...] lists were treated identically. Now properly differentiates: - many2one: extracts single ID from (id, name) tuple - many2many/one2many: joins all IDs with comma separator Also fixes the field type inference to use 'char' for many2many /.id fields (comma-separated string) vs 'integer' for many2one.
Odoo returns [id, name] lists (not tuples) for many2one fields. The fix now properly distinguishes: - many2one: [id, display_name] -> extract just the ID - many2many/one2many: [id1, id2, ...] -> join with comma
The hybrid export mode was only handling many2one fields for XML ID enrichment. Many2many and one2many fields returned incorrect results because the code assumed a (id, name) tuple format instead of a list of IDs. Changes: - Store relation_type (many2one/many2many/one2many) in fields_info - Pass relation_type to enrichment tasks - Rewrite _enrich_with_xml_ids to handle both field types: - many2one: single XML ID from (id, name) tuple - many2many/one2many: comma-separated XML IDs from [id1, id2, ...] list - Records without XML IDs are excluded from the output (not null placeholders) Added tests: - test_export_hybrid_mode_many2many_xml_ids: basic many2many /id export - test_export_hybrid_mode_many2many_partial_xml_ids: some records lack XML IDs - test_export_hybrid_mode_many2many_empty: empty many2many returns None - test_export_many2many_xml_ids_to_file: e2e test with file output - test_export_one2many_xml_ids: one2many field handling
The 'force_company' context key is deprecated in Odoo 18 and causes warnings in the server logs. The modern approach is to use only 'allowed_company_ids' which is supported in Odoo 13+. Note: .with_company(ID) is a Python ORM method that cannot be called via RPC - it internally sets context keys. For RPC calls, allowed_company_ids is the correct approach.
Implement intelligent batch splitting based on estimated payload size to prevent server timeouts when importing records with large binary fields like images. Changes: - Add _estimate_payload_size() and _estimate_row_size() helper functions - Add DEFAULT_MAX_BATCH_BYTES constant (5MB default) - Update _stream_csv_batches() to split batches when size limit exceeded - Update _orchestrate_pass_2() to use size-based super-batch aggregation - Add --max-batch-bytes CLI option to import command Both Pass 1 (load) and Pass 2 (write deferred fields) now respect the size limit. Batches are split when either the record count OR the payload size exceeds the configured limits. This fixes timeouts during product template imports with large images where a batch of 10 records could result in 50MB+ payloads.
- Add max_batch_bytes parameter to _orchestrate_streaming_pass_1 - Add docstring documentation for max_batch_bytes in import_data - Fix unused variable warnings in tests (prefix with underscore) - Shorten long docstrings to comply with line length limit - Add noqa: C901 to complex functions in export_threaded
- Add type annotations to nested test functions - Add 'assert error is not None' before using error in string operations - Fix MockColumn dtype annotation to use type[pl.DataType] - Add type annotation for rows list in test_idempotent - Change output parameter in run_export to Optional[str] - Fix typeguard issue by using intermediate Any-typed variable for json.loads - Import Any in test_vies_manager
The Pass 2 deferred field update was passing single integer IDs for many2many fields, causing Odoo ValueError. Odoo requires list format [id] or command format [(6, 0, [ids])] for many2many writes. Changes: - Added field type detection using model.fields_get() to identify m2m - Implemented proper value wrapping with [(6, 0, [ids])] command format - Added handling for comma-separated multiple values - Added comprehensive unit tests for m2m Pass 2 handling This fixes the ValueError: "Wrong value for product.template.accessory_product_ids" error during product template imports with accessory/optional product relations.
The grouping logic needed to convert nested lists inside tuples to tuples recursively to make them hashable. Also improved the reverse conversion to properly restore Odoo m2m command format [(6, 0, [ids])].
- Track serialization errors in failed_lines instead of silently dropping - Add logging for malformed rows in streaming mode - Add reconciliation check comparing total vs (created + failed) - Display warning panel when records are unaccounted for - Add failed_records and unaccounted_records to import stats Also fixes: - Python 3.9 compatibility in test_geonames.py (Path | None -> Optional) - Remove broken test file with non-existent function imports - Update test for serialization error behavior change Closes #178 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add explicit dict[str, Any] type annotation to fix mypy error where update_vals holds both int values (many2one) and list[tuple] values (many2many commands). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When deferred field values are not in id_map (which only contains records from the current model import), the code now checks if the value looks like an XML ID (contains a dot separator like module.name) and tries to resolve it via ir.model.data. This fixes the issue where cross-model references like: - user_id referencing res.users - state_id referencing res.country.state - property_purchase_currency_id referencing res.currency Were not being resolved because they weren't in the id_map built during Pass 1 (which only contains res.partner records in this case). The fix applies to both many2one and many2many deferred fields. Closes #179 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add TestPreparePass2DataCrossModelResolution class with 4 tests: - many2one cross-model reference resolution via ir.model.data - XML ID resolution for columns without /id suffix - many2many cross-model reference resolution - verification that non-XML ID values are used directly Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
#180 - Fix nested fail file directory When source file is in a directory matching the env_name (e.g., data/prod/file.csv with prod_connection.conf), no longer creates nested data/prod/prod/ directory. #181 - Better error messages for existing records Added detection for "already exists" patterns (duplicate key, unique constraint, circular references). Error messages now suggest using --skip-existing flag. #182 - Stop accumulating timestamped fail files Fail files now always use the same name (model_fail.csv) and get overwritten instead of creating timestamped copies. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Odoo returns datetime strings in format '2026-02-27 05:38:37' (space separator), but Polars cast(Datetime, strict=False) cannot parse this format and silently returns null. Changed ODOO_TO_POLARS_MAP to keep date/datetime fields as strings, preserving the values throughout the export process. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…187) Added --sanitize-newlines flag to export command that optionally replaces embedded newlines in text/char/html fields with a configurable delimiter. This prevents CSV corruption when text fields contain embedded newlines. Default behavior: newlines are preserved (no sanitization) With flag: newlines replaced with specified string (e.g., " | ") Changes: - Added sanitize_newlines() function to clean_expr.py - Added sanitize_newlines parameter to _clean_and_transform_batch() - Added --sanitize-newlines CLI flag to export command - Added 15 unit tests for newline sanitization Usage: odoo-data-flow export --sanitize-newlines " | " ...
- Add explicit type annotations for dict[str, pl.DataType] in test files to fix mypy covariance issues with polars DataType classes - Remove unused imports (datetime) from importer.py and writer.py - Format test assertions to comply with line length limits
Summary
This PR addresses critical issues with the deferred-fields feature to make odoo-data-flow production-ready for ETL operations.
Key Fixes
Fix deferred-fields matching - Handle both
fieldandfield/idformats correctlyAdd XML-ID resolution for non-self-referencing fields - Support fields like
responsible_idthat reference other models (e.g.,res.users)_resolve_external_id_for_pass2()helper functionFix batch rejection error handling - Records no longer inherit the same error message
_extract_per_row_errors()to parse per-row errors from Odoo's responseAdd binary field deferral support - Allow deferring image fields like
image_1920Add
--company-idCLI parameter - Simplify multicompany importsallowed_company_idsandforce_companyin contextFix CLI deferred-fields parsing - Convert comma-separated string to list
Tested With
Test plan
--deferred-fields--company-id--deferred-fields image_1920