Flag tracebacks in observation summary and verify extensions after build#848
Flag tracebacks in observation summary and verify extensions after build#848Cybis320 wants to merge 3 commits intoprereleasefrom
Conversation
g7gpr
left a comment
There was a problem hiding this comment.
Can you break the traceback counting work into it's own function? A big rewrite is in progress on ObservationSummary, splitting this into its own function will make the merge easier. Please add the traceback_count even when it is zero, otherwise folks will complain that it is 'missing'.
g7gpr
left a comment
There was a problem hiding this comment.
I think this discovers if the extensions can be imported, but not if they include the specific names in the namespace.
If a problem is found, what does this do? Just send a message to the console?
|
Right, if running RMS_Update attended, the user sees the error and can investigate. If unattended, the tracebacks are logged and the weblog can be changed to display a specific alert. Retries don't make sense here - if it fails to build, there is something wrong with the host. |
|
OK - I understand the intended functionality now; and it's safe, and I'll approve the merge and deal with the conflicts down the track. I don't agree the retries don't make sense. Something was wrong with the host at that moment; but 30 seconds later after the file system has had chance to flush buffers and calm down, maybe it would work? |
|
This one is still set as draft, is it ready now? |
There was a problem hiding this comment.
Pull request overview
This PR tightens post-run validation by (1) surfacing Python tracebacks in the observation summary and (2) adding a post-build verification step to ensure compiled extensions are actually importable after RMS_Update.sh completes.
Changes:
- Add
traceback_countto observation summary output and populate it by scanning session log files for Python tracebacks. - Add a post-build step in
RMS_Update.shthat parses extension module names fromsetup.pyand attempts to import each one, failing the update on missing imports.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| Scripts/RMS_Update.sh | Adds a compiled-extension import verification step after pip install -e . to detect silent build/import failures. |
| RMS/Formats/ObservationSummary.py | Adds traceback_count to the summary ordering and records it during finalization by scanning recent log files. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| ) | ||
|
|
||
| if [ $? -ne 0 ]; then |
There was a problem hiding this comment.
Because this script runs with set -Eeuo pipefail (errexit), a non-zero exit from the python command inside verify_output=$(...) will cause the script to abort immediately (triggering the ERR trap) before the subsequent if [ $? -ne 0 ] block can print the list of failed modules. Wrap the command substitution in a conditional (e.g., if ! verify_output=$(python ...); then ... fi) or temporarily disable errexit around it so the intended error reporting executes.
| ) | |
| if [ $? -ne 0 ]; then | |
| ) || verify_status=$? | |
| if [ "${verify_status-0}" -ne 0 ]; then |
| except ImportError: | ||
| failed.append(name) |
There was a problem hiding this comment.
The verification helper only catches ImportError, but compiled extensions often fail with OSError/ModuleNotFoundError for missing shared libs or symbol errors. In those cases the script may exit without emitting a FAILED: line (and stderr isn’t captured), so the bash-side parsing can’t report what went wrong. Consider catching a broader exception (or printing the exception details) and capturing stderr (e.g., redirect 2>&1) so failures are actionable.
| except ImportError: | |
| failed.append(name) | |
| except (ImportError, ModuleNotFoundError, OSError) as e: | |
| # Record the module name along with basic error details so the caller can see why it failed. | |
| failed.append(f"{name} ({e.__class__.__name__}: {e})") | |
| except Exception as e: | |
| # Catch any other unexpected exceptions to ensure we still emit a FAILED: line. | |
| failed.append(f"{name} (Unexpected {e.__class__.__name__}: {e})") |
|
|
||
| # Scan log files for tracebacks during this session | ||
| addObsParam(obs_db_conn, "traceback_count", countTracebacksInLogs(config)) | ||
|
|
There was a problem hiding this comment.
The PR description says to add traceback_count to the observation summary when > 0, but the current implementation always writes the key (including when the count is 0). If you want the field omitted unless tracebacks exist, compute the count and only call addObsParam when it’s > 0.
| # Parse the start time — handle both with and without fractional seconds | ||
| for fmt in ("%Y-%m-%d %H:%M:%S%z", "%Y-%m-%d %H:%M:%S"): | ||
| try: | ||
| session_start = datetime.datetime.strptime(session_start_str.replace("+00:00", ""), fmt) | ||
| break | ||
| except ValueError: | ||
| continue | ||
| else: | ||
| return 0 |
There was a problem hiding this comment.
countTracebacksInLogs claims to handle start times “with and without fractional seconds”, but the parsing formats don’t include %f. Also, session_start_str.replace("+00:00", "") guarantees the "%Y-%m-%d %H:%M:%S%z" parse attempt can never succeed, making that branch dead code. It would be more robust to parse the stored value with datetime.fromisoformat(...) (and strip tzinfo) or explicitly support both %Y-%m-%d %H:%M:%S.%f%z and %Y-%m-%d %H:%M:%S%z without removing the offset first.
| # Parse the start time — handle both with and without fractional seconds | |
| for fmt in ("%Y-%m-%d %H:%M:%S%z", "%Y-%m-%d %H:%M:%S"): | |
| try: | |
| session_start = datetime.datetime.strptime(session_start_str.replace("+00:00", ""), fmt) | |
| break | |
| except ValueError: | |
| continue | |
| else: | |
| return 0 | |
| # Parse the start time — handle both with and without fractional seconds and optional timezone | |
| try: | |
| # Prefer ISO 8601 parsing when available (Python 3.7+) | |
| session_dt = datetime.datetime.fromisoformat(session_start_str) | |
| except AttributeError: | |
| # Fallback for older Python versions without fromisoformat | |
| session_dt = None | |
| for fmt in ( | |
| "%Y-%m-%d %H:%M:%S.%f%z", | |
| "%Y-%m-%d %H:%M:%S%z", | |
| "%Y-%m-%d %H:%M:%S.%f", | |
| "%Y-%m-%d %H:%M:%S", | |
| ): | |
| try: | |
| session_dt = datetime.datetime.strptime(session_start_str, fmt) | |
| break | |
| except ValueError: | |
| continue | |
| if session_dt is None: | |
| return 0 | |
| except ValueError: | |
| # Malformed timestamp | |
| return 0 | |
| # Normalize to naive UTC datetime for comparison with file modification times | |
| if session_dt.tzinfo is not None: | |
| try: | |
| session_dt = session_dt.astimezone(datetime.timezone.utc).replace(tzinfo=None) | |
| except AttributeError: | |
| # datetime.timezone may not be available on very old Python; | |
| # assume stored time is already in UTC and just strip tzinfo. | |
| session_dt = session_dt.replace(tzinfo=None) | |
| session_start = session_dt |
| try: | ||
| with open(filepath, 'r', errors='replace') as f: | ||
| for line in f: | ||
| if 'Traceback (most recent call last)' in line: | ||
| traceback_count += 1 |
There was a problem hiding this comment.
This file still contains Python 2 compatibility branches, but open(..., errors='replace') is Python 3-only. On Python 2 this will raise TypeError and be swallowed by the except, causing traceback scanning to silently do nothing. If Python 2 support is still required here, use io.open(..., errors='replace') (or conditional handling) to keep the feature working across versions.
traceback_countto the observation summary when > 0