Skip to content

Flag tracebacks in observation summary and verify extensions after build#848

Draft
Cybis320 wants to merge 3 commits intoprereleasefrom
feature-flag-tracebacks
Draft

Flag tracebacks in observation summary and verify extensions after build#848
Cybis320 wants to merge 3 commits intoprereleasefrom
feature-flag-tracebacks

Conversation

@Cybis320
Copy link
Copy Markdown
Contributor

  • Scan session log files for tracebacks during finalization and add traceback_count to the observation summary when > 0
  • Verify all compiled extensions from setup.py are importable after RMS_Update build, fail the update if any are missing

@Cybis320 Cybis320 requested a review from g7gpr March 16, 2026 16:21
Copy link
Copy Markdown
Contributor

@g7gpr g7gpr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you break the traceback counting work into it's own function? A big rewrite is in progress on ObservationSummary, splitting this into its own function will make the merge easier. Please add the traceback_count even when it is zero, otherwise folks will complain that it is 'missing'.

Copy link
Copy Markdown
Contributor

@g7gpr g7gpr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this discovers if the extensions can be imported, but not if they include the specific names in the namespace.

If a problem is found, what does this do? Just send a message to the console?

@Cybis320
Copy link
Copy Markdown
Contributor Author

Right, if running RMS_Update attended, the user sees the error and can investigate. If unattended, the tracebacks are logged and the weblog can be changed to display a specific alert. Retries don't make sense here - if it fails to build, there is something wrong with the host.

@g7gpr
Copy link
Copy Markdown
Contributor

g7gpr commented Mar 17, 2026

OK - I understand the intended functionality now; and it's safe, and I'll approve the merge and deal with the conflicts down the track. I don't agree the retries don't make sense. Something was wrong with the host at that moment; but 30 seconds later after the file system has had chance to flush buffers and calm down, maybe it would work?

@dvida dvida requested a review from Copilot March 20, 2026 14:21
@dvida
Copy link
Copy Markdown
Contributor

dvida commented Mar 20, 2026

This one is still set as draft, is it ready now?

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR tightens post-run validation by (1) surfacing Python tracebacks in the observation summary and (2) adding a post-build verification step to ensure compiled extensions are actually importable after RMS_Update.sh completes.

Changes:

  • Add traceback_count to observation summary output and populate it by scanning session log files for Python tracebacks.
  • Add a post-build step in RMS_Update.sh that parses extension module names from setup.py and attempts to import each one, failing the update on missing imports.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File Description
Scripts/RMS_Update.sh Adds a compiled-extension import verification step after pip install -e . to detect silent build/import failures.
RMS/Formats/ObservationSummary.py Adds traceback_count to the summary ordering and records it during finalization by scanning recent log files.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread Scripts/RMS_Update.sh
Comment on lines +1576 to +1578
)

if [ $? -ne 0 ]; then
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this script runs with set -Eeuo pipefail (errexit), a non-zero exit from the python command inside verify_output=$(...) will cause the script to abort immediately (triggering the ERR trap) before the subsequent if [ $? -ne 0 ] block can print the list of failed modules. Wrap the command substitution in a conditional (e.g., if ! verify_output=$(python ...); then ... fi) or temporarily disable errexit around it so the intended error reporting executes.

Suggested change
)
if [ $? -ne 0 ]; then
) || verify_status=$?
if [ "${verify_status-0}" -ne 0 ]; then

Copilot uses AI. Check for mistakes.
Comment thread Scripts/RMS_Update.sh
Comment on lines +1569 to +1570
except ImportError:
failed.append(name)
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The verification helper only catches ImportError, but compiled extensions often fail with OSError/ModuleNotFoundError for missing shared libs or symbol errors. In those cases the script may exit without emitting a FAILED: line (and stderr isn’t captured), so the bash-side parsing can’t report what went wrong. Consider catching a broader exception (or printing the exception details) and capturing stderr (e.g., redirect 2>&1) so failures are actionable.

Suggested change
except ImportError:
failed.append(name)
except (ImportError, ModuleNotFoundError, OSError) as e:
# Record the module name along with basic error details so the caller can see why it failed.
failed.append(f"{name} ({e.__class__.__name__}: {e})")
except Exception as e:
# Catch any other unexpected exceptions to ensure we still emit a FAILED: line.
failed.append(f"{name} (Unexpected {e.__class__.__name__}: {e})")

Copilot uses AI. Check for mistakes.
Comment on lines +1341 to +1344

# Scan log files for tracebacks during this session
addObsParam(obs_db_conn, "traceback_count", countTracebacksInLogs(config))

Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description says to add traceback_count to the observation summary when > 0, but the current implementation always writes the key (including when the count is 0). If you want the field omitted unless tracebacks exist, compute the count and only call addObsParam when it’s > 0.

Copilot uses AI. Check for mistakes.
Comment on lines +1238 to +1246
# Parse the start time — handle both with and without fractional seconds
for fmt in ("%Y-%m-%d %H:%M:%S%z", "%Y-%m-%d %H:%M:%S"):
try:
session_start = datetime.datetime.strptime(session_start_str.replace("+00:00", ""), fmt)
break
except ValueError:
continue
else:
return 0
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

countTracebacksInLogs claims to handle start times “with and without fractional seconds”, but the parsing formats don’t include %f. Also, session_start_str.replace("+00:00", "") guarantees the "%Y-%m-%d %H:%M:%S%z" parse attempt can never succeed, making that branch dead code. It would be more robust to parse the stored value with datetime.fromisoformat(...) (and strip tzinfo) or explicitly support both %Y-%m-%d %H:%M:%S.%f%z and %Y-%m-%d %H:%M:%S%z without removing the offset first.

Suggested change
# Parse the start time — handle both with and without fractional seconds
for fmt in ("%Y-%m-%d %H:%M:%S%z", "%Y-%m-%d %H:%M:%S"):
try:
session_start = datetime.datetime.strptime(session_start_str.replace("+00:00", ""), fmt)
break
except ValueError:
continue
else:
return 0
# Parse the start time — handle both with and without fractional seconds and optional timezone
try:
# Prefer ISO 8601 parsing when available (Python 3.7+)
session_dt = datetime.datetime.fromisoformat(session_start_str)
except AttributeError:
# Fallback for older Python versions without fromisoformat
session_dt = None
for fmt in (
"%Y-%m-%d %H:%M:%S.%f%z",
"%Y-%m-%d %H:%M:%S%z",
"%Y-%m-%d %H:%M:%S.%f",
"%Y-%m-%d %H:%M:%S",
):
try:
session_dt = datetime.datetime.strptime(session_start_str, fmt)
break
except ValueError:
continue
if session_dt is None:
return 0
except ValueError:
# Malformed timestamp
return 0
# Normalize to naive UTC datetime for comparison with file modification times
if session_dt.tzinfo is not None:
try:
session_dt = session_dt.astimezone(datetime.timezone.utc).replace(tzinfo=None)
except AttributeError:
# datetime.timezone may not be available on very old Python;
# assume stored time is already in UTC and just strip tzinfo.
session_dt = session_dt.replace(tzinfo=None)
session_start = session_dt

Copilot uses AI. Check for mistakes.
Comment on lines +1265 to +1269
try:
with open(filepath, 'r', errors='replace') as f:
for line in f:
if 'Traceback (most recent call last)' in line:
traceback_count += 1
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file still contains Python 2 compatibility branches, but open(..., errors='replace') is Python 3-only. On Python 2 this will raise TypeError and be swallowed by the except, causing traceback scanning to silently do nothing. If Python 2 support is still required here, use io.open(..., errors='replace') (or conditional handling) to keep the feature working across versions.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants