Skip to content

[PyTorch Debug] Skip logging stats if unsupported#2652

Open
pggPL wants to merge 1 commit intoNVIDIA:mainfrom
pggPL:skip_not_necessary_tensors
Open

[PyTorch Debug] Skip logging stats if unsupported#2652
pggPL wants to merge 1 commit intoNVIDIA:mainfrom
pggPL:skip_not_necessary_tensors

Conversation

@pggPL
Copy link
Collaborator

@pggPL pggPL commented Feb 5, 2026

Description

If the layer is run in high precision and LogFP8Stats is invoked, it results in error. It is very inconvenient, thus this PR changes the behavior to ignore + warning.

Fixes # (issue)

Type of change

  • Documentation change (change only to the documentation, either a fix or a new content)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Infra/Build change
  • Code refactoring

Changes

Please list the changes introduced in this PR:

  • Change A
  • Change B

Checklist:

  • I have read and followed the contributing guidelines
  • The functionality is complete
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 5, 2026

Greptile Overview

Greptile Summary

Replaces hard errors with graceful warnings when debug logging features encounter unsupported precision modes (high precision or incompatible quantizers).

Changes:

  • Converts assertions to conditional checks with warnings.warn() and early returns in both LogFp8TensorStats and LogNvfp4TensorStats
  • Adds import warnings to both files
  • Handles three scenarios: missing quantizer (high precision), wrong quantized tensor type, and incompatible quantizer type
  • Warning messages include layer name, tensor name, and clear explanation of why logging was skipped

Impact:
This allows mixed-precision training workflows where some layers run in high precision while debug logging is enabled globally, improving developer experience.

Confidence Score: 5/5

  • Safe to merge - well-designed defensive change that improves usability
  • Changes are minimal, well-scoped, and improve error handling by replacing hard failures with graceful degradation. The logic is straightforward and follows consistent patterns across both files.
  • No files require special attention

Important Files Changed

Filename Overview
transformer_engine/debug/features/log_fp8_tensor_stats.py Converted assertions to graceful warnings with early returns for unsupported precision modes
transformer_engine/debug/features/log_nvfp4_tensor_stats.py Converted assertions and ValueError to graceful warnings with early returns for unsupported precision modes

Sequence Diagram

sequenceDiagram
    participant Caller
    participant LogFp8Stats as LogFp8TensorStats.inspect_tensor()
    participant LogNvfp4Stats as LogNvfp4TensorStats.inspect_tensor()
    
    Note over Caller,LogNvfp4Stats: High Precision Layer (No Quantizer)
    
    Caller->>LogFp8Stats: inspect_tensor(quantizer=None)
    LogFp8Stats->>LogFp8Stats: Check if quantizer is None
    LogFp8Stats->>LogFp8Stats: warnings.warn("layer runs in high precision")
    LogFp8Stats-->>Caller: return (skip logging)
    
    Caller->>LogNvfp4Stats: inspect_tensor(quantizer=None)
    LogNvfp4Stats->>LogNvfp4Stats: Check if quantizer is None
    LogNvfp4Stats->>LogNvfp4Stats: warnings.warn("layer runs in high precision")
    LogNvfp4Stats-->>Caller: return (skip logging)
    
    Note over Caller,LogNvfp4Stats: Incompatible Precision Type
    
    Caller->>LogFp8Stats: inspect_tensor(quantized_tensor=WrongType)
    LogFp8Stats->>LogFp8Stats: Check if isinstance(quantized_tensor, QuantizedTensor)
    LogFp8Stats->>LogFp8Stats: warnings.warn("incompatible precision")
    LogFp8Stats-->>Caller: return (skip logging)
    
    Caller->>LogNvfp4Stats: inspect_tensor(quantizer=WrongType)
    LogNvfp4Stats->>LogNvfp4Stats: Check if isinstance(quantizer, NVFP4Quantizer)
    LogNvfp4Stats->>LogNvfp4Stats: warnings.warn("incompatible precision")
    LogNvfp4Stats-->>Caller: return (skip logging)
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

@pggPL
Copy link
Collaborator Author

pggPL commented Feb 5, 2026

/te-ci pytorch

Copy link
Collaborator

@timmoon10 timmoon10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants