Skip to content

feat(storage): log additional bytes received from GCS in read path#17423

Open
nidhiii-27 wants to merge 7 commits into
mainfrom
feat/log-additional-bytes
Open

feat(storage): log additional bytes received from GCS in read path#17423
nidhiii-27 wants to merge 7 commits into
mainfrom
feat/log-additional-bytes

Conversation

@nidhiii-27

Copy link
Copy Markdown
Contributor

It has been found that GCS can occasionally send additional bytes while reading from stream. This scenario should be logged properly for debugging and tracking purposes.

Fixes: 475824752

@nidhiii-27 nidhiii-27 requested a review from a team as a code owner June 11, 2026 09:40

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces warning logs when more bytes are downloaded than requested from GCS, implemented in both _download.py and blob.py, along with a new unit test. The review feedback highlights several critical and medium-severity issues: an undefined headers variable in _download.py that will cause a NameError, a missing function call to _get_headers in blob.py that will fail at runtime, and a flawed unit test that does not actually execute the download loop due to incorrect mocking of download.finished and consume_next_chunk. Additionally, suggestions are provided to ensure fallback values are correctly applied when bucket or object names are None, and to align the range-handling logic in blob.py with _download.py for consistency.

Comment thread packages/google-cloud-storage/google/cloud/storage/_media/_download.py Outdated
Comment thread packages/google-cloud-storage/google/cloud/storage/blob.py Outdated
Comment thread packages/google-cloud-storage/google/cloud/storage/_media/_download.py Outdated
Comment thread packages/google-cloud-storage/google/cloud/storage/blob.py Outdated
Comment thread packages/google-cloud-storage/tests/unit/test_blob.py
It has been found that GCS can occasionally send additional bytes while reading from stream.
This scenario should be logged properly for debugging and tracking purposes.

Fixes: 475824752

[Generated-by: AI]
@nidhiii-27 nidhiii-27 force-pushed the feat/log-additional-bytes branch from d7a9e73 to ad8a555 Compare June 11, 2026 10:23
@nidhiii-27

Copy link
Copy Markdown
Contributor Author

Addressed all PR review comments.

@nidhiii-27 nidhiii-27 marked this pull request as ready for review June 16, 2026 05:50
@nidhiii-27 nidhiii-27 force-pushed the feat/log-additional-bytes branch from b09b205 to d0dc047 Compare June 24, 2026 08:34
@@ -66,6 +66,8 @@ def __init__(
end=None,
headers=None,
retry=DEFAULT_RETRY,
client_info_bucket_name=None,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add these params in the doc string as well.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Co-authored by AI Agent

Comment on lines +496 to +501
if self.start is not None and self.start < 0 and self.end is None:
requested_length = -self.start
elif self.start is not None and self.end is not None:
requested_length = self.end - self.start + 1
elif self.start is None and self.end is not None:
requested_length = self.end + 1

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this if else logic , can be made more readable a

if start is not None and end is not None:
    len = end - start + 1
elif start is None:
    len = end + 1
elif start < 0:
    len = -start

and what about start > 0 ?

@nidhiii-27

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I have refactored the logic to make it cleaner and also handled the start > 0 with end is None case, which requests the remaining bytes from start to the end of the file.

Co-authored by AI Agent

@@ -66,6 +70,8 @@ def __init__(
end=None,
headers=None,
retry=DEFAULT_RETRY,
client_info_bucket_name=None,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're introduced this new param , but no caller is passing these values. @nidhiii-27

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I have reverted all changes (reverted the warning logging and the unused client_info parameters) in _download.py and consolidated the warning logging logic inside blob.py after the download loop. I also expanded the requested_length calculation in blob.py to handle all range request cases correctly.

Co-authored by AI Agent

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of doing changes in this files, why not after these lines -

while not download.finished:
download.consume_next_chunk(transport, timeout=timeout)
? ?

you don't have to pass bucket_name and object_name ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I have reverted all changes (reverted the warning logging and the unused client_info parameters) in _download.py and consolidated the warning logging logic inside blob.py after the download loop. I also expanded the requested_length calculation in blob.py to handle all range request cases correctly.

Co-authored by AI Agent

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants