feat(storage): log additional bytes received from GCS in read path#17423
feat(storage): log additional bytes received from GCS in read path#17423nidhiii-27 wants to merge 7 commits into
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces warning logs when more bytes are downloaded than requested from GCS, implemented in both _download.py and blob.py, along with a new unit test. The review feedback highlights several critical and medium-severity issues: an undefined headers variable in _download.py that will cause a NameError, a missing function call to _get_headers in blob.py that will fail at runtime, and a flawed unit test that does not actually execute the download loop due to incorrect mocking of download.finished and consume_next_chunk. Additionally, suggestions are provided to ensure fallback values are correctly applied when bucket or object names are None, and to align the range-handling logic in blob.py with _download.py for consistency.
It has been found that GCS can occasionally send additional bytes while reading from stream. This scenario should be logged properly for debugging and tracking purposes. Fixes: 475824752 [Generated-by: AI]
d7a9e73 to
ad8a555
Compare
|
Addressed all PR review comments. |
[Generated-by: AI]
b09b205 to
d0dc047
Compare
| @@ -66,6 +66,8 @@ def __init__( | |||
| end=None, | |||
| headers=None, | |||
| retry=DEFAULT_RETRY, | |||
| client_info_bucket_name=None, | |||
There was a problem hiding this comment.
add these params in the doc string as well.
There was a problem hiding this comment.
Done
Co-authored by AI Agent
| if self.start is not None and self.start < 0 and self.end is None: | ||
| requested_length = -self.start | ||
| elif self.start is not None and self.end is not None: | ||
| requested_length = self.end - self.start + 1 | ||
| elif self.start is None and self.end is not None: | ||
| requested_length = self.end + 1 |
There was a problem hiding this comment.
this if else logic , can be made more readable a
if start is not None and end is not None:
len = end - start + 1
elif start is None:
len = end + 1
elif start < 0:
len = -startand what about start > 0 ?
There was a problem hiding this comment.
Done. I have refactored the logic to make it cleaner and also handled the start > 0 with end is None case, which requests the remaining bytes from start to the end of the file.
Co-authored by AI Agent
[Generated-by: AI]
| @@ -66,6 +70,8 @@ def __init__( | |||
| end=None, | |||
| headers=None, | |||
| retry=DEFAULT_RETRY, | |||
| client_info_bucket_name=None, | |||
There was a problem hiding this comment.
you're introduced this new param , but no caller is passing these values. @nidhiii-27
There was a problem hiding this comment.
Good point. I have reverted all changes (reverted the warning logging and the unused client_info parameters) in _download.py and consolidated the warning logging logic inside blob.py after the download loop. I also expanded the requested_length calculation in blob.py to handle all range request cases correctly.
Co-authored by AI Agent
There was a problem hiding this comment.
instead of doing changes in this files, why not after these lines -
google-cloud-python/packages/google-cloud-storage/google/cloud/storage/blob.py
Lines 1123 to 1124 in 8cb77d9
you don't have to pass bucket_name and object_name ?
There was a problem hiding this comment.
Good point. I have reverted all changes (reverted the warning logging and the unused client_info parameters) in _download.py and consolidated the warning logging logic inside blob.py after the download loop. I also expanded the requested_length calculation in blob.py to handle all range request cases correctly.
Co-authored by AI Agent
…nload.py to blob.py [Generated-by: AI]
…c readability [Generated-by: AI]
…in blob.py [Generated-by: AI]
It has been found that GCS can occasionally send additional bytes while reading from stream. This scenario should be logged properly for debugging and tracking purposes.
Fixes: 475824752