Gapic deduplication experimentation by hebaalazzeh · Pull Request #17573 · googleapis/google-cloud-python

hebaalazzeh · 2026-06-25T09:39:53Z

GAPIC Client Deduplication (Shared Utility Layer)

Goal

Reduce package size and improve maintainability of generated GAPIC clients by deduplicating redundant static helper methods and configuration logic into a shared layer.

Background

Currently, every generated service client (e.g., in google-cloud-compute, google-cloud-aiplatform) contains hundreds of lines of identical static methods for:

mTLS endpoint resolution (_get_default_mtls_endpoint)
Client certificate usage checks (_use_client_cert_effective)
Environment variable reading (_read_environment_variables)

This duplication inflates package size and makes horizontal fixes difficult.

Proposed Approach (Shared Utility Layer)

We are introducing a shared_utils.py module within the generated services package. Common logic is extracted into this module. The generated service clients are refactored to delegate to these shared functions via thin pass-through stubs.

Benefits:

Significant LOC Reduction: Thousands of lines of code saved across massive libraries.
Backward Compatibility Preserved: Backward-compatible stubs on the client class ensure that advanced users monkey-patching "private" methods continue to function.
Type Safety: Maintains strict type hints.

Manual Validation Results (`google-cloud-kms`)

A manual experiment was conducted on google-cloud-kms to validate this approach before updating the generator templates.

Steps Taken:

Created google/cloud/kms_v1/services/shared_utils.py.
Moved get_default_mtls_endpoint, use_client_cert_effective, and read_environment_variables to shared_utils.
Refactored KeyManagementServiceClient and EkmServiceClient to delegate to shared_utils.

Impact:

Replaced ~35 lines of duplicate logic per client with one-line delegations.
No functional regressions.

Testing Note (Environmental Edge Case Discovered):

During testing on Cloudtop environments, tests assuming default False states for client certs might fail if CLOUDSDK_CONTEXT_AWARE_USE_CLIENT_CERTIFICATE=true is set globally. This is an artifact of the testing environment assumptions, not the deduplication logic. Overriding this variable during testing (CLOUDSDK_CONTEXT_AWARE_USE_CLIENT_CERTIFICATE=false) confirms logic is sound.

References

Design Doc

…metrics

- Add fallback to handle single-iteration quantile calculation crashes - Normalize .pyc file paths during line counting and log exceptions - Expose worker process stderr to facilitate debugging CalledProcessError - Fix absolute paths in documentation.md to use relative directory paths

…g, and encodings - Use importlib.util.source_from_cache for robust .pyc resolution - Move importlib.util and logging imports to module level - Refine json.loads to parse only the last line of stdout - Specify UTF-8 encoding when opening files for writing

- Extract worker logic to _run_worker_and_parse for robust JSON parsing - Add early validation for iterations parameter in run_master - Add non-zero exit code check and warning for run_trace failures

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

- Execute cProfile in a fresh subprocess to ensure cold-start accuracy - Execute tracemalloc in a fresh subprocess to capture all memory allocations - Clean up master process exception handling for missing taskset vs crashed workers

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

- Use json.dumps to safely escape target_module string in all subprocess commands (mprofile, cprofile, trace)

- Replace detailed implementation architecture and alternative solutions with a concise high-level usage and expected output format per PR review feedback

…me latency

- Read /proc/self/statm inside worker before and after import - Exposes pure C-extension memory allocations (e.g. grpc/protobuf) missed by Python's internal tracemalloc - Requires zero dependencies, bypassing the need for psutil

- Recursively wipes __pycache__ and .pyc files via pathlib prior to iteration loops. - Employs Python's -B flag on worker subprocesses to prevent re-caching during the benchmark, enforcing raw .py parsing for every execution.

- Replaces --clear-pycache with an opt-out --keep-pycache flag - The profiler now automatically sweeps and enforces disk-level cold starts without requiring explicit flags from the user.

Address PR review comment by moving setup tracking (rss_before, modules_before) outside of the time.perf_counter() window to prevent artificial latency inflation.

Addresses PR feedback to avoid broad exception catching when parsing physical python files for line counts.

Addresses PR review by explicitly documenting the expected PEP 3147/488 error path for irregular .pyc files.

…ssing modules Addresses PR review by replacing null-checks with a cleaner try/except block that assumes normal module behavior and explicitly logs when a C-extension or built-in is skipped due to a missing __file__ attribute.

Addresses PR review by replacing the 'none' string check with a typed NO_CPU_PINNING = -1 constant for cleaner argument parsing and state tracking.

…askset Addresses PR review by removing the fallback unpinned execution logic when taskset is not installed. The script now fails immediately with a clear error message, preventing code duplication.

Addresses PR review by collapsing the repeated statistics.quantiles logic into a reusable _calculate_percentiles helper method.

…terations Addresses PR review by validating that 'loaded_modules' and 'loaded_lines' remain deterministic across cold-start iterations, warning the user if non-deterministic import paths are triggered.

… tracemalloc profiling Addresses PR review by replacing the hard-to-maintain dynamic code string inside run_mprofile with a properly structured multiprocessing 'spawn' context.

… syntax

…562)

gemini-code-assist

Code Review

This pull request introduces shared utilities for mTLS endpoint handling in the google-cloud-kms package and adds an import profiling toolset under scripts/import_profiler/, which includes a profiler script, documentation, and a script to rewrite __init__.py files for lazy loading. The review feedback suggests several robust improvements to the profiling scripts: converting the rglob generator to a list to prevent traversal issues during directory deletion, updating the __all__ regex parser to support lists in addition to tuples, adding boundary checks to prevent potential IndexError during multi-line import parsing, and implementing __dir__ alongside __getattr__ to support autocompletion for lazy-loaded modules.

gemini-code-assist · 2026-06-25T09:41:57Z

+    """Recursively deletes all __pycache__ directories and .pyc files to ensure disk-level cold starts."""
+    print("Sweeping directory to delete __pycache__ and force bytecode recompilation...")
+    count = 0
+    for p in pathlib.Path('.').rglob('__pycache__'):


Modifying the directory structure (deleting __pycache__ directories) while traversing it with rglob can lead to unexpected behavior or FileNotFoundError. Converting the generator to a list first ensures safe traversal.

Suggested change

for p in pathlib.Path('.').rglob('__pycache__'):

for p in list(pathlib.Path('.').rglob('__pycache__')):

gemini-code-assist · 2026-06-25T09:41:58Z

+        content = f.read()
+
+    # Find the __all__ tuple/list
+    all_match = re.search(r'__all__\s*=\s*\((.*?)\)', content, re.DOTALL)


The __all__ variable in Python can be defined as either a tuple or a list. Updating the regular expression to support both parentheses and square brackets prevents parsing failures.

Suggested change

all_match = re.search(r'__all__\s*=\s*\((.*?)\)', content, re.DOTALL)

all_match = re.search(r'__all__\s*=\s*[(\[](.*?)[)\]]', content, re.DOTALL)

gemini-code-assist · 2026-06-25T09:41:58Z

+            if "(" in names_part and ")" not in names_part:
+                # multi-line import
+                import_names_str = names_part.replace("(", "")
+                i += 1
+                while ")" not in lines[i]:
+                    import_names_str += lines[i]
+                    i += 1
+                import_names_str += lines[i].replace(")", "")


If a multi-line import is malformed or lacks a closing parenthesis, iterating over lines without checking bounds can raise an IndexError. Adding a boundary check ensures safe parsing.

Suggested change

if "(" in names_part and ")" not in names_part:

# multi-line import

import_names_str = names_part.replace("(", "")

i += 1

while ")" not in lines[i]:

import_names_str += lines[i]

i += 1

import_names_str += lines[i].replace(")", "")

if "(" in names_part and ")" not in names_part:

# multi-line import

import_names_str = names_part.replace("(", "")

i += 1

while i < len(lines) and ")" not in lines[i]:

import_names_str += lines[i]

i += 1

if i < len(lines):

import_names_str += lines[i].replace(")", "")

gemini-code-assist · 2026-06-25T09:41:58Z

+    def __getattr__(name):
+        if name in _LAZY_IMPORTS:
+            module = importlib.import_module(_LAZY_IMPORTS[name], package=__name__)
+            attr = getattr(module, name)
+            globals()[name] = attr
+            return attr
+        raise AttributeError(f"module {{__name__!r}} has no attribute {{name!r}}")


When implementing lazy loading via __getattr__ (PEP 562), it is highly recommended to also define __dir__ so that interactive environments (like IPython, Jupyter, or standard REPL) and IDEs can correctly autocomplete the lazy-imported attributes.

def __getattr__(name): if name in _LAZY_IMPORTS: module = importlib.import_module(_LAZY_IMPORTS[name], package=__name__) attr = getattr(module, name) globals()[name] = attr return attr raise AttributeError(f"module {__name__!r} has no attribute {name!r}") def __dir__(): return sorted(list(globals().keys()) + list(_LAZY_IMPORTS.keys()))

hebaalazzeh and others added 30 commits June 15, 2026 21:27

feat: add import profiler tool with dynamic loaded lines code volume …

7e325db

…metrics

fix: refactor profiler worker execution and handling

9340dc9

- Extract worker logic to _run_worker_and_parse for robust JSON parsing - Add early validation for iterations parameter in run_master - Add non-zero exit code check and warning for run_trace failures

Update scripts/import_profiler/profiler.py

d8c26a2

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Update scripts/import_profiler/profiler.py

f2c998c

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

refactor: replace manual sys.argv parsing with argparse

d7ae73f

Update scripts/import_profiler/profiler.py

6a94207

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Update scripts/import_profiler/profiler.py

e15d972

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Update scripts/import_profiler/profiler.py

051b455

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Update scripts/import_profiler/profiler.py

9f55f1c

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

fix: resolve code injection vulnerability when importing target_module

fb35fd3

- Use json.dumps to safely escape target_module string in all subprocess commands (mprofile, cprofile, trace)

docs: simplify profiler mechanism section

727d44e

- Replace detailed implementation architecture and alternative solutions with a concise high-level usage and expected output format per PR review feedback

update documenation md

70c2a24

docs: add inline comments explaining isolation of interpreter boot-ti…

40ac359

…me latency

feat: make pycache wiping automatic by default

88c2e7f

- Replaces --clear-pycache with an opt-out --keep-pycache flag - The profiler now automatically sweeps and enforces disk-level cold starts without requiring explicit flags from the user.

fix: tighten performance timer to strictly wrap import_module

fa00b05

Address PR review comment by moving setup tracking (rss_before, modules_before) outside of the time.perf_counter() window to prevent artificial latency inflation.

fix: replace bare Exception with OSError during LOC parsing

137085a

Addresses PR feedback to avoid broad exception catching when parsing physical python files for line counts.

docs: explain why ValueError is silently passed during source_from_cache

68465e5

Addresses PR review by explicitly documenting the expected PEP 3147/488 error path for irregular .pyc files.

refactor: replace magic string for CPU pinning with integer constant

9a3e981

Addresses PR review by replacing the 'none' string check with a typed NO_CPU_PINNING = -1 constant for cleaner argument parsing and state tracking.

refactor: remove code duplication and enforce fail-fast for missing t…

aaaa1a4

…askset Addresses PR review by removing the fallback unpinned execution logic when taskset is not installed. The script now fails immediately with a clear error message, preventing code duplication.

refactor: extract percentile calculation into a helper method

352188f

Addresses PR review by collapsing the repeated statistics.quantiles logic into a reusable _calculate_percentiles helper method.

refactor: emit warnings if dynamic import volume fluctuates between i…

c082ba7

…terations Addresses PR review by validating that 'loaded_modules' and 'loaded_lines' remain deterministic across cold-start iterations, warning the user if non-deterministic import paths are triggered.

refactor: use multiprocessing instead of dynamic string execution for…

3cc3c9d

… tracemalloc profiling Addresses PR review by replacing the hard-to-maintain dynamic code string inside run_mprofile with a properly structured multiprocessing 'spawn' context.

Prototype: PEP 0810 lazy imports architecture for GAPIC generator

237f4d8

chore: update rewrite_init.py to use list-based PEP 0810 lazy loading…

ad932ff

… syntax

hebaalazzeh added 2 commits June 25, 2026 07:41

Implement hybrid lazy loading strategy for compute_v1 (PEP 810 + PEP …

c2246a6

…562)

feat: manual deduplication experiment on google-cloud-kms

72547ec

gemini-code-assist Bot reviewed Jun 25, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Gapic deduplication experimentation#17573

Gapic deduplication experimentation#17573
hebaalazzeh wants to merge 32 commits into
mainfrom
gapic-deduplication-experimentation

hebaalazzeh commented Jun 25, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 25, 2026

Uh oh!

gemini-code-assist Bot Jun 25, 2026

Uh oh!

gemini-code-assist Bot Jun 25, 2026

Uh oh!

gemini-code-assist Bot Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	for p in pathlib.Path('.').rglob('__pycache__'):
	for p in list(pathlib.Path('.').rglob('__pycache__')):

	all_match = re.search(r'__all__\s=\s\((.*?)\)', content, re.DOTALL)
	all_match = re.search(r'__all__\s=\s[(\[](.*?)[)\]]', content, re.DOTALL)

Uh oh!

Conversation

hebaalazzeh commented Jun 25, 2026

GAPIC Client Deduplication (Shared Utility Layer)

Goal

Background

Proposed Approach (Shared Utility Layer)

Benefits:

Manual Validation Results (google-cloud-kms)

Steps Taken:

Impact:

Testing Note (Environmental Edge Case Discovered):

References

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Manual Validation Results (`google-cloud-kms`)