add quick validation for the api key and add log if api key isn't valid#119
add quick validation for the api key and add log if api key isn't valid#119yiyche wants to merge 2 commits into
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces Data Commons API key validation before starting a data ingestion job by parsing terraform.tfvars and validating the key against api.datacommons.org. Feedback focuses on improving the robustness of the implementation: handling other HTTP errors (such as 403 Forbidden) during validation instead of silently ignoring them, refining the tfvars parser to prevent truncating API keys that contain # or // characters, and adding corresponding test coverage for keys with special characters.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| except urllib.error.HTTPError as e: | ||
| if e.code == 401: | ||
| click.secho("\n[!] ERROR: The Data Commons API Key in your terraform.tfvars was rejected by the server (HTTP 401 Unauthorized).", fg="red", bold=True) | ||
| click.secho("Ingestion cannot proceed with an invalid API Key.", fg="red") | ||
| raise click.ClickException("Invalid API key. Please check your terraform.tfvars.") |
There was a problem hiding this comment.
The current implementation of except urllib.error.HTTPError as e: only handles e.code == 401. If the server returns any other HTTP error (such as 403 Forbidden or 500 Internal Server Error), the exception is caught but silently ignored because there is no fallback or else block. This causes the CLI to proceed with starting the ingestion without any warning or error, even though the validation request failed.
We should handle 403 Forbidden as an invalid API key error, and print a warning for other HTTP status codes.
| except urllib.error.HTTPError as e: | |
| if e.code == 401: | |
| click.secho("\n[!] ERROR: The Data Commons API Key in your terraform.tfvars was rejected by the server (HTTP 401 Unauthorized).", fg="red", bold=True) | |
| click.secho("Ingestion cannot proceed with an invalid API Key.", fg="red") | |
| raise click.ClickException("Invalid API key. Please check your terraform.tfvars.") | |
| except urllib.error.HTTPError as e: | |
| if e.code in [401, 403]: | |
| click.secho(f"\n[!] ERROR: The Data Commons API Key in your terraform.tfvars was rejected by the server (HTTP {e.code}).", fg="red", bold=True) | |
| click.secho("Ingestion cannot proceed with an invalid API Key.", fg="red") | |
| raise click.ClickException("Invalid API key. Please check your terraform.tfvars.") | |
| else: | |
| click.secho(f"Warning: Could not validate API key (HTTP {e.code}). Skipping validation.", fg="yellow") |
| for line in content.splitlines(): | ||
| # Strip inline comments (e.g., # or //) | ||
| line = line.split("#", 1)[0].split("//", 1)[0].strip() | ||
| if not line: | ||
| continue | ||
| if "=" in line: | ||
| parts = line.split("=", 1) | ||
| k = parts[0].strip() | ||
| v = parts[1].strip() | ||
| if (v.startswith('"') and v.endswith('"')) or (v.startswith("'") and v.endswith("'")): | ||
| v = v[1:-1].strip() | ||
| if k in ["auth_google_datacommons_api_key", "dc_api_key"]: | ||
| return v |
There was a problem hiding this comment.
The current parser strips inline comments by splitting the entire line on # and // before parsing the key-value pair. If the API key itself contains a # or // character (which can happen with some keys or tokens), the key will be silently truncated, leading to validation and ingestion failures.
We can make the parser much more robust by splitting on = first, and then extracting the quoted string value while respecting the quotes. This prevents comments inside the string from being stripped.
| for line in content.splitlines(): | |
| # Strip inline comments (e.g., # or //) | |
| line = line.split("#", 1)[0].split("//", 1)[0].strip() | |
| if not line: | |
| continue | |
| if "=" in line: | |
| parts = line.split("=", 1) | |
| k = parts[0].strip() | |
| v = parts[1].strip() | |
| if (v.startswith('"') and v.endswith('"')) or (v.startswith("'") and v.endswith("'")): | |
| v = v[1:-1].strip() | |
| if k in ["auth_google_datacommons_api_key", "dc_api_key"]: | |
| return v | |
| for line in content.splitlines(): | |
| line = line.strip() | |
| if not line or line.startswith("#") or line.startswith("//"): | |
| continue | |
| if "=" in line: | |
| parts = line.split("=", 1) | |
| k = parts[0].strip() | |
| v = parts[1].strip() | |
| if v.startswith('"'): | |
| end_idx = v.find('"', 1) | |
| if end_idx != -1: | |
| v = v[1:end_idx] | |
| elif v.startswith("'"): | |
| end_idx = v.find("'", 1) | |
| if end_idx != -1: | |
| v = v[1:end_idx] | |
| else: | |
| v = v.split("#", 1)[0].split("//", 1)[0].strip() | |
| if k in ["auth_google_datacommons_api_key", "dc_api_key"]: | |
| return v |
| # 6. Test short key name (dc_api_key) | ||
| tfvars.write_text('dc_api_key = "my-test-key-5"') | ||
| assert get_tfvars_api_key() == "my-test-key-5" |
There was a problem hiding this comment.
To ensure the parser correctly handles API keys containing special characters like # or // without truncating them, we should add a test case covering this scenario.
| # 6. Test short key name (dc_api_key) | |
| tfvars.write_text('dc_api_key = "my-test-key-5"') | |
| assert get_tfvars_api_key() == "my-test-key-5" | |
| # 6. Test short key name (dc_api_key) | |
| tfvars.write_text('dc_api_key = "my-test-key-5"') | |
| assert get_tfvars_api_key() == "my-test-key-5" | |
| # 7. Test key containing special characters like '#' or '//' | |
| tfvars.write_text('dc_api_key = "my#key//value" # inline comment') | |
| assert get_tfvars_api_key() == "my#key//value" |
e06a39d to
0021f1b
Compare
dwnoble
left a comment
There was a problem hiding this comment.
I like doing a check for the API key before starting the ingestion job, but I think the logistics of it should go in a different direction.
TLDR; instead of doing the api key check in the CLI, can you add a new new ingestion helper endpoint (maybe as a new "ingestion.py" router here: https://github.com/datacommonsorg/import/tree/master/pipeline/workflow/ingestion-helper/routes) that does the api key check and also starts the cloud run job? then update the datacommons admin ingest start command to call the ingestion helper api endpoint instead of starting cloud run directly
Rationale:
Today, we start the ingestion job by invoking a cloud run job directly, and on completion that job starts a cloud workflow. Eventually we want to move the cloud run workflow. From a user perspective, they shouldn't care whether we start a cloud run job or workflow, they should just know that ingestion has started and how they can check on it.
Also, in addition to checking if the API key exists, we should also check things like:
- does the input data artifact gcs bucket exist?
- is there a config.json and other required files in the input directory in the above bucket?
- is there an ingestion job already running?
- probably more down the road
last thing- here the CLI checks if the api key in the terraform.tfvars is valid, but it really should be checking if the actual secret in GCP is valid (example: https://pantheon.corp.google.com/security/secret-manager/secret/itest-4771-dc-api-key/versions?e=13803378&mods=-monitoring_api_staging&project=datcom-ci). while this is totally possible, i think it's more elegant to instead update terraform to set a DC api key in the environment for the ingestion helper, then just have the ingestion helper check if its local key is valid
59f0bb8 to
fb6c10f
Compare
…e, and remove IngestionJobClient
fb6c10f to
36bef65
Compare
…e, and remove IngestionJobClient
36bef65 to
8528f72
Compare
…e, and remove IngestionJobClient
8528f72 to
3173204
Compare
just did - PTAL, thank you! |
Add API key validations: