Skip to content

add quick validation for the api key and add log if api key isn't valid#119

Open
yiyche wants to merge 2 commits into
mainfrom
feature/validate-api-key
Open

add quick validation for the api key and add log if api key isn't valid#119
yiyche wants to merge 2 commits into
mainfrom
feature/validate-api-key

Conversation

@yiyche

@yiyche yiyche commented Jun 12, 2026

Copy link
Copy Markdown
Member

Add API key validations:

  1. Local config parser (get_tfvars_api_key): Added a parser in tf_utils.py to extract the API key from the local terraform.tfvars
  2. Triggers a quick validation request against api.datacommons.org/v2/node. If the server returns an HTTP 401 Unauthorized, the CLI prints a clear warning and aborts execution immediately

@yiyche yiyche requested a review from dwnoble June 12, 2026 22:23

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces Data Commons API key validation before starting a data ingestion job by parsing terraform.tfvars and validating the key against api.datacommons.org. Feedback focuses on improving the robustness of the implementation: handling other HTTP errors (such as 403 Forbidden) during validation instead of silently ignoring them, refining the tfvars parser to prevent truncating API keys that contain # or // characters, and adding corresponding test coverage for keys with special characters.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +67 to +71
except urllib.error.HTTPError as e:
if e.code == 401:
click.secho("\n[!] ERROR: The Data Commons API Key in your terraform.tfvars was rejected by the server (HTTP 401 Unauthorized).", fg="red", bold=True)
click.secho("Ingestion cannot proceed with an invalid API Key.", fg="red")
raise click.ClickException("Invalid API key. Please check your terraform.tfvars.")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current implementation of except urllib.error.HTTPError as e: only handles e.code == 401. If the server returns any other HTTP error (such as 403 Forbidden or 500 Internal Server Error), the exception is caught but silently ignored because there is no fallback or else block. This causes the CLI to proceed with starting the ingestion without any warning or error, even though the validation request failed.

We should handle 403 Forbidden as an invalid API key error, and print a warning for other HTTP status codes.

Suggested change
except urllib.error.HTTPError as e:
if e.code == 401:
click.secho("\n[!] ERROR: The Data Commons API Key in your terraform.tfvars was rejected by the server (HTTP 401 Unauthorized).", fg="red", bold=True)
click.secho("Ingestion cannot proceed with an invalid API Key.", fg="red")
raise click.ClickException("Invalid API key. Please check your terraform.tfvars.")
except urllib.error.HTTPError as e:
if e.code in [401, 403]:
click.secho(f"\n[!] ERROR: The Data Commons API Key in your terraform.tfvars was rejected by the server (HTTP {e.code}).", fg="red", bold=True)
click.secho("Ingestion cannot proceed with an invalid API Key.", fg="red")
raise click.ClickException("Invalid API key. Please check your terraform.tfvars.")
else:
click.secho(f"Warning: Could not validate API key (HTTP {e.code}). Skipping validation.", fg="yellow")

Comment on lines +149 to +161
for line in content.splitlines():
# Strip inline comments (e.g., # or //)
line = line.split("#", 1)[0].split("//", 1)[0].strip()
if not line:
continue
if "=" in line:
parts = line.split("=", 1)
k = parts[0].strip()
v = parts[1].strip()
if (v.startswith('"') and v.endswith('"')) or (v.startswith("'") and v.endswith("'")):
v = v[1:-1].strip()
if k in ["auth_google_datacommons_api_key", "dc_api_key"]:
return v

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current parser strips inline comments by splitting the entire line on # and // before parsing the key-value pair. If the API key itself contains a # or // character (which can happen with some keys or tokens), the key will be silently truncated, leading to validation and ingestion failures.

We can make the parser much more robust by splitting on = first, and then extracting the quoted string value while respecting the quotes. This prevents comments inside the string from being stripped.

Suggested change
for line in content.splitlines():
# Strip inline comments (e.g., # or //)
line = line.split("#", 1)[0].split("//", 1)[0].strip()
if not line:
continue
if "=" in line:
parts = line.split("=", 1)
k = parts[0].strip()
v = parts[1].strip()
if (v.startswith('"') and v.endswith('"')) or (v.startswith("'") and v.endswith("'")):
v = v[1:-1].strip()
if k in ["auth_google_datacommons_api_key", "dc_api_key"]:
return v
for line in content.splitlines():
line = line.strip()
if not line or line.startswith("#") or line.startswith("//"):
continue
if "=" in line:
parts = line.split("=", 1)
k = parts[0].strip()
v = parts[1].strip()
if v.startswith('"'):
end_idx = v.find('"', 1)
if end_idx != -1:
v = v[1:end_idx]
elif v.startswith("'"):
end_idx = v.find("'", 1)
if end_idx != -1:
v = v[1:end_idx]
else:
v = v.split("#", 1)[0].split("//", 1)[0].strip()
if k in ["auth_google_datacommons_api_key", "dc_api_key"]:
return v

Comment on lines +481 to +483
# 6. Test short key name (dc_api_key)
tfvars.write_text('dc_api_key = "my-test-key-5"')
assert get_tfvars_api_key() == "my-test-key-5"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To ensure the parser correctly handles API keys containing special characters like # or // without truncating them, we should add a test case covering this scenario.

Suggested change
# 6. Test short key name (dc_api_key)
tfvars.write_text('dc_api_key = "my-test-key-5"')
assert get_tfvars_api_key() == "my-test-key-5"
# 6. Test short key name (dc_api_key)
tfvars.write_text('dc_api_key = "my-test-key-5"')
assert get_tfvars_api_key() == "my-test-key-5"
# 7. Test key containing special characters like '#' or '//'
tfvars.write_text('dc_api_key = "my#key//value" # inline comment')
assert get_tfvars_api_key() == "my#key//value"

@yiyche yiyche force-pushed the feature/validate-api-key branch from e06a39d to 0021f1b Compare June 12, 2026 22:40

@dwnoble dwnoble left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like doing a check for the API key before starting the ingestion job, but I think the logistics of it should go in a different direction.

TLDR; instead of doing the api key check in the CLI, can you add a new new ingestion helper endpoint (maybe as a new "ingestion.py" router here: https://github.com/datacommonsorg/import/tree/master/pipeline/workflow/ingestion-helper/routes) that does the api key check and also starts the cloud run job? then update the datacommons admin ingest start command to call the ingestion helper api endpoint instead of starting cloud run directly

Rationale:
Today, we start the ingestion job by invoking a cloud run job directly, and on completion that job starts a cloud workflow. Eventually we want to move the cloud run workflow. From a user perspective, they shouldn't care whether we start a cloud run job or workflow, they should just know that ingestion has started and how they can check on it.

Also, in addition to checking if the API key exists, we should also check things like:

  • does the input data artifact gcs bucket exist?
  • is there a config.json and other required files in the input directory in the above bucket?
  • is there an ingestion job already running?
  • probably more down the road

last thing- here the CLI checks if the api key in the terraform.tfvars is valid, but it really should be checking if the actual secret in GCP is valid (example: https://pantheon.corp.google.com/security/secret-manager/secret/itest-4771-dc-api-key/versions?e=13803378&mods=-monitoring_api_staging&project=datcom-ci). while this is totally possible, i think it's more elegant to instead update terraform to set a DC api key in the environment for the ingestion helper, then just have the ingestion helper check if its local key is valid

@yiyche yiyche force-pushed the feature/validate-api-key branch 3 times, most recently from 59f0bb8 to fb6c10f Compare June 13, 2026 06:22
@yiyche yiyche requested a review from dwnoble June 15, 2026 18:57

@dwnoble dwnoble left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @yiyche - have you started migrating the IngestionJobClient to the ingestion helper?

Comment thread packages/datacommons-admin/datacommons_admin/tf_utils.py Outdated
Comment thread packages/datacommons-admin/datacommons_admin/ingest_cli.py
Comment thread packages/datacommons-admin/datacommons_admin/ingest_cli.py Outdated
yiyche added a commit that referenced this pull request Jun 16, 2026
@yiyche yiyche force-pushed the feature/validate-api-key branch from fb6c10f to 36bef65 Compare June 16, 2026 06:42
yiyche added a commit that referenced this pull request Jun 16, 2026
@yiyche yiyche force-pushed the feature/validate-api-key branch from 36bef65 to 8528f72 Compare June 16, 2026 06:44
@yiyche yiyche force-pushed the feature/validate-api-key branch from 8528f72 to 3173204 Compare June 16, 2026 06:46
@yiyche yiyche requested a review from dwnoble June 16, 2026 06:47
@yiyche

yiyche commented Jun 16, 2026

Copy link
Copy Markdown
Member Author

Thanks @yiyche - have you started migrating the IngestionJobClient to the ingestion helper?

just did - PTAL, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants