Add mongo_sync script for syncing MongoDB data to JSON files by SimoneBendazzoli93 · Pull Request #150 · minnelab/MAIA

SimoneBendazzoli93 · 2026-05-05T11:22:26Z

This new script connects to a MongoDB database, retrieves project and user data, filters the information, and writes it to JSON files organized by project namespace. It includes error handling for date formatting and ensures the project folder is created if it doesn't exist.

Summary by CodeRabbit

New Features
- Added automated MongoDB data synchronization that exports project information and associated user lists to JSON files with normalized date formatting and namespace-based filtering to organize project exports.

Copilot

Pull request overview

Adds a new mongo_sync utility script under the dashboard to export MongoDB maia_projects data to per-namespace JSON files, enriching each project with a derived users email list and normalizing the date field.

Changes:

Connect to MongoDB via env-provided credentials and fetch maia_projects / maia_users.
Build a filtered project payload (selected fields + derived users) and write Projects/<namespace>.json.
Add basic date normalization for multiple MongoDB date representations.

Overall Verdict

REQUEST CHANGES

Critical Issues

The user-to-project membership check is incorrect (substring match vs exact membership in a comma-separated namespace list), which can produce wrong users lists in the generated JSON.

Optional Notes (non-blocking)

Some key-mapping branches (group_id/username) are currently unreachable due to the metadata gate and should be removed or reworked.
The script performs DB access and file writes at import time; guarding with a main()/if __name__ == "__main__": would prevent accidental side effects.
Remove or rewrite the inline “git clone with credentials in URL” comment to avoid promoting a secret-leaking workflow.

Copilot

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.

Copilot

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 4 comments.

This new script connects to a MongoDB database, retrieves project and user data, filters the information, and writes it to JSON files organized by project namespace. It includes error handling for date formatting and ensures the project folder is created if it doesn't exist.

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Signed-off-by: Simone Bendazzoli <simonebendazzoli93@gmail.com>

This update organizes the mongo_sync script by wrapping the main functionality in a `main()` function. It maintains the existing logic for connecting to the MongoDB database, retrieving project and user data, filtering the information, and writing it to JSON files. The project folder creation and date formatting error handling remain intact.

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Signed-off-by: Simone Bendazzoli <simonebendazzoli93@gmail.com>

- Introduced a new script, mongo_sync.py, that connects to a MongoDB database and retrieves project and user data. - The script filters and formats project information, including user emails associated with each project, and saves the data as JSON files in a designated "Projects" directory. - Implemented error handling for date formatting and ensured safe filename generation to prevent path traversal vulnerabilities.

Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com> Signed-off-by: Simone Bendazzoli <simonebendazzoli93@gmail.com>

coderabbitai · 2026-05-17T09:31:20Z

📝 Walkthrough

Walkthrough

This PR adds a MongoDB sync script that reads user and project documents from MongoDB, filters projects by matching user namespaces, normalizes metadata fields including dates, and exports each filtered project to a JSON file with a path-safe filename derived from the project namespace.

Changes

MongoDB Project Sync

Layer / File(s)	Summary
MongoDB connection, filtering, and JSON export `dashboard/core/mongo_sync.py`	Script reads MongoDB credentials from environment, loads all users and projects into memory, filters projects by matching each project's namespace against user-derived namespace lists, normalizes the `date` field with multiple fallback paths for different shapes, generates path-safe filenames from sanitized namespaces, and writes one JSON file per project to `Projects/`.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 A rabbit hops through MongoDB's store,
Filters projects and users by the score,
Dates dance through fallback paths with care,
JSON files bloom in Projects' lair—
Namespace matching, safe and sound! 🌱

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: adding a mongo_sync script that syncs MongoDB data to JSON files, which directly matches the file additions and functionality summarized in the raw_summary.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

✅ Generated successfully - (🔄 Check to regenerate)
Commit on current branch

🧪 Generate unit tests (beta)

✅ Unit Test PR creation complete.

Create PR with unit tests
Commit unit tests in branch mongo-sync-to-git

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@dashboard/core/mongo_sync.py`:
- Around line 59-67: The loop that populates filtered_project["users"] currently
appends raw user.get("email") values (in the users -> filtered_project
construction) which exports PII; change it to export a stable non-PII identifier
instead (e.g., user.get("id") or a deterministic hash of the email using a
project/system salt) so no raw emails are written to disk; update the same
pattern found around the other occurrence noted (lines ~103-104) to use the
ID/hash, and ensure any downstream consumers that expect emails read the new
field name/format (adjust keys or document the change) in the code paths
referencing filtered_project and project.
- Line 63: The code directly indexes project["namespace"] which can raise
KeyError for missing/invalid project docs; change the membership check in the
sync logic to use project.get("namespace") (or assign ns =
project.get("namespace")) and skip the record if ns is None or not a str, i.e.
only proceed to check membership against user_namespaces when a valid namespace
exists to avoid aborting the whole sync.
- Around line 71-77: The code assumes v["$date"] exists and is a string before
parsing in the block around filtered_project[k], which can raise KeyError or
TypeError; change it to safely fetch and normalize the $date value (e.g. use
v.get("$date") or check "$date" in v), ensure non-string values are converted to
str() before calling replace("+00:00")/fromisoformat, and keep the
datetime.fromisoformat(...) call inside the try/except so missing or malformed
values fall back to assigning the original (or stringified) date to
filtered_project[k]; update the logic around variables v, k, filtered_project
and datetime.fromisoformat accordingly.
- Around line 98-104: The current sanitization of raw_namespace into
safe_namespace can cause collisions (e.g., "team/a" -> "team_a") and overwrite
existing exports; update the write logic around safe_namespace (used in
project_folder.joinpath(safe_namespace + ".json") and
json.dump(filtered_project,...)) to detect collisions and produce a unique,
deterministic filename instead of silently overwriting: after computing
safe_namespace, check whether a file with that name already exists and if it was
produced from a different raw_namespace (track a mapping or compare existing
metadata), and if a collision is detected append a short deterministic suffix
(e.g., a hex hash of raw_namespace like first 8 chars) to the filename or raise
a ValueError; ensure the chosen approach uses raw_namespace for uniqueness and
still prevents path traversal via the existing Path(safe_namespace).name check
before opening the file.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: b0ec6139-1168-4315-9029-4f9fde529d77

📥 Commits

Reviewing files that changed from the base of the PR and between 7757d3d and e76abbb.

📒 Files selected for processing (1)

dashboard/core/mongo_sync.py

coderabbitai · 2026-05-17T09:32:58Z

+        filtered_project = {"users": []}
+        for user in users:
+            user_namespace_value = user.get("namespace") or ""
+            user_namespaces = [namespace.strip() for namespace in user_namespace_value.split(",") if namespace.strip()]
+            if project["namespace"] in user_namespaces:
+                user_email = user.get("email")
+                if user_email:
+                    filtered_project["users"].append(user_email)
+        for k, v in project.items():


⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Reassess exporting raw user emails to disk (PII retention risk).

The script writes email addresses into project JSON files under Projects/. If these files are committed/synced, this creates a privacy/compliance exposure. Prefer stable user IDs or hashed emails unless explicit policy/legal basis requires raw emails.

Also applies to: 103-104

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@dashboard/core/mongo_sync.py` around lines 59 - 67, The loop that populates filtered_project["users"] currently appends raw user.get("email") values (in the users -> filtered_project construction) which exports PII; change it to export a stable non-PII identifier instead (e.g., user.get("id") or a deterministic hash of the email using a project/system salt) so no raw emails are written to disk; update the same pattern found around the other occurrence noted (lines ~103-104) to use the ID/hash, and ensure any downstream consumers that expect emails read the new field name/format (adjust keys or document the change) in the code paths referencing filtered_project and project.

coderabbitai · 2026-05-17T09:32:58Z

+        for user in users:
+            user_namespace_value = user.get("namespace") or ""
+            user_namespaces = [namespace.strip() for namespace in user_namespace_value.split(",") if namespace.strip()]
+            if project["namespace"] in user_namespaces:


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guard missing project namespace before membership checks.

project["namespace"] can raise KeyError and abort the whole sync when a project document is missing/invalid. Use .get() and skip invalid records safely.

Proposed fix

- if project["namespace"] in user_namespaces: + project_namespace = project.get("namespace") + if not project_namespace: + continue + if project_namespace in user_namespaces: user_email = user.get("email") if user_email: filtered_project["users"].append(user_email)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if project["namespace"] in user_namespaces:

project_namespace = project.get("namespace")

if not project_namespace:

continue

if project_namespace in user_namespaces:

user_email = user.get("email")

if user_email:

filtered_project["users"].append(user_email)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@dashboard/core/mongo_sync.py` at line 63, The code directly indexes project["namespace"] which can raise KeyError for missing/invalid project docs; change the membership check in the sync logic to use project.get("namespace") (or assign ns = project.get("namespace")) and skip the record if ns is None or not a str, i.e. only proceed to check membership against user_namespaces when a valid namespace exists to avoid aborting the whole sync.

coderabbitai · 2026-05-17T09:32:58Z

+                    if isinstance(v, dict):
+                        date_str = v["$date"]
+                        try:
+                            date_obj = datetime.fromisoformat(date_str.replace("Z", "+00:00"))
+                            filtered_project[k] = date_obj.strftime("%Y-%m-%d")
+                        except Exception:
+                            filtered_project[k] = date_str


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Handle $date shape safely before parsing.

date_str = v["$date"] is outside the try, so missing $date crashes the run. Also, non-string $date values should be normalized safely.

Proposed fix

if k == "date": if isinstance(v, dict): - date_str = v["$date"] + date_raw = v.get("$date") try: - date_obj = datetime.fromisoformat(date_str.replace("Z", "+00:00")) - filtered_project[k] = date_obj.strftime("%Y-%m-%d") + if isinstance(date_raw, str): + date_obj = datetime.fromisoformat(date_raw.replace("Z", "+00:00")) + filtered_project[k] = date_obj.strftime("%Y-%m-%d") + elif isinstance(date_raw, datetime): + filtered_project[k] = date_raw.strftime("%Y-%m-%d") + else: + filtered_project[k] = str(date_raw) if date_raw is not None else "" except Exception: - filtered_project[k] = date_str + filtered_project[k] = str(date_raw) if date_raw is not None else ""

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@dashboard/core/mongo_sync.py` around lines 71 - 77, The code assumes v["$date"] exists and is a string before parsing in the block around filtered_project[k], which can raise KeyError or TypeError; change it to safely fetch and normalize the $date value (e.g. use v.get("$date") or check "$date" in v), ensure non-string values are converted to str() before calling replace("+00:00")/fromisoformat, and keep the datetime.fromisoformat(...) call inside the try/except so missing or malformed values fall back to assigning the original (or stringified) date to filtered_project[k]; update the logic around variables v, k, filtered_project and datetime.fromisoformat accordingly.

coderabbitai · 2026-05-17T09:32:58Z

+        safe_namespace = re.sub(r"[^a-z0-9-]", "_", raw_namespace.lower())
+        # Additionally, enforce that no path traversal can happen
+        if Path(safe_namespace).name != safe_namespace or not safe_namespace:
+            raise ValueError(f"Unsafe or empty namespace for filename: {raw_namespace}")
+
+        with open(project_folder.joinpath(safe_namespace + ".json"), "w") as f:
+            json.dump(filtered_project, f, indent=4)


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Prevent sanitized filename collisions from overwriting project exports.

Different namespaces can collapse to the same safe_namespace (e.g., team/a and team_a) and silently overwrite JSON files.

Proposed fix

+ existing_name = filtered_project.get("namespace", "") + file_name = f"{safe_namespace}.json" + output_path = project_folder / file_name + if output_path.exists(): + with open(output_path, "r") as rf: + existing = json.load(rf) + if existing.get("namespace") != existing_name: + raise ValueError( + f"Namespace collision after sanitization: '{existing_name}' -> '{file_name}'" + ) - - with open(project_folder.joinpath(safe_namespace + ".json"), "w") as f: + with open(output_path, "w") as f: json.dump(filtered_project, f, indent=4)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

safe_namespace = re.sub(r"[^a-z0-9-]", "_", raw_namespace.lower())

# Additionally, enforce that no path traversal can happen

if Path(safe_namespace).name != safe_namespace or not safe_namespace:

raise ValueError(f"Unsafe or empty namespace for filename: {raw_namespace}")

with open(project_folder.joinpath(safe_namespace + ".json"), "w") as f:

json.dump(filtered_project, f, indent=4)

safe_namespace = re.sub(r"[^a-z0-9-]", "_", raw_namespace.lower())

# Additionally, enforce that no path traversal can happen

if Path(safe_namespace).name != safe_namespace or not safe_namespace:

raise ValueError(f"Unsafe or empty namespace for filename: {raw_namespace}")

existing_name = filtered_project.get("namespace", "")

file_name = f"{safe_namespace}.json"

output_path = project_folder / file_name

if output_path.exists():

with open(output_path, "r") as rf:

existing = json.load(rf)

if existing.get("namespace") != existing_name:

raise ValueError(

f"Namespace collision after sanitization: '{existing_name}' -> '{file_name}'"

)

with open(output_path, "w") as f:

json.dump(filtered_project, f, indent=4)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@dashboard/core/mongo_sync.py` around lines 98 - 104, The current sanitization of raw_namespace into safe_namespace can cause collisions (e.g., "team/a" -> "team_a") and overwrite existing exports; update the write logic around safe_namespace (used in project_folder.joinpath(safe_namespace + ".json") and json.dump(filtered_project,...)) to detect collisions and produce a unique, deterministic filename instead of silently overwriting: after computing safe_namespace, check whether a file with that name already exists and if it was produced from a different raw_namespace (track a mapping or compare existing metadata), and if a collision is detected append a short deterministic suffix (e.g., a hex hash of raw_namespace like first 8 chars) to the filename or raise a ValueError; ensure the chosen approach uses raw_namespace for uniqueness and still prevents path traversal via the existing Path(safe_namespace).name check before opening the file.

coderabbitai · 2026-05-17T11:38:39Z

Note

Docstrings generation - SUCCESS
Generated docstrings for this pull request at #154

@SimoneBendazzoli93

Docstrings generation was requested by @SimoneBendazzoli93. * #150 (comment) The following files were modified: * `dashboard/core/mongo_sync.py`

coderabbitai · 2026-05-17T11:40:19Z

Note

Unit test generation is a public access feature. Expect some limitations and changes as we gather feedback and continue to improve it.

Generating unit tests... This may take up to 20 minutes.

coderabbitai · 2026-05-17T11:45:01Z

✅ Created PR with unit tests: #155

Copilot AI review requested due to automatic review settings May 5, 2026 11:22

Copilot started reviewing on behalf of SimoneBendazzoli93 May 5, 2026 11:23 View session

Copilot AI reviewed May 5, 2026

View reviewed changes

Comment thread dashboard/mongo_sync.py Outdated

Comment thread dashboard/mongo_sync.py Outdated

Comment thread dashboard/mongo_sync.py Outdated

Comment thread dashboard/mongo_sync.py Outdated

SimoneBendazzoli93 requested a review from Copilot May 5, 2026 13:09

Copilot started reviewing on behalf of SimoneBendazzoli93 May 5, 2026 13:10 View session

Copilot AI reviewed May 5, 2026

View reviewed changes

Comment thread dashboard/mongo_sync.py Outdated

SimoneBendazzoli93 requested a review from Copilot May 5, 2026 21:56

Copilot started reviewing on behalf of SimoneBendazzoli93 May 5, 2026 21:56 View session

Copilot AI reviewed May 5, 2026

View reviewed changes

Comment thread dashboard/mongo_sync.py Outdated

Comment thread dashboard/mongo_sync.py Outdated

Comment thread dashboard/mongo_sync.py Outdated

Comment thread dashboard/mongo_sync.py Outdated

SimoneBendazzoli93 and others added 7 commits May 17, 2026 10:16

Potential fix for pull request finding

ea3eeab

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Signed-off-by: Simone Bendazzoli <simonebendazzoli93@gmail.com>

Potential fix for pull request finding

e6103f6

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Signed-off-by: Simone Bendazzoli <simonebendazzoli93@gmail.com>

Potential fix for pull request finding

d41da31

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Signed-off-by: Simone Bendazzoli <simonebendazzoli93@gmail.com>

Potential fix for pull request finding

2476268

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Signed-off-by: Simone Bendazzoli <simonebendazzoli93@gmail.com>

SimoneBendazzoli93 force-pushed the mongo-sync-to-git branch from 3af7f4b to 72b77fb Compare May 17, 2026 08:16

SimoneBendazzoli93 closed this May 17, 2026

SimoneBendazzoli93 reopened this May 17, 2026

kick github to refresh checks

f9a6e48

github-code-quality Bot found potential problems May 17, 2026

View reviewed changes

Comment thread dashboard/core/mongo_sync.py Fixed

Potential fix for pull request finding 'Statement has no effect'

f8cdedb

Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com> Signed-off-by: Simone Bendazzoli <simonebendazzoli93@gmail.com>

SimoneBendazzoli93 self-assigned this May 17, 2026

SimoneBendazzoli93 added 2 commits May 17, 2026 11:30

kick github to refresh checks and coderabbitai

1e1fc60

kick github to refresh checks and coderabbitai

e76abbb

coderabbitai Bot reviewed May 17, 2026

View reviewed changes

coderabbitai Bot added a commit that referenced this pull request May 17, 2026

📝 Add docstrings to mongo-sync-to-git

675e94f

Docstrings generation was requested by @SimoneBendazzoli93. * #150 (comment) The following files were modified: * `dashboard/core/mongo_sync.py`

coderabbitai Bot mentioned this pull request May 17, 2026

📝 Add docstrings to mongo-sync-to-git #154

Open

coderabbitai Bot mentioned this pull request May 17, 2026

CodeRabbit Generated Unit Tests: Add generated unit tests #155

Open

Conversation

SimoneBendazzoli93 commented May 5, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Overall Verdict

Critical Issues

Optional Notes (non-blocking)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot commented May 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot commented May 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented May 17, 2026

Uh oh!

coderabbitai Bot commented May 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

SimoneBendazzoli93 commented May 5, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 17, 2026 •

edited

Loading

coderabbitai Bot commented May 17, 2026 •

edited

Loading