Skip to content

41526 Storage [Backend] Create a file service#53

Open
EBirkenfeld wants to merge 84 commits intomasterfrom
41526__create_file_service
Open

41526 Storage [Backend] Create a file service#53
EBirkenfeld wants to merge 84 commits intomasterfrom
41526__create_file_service

Conversation

@EBirkenfeld
Copy link
Collaborator

@EBirkenfeld EBirkenfeld commented Oct 24, 2025

Note

Add a dedicated file storage microservice and replace legacy GCS attachment handling

  • Introduces a new standalone FastAPI service (storage/) that handles file upload and download via SeaweedFS (S3-compatible), with JWT/Redis-based auth, Alembic migrations, and structured error handling.
  • Adds a new Attachment Django model in backend/src/storage/ replacing the legacy FileAttachment workflow; attachments are now identified by file_id strings and carry access_type/source_type metadata with per-object permissions via django-guardian.
  • File references in task fields, comments, templates, and workflow descriptions are now expressed as Markdown links (e.g. [name](url)) pointing to FILE_DOMAIN; refresh_attachments syncs storage records and permissions whenever content changes.
  • Nginx is updated to proxy FILE_DOMAIN to the new pneumatic-file-service container; all compose files provision the file service stack (app, Postgres, SeaweedFS).
  • Frontend upload flow is rewritten in uploadFiles.ts to POST to the file service and receive string fileId/publicUrl; accountId is removed from all upload call sites.
  • Behavioral Change: comments now require non-empty text; the attachments list field is removed from comment creation and workflow event serializers; StoragePermission and the /workflows/attachments routes are removed; GCS upload paths are gone.
📊 Macroscope summarized af43724. 131 files reviewed, 161 issues evaluated, 81 issues filtered, 3 comments posted

🗂️ Filtered Issues

backend/src/accounts/admin.py — 0 comments posted, 3 evaluated, 2 filtered
  • line 137: When photo upload fails (in the except FileServiceException block), the error is logged to Sentry but the form continues to super().save() without raising a validation error or notifying the admin user. The user clicks save, sees a success message, but their photo was never uploaded. This silent failure means the photo upload feature appears to work but silently does nothing on error. [ Already posted ]
  • line 137: In UserAdminChangeForm.save(), when FileServiceException is raised during photo upload, the exception is caught and logged to Sentry, but the form continues to save without the photo being updated. The admin user receives no indication that the photo upload failed - no error message is added to the form, no exception is re-raised, and no Django message is displayed. This results in a silent failure where the user believes the photo was saved but it was not. [ Already posted ]
backend/src/accounts/forms.py — 0 comments posted, 2 evaluated, 2 filtered
  • line 40: Removing the settings.PROJECT_CONF['STORAGE'] validation check in clean_photo_file means that when storage is not configured, users can now submit a photo file that will silently fail to upload (caught by FileServiceException in save()), resulting in the contact being saved without the photo and no error feedback to the user. The old code prevented this by adding a form error before the save attempt. [ Already posted ]
  • line 59: When photo upload fails (in the except FileServiceException block), the error is logged to Sentry but the form saves successfully without notifying the admin user. The admin sees success but the contact's photo was never updated, causing silent data loss. [ Cross-file consolidated ]
backend/src/accounts/services/group.py — 0 comments posted, 3 evaluated, 1 filtered
  • line 202: The condition new_name != self.instance.name at line 202 will evaluate to True when new_name is None (not provided in update_kwargs) and self.instance.name has any value. This causes the analytics tracking block to execute incorrectly when name was not being updated, potentially triggering unnecessary analytics events. [ Out of scope ]
backend/src/analysis/services.py — 0 comments posted, 1 evaluated, 1 filtered
  • line 694: Accessing user.is_authenticated on line 694 and 711 will raise AttributeError when user is None. The parameter user: Optional[UserModel] = None allows None values, but the code accesses user.is_authenticated without first checking if user is not None. [ Out of scope ]
backend/src/applications/admin.py — 0 comments posted, 3 evaluated, 2 filtered
  • line 72: If self.user is None (which is the default value set in __init__), line 72 will raise AttributeError: 'NoneType' object has no attribute 'account' when accessing self.user.account. This AttributeError is not a subclass of FileServiceException, so it will not be caught by the except block and will propagate up, causing an unhandled exception. [ Already posted ]
  • line 74: When logo upload fails (in the except FileServiceException block), file_url remains None and the integration is saved without a logo. The admin user gets no indication that the upload failed - the integration is created/updated but missing its logo image. [ Cross-file consolidated ]
backend/src/authentication/services/microsoft.py — 0 comments posted, 2 evaluated, 1 filtered
  • line 160: In _get_user_photo, account.get_owner() can return None if the account has no user with is_account_owner=True. When FileServiceClient(user=None) is instantiated and upload_file_with_attachment is called, the request will be made without authentication headers (since _make_request checks if self.user: before adding auth). This will cause a 401 error from the file service. While this is caught by the exception handler, it represents a silent failure that may not be the intended behavior - the photo upload will fail for any account without an owner. [ Already posted ]
backend/src/processes/services/events.py — 0 comments posted, 2 evaluated, 1 filtered
  • line 740: In the update method, when force_save=False (the default), partial_update modifies self.instance in memory but does not persist the changes to the database. However, notifications (send_mention_notification), websocket events (_send_workflow_event), and analytics (AnalyticService.comment_edited) are still fired based on the uncommitted in-memory state. This causes notifications and analytics to be sent for comment updates that are never actually saved to the database. [ Already posted ]
backend/src/processes/services/tasks/task.py — 0 comments posted, 1 evaluated, 1 filtered
  • line 137: In partial_update, refresh_attachments is called when description is in update_kwargs regardless of whether force_save is True. When force_save=False, the description change has not been persisted to the database yet (it's only added to update_fields for a later save), but refresh_attachments is called immediately. This can cause attachment state to be out of sync with the actual persisted description, especially if the eventual save() fails or is never called. [ Already posted ]
backend/src/processes/services/templates/template.py — 0 comments posted, 2 evaluated, 2 filtered
  • line 41: The refresh_attachments function is called with self.user which can be None (as set in BaseModelService.__init__ when no user is provided), but the function signature refresh_attachments(source: models.Model, user: UserModel) expects a non-Optional UserModel. If the service is instantiated without a user and self.instance exists, this will pass None as the user argument, likely causing an AttributeError when the function attempts to access user attributes. [ Already posted ]
  • line 59: The refresh_attachments function is called with self.user which can be None (as set in BaseModelService.__init__ when no user is provided), but the function signature expects a non-Optional UserModel. The condition only checks if 'description' in update_kwargs and self.instance: but does not verify that self.user is not None, which will likely cause an AttributeError when refresh_attachments attempts to use the user object. [ Already posted ]
backend/src/storage/services/attachments.py — 1 comment posted, 18 evaluated, 6 filtered
  • line 95: Task.objects.get(id=task.id) will raise Task.DoesNotExist if the task has been soft-deleted between when self.instance.task was set and this reload. The BaseSoftDeleteManager filters out records where is_deleted=True, so a soft-deleted task won't be found, causing an unhandled exception. [ Already posted ]
  • line 116: Permissions are granted to task performers who have been "directly deleted" (removed from the task). The code uses task.taskperformer_set.all() which includes performers with directly_status=DirectlyStatus.DELETED. Looking at the Task model reference, there's an exclude_directly_deleted_taskperformer_set() method specifically to filter these out. This is a security bug - users/groups who were removed as task performers will still retain access to attachments. [ Already posted ]
  • line 192: Template.objects.get(pk=template.pk) will raise Template.DoesNotExist if the template has been soft-deleted between when self.instance.template was set and this reload. The BaseSoftDeleteManager filters out records where is_deleted=True, so a soft-deleted template won't be found, causing an unhandled exception. [ Already posted ]
  • line 251: Workflow.objects.get(id=workflow.id) will raise Workflow.DoesNotExist if the workflow has been soft-deleted between when self.instance.workflow was set and this reload. The BaseSoftDeleteManager filters out records where is_deleted=True, so a soft-deleted workflow won't be found, causing an unhandled exception. [ Already posted ]
  • line 431: The bulk_create_for_scope method accepts arbitrary source_type string parameter but _assign_restricted_permissions() only handles three specific source types. If an unsupported source_type is passed with access_type=AccessType.RESTRICTED, the attachment is created but no permissions are assigned, silently leaving the file inaccessible. [ Already posted ]
  • line 506: The UserObjectPermission filter at lines 506-511 does not include content_type=ctype on the UserObjectPermission model itself, only filtering via permission__content_type. While guardian's UserObjectPermission model has its own content_type field that specifies which model the object_pk belongs to, this filter relies solely on the permission's content_type. The same applies to the GroupObjectPermission filter at lines 517-522. For robustness, both filters should include content_type=ctype to explicitly constrain by the object's content type, not just the permission's. [ Already posted ]
backend/src/storage/services/file_service.py — 0 comments posted, 5 evaluated, 3 filtered
  • line 73: In _get_user_token, every call without a valid token in the request creates a new token via PneumaticToken.create() with for_api_key=True (line 76). Since API tokens are long-lived and this token is not stored or reused, repeated calls to _make_request will accumulate orphaned tokens in the cache that are never cleaned up, leading to cache memory growth over time. [ Already posted ]
  • line 206: If capture_sentry_message (called at lines 185-192 or 196-203 within the try block) raises ValueError, KeyError, or AttributeError, the exception will be caught by the except clause at line 206 and incorrectly reported as a 'File service response parsing error', masking the actual error from the Sentry logging call. [ Already posted ]
  • line 206: In the exception handler at line 206-216, if response.json() raises an exception other than ValueError, KeyError, or AttributeError (such as TypeError from an unexpected encoding issue), the exception will propagate up uncaught by this handler. While the finally block ensures file_obj is closed, the error message and logging in the except block won't apply to these edge cases. [ Already posted ]
backend/src/storage/services/file_sync.py — 1 comment posted, 10 evaluated, 7 filtered
  • line 65: The save() call on line 65 can raise django.db.IntegrityError if the generated file_id collides with an existing value (since file_id has unique=True), but the exception handler on line 76 only catches ValueError, AttributeError, and TypeError. This will cause an unhandled exception and crash the sync process. [ Already posted ]
  • line 125: The exception handler on lines 76-81 and 143-148 only catches (ValueError, AttributeError, TypeError) but Attachment.objects.create() on line 125 and attachment.save() on line 65 can raise IntegrityError (from django.db.utils) if the file_id unique constraint is violated (e.g., due to race conditions or duplicate data). This uncaught exception will crash the entire sync operation instead of incrementing the error counter and continuing. [ Already posted ]
  • line 143: The exception handler in sync_to_new_attachment_model only catches ValueError, AttributeError, and TypeError, but Attachment.objects.create() can raise django.db.IntegrityError (e.g., if account_id is required by AccountBaseMixin but the FileAttachment has no account, or for other constraint violations). This would cause the entire sync loop to abort unexpectedly instead of logging the error and continuing to the next attachment. [ Already posted ]
  • line 185: The file_id generation on line 181-185 truncates to 64 characters after combining account_id, attachment.id, salt, and filename. If the filename is very long (e.g., 60+ chars), the truncation could cut into the unique identifiers (salt, attachment.id), potentially creating collision risks for attachments with similar long filenames under the same account, though the 6-character salt provides some mitigation. [ Already posted ]
  • line 185: Truncating file_id to 64 characters with file_id[:64] after generating it from account_id, attachment.id, salt, and filename could produce non-unique values if two attachments have identical prefixes up to 64 characters (e.g., long filenames with the same account and nearby IDs). Since file_id has unique=True constraint, this would cause an IntegrityError when saving to FileAttachment. [ Already posted ]
  • line 185: The file_id truncation at line 185 (return file_id[:64]) could cause uniqueness collisions when long filenames are involved. The prefix {account_id}_{attachment_id}_{salt}_ can easily consume 20+ characters, and if two attachments have similar long filenames, truncation could produce identical file_ids. Consider truncating only the filename portion while preserving the unique prefix components. [ Failed validation ]
  • line 234: The Attachment.objects.create() call in sync_to_new_attachment_model (the caller of _determine_source_type) can raise django.db.IntegrityError or other database exceptions, but only ValueError, AttributeError, and TypeError are caught. If a database constraint violation occurs during the sync loop, it will terminate the entire batch instead of counting the error and continuing. [ Already posted ]
backend/src/storage/tests/test_refresh_attachments.py — 0 comments posted, 3 evaluated, 1 filtered
  • line 178: Test test_refresh_attachments_for_text__empty_text__deletes_unused does not mock mock_filter.values_list.return_value. The actual code calls list(to_delete.values_list('id', flat=True)) before passing the result to clear_guardian_permissions_for_attachment_ids. Without the mock, this returns a Mock object, and list(Mock()) will raise TypeError because Mock is not iterable, or clear_guardian_permissions_for_attachment_ids (which is also not patched) may fail when receiving unexpected input. [ Failed validation ]
backend/src/storage/tests/test_services/test_attachments.py — 0 comments posted, 2 evaluated, 1 filtered
  • line 180: The test test_bulk_create__duplicate_file_ids__ignore_conflicts does not verify its claimed behavior. The assertion Attachment.objects.filter(task=task).count() >= 1 will always pass because the pre-existing 'existing_file' attachment was already created on line 163-168. The test should verify that: (1) 'new_file' was successfully created, and (2) exactly one attachment exists for 'existing_file' (no duplicate). A proper assertion would be something like assert Attachment.objects.filter(file_id='new_file').exists() and assert Attachment.objects.filter(task=task).count() == 2. [ Cross-file consolidated ]
backend/src/storage/tests/test_services/test_attachments_permissions.py — 0 comments posted, 1 evaluated, 1 filtered
  • line 436: This test does not actually verify that template owner permissions work correctly. The owner user is already set as a workflow owner via create_test_workflow(user=owner) (which adds the user to workflow.owners and workflow.members). When _assign_workflow_permissions runs, the owner gets access through workflow_owners regardless of whether the template owner logic works. To properly test template owner permissions, use a different user as the template owner who is NOT the workflow creator. [ Cross-file consolidated ]
backend/src/storage/tests/test_services/test_file_sync.py — 0 comments posted, 2 evaluated, 1 filtered
  • line 353: Test test_determine_source_type__workflow_via_event__ok does not test what its name implies. The test sets workflow=workflow on the attachment, so _determine_source_type returns SourceType.WORKFLOW immediately at the first if attachment.workflow: check, before the event is ever evaluated. The event_mock setup is completely dead code and the test passes for the wrong reason - it's just testing that a workflow-attached attachment returns WORKFLOW, not anything 'via event'. [ Cross-file consolidated ]
frontend/src/public/api/commonRequest.ts — 0 comments posted, 1 evaluated, 1 filtered
  • line 117: Potential runtime crash in commonRequest when using the new fileService request type. The code introduces 'fileService' as a TRequestType and attempts to retrieve fileServiceUrl from the browser configuration (getBrowserConfigEnv().api). If the runtime configuration (e.g., common.json or injected environment) is missing fileServiceUrl, it resolves to undefined. Subsequently, mergePaths is called with undefined as the first argument (requestBaseUrlsMap['fileService']), which causes a TypeError when accessing left.length. [ Already posted ]
frontend/src/public/api/fileServiceUpload.ts — 0 comments posted, 1 evaluated, 1 filtered
  • line 18: The new function uploadFileToFileService performs deep destructuring on the result of getBrowserConfigEnv(). Since getBrowserConfigEnv returns a fallback empty object {} when configuration is missing, the api property will be undefined. Attempting to destructure urls from this undefined api property will throw a runtime TypeError, causing the application to crash if the configuration is not loaded. [ Already posted ]
frontend/src/public/components/Workflows/WorkflowsGridPage/WorkflowCard/utils/getClonedKickoff.ts — 0 comments posted, 1 evaluated, 1 filtered
  • line 71: Runtime type violation in cloneAttachments due to missing string conversion. The copyAttachment function returns ICopyAttachmentResponse where id is a number (as defined in frontend/src/public/api/copyAttachment.ts), but TUploadedFile has been updated to expect id: string. In cloneAttachments, the numeric ID from the response is assigned directly to the id property of the new attachment object (id: attachmentsMap.get(attachment.id)). This object is then cast to TUploadedFile, hiding the type mismatch. Downstream code expecting id to be a string will crash when attempting to invoke string methods (e.g., id.toLowerCase(), id.startsWith()) on the numeric value. [ Out of scope ]
storage/src/application/use_cases/file_upload.py — 0 comments posted, 3 evaluated, 3 filtered
  • line 68: In execute(), the file is uploaded to S3 storage (lines 68-73) before the database record is saved (lines 76-78). If the database operation fails after a successful S3 upload, the uploaded file becomes orphaned in storage with no cleanup mechanism. The operations should either be reordered (DB first, then S3) or include compensation logic to delete the S3 object on database failure. [ Already posted ]
  • line 68: If the database operation fails after the S3 upload succeeds, the uploaded file is orphaned in S3 storage with no cleanup. The file is uploaded to S3 at lines 68-73, then the database record is created within the unit of work at lines 76-78. If create or commit raises an exception (e.g., DatabaseConstraintError, DatabaseConnectionError), the exception propagates without any mechanism to delete the already-uploaded S3 file, leaving storage in an inconsistent state. [ Already posted ]
  • line 81: If self._fastapi_base_url contains a trailing slash, the generated public_url on line 81 will have a double slash (e.g., https://example.com//file-id). Consider normalizing the base URL or using urljoin for proper URL construction. [ Already posted ]
storage/src/infra/http_client.py — 0 comments posted, 3 evaluated, 2 filtered
  • line 37: The httpx.AsyncClient() is created without explicit timeout configuration. If the service expects specific timeout behavior (e.g., the 30 seconds mentioned in the error message), the client should be initialized with httpx.AsyncClient(timeout=30.0) to ensure consistent behavior. [ Already posted ]
  • line 65: The HttpTimeoutError is raised with a hardcoded timeout=30.0 value, but no timeout is actually configured on the httpx.AsyncClient(). The default timeout for httpx.AsyncClient is 5 seconds, not 30 seconds. This causes misleading error messages when timeouts occur, making debugging difficult. [ Already posted ]
storage/src/infra/repositories/storage_service.py — 0 comments posted, 15 evaluated, 14 filtered
  • line 119: StorageError constructor expects an error_code_key that exists in DOMAIN_ERROR_CODES, but MSG_STORAGE_009.format(details=str(e)) passes a formatted message string. This will cause a KeyError when DOMAIN_ERROR_CODES[error_code_key] is accessed. Should use StorageError.bucket_create_failed(str(e)) instead. [ Already posted ]
  • line 119: StorageError is instantiated incorrectly. The StorageError.__init__ expects error_code_key as the first parameter (a key to lookup in DOMAIN_ERROR_CODES), but the code passes a formatted message string MSG_STORAGE_009.format(details=str(e)). This will cause a KeyError when DOMAIN_ERROR_CODES[error_code_key] is executed because the formatted message string won't exist as a key in the dictionary. [ Already posted ]
  • line 119: StorageError constructor expects an error_code_key (e.g., 'STORAGE_UPLOAD_FAILED') as the first argument, which is then looked up in DOMAIN_ERROR_CODES. However, on line 119-121, a formatted message string (MSG_STORAGE_009.format(...)) is passed instead of an error code key. This will cause a KeyError at runtime when DOMAIN_ERROR_CODES[error_code_key] tries to find a key like 'Failed create bucket: ...'. Should use StorageError.bucket_create_failed(details=str(e)) instead. [ Already posted ]
  • line 119: StorageError constructor expects an error_code_key (e.g., 'STORAGE_UPLOAD_FAILED') as its first argument, which gets looked up in DOMAIN_ERROR_CODES. Instead, the code passes a pre-formatted message string (MSG_STORAGE_009.format(...)) directly. This will cause a KeyError when DOMAIN_ERROR_CODES[error_code_key] is evaluated, crashing the error handling. Should use the classmethods like StorageError.bucket_create_failed(str(e)) or pass a valid key like StorageError('STORAGE_BUCKET_CREATE_FAILED', MSG_STORAGE_009.format(details=str(e))) [ Already posted ]
  • line 128: ContentType=content_type is passed to s3.put_object() even when content_type is None. While S3's put_object accepts ContentType as an optional parameter, explicitly passing None may cause a ParamValidationError from botocore since it expects a string type when the parameter is provided. The code should conditionally include this parameter only when content_type is not None. [ Already posted ]
  • line 131: StorageError is instantiated incorrectly with MSG_STORAGE_010.format(details=str(e)) as the first argument. The constructor expects an error_code_key string that maps to DOMAIN_ERROR_CODES, not a formatted error message. This will raise KeyError at runtime. [ Already posted ]
  • line 131: Same issue as above: StorageError is called with MSG_STORAGE_010.format(details=str(e)) as the first argument, but the constructor expects an error code key from DOMAIN_ERROR_CODES. This will raise a KeyError instead of the intended StorageError. Should use StorageError.upload_failed(str(e)) or pass a valid key like StorageError('STORAGE_UPLOAD_FAILED', MSG_STORAGE_010.format(details=str(e))) [ Already posted ]
  • line 131: StorageError constructor expects an error_code_key that exists in DOMAIN_ERROR_CODES, but MSG_STORAGE_010.format(details=str(e)) passes a formatted message string. This will cause a KeyError. Should use StorageError.upload_failed(str(e)) instead. [ Already posted ]
  • line 131: StorageError is called with a formatted message string (MSG_STORAGE_010.format(...)) instead of a valid error code key. This will raise a KeyError at runtime. Should use StorageError.upload_failed(details=str(e)) instead. [ Already posted ]
  • line 174: StorageError is called with error_msg (a formatted string) instead of a valid error code key. This will raise a KeyError at runtime. Should use StorageError.file_not_found_in_storage(file_path=file_path, bucket_name=bucket_name) instead. [ Already posted ]
  • line 174: StorageError is instantiated incorrectly in two places within the except ClientError block. On line 174, a plain string error_msg is passed, and on line 175, MSG_STORAGE_011.format(details=str(e)) is passed. Both should be error code keys like 'STORAGE_DOWNLOAD_FAILED' rather than formatted message strings. This will cause KeyError when the code attempts to lookup these strings in DOMAIN_ERROR_CODES. [ Already posted ]
  • line 174: StorageError constructor expects an error_code_key that exists in DOMAIN_ERROR_CODES, but error_msg is a custom formatted string. This will cause a KeyError. Should use StorageError.file_not_found_in_storage(file_path, bucket_name) instead. [ Already posted ]
  • line 175: StorageError is called with a formatted message string (MSG_STORAGE_011.format(...)) instead of a valid error code key. This will raise a KeyError at runtime. Should use StorageError.download_failed(details=str(e)) instead. [ Already posted ]
  • line 175: StorageError constructor expects an error_code_key that exists in DOMAIN_ERROR_CODES, but MSG_STORAGE_011.format(details=str(e)) passes a formatted message string. This will cause a KeyError. Should use StorageError.download_failed(str(e)) instead. [ Already posted ]
storage/src/presentation/api/files.py — 0 comments posted, 6 evaluated, 4 filtered
  • line 132: FileAccessDeniedError constructor at line 132 expects user_id: int as its second argument, but current_user.user_id can be None (for guest tokens or public tokens, user_id is not guaranteed to be set). While this won't crash at runtime, the error message MSG_FILE_005.format(user_id=user_id, file_id=file_id) will contain the literal string None for the user ID, producing a confusing error message. [ Already posted ]
  • line 132: Line 132 passes current_user.user_id to FileAccessDeniedError, but user_id can be None (it's not validated as non-null in AuthenticatedUser or get_current_user). The FileAccessDeniedError.__init__ signature expects user_id: int, not int | None. This will cause incorrect error message formatting (showing None instead of a user ID). [ Already posted ]
  • line 141: Line 141-143 constructs Content-Disposition header using file_record.filename directly via f-string, but filename can be None (as seen in UploadFileCommand.filename: str | None). When filename is None, the header becomes attachment; filename="None" - a literal string "None" - which is incorrect behavior. The filename should be checked for None and handled appropriately (e.g., using a default name or omitting the filename parameter). [ Already posted ]
  • line 141: Line 141-143: If file_record.filename contains special characters like double quotes ("), semicolons, or newlines, the Content-Disposition header will be malformed (e.g., attachment; filename="my"file.txt") and could cause parsing failures in HTTP clients. The filename should be properly escaped or encoded using RFC 5987 encoding. [ Already posted ]
storage/src/shared_kernel/auth/dependencies.py — 0 comments posted, 1 evaluated, 1 filtered
  • line 27: The AuthenticatedUser class docstring claims "guaranteed non-null fields" but user_id is copied directly from auth_user.user_id without validation and can be None. The get_current_user function only checks is_anonymous and account_id is None, not user_id. This contradicts the documented guarantee and causes downstream issues (e.g., line 132 in download_file). [ Already posted ]
storage/src/shared_kernel/auth/redis_client.py — 0 comments posted, 1 evaluated, 1 filtered
  • line 57: The get_redis_client() function creates a new RedisAuthClient instance on every call, which in turn creates a new Redis connection pool via redis.from_url(). Since callers in services.py and token_auth.py invoke get_redis_client() on each authentication request, this leads to connection pool proliferation and potential connection exhaustion under load. The client should be cached as a singleton or use a shared connection pool. [ Already posted ]
storage/src/shared_kernel/database/models.py — 0 comments posted, 2 evaluated, 1 filtered
  • line 22: The FileRecordORM model defines content_type and filename columns as nullable=False, but the FileRecord domain entity defines these fields as str | None. When FileRecordMapper.entity_to_orm() is called with a FileRecord where content_type or filename is None, the subsequent database insert/flush will fail with an IntegrityError due to the NOT NULL constraint violation. [ Already posted ]
storage/src/shared_kernel/di/container.py — 0 comments posted, 1 evaluated, 1 filtered
  • line 27: The async for pattern in get_db_session breaks exception propagation to get_async_session. When FastAPI throws an exception into get_db_session during request cleanup, that exception is raised at the yield session line in get_db_session, but it is NOT propagated into get_async_session(). This means the except SQLAlchemyError block in get_async_session will never execute, and session.rollback() will not be called on errors. The underlying generator may be abandoned without proper cleanup. The correct pattern is to directly use get_async_session as a dependency or properly delegate using async with aclosing() or by re-yielding with a try/finally that handles exception forwarding. [ Already posted ]
storage/src/shared_kernel/exceptions/base_exceptions.py — 0 comments posted, 1 evaluated, 1 filtered
  • line 105: The to_response method accepts arbitrary **kwargs (line 93) and passes them to ErrorResponse (line 105), but ErrorResponse is a dataclass that only accepts specific fields (timestamp, request_id). If a caller passes any kwargs with keys other than timestamp or request_id, a TypeError will be raised at runtime (e.g., to_response(trace_id="abc") would fail). [ Already posted ]
storage/src/shared_kernel/exceptions/external_service_exceptions.py — 0 comments posted, 3 evaluated, 1 filtered
  • line 29: In ExternalServiceError.__init__, line 29 accesses EXTERNAL_SERVICE_ERROR_CODES[error_code_key] without validating that error_code_key exists in the dictionary. If an invalid key is passed, this will raise a KeyError with no helpful context about valid keys. While subclasses use known keys, the class is public and could be instantiated directly with an invalid key. [ Already posted ]
storage/src/shared_kernel/middleware/auth_middleware.py — 0 comments posted, 3 evaluated, 2 filtered
  • line 141: PublicAuthService.get_token() expects a header in the format "Token <token>" (checking parts[0] == cls.HEADER_PREFIX), but authenticate_public_token receives the raw token value directly from cookies or headers without this prefix. This means get_token() will always return None for public token authentication since the raw token won't have the required "Token " prefix, causing public token authentication to silently fail. [ Already posted ]
  • line 141: The authenticate_public_token method passes the raw token to PublicAuthService.get_token(), but get_token() expects a header string in the format "Token <token_value>" (it splits on whitespace and checks parts[0] == 'Token'). When called from a cookie source (line 87), the raw_token will be just the token value without the "Token " prefix, causing get_token() to always return None. The same issue occurs for the 'X-Public-Authorization' header source if clients send just the token without the prefix format. [ Already posted ]
storage/src/shared_kernel/uow/unit_of_work.py — 0 comments posted, 1 evaluated, 1 filtered
  • line 35: In UnitOfWork, calling commit() (line 35) which calls self._session.commit() inside the begin() context manager, then having __aexit__ call self._transaction.__aexit__(*args) on the same transaction, creates problematic double-commit behavior. SQLAlchemy's session.commit() commits and closes the transaction, but then the transaction's __aexit__ will attempt to commit/rollback on an already-finished transaction. The commit() method should operate on self._transaction or the class should not mix explicit commits with the context manager pattern. [ Failed validation ]
storage/tests/conftest.py — 0 comments posted, 7 evaluated, 4 filtered
  • line 48: The mock_upload_use_case_execute fixture patches UploadFileUseCase.execute without specifying new_callable=AsyncMock. Comparing to the similar fixtures in this file (e.g., mock_download_use_case_get_metadata at line 81-85, mock_download_use_case_get_stream at line 86-90, and the new fixtures at lines 53-60 and 63-70), async methods are patched with new_callable=AsyncMock. If UploadFileUseCase.execute is an async method, tests using this fixture that await the mocked method will fail with a TypeError because a regular MagicMock is not awaitable. [ Already posted ]
  • line 123: The side_effect setter only propagates the side effect to self._metadata.side_effect (line 123) but not to self._stream.side_effect. If the old execute method could fail at either the metadata or stream retrieval stage, this incomplete propagation could cause tests to behave unexpectedly when simulating errors. [ Already posted ]
  • line 178: If RedisAuthClient.get is an async method, the mock will not work correctly when awaited because new_callable=AsyncMock is not specified. Awaiting a regular MagicMock raises TypeError: object MagicMock can't be used in 'await' expression. Compare with similar fixtures like mock_download_use_case_get_metadata (line 54-60) that use new_callable=AsyncMock for async methods. [ Already posted ]
  • line 186: If get_db_session is an async function or async context manager, the mock will not work correctly when awaited because new_callable=AsyncMock is not specified. This would cause TypeError at runtime when test code awaits the mock. [ Already posted ]
storage/tests/fixtures/e2e.py — 0 comments posted, 6 evaluated, 5 filtered
  • line 14: The return type annotation Iterator[MagicMock] is incorrect since the fixture uses return instead of yield. This won't cause a runtime error but the type annotation is misleading. Other similar fixtures in this file (mock_http_client, mock_storage_service, etc.) have the same issue, suggesting this follows an existing pattern, but for a fixture that uses return, the type should just be MagicMock, not Iterator[MagicMock]. [ Already posted ]
  • line 32: The return type annotation Iterator[AsyncMock] suggests this is a generator fixture using yield, but it uses return instead. While pytest handles both patterns, this type mismatch could confuse type checkers and developers. However, this matches the existing pattern in other fixtures in this file (e.g., mock_auth_middleware, mock_redis_client), so it's consistent with the codebase style. [ Already posted ]
  • line 43: The return type annotation Iterator[AsyncMock] suggests this is a generator fixture using yield, but it uses return instead. While pytest handles both patterns, this type mismatch could confuse type checkers and developers. However, this matches the existing pattern in other fixtures in this file. [ Already posted ]
  • line 54: The return type annotation Iterator[MagicMock] is incorrect since the fixture uses return instead of yield. The type should just be MagicMock. This follows an existing incorrect pattern in the file but is still technically wrong. [ Already posted ]
  • line 95: The return type annotation Iterator[AsyncMock] is incorrect - the fixture uses return instead of yield, so it returns an AsyncMock directly, not an Iterator[AsyncMock]. This mismatch will cause type checking failures and could confuse consumers expecting iterator/generator behavior (e.g., fixture teardown via generator). [ Already posted ]
storage/tests/fixtures/integration.py — 0 comments posted, 3 evaluated, 1 filtered
  • line 14: SQLite in-memory database with :///:memory: creates a separate database per connection. Since the engine uses a connection pool and Base.metadata.create_all runs on one connection, subsequent sessions may connect to different in-memory databases where the tables don't exist. This will cause sqlalchemy.exc.OperationalError: no such table errors. Fix by using StaticPool with poolclass=StaticPool and connect_args={'check_same_thread': False} to ensure all connections share the same in-memory database. [ Already posted ]
storage/tests/unit/test_token_auth.py — 0 comments posted, 3 evaluated, 3 filtered
  • line 24: The mock for PneumaticToken.encrypt is not awaitable. Since encrypt is an async method and data does await cls.encrypt(token), patching it with return_value=encrypted_token creates a non-awaitable mock. When await is called on the synchronous mock, it will fail. The mock should use AsyncMock or be configured with new_callable=AsyncMock to be properly awaitable. [ Cross-file consolidated ]
  • line 53: Same issue as the first test: the encrypt mock at line 53-56 is not awaitable but PneumaticToken.data awaits the result of cls.encrypt(token). [ Already posted ]
  • line 82: Same issue as the other tests: the encrypt mock at line 82-85 is not awaitable but PneumaticToken.data awaits the result of cls.encrypt(token). [ Already posted ]

Note

High Risk
Replaces core attachment/comment/file-field handling and introduces object-level permission tracking plus new runtime services (file-service/SeaweedFS), which could impact data integrity and access control if migration paths or refresh logic misbehave.

Overview
This PR migrates backend file handling from the legacy FileAttachment/Google Cloud flow to a dedicated file microservice plus a new src.storage Attachment model, using Markdown file links (and extracted file_ids) to create/sync/delete attachment records and maintain restricted access.

Workflows, tasks, templates, comments, and file-type task fields are updated to refresh/sync attachments on create/update/delete, reassign restricted permissions when task performers change, and tighten comment validation (text now required; attachments no longer satisfy “empty comment”).

Admin/user/contact/integration image uploads now go through FileServiceClient with Sentry logging on failures; public template token auth now caches template data and explicitly invalidates cached tokens on template save; django-guardian is introduced with a custom GroupObjectPermission model and test harness tweaks; local docker-compose is expanded to run backend, file-service, its Postgres, and SeaweedFS, and GCP storage deps are updated (notably google-cloud-storage/google-auth).

Written by Cursor Bugbot for commit af43724. This will update automatically on new commits. Configure here.

…ion API

- Add FileAttachment.access_type field with account/restricted options
- Add FileAttachment.file_id field for unique file identification
- Create FileAttachmentPermission model for user-specific file access
- Implement AttachmentService.check_user_permission method
- Add AttachmentsViewSet with check-permission endpoint
- Add Redis caching for public template authentication
- Add response_forbidden method to BaseResponseMixin
- Include comprehensive test coverage for permission checking

This enables fine-grained file access control with two access types:
- account: accessible by all users in the same account
- restricted: accessible only by users with explicit permissions
…integration

- Add FastAPI-based file upload and download endpoints with streaming support
- Implement Clean Architecture with domain entities, use cases, and repositories
- Add authentication middleware with JWT token validation and Redis caching
- Integrate Google Cloud Storage S3-compatible API for file storage
- Add comprehensive error handling with custom exceptions and HTTP status codes
- Implement file access permissions validation through external HTTP service
- Add database models and Alembic migrations for file metadata storage
- Include Docker containerization with docker-compose for local development
- Add comprehensive test suite with unit, integration, and e2e tests
- Configure pre-commit hooks with ruff, mypy, and pytest for code quality
…e_access_rights' into 41526__сreate_file_service

# Conflicts:
#	backend/src/processes/enums.py
#	backend/src/processes/models/__init__.py
#	backend/src/processes/models/workflows/attachment.py
#	backend/src/processes/services/attachments.py
#	backend/src/processes/tests/test_services/test_attachments.py
@EBirkenfeld EBirkenfeld self-assigned this Oct 24, 2025
@EBirkenfeld EBirkenfeld added the Backend API changes request label Oct 24, 2025
….toml to dedicated config files

- Move mypy configuration from pyproject.toml to mypy.ini for better separation of concerns
- Simplify ruff.toml configuration by removing extensive rule selections and using "ALL" selector
- Update ruff target version from py37 to py311 to match project Python version
- Remove redundant ruff configuration from pyproject.toml to avoid duplication
- Apply code formatting fixes across entire codebase
- Standardize import statements and code style according to new linting rules
- Update test files to comply with new formatting standards
…ore rule in ruff configuration

- Update docstrings across various modules to ensure consistency and clarity.
- Remove unused "D" rule from ruff.toml configuration.
- Enhance readability and maintainability of the codebase.
…ling for consistency

- Adjust import paths in test files to ensure they reference the correct locations.
- Replace instances of FileNotFoundError with DomainFileNotFoundError for better clarity in exception handling.
- Streamline fixture definitions and improve code readability across various test modules.
… configuration

- Update docstrings across various test files for consistency and clarity.
- Add new linting rules in ruff.toml for improved code quality.
- Enhance readability and maintainability of the codebase by refining fixture definitions and mock implementations.
…line permission handling

- Refactor the AuthenticationMiddleware to enhance error handling and response formatting.
- Update permission classes to use direct Request type hints instead of string annotations.
- Consolidate permission checks into FastAPI dependency wrappers for better clarity and usability.
- Remove unused exception classes and error messages to clean up the codebase.
- Adjust test cases to reflect changes in authentication and permission handling.
…exception tests

- Remove unused infrastructure error codes from error_codes.py to streamline the codebase.
- Update the AuthenticationMiddleware constructor to use direct FastAPI type hints for clarity.
- Add new tests for validation exceptions, including file size and storage errors, to improve coverage and ensure accurate error handling.
const requestBaseUrlsMap: { [key in TRequestType]: string } = {
public: envBackendURL || publicUrl,
local: '',
fileService: fileServiceUrl,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Medium

api/commonRequest.ts:106 If fileServiceUrl is missing from config, mergePaths will crash accessing .length on undefined. Consider adding a fallback (e.g., fileServiceUrl || '') or validating before use.

Suggested change
fileService: fileServiceUrl,
fileService: fileServiceUrl || '',

🚀 Want me to fix this? Reply ex: "fix it for me".

🤖 Prompt for AI
In file frontend/src/public/api/commonRequest.ts around line 106:

If `fileServiceUrl` is missing from config, `mergePaths` will crash accessing `.length` on `undefined`. Consider adding a fallback (e.g., `fileServiceUrl || ''`) or validating before use.

Evidence trail:
Viewed frontend/src/public/api/commonRequest.ts:98-114 at 93557ff. Viewed frontend/src/public/utils/urls.ts:1-12 at 93557ff.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

Comment on lines +89 to +94
const compressOptions = {
useWebWorker: true,
maxWidthOrHeight: MAX_THUMBNAIL_WIDTH_OR_HEIGHT,
};

return imageCompression(file, compressOptions);
Copy link

@macroscopeapp macroscopeapp bot Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utils/uploadFilesNew.ts:89

If imageCompression throws (e.g., corrupted image), the error propagates and bypasses the thumbnail || file fallback in uploadUserAvatar. Consider wrapping the call in try-catch and returning null on failure.

No longer relevant as of b9f359a

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 4 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

const formData = new FormData();
formData.append('file', file, filename);

const data = await commonRequest<IFileServiceUploadResponse>(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 High

api/fileServiceUpload.ts:27 If fileServiceUrl is missing from the runtime config, requests will go to undefined/.... Consider adding a guard to validate that fileServiceUrl is defined before making the request, or document that this config property is required.

🚀 Want me to fix this? Reply ex: "fix it for me".

🤖 Prompt for AI
In file frontend/src/public/api/fileServiceUpload.ts around line 27:

If `fileServiceUrl` is missing from the runtime config, requests will go to `undefined/...`. Consider adding a guard to validate that `fileServiceUrl` is defined before making the request, or document that this config property is required.

Evidence trail:
frontend/src/public/api/fileServiceUpload.ts (lines 26-35): `commonRequest` called with `fileServiceUpload` URL and type `'fileService'`
frontend/src/public/api/commonRequest.ts (lines 106-117): `fileServiceUrl` extracted from config, mapped as base URL for 'fileService' type, then used in `mergePaths(requestBaseUrlsMap[type], url)`
frontend/src/public/utils/getConfig.ts (lines 71-74): `getBrowserConfigEnv()` returns `{}` if config missing, causing `fileServiceUrl` to be undefined at runtime

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 3 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

file,
filename,
}: IFileServiceUploadRequest): Promise<IFileServiceUploadResponse> {
const {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Low api/fileServiceUpload.ts:18

Deep destructuring api.urls.fileServiceUpload will throw if getBrowserConfigEnv() returns {}. Consider adding a guard or using optional chaining before destructuring.

🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file frontend/src/public/api/fileServiceUpload.ts around line 18:

Deep destructuring `api.urls.fileServiceUpload` will throw if `getBrowserConfigEnv()` returns `{}`. Consider adding a guard or using optional chaining before destructuring.

Evidence trail:
1. frontend/src/public/api/fileServiceUpload.ts lines 18-21 (REVIEWED_COMMIT): Deep destructuring `const { api: { urls: { fileServiceUpload } } } = getBrowserConfigEnv();`
2. frontend/src/public/utils/getConfig.ts lines 71-74 (REVIEWED_COMMIT): `getBrowserConfigEnv()` returns `(browserConfig && browserConfig.config) || {}` which can be an empty object `{}`
3. When `{}` is returned, `{}.api` is `undefined`, and accessing `.urls` on `undefined` throws `TypeError: Cannot read property 'urls' of undefined`

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

… redundant FileServiceConnectionException imports
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 3 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

…essary calls and add tests for logo update scenarios
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 4 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

api_name=self._predicate.field,
)
self.field_value = field.attachments.exists() or None
self.field_value = field.storage_attachments.exists() or None
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Condition resolver breaks for pre-existing file field data

High Severity

The FileResolver changed from field.attachments.exists() to field.storage_attachments.exists(), and TaskFieldQuerySet.with_attachments similarly changed. The old relation points to FileAttachment records, while the new one points to the new Attachment model from src.storage.models. For any existing workflow where file fields were populated before this change, FileAttachment records exist but corresponding Attachment records may not. This causes file-based conditions to evaluate as if the field is empty, potentially routing workflows down incorrect branches.

Additional Locations (1)

Fix in Cursor Fix in Web

def save(self, *args, **kwargs):
super().save(*args, **kwargs)
from src.authentication.services.public_auth import PublicAuthService
PublicAuthService.invalidate_template_public_tokens(self)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Template save triggers non-transactional cache side effect

Medium Severity

The Template.save() override calls PublicAuthService.invalidate_template_public_tokens() on every save. This performs Redis cache deletion, which is not rolled back if the surrounding database transaction fails. If a template deactivation (is_active=False) is saved within a transaction that subsequently rolls back, the cache entry is still deleted — the template remains active in the DB but public/embed token validation by the file service fails until the cache is repopulated on the next authenticate call.

Fix in Cursor Fix in Web

},
level=SentryLogLevel.ERROR,
)
return super().save(commit=commit)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Admin photo uploads silently fail without user feedback

Medium Severity

In UserAdminChangeForm.save(), ContactAdminForm.save(), and IntegrationCreateForm.save(), when FileServiceException is thrown during photo/logo upload, the error is logged to Sentry but the form saves successfully without the uploaded file. The admin user receives no error indication — the save appears successful while the photo/logo is quietly discarded. Using self.add_error() would surface the failure to the admin user.

Additional Locations (2)

Fix in Cursor Fix in Web

if task is not None and not (task.is_active or task.is_delayed):
raise CommentedTaskNotActive
if not text and not attachments:
if not text:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment update omits with_attachments from update kwargs

Low Severity

In CommentService.update(), the with_attachments field was removed from the update kwargs, relying solely on refresh_attachments to set it. However, refresh_attachments is called after partial_update, so if partial_update triggers any side effects (like websocket notifications via serialization) before refresh_attachments runs, consumers could see a stale with_attachments value. The create path explicitly initializes it to False before refresh_attachments, but update leaves the old value intact during the gap.

Fix in Cursor Fix in Web

# Conflicts:
#	backend/docker-compose.yml
#	backend/src/accounts/services/user.py
#	backend/src/accounts/tests/views/test_user/test_update.py
#	backend/src/accounts/views/user.py
#	backend/src/processes/enums.py
#	backend/src/processes/models/workflows/attachment.py
#	backend/src/processes/tests/fixtures.py
#	backend/src/processes/tests/test_services/test_attachments.py
#	backend/src/processes/tests/test_services/test_workflows/test_workflow_version_service.py
#	docker-compose.src.yml
#	docker-compose.yml
#	frontend/docker-compose.yml
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

STORAGE_PROVIDER: ${STORAGE_PROVIDER:-}
GOOGLE_APPLICATION_CREDENTIALS: /pneumatic_backend/google_api_credentials.json
SENTRY_DSN: ${SENTRY_DSN:-}
SENTRY_DSN: ${SENTRY_DSN:-}}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra closing brace breaks YAML environment variable

High Severity

SENTRY_DSN: ${SENTRY_DSN:-}} has a double closing brace }}. When the three lines above (STORAGE, STORAGE_PROVIDER, GOOGLE_APPLICATION_CREDENTIALS) were removed, the remaining SENTRY_DSN line picked up a stray }. This will either break YAML parsing of the docker-compose file or cause the environment variable value to contain a literal }, preventing Sentry from initializing correctly.

Fix in Cursor Fix in Web

if user_groups is not None:
self.instance.user_groups.set(user_groups)
self._update_related_user_fields(old_name=old_name)
self._update_related_user_fields(old_name=old_name)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Method moved outside transaction breaks atomicity guarantee

Medium Severity

self._update_related_user_fields(old_name=old_name) was moved from inside the with transaction.atomic(): block to outside it. Previously this call ran within the same transaction as the user update and group assignment. Now, if the transaction rolls back (e.g., due to a DB constraint violation on user_groups.set), the related TaskField records would still be updated with the new name, causing data inconsistency between the user and their associated task fields.

Fix in Cursor Fix in Web

).delete()

# Assign permissions to template owners (exclude soft-deleted)
template_owners = template.owners.filter(is_deleted=False)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Medium services/attachments.py:212

In _assign_template_permissions, the query for template owners at line 212 does not filter by type=OwnerType.USER, unlike the equivalent queries in _assign_task_permissions and _assign_workflow_permissions. If a group-type owner has its user field populated (e.g., data inconsistency or future schema change), that user incorrectly receives attachment permissions while the group itself might not. Consider adding type=OwnerType.USER to the filter at line 212 to match the other methods.

-        template_owners = template.owners.filter(is_deleted=False)
+        template_owners = template.owners.filter(
+            is_deleted=False,
+            type=OwnerType.USER,
+            user__isnull=False,
+        )
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file backend/src/storage/services/attachments.py around line 212:

In `_assign_template_permissions`, the query for template owners at line 212 does not filter by `type=OwnerType.USER`, unlike the equivalent queries in `_assign_task_permissions` and `_assign_workflow_permissions`. If a group-type owner has its `user` field populated (e.g., data inconsistency or future schema change), that user incorrectly receives attachment permissions while the group itself might not. Consider adding `type=OwnerType.USER` to the filter at line 212 to match the other methods.

Evidence trail:
backend/src/storage/services/attachments.py lines 212-218 (_assign_template_permissions): `template_owners = template.owners.filter(is_deleted=False)` followed by `if owner.user:` check

backend/src/storage/services/attachments.py lines 133-141 (_assign_task_permissions): `TemplateOwner.objects.filter(template_id=workflow.template_id, type=OwnerType.USER, user__isnull=False, is_deleted=False)`

backend/src/storage/services/attachments.py lines 290-298 (_assign_workflow_permissions): `TemplateOwner.objects.filter(template_id=workflow.template_id, type=OwnerType.USER, user__isnull=False, is_deleted=False)`

else:
stats['skipped'] += 1

except (ValueError, AttributeError, TypeError) as e:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Medium services/file_sync.py:76

attachment.save(update_fields=['file_id']) at line 65 can raise IntegrityError when the generated file_id collides with an existing record, but the exception handler at line 76 only catches ValueError, AttributeError, and TypeError. This causes the entire sync_all_files() operation to crash instead of logging the error and continuing to the next attachment.

-            except (ValueError, AttributeError, TypeError) as e:
+            except (ValueError, AttributeError, TypeError, IntegrityError) as e:
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file backend/src/storage/services/file_sync.py around line 76:

`attachment.save(update_fields=['file_id'])` at line 65 can raise `IntegrityError` when the generated `file_id` collides with an existing record, but the exception handler at line 76 only catches `ValueError`, `AttributeError`, and `TypeError`. This causes the entire `sync_all_files()` operation to crash instead of logging the error and continuing to the next attachment.

Evidence trail:
backend/src/storage/services/file_sync.py lines 65 (save call) and 76 (exception handler catching only ValueError, AttributeError, TypeError); backend/src/processes/models/workflows/attachment.py lines 31-35 showing `file_id = models.CharField(max_length=64, unique=True, null=True, blank=True)` confirming the unique constraint that can cause IntegrityError on collision.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Low

Issue on line in backend/src/processes/tests/test_services/test_comment_service.py:1672:

The test test_update__find_attachments_in_text__ok is parameterized with markdown patterns containing attachment IDs, but the assertion no longer verifies that any IDs are extracted from the text. The old code checked update_attachments_mock.assert_called_once_with(attachment_ids), but the new assertion only verifies refresh_attachments was called with source and user — making the parameterized data unused and the test name misleading. Consider either restoring verification that IDs are extracted from text, or rename the test and remove the parameterized data if extraction is no longer the responsibility of this method.

Also found in 3 other location(s)

backend/src/storage/tests/test_services/test_attachments_permissions.py:436

This test does not actually verify that template owner permissions work correctly. The owner user is already set as a workflow owner via create_test_workflow(user=owner) (which adds the user to workflow.owners and workflow.members). When _assign_workflow_permissions runs, the owner gets access through workflow_owners regardless of whether the template owner logic works. To properly test template owner permissions, use a different user as the template owner who is NOT the workflow creator.

backend/src/storage/tests/test_services/test_file_sync.py:353

Test test_determine_source_type__workflow_via_event__ok does not test what its name implies. The test sets workflow=workflow on the attachment, so _determine_source_type returns SourceType.WORKFLOW immediately at the first if attachment.workflow: check, before the event is ever evaluated. The event_mock setup is completely dead code and the test passes for the wrong reason - it's just testing that a workflow-attached attachment returns WORKFLOW, not anything 'via event'.

backend/src/storage/tests/test_services/test_attachments.py:180

The test test_bulk_create__duplicate_file_ids__ignore_conflicts does not verify its claimed behavior. The assertion Attachment.objects.filter(task=task).count() &gt;= 1 will always pass because the pre-existing &#39;existing_file&#39; attachment was already created on line 163-168. The test should verify that: (1) &#39;new_file&#39; was successfully created, and (2) exactly one attachment exists for &#39;existing_file&#39; (no duplicate). A proper assertion would be something like assert Attachment.objects.filter(file_id=&#39;new_file&#39;).exists() and assert Attachment.objects.filter(task=task).count() == 2.

🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file backend/src/processes/tests/test_services/test_comment_service.py around line 1672:

The test `test_update__find_attachments_in_text__ok` is parameterized with markdown patterns containing attachment IDs, but the assertion no longer verifies that any IDs are extracted from the text. The old code checked `update_attachments_mock.assert_called_once_with(attachment_ids)`, but the new assertion only verifies `refresh_attachments` was called with `source` and `user` — making the parameterized data unused and the test name misleading. Consider either restoring verification that IDs are extracted from text, or rename the test and remove the parameterized data if extraction is no longer the responsibility of this method.

Evidence trail:
git_diff base=MERGE_BASE head=REVIEWED_COMMIT path=backend/src/processes/tests/test_services/test_comment_service.py shows:
1. Lines ~1675-1700: Parameterized data had attachment_ids like `[3349]`, `[4187]` removed
2. Line ~1710 (old): `media, attachment_ids = data` changed to just using `data`
3. Line ~1777 (old): `update_attachments_mock.assert_called_once_with(attachment_ids)` changed to `refresh_attachments_mock.assert_called_once_with(source=event, user=account_owner)`

View backend/src/processes/tests/test_services/test_comment_service.py:1672-1800 confirms current state: test name is `test_update__find_attachments_in_text__ok`, parameterized data contains markdown with attachment IDs, but assertion only checks `refresh_attachments_mock.assert_called_once_with(source=event, user=account_owner)` - no verification of extracted IDs.

Also found in 3 other location(s):
- backend/src/storage/tests/test_services/test_attachments_permissions.py:436 -- This test does not actually verify that template owner permissions work correctly. The `owner` user is already set as a workflow owner via `create_test_workflow(user=owner)` (which adds the user to `workflow.owners` and `workflow.members`). When `_assign_workflow_permissions` runs, the owner gets access through `workflow_owners` regardless of whether the template owner logic works. To properly test template owner permissions, use a different user as the template owner who is NOT the workflow creator.
- backend/src/storage/tests/test_services/test_file_sync.py:353 -- Test `test_determine_source_type__workflow_via_event__ok` does not test what its name implies. The test sets `workflow=workflow` on the attachment, so `_determine_source_type` returns `SourceType.WORKFLOW` immediately at the first `if attachment.workflow:` check, before the event is ever evaluated. The `event_mock` setup is completely dead code and the test passes for the wrong reason - it's just testing that a workflow-attached attachment returns `WORKFLOW`, not anything 'via event'.
- backend/src/storage/tests/test_services/test_attachments.py:180 -- The test `test_bulk_create__duplicate_file_ids__ignore_conflicts` does not verify its claimed behavior. The assertion `Attachment.objects.filter(task=task).count() >= 1` will always pass because the pre-existing `'existing_file'` attachment was already created on line 163-168. The test should verify that: (1) `'new_file'` was successfully created, and (2) exactly one attachment exists for `'existing_file'` (no duplicate). A proper assertion would be something like `assert Attachment.objects.filter(file_id='new_file').exists()` and `assert Attachment.objects.filter(task=task).count() == 2`.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Backend API changes request Frontend Web client changes request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants