Skip to content

feat: Add monitoring and alerting on critical webhook failures (#23)#28

Merged
Queenode merged 3 commits into
Kolo-Org:mainfrom
thebabalola:fix/webhook-observability-alerting
Jun 24, 2026
Merged

feat: Add monitoring and alerting on critical webhook failures (#23)#28
Queenode merged 3 commits into
Kolo-Org:mainfrom
thebabalola:fix/webhook-observability-alerting

Conversation

@thebabalola

@thebabalola thebabalola commented Jun 21, 2026

Copy link
Copy Markdown
Contributor

Description

This pull request addresses issue #23 by introducing robust monitoring and alerting for critical webhook failures.

Changes Made

  • ObservabilityService: Created a dedicated ObservabilityService that provides structured logging (info/warnings) and critical alerts ([CRITICAL ALERT]) for failures requiring immediate attention.
  • BotController Updates: Integrated ObservabilityService into BotController to properly log webhook message enqueue failures.
  • Error Retries: Modified the controller to return a 500 status code on critical message enqueue failures, which ensures that WhatsApp's natural retry mechanism is triggered.
  • Testing Enhancements: Extensively updated unit tests across BotController, MessageProcessor, WebhookIntegration, and StellarService to accommodate the new error-handling behavior. All 139 tests are now passing successfully.

Closes #23

…Org#23)

Closes Kolo-Org#23

- Introduce ObservabilityService for critical failures
- Update BotController to leverage observability for enqueue failures
- Return 500 status on critical message enqueue failure to trigger WhatsApp retries
- Test integration for observability flows
@Queenode

Copy link
Copy Markdown
Contributor

Thanks for the contribution @thebabalola, The observability improvements, structured logging, global error handlers, and additional tests are all valuable additions.

I have one major concern before approving this PR.

Webhook Retry Behavior

The controller behavior has changed from returning 200 on enqueue failures to returning 500:

res.sendStatus(500);

The PR description states that this is intended to trigger WhatsApp retries for failed webhook processing.

While I understand the reasoning, I do not see any evidence that webhook processing is idempotent.

For a financial application, retries can introduce duplicate processing if the same webhook is delivered more than once.

Could you please clarify:

  • How duplicate webhook deliveries are handled?
  • Is there any deduplication mechanism based on WhatsApp message IDs?
  • Can the same message be queued multiple times?
  • Are downstream transaction operations protected against duplicate execution?

Before merging, I would like confirmation that enabling retries cannot result in duplicate processing of financial actions.

Additionally, please provide test coverage (or documentation) demonstrating the expected behavior when WhatsApp retries the same webhook multiple times.

Once this concern is addressed, I'm happy to take another look.

@thebabalola

thebabalola commented Jun 23, 2026

Copy link
Copy Markdown
Contributor Author

@Queenode . addressed your feedback — updated the bot controller to extract the native whatsapp message id (wamid) directly from the webhook payload and pass it to bullmq as the jobId. since bullmq natively ignores duplicate job ids, this guarantees we won't process the same message twice even if whatsapp retries the webhook delivery. also updated the tests to cover the new payload structure and idempotency checks. let me know if there's anything else you need before merging.

@Queenode

Copy link
Copy Markdown
Contributor

@Queenode . addressed your feedback — updated the bot controller to extract the native whatsapp message id (wamid) directly from the webhook payload and pass it to bullmq as the jobId. since bullmq natively ignores duplicate job ids, this guarantees we won't process the same message twice even if whatsapp retries the webhook delivery. also updated the tests to cover the new payload structure and idempotency checks. let me know if there's anything else you need before merging.

Thank you for your contribution.
You did really great.

@Queenode Queenode merged commit c20f4c4 into Kolo-Org:main Jun 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

No Monitoring or Alerting on Critical Webhook Failures

2 participants