Skip to content

Implement notification pattern for sync trigger operations#11690

Open
kamperiadis wants to merge 3 commits intodevfrom
developer/kamperiadis/notifytriggersupdate
Open

Implement notification pattern for sync trigger operations#11690
kamperiadis wants to merge 3 commits intodevfrom
developer/kamperiadis/notifytriggersupdate

Conversation

@kamperiadis
Copy link
Copy Markdown
Contributor

@kamperiadis kamperiadis commented Apr 3, 2026

Issue describing the changes in this PR

Implement a new notification pattern for Flex Consumption sync trigger operations

Pull request checklist

IMPORTANT: Currently, changes must be backported to the in-proc branch to be included in Core Tools and non-Flex deployments.

  • Backporting to the in-proc branch is not required
    • Otherwise: Link to backporting PR
  • My changes do not require documentation changes
    • Otherwise: Documentation issue linked to PR
  • My changes should not be added to the release notes for the next release
    • Otherwise: I've added my notes to release_notes.md
  • My changes do not need to be backported to a previous version
    • Otherwise: Backport tracked by issue/PR #issue_or_pr
  • My changes do not require diagnostic events changes
    • Otherwise: I have added/updated all related diagnostic events and their documentation (Documentation issue linked to PR)
  • I have added all required tests (Unit tests, E2E tests)

@kamperiadis kamperiadis force-pushed the developer/kamperiadis/notifytriggersupdate branch 2 times, most recently from fc05dc0 to 7886b27 Compare April 3, 2026 21:27
@kamperiadis kamperiadis force-pushed the developer/kamperiadis/notifytriggersupdate branch from 7886b27 to c0a5fc4 Compare April 7, 2026 16:36
@kamperiadis kamperiadis changed the title Notify mesh server of triggers update Implement notification pattern for sync trigger operations Apr 7, 2026
public const string ContainerName = "CONTAINER_NAME";
public const string WebsitePodName = "WEBSITE_POD_NAME";
public const string LegionServiceHost = "LEGION_SERVICE_HOST";
public const string TriggersSyncSignalEnabled = "TRIGGERS_SYNC_SIGNAL_ENABLED";
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't see mention of this env setting in any design docs. When will this mode be enabled - is it tied to FDM or ATOL? Depending on your answer, I'm wondering if instead of per feature settings like this you don't need just a single var that indicates we're in that mode - that can then be used by other features you need to make conditional.

Copy link
Copy Markdown
Contributor Author

@kamperiadis kamperiadis Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not tied to FDM since this doesn't impact the deployment flow but rather the non-deployment sync trigger operations (e.g. key rotation, customer sync triggers call). This is more ATOL-related and we already have env settings for ATOL detection, but the intention for this env setting is to ensure that the platform has the implementation to support this.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the behavior is tied to ATOL then - I.e. on stamps with ATOL enabled, you'll be injecting this into the container context always? Then my comment still applies - shouldn't we have a single env var that indicates we're running in the ATOL mode. What ATOL detection do we have already in host? I couldn't find any.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For ATOL detection, we need to check IsFlexConsumptionSku() && IsConsumptionOnLegion() but checking if we are in ATOL mode wouldn't really help here since I am concerned that we'll release the host with a version of Legion that might not yet have the backing changes (e.g. delays in reviews for Legion PR). And then we are left in a state where the host makes the notification assuming that the platform is handling the triggers when in reality that is not the case.

return result;
}

if (_environment.IsTriggersSyncSignalEnabled())
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this enable/disable check to mesh client.

{
try
{
await _meshServiceClient.NotifyTriggersChanged();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer a better decoupling between sync manager and specific environment behaviors. Can we use a general event callback and have mesh client listen to that event?

Or some other decoupling instead of having the manager know to call mesh client in these circumstances.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was contemplating a similar comment, but hadn't made it yet because for better or worse, IMeshServiceClient is being used by other services in a similar way. I was thinking what event approaches might look like. Another option would be a custom IFunctionsSyncManager implementation that gets used in this mode, but that'd require a class hierarchy to be added such that the new impl. inherits behavior and only overrides a protected SetTriggersAsync for example to send its notification.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally favor the custom IFunctionsSyncManager implementation over the event callback approach just because in this case, there would be a single subscriber listening for this event and if there were an error we might want to return that to the caller.

To build on what you mentioned @mathewc, wouldn't we want to override all of TrySyncTriggersAsync? If we only override SetTriggersAsync, we would still be generating the triggers payload which would be essentially thrown away.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just refactored to create a new SignalBasedFunctionsSyncManager.cs

Comment thread src/WebJobs.Script.WebHost/Management/MeshServiceClient.cs Outdated
Comment thread src/WebJobs.Script/Environment/EnvironmentSettingNames.cs Outdated
@kamperiadis kamperiadis force-pushed the developer/kamperiadis/notifytriggersupdate branch 2 times, most recently from cb8c16e to 699e30f Compare April 8, 2026 19:00
@kamperiadis kamperiadis marked this pull request as ready for review April 8, 2026 19:21
@kamperiadis kamperiadis requested a review from a team as a code owner April 8, 2026 19:21
Copilot AI review requested due to automatic review settings April 8, 2026 19:21
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements a new “signal/notification” flow for SyncTriggers in Flex Consumption by adding a SignalBasedFunctionsSyncManager that notifies the mesh/appserver instead of performing the legacy trigger-sync payload flow, behind an environment flag.

Changes:

  • Add SignalBasedFunctionsSyncManager and wire it into DI when FUNCTIONS_TRIGGERS_SYNC_SIGNAL_ENABLED is enabled.
  • Extend IMeshServiceClient with NotifyTriggersChanged() and implement it in MeshServiceClient/NullMeshServiceClient.
  • Add/adjust integration tests and update release notes.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
test/WebJobs.Script.Tests.Integration/Management/SignalBasedFunctionsSyncManagerTests.cs New tests validating the notification-based sync manager behavior
test/WebJobs.Script.Tests.Integration/Management/MeshServiceClientTests.cs Adds a test asserting the mesh request for trigger-change notification
test/WebJobs.Script.Tests.Integration/Management/FunctionsSyncManagerTests.cs Updates strict environment mock to include the new env var
src/WebJobs.Script/Environment/EnvironmentSettingNames.cs Adds FUNCTIONS_TRIGGERS_SYNC_SIGNAL_ENABLED constant
src/WebJobs.Script/Environment/EnvironmentExtensions.cs Adds IsTriggersSyncSignalEnabled() helper
src/WebJobs.Script.WebHost/WebHostServiceCollectionExtensions.cs Switches IFunctionsSyncManager registration based on the new flag
src/WebJobs.Script.WebHost/Management/SignalBasedFunctionsSyncManager.cs New manager that sends mesh notification for trigger updates
src/WebJobs.Script.WebHost/Management/NullMeshServiceClient.cs Adds no-op NotifyTriggersChanged() implementation
src/WebJobs.Script.WebHost/Management/MeshServiceClient.cs Implements NotifyTriggersChanged() by posting update-triggers-timestamp
src/WebJobs.Script.WebHost/Management/IMeshServiceClient.cs Adds NotifyTriggersChanged() to the interface
src/WebJobs.Script.WebHost/Management/FunctionsSyncManager.cs Makes TrySyncTriggersAsync virtual to allow overriding
release_notes.md Adds an entry describing the new notification behavior

Comment thread src/WebJobs.Script.WebHost/Management/SignalBasedFunctionsSyncManager.cs Outdated
Comment thread src/WebJobs.Script.WebHost/WebHostServiceCollectionExtensions.cs
@kamperiadis kamperiadis force-pushed the developer/kamperiadis/notifytriggersupdate branch from f2a78dd to 2c5804d Compare April 8, 2026 20:59
@kamperiadis kamperiadis force-pushed the developer/kamperiadis/notifytriggersupdate branch 4 times, most recently from 9a3f87b to b0e7aa1 Compare April 13, 2026 20:01
@kamperiadis
Copy link
Copy Markdown
Contributor Author

@jviau @mathewc Thank you so much for all your feedback. I refactored the changes based on our last discussion:

  1. We want to also have background sync triggers be handled via this notification pattern
  2. We do not want to attempt to fetch the triggers payload and update the db if there have been no updates. For this reason, we might want to send the hash so that in the platform we can determine if there were no updates
  3. We want to reduce blob dependency if possible

Can you please take another look? Thank you so much.

_environment = environment;
}

public override async Task<TriggersOperationResult> TrySyncTriggersAsync(bool isBackgroundSync = false)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the right method to override? There's some amount of duplication happening here and maintenance moving forward will also have to account for this derived type.

If the difference between the two implementations is purely the way the platform is notified of the change, can we be more granular about what we override (e.g., either override SetTriggersAsync or introduce a method for this purpose)?


try
{
var payload = await GetSyncTriggersPayload();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a problem with this implementation (part of that would be addressed with the suggestion I made above). GetSyncTriggersPayload is guarded in the base type, but it's called without any synchronization here. The call to that method operates on shared mutable state, and removing that guard will lead to problems. One clear example is the call chain into GetSyncTriggersPayload()PrepareSyncTriggers()
_secretManagerProvider.Current.ClearCache().

{
var payload = await GetSyncTriggersPayload();
string contentHash = ComputeContentHash(payload.Content);
await _meshServiceClient.NotifyTriggersChanged(contentHash);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we delegating change validation to the platform? The NotifyTriggersChanged method name imply a change, but the current implementation will be calling this for every instance and relying on the platform to determine whether there has been a change. Where is the platform persisting the hash for comparison?

I realize you wanted to avoid blob dependencies, but I'm concerned this is leading to additional requirements already handled by the base implementation. If you were to preserve the current behavior and simply change the notification mechanism, that would address problems like the unnecessary notifications coming out of the host.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your feedback @fabiocav

I have updated the class so that we only override the SetTriggersAsync method instead of overriding TrySyncTriggersAsync

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be wasteful that we are generating the triggers payload when we are not really using it for anything other than just sending the notification.

Question - in the design meeting, it was mentioned that we might want to send the hash to Legion for dedupe purposes. But wouldn't it be considered expensive to generate the triggers payload and calculate the hash to send it over in the notification?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question - in the design meeting, it was mentioned that we might want to send the hash to Legion for dedupe purposes. But wouldn't it be considered expensive to generate the triggers payload and calculate the hash to send it over in the notification?

It's unnecessary work, and it leads to other inefficiencies from the platform side as well. I don't think we should be sending this hash, given the notification semantics.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kamperiadis I had a chat with @fabiocav and he makes a good point to leave this for later with this recent change. We are already doing all the blob storage checks before we get into your new set triggers handler.

Hopefully it won't be hard to change the notification payload at a later date if we want to remove storage dependency in the future.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given how we may need more significant changes to this signal in the future for multi-revision, and we don't have a path to removing the storage dependency just yet, my original motivation for adding this is no longer applicable. So I agree we can remove it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @jviau and @fabiocav. There is one edge case that I thought of that would be caught by sending the hash. Let's say the customer took an action where we rely on the background sync triggers to catch the update in the triggers payload.

  1. Host startup triggered the background sync triggers call
  2. The host notified the platform that there is an update to the triggers payload by sending the notification and then we updated the blob with the new hash
  3. The platform received the notification, but by the time it attempted to call GET admin/host/triggers, the worker was deleted
  4. Because the hash was already updated in the blob, in the next worker startup, we will not reattempt to notify the platform to fetch the triggers. We will only reattempt to get the triggers payload the next time we go through this flow for a non-background-sync-triggers scenario (e.g. key rotation, customer-facing sync triggers call)

Sending the hash in the notification would mitigate this in that the platform would compare this hash with what is in the db and reattempt to get the triggers payload.

But at the same time, my understanding is that we consider the background sync triggers as a best-effort attempt to have the triggers payload updated in the db (please correct me if I'm wrong) so in that case, we might not want to always send the hash just to mitigate the scenario I described above, especially since the probability of it happening is very small. What are your thoughts on this?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I was giving it more thought, we only do the hash check when isBackgroundSync == true, so for key rotation, customer-facing sync call, etc., for the SignalBasedFunctionsSyncManager class, it is wasteful to be generating the triggers payload when we are relying on the platform to fetch it later and this payload is just thrown away. So wouldn't it be better if I did the following in SignalBasedFunctionsSyncManager:


public override async Task<TriggersOperationResult> TrySyncTriggersAsync(bool isBackgroundSync = false)
        {
            if (isBackgroundSync)
            {
                return await base.TrySyncTriggersAsync(isBackgroundSync);
            }

            var result = new TriggersOperationResult
            {
                Success = true
            };

            if (!IsSyncTriggersEnvironment(_webHostEnvironment, _environment))
            {
                result.Success = false;
                result.Error = "Invalid environment for SyncTriggers operation.";
                _logger.LogWarning(result.Error);

                return result;
            }

            try
            {
                await _meshServiceClient.NotifyTriggersChanged();
            }
            catch (Exception ex)
            {
                string message = "Failed to notify triggers changed.";
                _logger.LogError(ex, message);
                result.Success = false;
                result.Error = message;
            }

            return result;
        }

protected override async Task<(bool Success, string ErrorMessage)> SetTriggersAsync(string content)
        {
            try
            {
                await _meshServiceClient.NotifyTriggersChanged();

                return (true, null);
            }
            catch (Exception ex)
            {
                string message = "Failed to notify triggers changed.";
                _logger.LogError(ex, message);

                return (false, message);
            }
        }

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, for the background sync triggers flow, we just follow the existing logic in FunctionsSyncManager (i.e. generating triggers payload, comparing the hash with what is in blob storage, etc.) except that for SetTriggersAsync, we will be doing the notification to the platform.

And for non-background sync triggers flows, we will just be doing the notification to the platform.

What do you think?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is wasteful to be generating the triggers payload when we are relying on the platform to fetch it later and this payload is just thrown away

I wouldn't index too much on this. The overall cost of this should be negligible in the grand scale of an instance's lifetime. Plus, it is on a background thread, so not impacting cold start.

If we ignore that optimization as a goal, does it affect your approach here?

@kamperiadis kamperiadis force-pushed the developer/kamperiadis/notifytriggersupdate branch from 0dcef3b to c2ad17e Compare April 20, 2026 16:11
using System.Threading.Tasks;
using Xunit;

namespace Microsoft.Azure.WebJobs.Script.Tests.Managment
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spelling error Management

public const string ContainerName = "CONTAINER_NAME";
public const string WebsitePodName = "WEBSITE_POD_NAME";
public const string LegionServiceHost = "LEGION_SERVICE_HOST";
public const string FunctionsTriggersSyncSignalEnabled = "FUNCTIONS_TRIGGERS_SYNC_SIGNAL_ENABLED";
Copy link
Copy Markdown
Member

@mathewc mathewc Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of a setting named FUNCTIONS_SYNC_TRIGGERS_MODE. The default value would be "FrontEnd" when not specified, and for ATOL you'd use "Platform". We'd map this to a TriggerSyncMode enum and use that for parsing.


// Management services
services.AddSingleton<IFunctionsSyncManager, FunctionsSyncManager>();
if (SystemEnvironment.Instance.IsTriggersSyncSignalEnabled())
Copy link
Copy Markdown
Member

@mathewc mathewc Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you like the mode suggestion I make below, this'd become Instance.GetTrigerSyncMode() == TriggerSyncMode.AppServer


var response = await SendAsync(
[
KeyValuePair.Create(Operation, "update-triggers-timestamp")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if you shouldn't generalize this operation name to something like "update-triggers". That way if for whatever reason you want to pass any additional info along on the operation it makes sense.


private static bool IsNotifyTriggersChangedRequest(HttpRequestMessage request)
{
var formData = request.Content.ReadAsFormDataAsync().Result;
Copy link
Copy Markdown
Member

@mathewc mathewc Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize the rest of the methods in this test are using the sync over async pattern, but we should clean this up and make it truly async (ask @fabiocav :))


public async Task NotifyTriggersChanged()
{
_logger.LogDebug("Posting triggers changed notification to appserver.");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should log "appserver" - its not really a concept that exists currently in this codebase. Instead we could just be opaque and say "Performing platform triggers changed notification."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants