Skip to content

SMIRNOFF: multiple force field files, virtual sites, and constraints#423

Merged
peastman merged 30 commits intoopenmm:mainfrom
epretti:smirnoff-vsites
Mar 3, 2026
Merged

SMIRNOFF: multiple force field files, virtual sites, and constraints#423
peastman merged 30 commits intoopenmm:mainfrom
epretti:smirnoff-vsites

Conversation

@epretti
Copy link
Member

@epretti epretti commented Dec 12, 2025

Changes to SMIRNOFFTemplateGenerator:

@codecov-commenter
Copy link

codecov-commenter commented Dec 12, 2025

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 91.86992% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.70%. Comparing base (fb8ed0e) to head (80f8a02).

Files with missing lines Patch % Lines
...penmmforcefields/generators/template_generators.py 91.86% 10 Missing ⚠️
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #423      +/-   ##
==========================================
- Coverage   84.17%   83.70%   -0.47%     
==========================================
  Files           5        5              
  Lines         771      798      +27     
==========================================
+ Hits          649      668      +19     
- Misses        122      130       +8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@epretti epretti changed the title SMIRNOFF: support virtual sites, multiple force field files SMIRNOFF: multiple force field files, virtual sites, and constraints Feb 13, 2026
names.append(root)

return file_names
return sorted(names)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seemed like the output order of get_available_force_fields is arbitrary (is that right?); sorting it just makes the results easier to look at.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right, the output order is arbitrary. We were concerned that, if we outputted them in some order, then users would make workflows that assumed element [-1] is the latest (and therefore recommended) FF, which is not the case (we have things like the rosemary alpha and AMBER FF port which could end up as element [-1] in some sorting schemes, but which are not our latest flagship FF)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I would think relying on choosing the last element in this list could cause trouble whether or not it was sorted. But I don't feel strongly about this either way.

Comment on lines -1463 to 1467
Replace this with an API call once this issue is addressed:
https://github.com/openforcefield/openff-toolkit/issues/477
"""

# TODO: Replace this method once there is a public API in the OpenFF toolkit for doing this
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This issue is closed, so I removed the first note, but kept the second one, since it seems like there is presently not a function to do what this one is doing (trying to replicate the behavior of OpenFF when searching for .offxml files without full paths).

def smirnoff_filename(self):
"""Full path to the SMIRNOFF force field file"""
return self._smirnoff_filename
def smirnoff_filenames(self):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a (noisy) breaking change; the property name is different, and it now returns a list rather than just a string or None. I assume this would only be used if someone wants to find the paths to some named SMIRNOFF force fields that they are loading.

constrained.registerTemplateGenerator(
SMIRNOFFTemplateGenerator(molecules=molecule, forcefield="openff-2.1.0").generator
)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all of the logic below is correct; you may want to double-check these to ensure we are happy with the behavior from OpenMM for each set of options.

@@ -1,5 +1,6 @@
name: openmmforcefields-test
channels:
- conda-forge/label/openmm_rc
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should eventually be removable once OpenMM 8.5.0 is released.

@epretti
Copy link
Member Author

epretti commented Feb 13, 2026

This should be ready to look at; I've left some of my own comments above to draw attention to a few places.

I thought a bit about changing the behavior of the SMIRNOFF force field lookup by name.

Not allowing the name of a SMIRNOFF force field without a file extension would be inconsistent with GAFFTemplateGenerator (which only accepts a list of predefined names corresponding to the GAFF data files we bundle, without their extensions), and EspalomaTemplateGenerator (which appears to accept a predefined name, a path, or a URL, although the logic in there surrounding forcefield lookup is messy). Furthermore, all of the template generators seem to choose a default (which is supposed to be the "latest" force field, but in reality is whatever one was the latest when we last updated the code) if a user doesn't specify any force field.

My opinions on this are:

  • I'd be happy to get rid of the behavior of choosing the "latest" force field as a default (across all of the template generators). This would be a noisy breaking change for any users who rely on it. I argue nobody should rely on it, since it means the simulation results from any code doing so will not be reproducible since they could change whenever that default gets updated.
  • I don't think we should prohibit users from specifying a force field name without an extension, as long as the behavior is consistent between force_field_name and force_field_name.offxml when force_field_name is an installed force field (which it should be). I don't feel too strongly about this, so if there is some other compelling reason to prohibit it, we could.

The one other issue is this: currently, any user asking for openff-blah will get the equivalent of openff_unconstrained-blah.offxml (since openmmforcefields will load openff-blah.offxml and delete the <Constraints>). This has the nasty side effect of breaking the water models. With the changes here, the water models should always work, but users asking for openff-blah will get constraints turned on unconditionally, and would have to use openff_unconstrained-blah to recover the previous behavior.

There are a few options I suppose we could pursue:

  1. Don't try to do anything special with names or paths users pass in. This could be a quiet breaking change for some users who are using constraints=None in OpenMM.
  2. Silently rewrite openff.*-.* to openff.*_unconstrained-.* for compatibility. This relies on this naming convention being consistent in OpenFF, and also means that openff-blah would give an unconstrained force field but openff-blah.offxml would give a different (constrained) force field. It would still be a quiet breaking change for people using openff-blah.offxml, since this would switch from behaving like an "unconstrained" force field to a "constrained" one.
  3. As above, but also silently rewrite names ending in .offxml. This means that there would be no way to specify the constrained version of the force field, and could be confusing as to what files were actually being read.
  4. Stop accepting names without extensions. All code using openff-blah would break, and we would then hope people would look for our documentation to see that they need to explicitly specify openff_unconstrained-blah.offxml or openff-blah.offxml to get the version of the force field that they want. As in case 2, anyone using openff-blah.offxml already would experience a quiet breaking change.

I see no way to avoid breaking changes altogether; since the current behavior is arguably wrong, this is to be expected. I'm open to additional suggestions, or ideas on which alternative we think will be the least troublesome.

@epretti epretti marked this pull request as ready for review February 13, 2026 20:16
@peastman
Copy link
Member

I lean toward option 2. If you give a file path, you get exactly the file you specified. If you give the name of a force field rather than a file, you get the unconstrained version of the force field. This will usually produce the behavior people expect. If OpenFF changes its naming convention in the future, we'll adapt to the new convention and maintain the same behavior.

Comment on lines 1412 to 1416
@@ -1366,83 +1414,70 @@ def __init__(self, molecules=None, cache=None, forcefield=None, template_generat
forcefield = "openff-2.2.1"
# TODO: After toolkit provides date-ranked force fields,
# use latest dated version if we can sort by date, such as self.INSTALLED_FORCEFIELDS[-1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(not blocking) Just double checking my understanding that we're intending to remove this default FF (can be in a future PR)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think there was a general consensus to remove this behavior of picking a default.

@@ -0,0 +1,120 @@
<?xml version="1.0" encoding="utf-8"?>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also include a test file with some NAGL charges?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'll add an OFFXML that uses NAGL charges.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While these specific tests are run using partial charges stored in the SDF, this PR adds tests that use openff-2.3.0 which assigns NAGL charges. So this is covered elsewhere in this PR.

from openmm import unit
# Get SMILES and a hash of it (to use in atom names to prevent cached
# XML templates from having sizes quadratic in the molecule size)
smiles = molecule.to_smiles()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not something to fix here but maybe in future work - smiles generation can often fail when e.g. if you create an openff molecule with rdkit and you create smiles with openeye toolkit installed.

It may make sense to allow users to define the toolkit registry? We just disallow openeye in the global toolkit registry for all our openmmforcefields calls.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed InChI; if we decide this is a better choice as a molecule identifier than SMILES for openmmforcefields' (internal) use, I'd want to do this in a separate PR, and do it uniformly throughout openmmforcefields.

forcefield = [forcefield]
try:
forcefield = list(forcefield)
except Exception:
Copy link

@IAlibay IAlibay Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] What kind of failure are you anticipating here?

Copy link
Member Author

@epretti epretti Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the record, since we discussed this, the purpose is to pass any unknown objects through to the openff.toolkit.ForceField instead of trying to iterate over them, so the ForceField can raise an appropriate and meaningful error instead of getting a more obscure error here.

@epretti
Copy link
Member Author

epretti commented Feb 19, 2026

I lean toward option 2. If you give a file path, you get exactly the file you specified. If you give the name of a force field rather than a file, you get the unconstrained version of the force field. This will usually produce the behavior people expect. If OpenFF changes its naming convention in the future, we'll adapt to the new convention and maintain the same behavior.

I think this sounds good. Let me know if there any cases against this in favor of another option. If not, I'll plan to implement this one.

@j-wags
Copy link
Contributor

j-wags commented Feb 20, 2026

I'm fairly strongly in favor of option 4, because

  • We don't have a spec for our non-mainline FF naming schemes, and in fact we already have some weird naming patterns on important FFs like openff_no_water_unconstrained-3.0.0-alpha0.offxml. I think the less we assume here, the better.
  • We can't control how other people might name offxml force fields that they publish, but it would be good to support those out of the box.
  • I usually take the perspective that, once it's decided that there will be an API/behavior change, the magnitude of the change doesn't totally matter since we'll want users to re-audit their code anyway. Since option 4 is the most literal and future proof (user-says-exactly-what-they-want) , I figure this is the best time to switch over to it.

@peastman
Copy link
Member

What I proposed above is that there's no connection between the name of a force field and the name of a file. We don't assume any convention about file names. There's a fixed list of built in force fields that you can specify by name. That will be the normal way most people use it. Giving the path to a file is a more advanced usage intended for people who are sure they know what they're doing.

@epretti
Copy link
Member Author

epretti commented Feb 23, 2026

Evan identified a bug in Interchange which affects a corner case (non-standard 1-4 scaling factors applied to virtual sites) that may make it into the test suit of this changeset. I hope to turn around a fix in a new release soon, but no guarantees. Follow here for updates: openforcefield/openff-interchange#1436

This is why the changes I just pushed are failing CI. I've verified locally that they work with Matt's fix to Interchange. Depending on the timeline for a bugfix release and when we want to merge this, we could temporarily remove the two lines in the new test that try to apply a non-default non-zero 1-4 scaling.

Comment on lines +1322 to +1335
"openff-1.0.0": "openff_unconstrained-1.0.0.offxml",
"openff-1.0.1": "openff_unconstrained-1.0.1.offxml",
"openff-1.1.0": "openff_unconstrained-1.1.0.offxml",
"openff-1.1.1": "openff_unconstrained-1.1.1.offxml",
"openff-1.2.0": "openff_unconstrained-1.2.0.offxml",
"openff-1.2.1": "openff_unconstrained-1.2.1.offxml",
"openff-1.3.0": "openff_unconstrained-1.3.0.offxml",
"openff-1.3.1": "openff_unconstrained-1.3.1.offxml",
"openff-2.0.0": "openff_unconstrained-2.0.0.offxml",
"openff-2.1.0": "openff_unconstrained-2.1.0.offxml",
"openff-2.1.1": "openff_unconstrained-2.1.1.offxml",
"openff-2.2.0": "openff_unconstrained-2.2.0.offxml",
"openff-2.2.1": "openff_unconstrained-2.2.1.offxml",
"openff-2.3.0": "openff_unconstrained-2.3.0.offxml",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the table that we will need to update in the future as new FFs are released. I put just the mainline non-RC/alpha/beta OpenFF FFs in here. If we want a different set of them, this could be changed; keep in mind that users can still get any forcefield installed with the OpenFF toolkit by specifying its full name with .offxml.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent - This list looks good to me!

@epretti
Copy link
Member Author

epretti commented Feb 26, 2026

There is a significant performance regression for CI after the release of Interchange 0.5.1, hence why the tests are taking 2-3 hours to run now. There are some tests that try a set of molecules on a list of force fields. This used to lead to an Antechamber/SQM call once for each molecule, but now it leads to a call per molecule per force field, and the results are no longer cached. I confirmed that this change took place between Interchange 0.5.0 and 0.5.1. (I suspect it's due to this change? @j-wags @mattwthompson But maybe the old behavior with caching was incorrect?)

Anyway, this should be ready again if anyone wishes to take a final look after the various changes we've made. With the new behavior w.r.t. specifying force field names, I've updated some of the tests and documentation text.

@mattwthompson
Copy link
Collaborator

If running $N_{forcefields} \times N_{molecules}$ using AM1-BCC is slow for each element in the CI matrix, I would argue that is the expected result. I don't think the entire molecule set needs to be executed for every listed force field - If we were building a new test suite, I would want to use the fast charge assignment of openff-2.3.0 on a medium-sized dataset and a few other force fields on a minimal dataset of < 10 molecules. At a certain number of inputs, running more molecule + force field combinations becomes less obviously about testing the behavior of this package and indistinguishable from burning coal for fun.

The behavior of openforcefield/openff-interchange#1234 is fundamentally bad and the fix didn't surface major performance regressions in our automation nor directly/intentionally interact with charge assignment. That's not to discount the possibility of a regression, though, and I'm happy to help you try to isolate the cause.

@epretti
Copy link
Member Author

epretti commented Feb 26, 2026

I do have a minimal reproducer for this performance regression that I can share if you're curious, but my guess is that parameterizing each given molecule under many different force fields (which is what the CI currently does) is not a typical use case at all, so I wouldn't be concerned about it. I think in this case that I will try to reduce the number of combinations tested per your suggestion.

Copy link
Contributor

@j-wags j-wags left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(not blocking, AI assisted) SystemGenerator.__init__ still defaults to small_molecule_forcefield="openff-2.2.0". We'd agreed to remove the default value from SMIRNOFFTemplateGenerator but hadn't mentioned instances outside of that. You might consider removing that default in this or another PR. And in the meantime, switching it to openff-2.3.0 could be good :-)

Noted; I think I'll save additional modifications to those defaults for another PR.

Comment on lines +1333 to +1335
assert self.get_terms(
unconstrained.createSystem(topology, constraints=AllBonds, flexibleConstraints=True)
) == (non_h_bonds | h_bonds, non_h_angles | h_angles, non_h_bonds | h_bonds)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(not blocking) I won't fight your linter/formatter over the formatting of this line, but if you happen to not be using one it's be good if this followed the same pattern (line breaks after ,s) as the others for readability (it's an information-dense test so this tripped me up).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The automatic formatter set up on this repository is responsible for this and some of the other strange formatting. Frankly I think it often does a bad job and makes the code less readable, but that's another issue.

# Make force field
forcefield = openmm.app.ForceField()
forcefield.registerTemplateGenerator(
SMIRNOFFTemplateGenerator(molecules=molecule, forcefield="opc3.offxml").generator
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(not blocking) For robustness, another permutation of this test could use openff_unconstrained-2.3.0.offxml, which contains the TIP3P water model and also harmonic bond and angle terms which will match the water (but the harmonic terms should be superseded by the constraints).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea; I've updated this test.

Comment on lines +1224 to +1228
if isinstance(site, LocalCoordinatesSite):
origin_weights = site.getOriginWeights()
x_weights = site.getXWeights()
y_weights = site.getYWeights()
position = site.getLocalPosition()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(not blocking, AI assisted) I did a second-pass review with Claude and it suggests a few ways to make this more failsafe WRT the parent atom identity. I can't fully vet the suggestion (is it straightforward to check the identity of the parent atom here?) but I thought I'd pass it on in case you see this as an easy thing to implement.

  1. excludeWith not explicitly set for virtual sites in generated templates (lines 1218-1252)                      
                                                                                                                    
  The VirtualSite XML elements in convert_system_to_ffxml don't include an excludeWith attribute. OpenMM defaults
  excludeWith to the first atom listed, which happens to work for SMIRNOFF's "parents" exclusion policy only if the
  first atomName in the LocalCoordinatesSite is always the parent atom. Matt Thompson noted in the 02/18 meeting: "I
   think it's likely that parent atom is always first in interchange-outputted localcoordinatesite, but not
  guaranteed." If Interchange ever changes ordering, vsite exclusions will silently break. This should either:
  - Explicitly set excludeWith to the parent atom name, or
  - Add a guard/assertion that the first particle is always the parent atom

  The energy tests would catch a mismatch today, but a future Interchange change could slip through if the test
  molecules don't exercise the edge case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To actually determine what OpenFF thinks is the parent atom, openmmforcefields would need access to whatever internal tables it uses when it is creating virtual sites, since at this point, we only have the final OpenMM System. If you can confirm how to get this information, I could make this more robust. (Playing with it a bit, it seems like creating an Interchange and iterating over interchange.collections["VirtualSites"].virtual_site_key_topology_index_map would give the appropriate indices, since the documentation says that the first atom in this list will actually be the parent atom? But I don't know and wouldn't want to implement this based on a guess.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right - you'd need to have the original Interchange around to directly query this. I think this may not be feasible, since the function that all this resides in only get the final system, not the interchange that created it. So this should be fine as-is.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest assuming that the first atom is the "parent," this is how it's implemented and I can't currently think of a case in which that would intentionally not be the case. Carrying around Interchange objects just for this detail may be feasible but it's a poor trade-off (significant code complexity and memory usage for no known benefit). Certainly, I don't think this should hold up the rest of this PR

(I'm having a hard time following the rest of this discussion, but above is likely to hold in any context)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, in that case I think I won't try to add any additional checking. The test cases that Jeff came up with should be fairly thorough as to different kinds of vsites, so I think we will find out if this breaks down the road.

Comment on lines +1444 to +1446
raise FileNotFoundError(
f"No installed OFFXML with name {known_paths[entry]} was found for the force field {entry}"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(not blocking, AI assisted)

At line 1446, entry has already been reassigned to None (the return from _search_paths), so known_paths[entry]
  will raise a KeyError before the FileNotFoundError is ever constructed. The original entry value is lost. Need to
  save the original name before the reassignment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks; should be fixed. There wasn't a test for this case, since this would only ever get triggered in the event that the user's installation of openff-forcefields was broken, or we mistyped an entry in _INSTALLED_FORCEFIELDS. But it should work now.

@mattwthompson
Copy link
Collaborator

This may be repetitive, but to be explicit: since this PR is so big, I won't have the time to review it in the next few weeks. But that shouldn't be taken as negative feedback nor a reason to hold off merging this and/or making a release

@epretti
Copy link
Member Author

epretti commented Mar 2, 2026

OK, no worries. I would like to get this merged in so I can create the second PR here (multi-residue molecules) for people to take a look at. Any other comments / questions?

@epretti
Copy link
Member Author

epretti commented Mar 3, 2026

@peastman Unless you think we should wait any longer for more comments, this can be merged.

@epretti epretti enabled auto-merge (squash) March 3, 2026 20:31
@mattwthompson
Copy link
Collaborator

I don't think the auto-merge will work as long as these (phantom) 8.3.1 checks are required

Copy link
Contributor

@j-wags j-wags left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No objections from me!

@epretti
Copy link
Member Author

epretti commented Mar 3, 2026

I cannot merge this as the repository is set up to require some (no longer present) CI runs to complete before merging is allowed, and I do not have the permissions to override it.

@peastman peastman disabled auto-merge March 3, 2026 21:49
@peastman peastman merged commit 0551fd1 into openmm:main Mar 3, 2026
8 checks passed
@peastman
Copy link
Member

peastman commented Mar 3, 2026

I remove the obsolete checks.

@epretti epretti deleted the smirnoff-vsites branch March 3, 2026 21:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

7 participants