SMIRNOFF: multiple force field files, virtual sites, and constraints#423
SMIRNOFF: multiple force field files, virtual sites, and constraints#423peastman merged 30 commits intoopenmm:mainfrom
Conversation
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #423 +/- ##
==========================================
- Coverage 84.17% 83.70% -0.47%
==========================================
Files 5 5
Lines 771 798 +27
==========================================
+ Hits 649 668 +19
- Misses 122 130 +8 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
for more information, see https://pre-commit.ci
| names.append(root) | ||
|
|
||
| return file_names | ||
| return sorted(names) |
There was a problem hiding this comment.
It seemed like the output order of get_available_force_fields is arbitrary (is that right?); sorting it just makes the results easier to look at.
There was a problem hiding this comment.
That's right, the output order is arbitrary. We were concerned that, if we outputted them in some order, then users would make workflows that assumed element [-1] is the latest (and therefore recommended) FF, which is not the case (we have things like the rosemary alpha and AMBER FF port which could end up as element [-1] in some sorting schemes, but which are not our latest flagship FF)
There was a problem hiding this comment.
I see. I would think relying on choosing the last element in this list could cause trouble whether or not it was sorted. But I don't feel strongly about this either way.
| Replace this with an API call once this issue is addressed: | ||
| https://github.com/openforcefield/openff-toolkit/issues/477 | ||
| """ | ||
|
|
||
| # TODO: Replace this method once there is a public API in the OpenFF toolkit for doing this |
There was a problem hiding this comment.
This issue is closed, so I removed the first note, but kept the second one, since it seems like there is presently not a function to do what this one is doing (trying to replicate the behavior of OpenFF when searching for .offxml files without full paths).
| def smirnoff_filename(self): | ||
| """Full path to the SMIRNOFF force field file""" | ||
| return self._smirnoff_filename | ||
| def smirnoff_filenames(self): |
There was a problem hiding this comment.
This is a (noisy) breaking change; the property name is different, and it now returns a list rather than just a string or None. I assume this would only be used if someone wants to find the paths to some named SMIRNOFF force fields that they are loading.
| constrained.registerTemplateGenerator( | ||
| SMIRNOFFTemplateGenerator(molecules=molecule, forcefield="openff-2.1.0").generator | ||
| ) | ||
|
|
There was a problem hiding this comment.
I think all of the logic below is correct; you may want to double-check these to ensure we are happy with the behavior from OpenMM for each set of options.
| @@ -1,5 +1,6 @@ | |||
| name: openmmforcefields-test | |||
| channels: | |||
| - conda-forge/label/openmm_rc | |||
There was a problem hiding this comment.
This should eventually be removable once OpenMM 8.5.0 is released.
|
This should be ready to look at; I've left some of my own comments above to draw attention to a few places. I thought a bit about changing the behavior of the SMIRNOFF force field lookup by name. Not allowing the name of a SMIRNOFF force field without a file extension would be inconsistent with My opinions on this are:
The one other issue is this: currently, any user asking for There are a few options I suppose we could pursue:
I see no way to avoid breaking changes altogether; since the current behavior is arguably wrong, this is to be expected. I'm open to additional suggestions, or ideas on which alternative we think will be the least troublesome. |
|
I lean toward option 2. If you give a file path, you get exactly the file you specified. If you give the name of a force field rather than a file, you get the unconstrained version of the force field. This will usually produce the behavior people expect. If OpenFF changes its naming convention in the future, we'll adapt to the new convention and maintain the same behavior. |
| @@ -1366,83 +1414,70 @@ def __init__(self, molecules=None, cache=None, forcefield=None, template_generat | |||
| forcefield = "openff-2.2.1" | |||
| # TODO: After toolkit provides date-ranked force fields, | |||
| # use latest dated version if we can sort by date, such as self.INSTALLED_FORCEFIELDS[-1] | |||
There was a problem hiding this comment.
(not blocking) Just double checking my understanding that we're intending to remove this default FF (can be in a future PR)
There was a problem hiding this comment.
Yes, I think there was a general consensus to remove this behavior of picking a default.
| @@ -0,0 +1,120 @@ | |||
| <?xml version="1.0" encoding="utf-8"?> | |||
There was a problem hiding this comment.
Should we also include a test file with some NAGL charges?
There was a problem hiding this comment.
Yes, I'll add an OFFXML that uses NAGL charges.
There was a problem hiding this comment.
While these specific tests are run using partial charges stored in the SDF, this PR adds tests that use openff-2.3.0 which assigns NAGL charges. So this is covered elsewhere in this PR.
| from openmm import unit | ||
| # Get SMILES and a hash of it (to use in atom names to prevent cached | ||
| # XML templates from having sizes quadratic in the molecule size) | ||
| smiles = molecule.to_smiles() |
There was a problem hiding this comment.
Not something to fix here but maybe in future work - smiles generation can often fail when e.g. if you create an openff molecule with rdkit and you create smiles with openeye toolkit installed.
It may make sense to allow users to define the toolkit registry? We just disallow openeye in the global toolkit registry for all our openmmforcefields calls.
There was a problem hiding this comment.
We discussed InChI; if we decide this is a better choice as a molecule identifier than SMILES for openmmforcefields' (internal) use, I'd want to do this in a separate PR, and do it uniformly throughout openmmforcefields.
| forcefield = [forcefield] | ||
| try: | ||
| forcefield = list(forcefield) | ||
| except Exception: |
There was a problem hiding this comment.
[nit] What kind of failure are you anticipating here?
There was a problem hiding this comment.
For the record, since we discussed this, the purpose is to pass any unknown objects through to the openff.toolkit.ForceField instead of trying to iterate over them, so the ForceField can raise an appropriate and meaningful error instead of getting a more obscure error here.
I think this sounds good. Let me know if there any cases against this in favor of another option. If not, I'll plan to implement this one. |
|
I'm fairly strongly in favor of option 4, because
|
|
What I proposed above is that there's no connection between the name of a force field and the name of a file. We don't assume any convention about file names. There's a fixed list of built in force fields that you can specify by name. That will be the normal way most people use it. Giving the path to a file is a more advanced usage intended for people who are sure they know what they're doing. |
This is why the changes I just pushed are failing CI. I've verified locally that they work with Matt's fix to Interchange. Depending on the timeline for a bugfix release and when we want to merge this, we could temporarily remove the two lines in the new test that try to apply a non-default non-zero 1-4 scaling. |
Co-authored-by: Josh Horton <joshua.horton@openforcefield.org>
| "openff-1.0.0": "openff_unconstrained-1.0.0.offxml", | ||
| "openff-1.0.1": "openff_unconstrained-1.0.1.offxml", | ||
| "openff-1.1.0": "openff_unconstrained-1.1.0.offxml", | ||
| "openff-1.1.1": "openff_unconstrained-1.1.1.offxml", | ||
| "openff-1.2.0": "openff_unconstrained-1.2.0.offxml", | ||
| "openff-1.2.1": "openff_unconstrained-1.2.1.offxml", | ||
| "openff-1.3.0": "openff_unconstrained-1.3.0.offxml", | ||
| "openff-1.3.1": "openff_unconstrained-1.3.1.offxml", | ||
| "openff-2.0.0": "openff_unconstrained-2.0.0.offxml", | ||
| "openff-2.1.0": "openff_unconstrained-2.1.0.offxml", | ||
| "openff-2.1.1": "openff_unconstrained-2.1.1.offxml", | ||
| "openff-2.2.0": "openff_unconstrained-2.2.0.offxml", | ||
| "openff-2.2.1": "openff_unconstrained-2.2.1.offxml", | ||
| "openff-2.3.0": "openff_unconstrained-2.3.0.offxml", |
There was a problem hiding this comment.
This is the table that we will need to update in the future as new FFs are released. I put just the mainline non-RC/alpha/beta OpenFF FFs in here. If we want a different set of them, this could be changed; keep in mind that users can still get any forcefield installed with the OpenFF toolkit by specifying its full name with .offxml.
There was a problem hiding this comment.
Excellent - This list looks good to me!
|
There is a significant performance regression for CI after the release of Interchange 0.5.1, hence why the tests are taking 2-3 hours to run now. There are some tests that try a set of molecules on a list of force fields. This used to lead to an Antechamber/SQM call once for each molecule, but now it leads to a call per molecule per force field, and the results are no longer cached. I confirmed that this change took place between Interchange 0.5.0 and 0.5.1. (I suspect it's due to this change? @j-wags @mattwthompson But maybe the old behavior with caching was incorrect?) Anyway, this should be ready again if anyone wishes to take a final look after the various changes we've made. With the new behavior w.r.t. specifying force field names, I've updated some of the tests and documentation text. |
|
If running The behavior of openforcefield/openff-interchange#1234 is fundamentally bad and the fix didn't surface major performance regressions in our automation nor directly/intentionally interact with charge assignment. That's not to discount the possibility of a regression, though, and I'm happy to help you try to isolate the cause. |
|
I do have a minimal reproducer for this performance regression that I can share if you're curious, but my guess is that parameterizing each given molecule under many different force fields (which is what the CI currently does) is not a typical use case at all, so I wouldn't be concerned about it. I think in this case that I will try to reduce the number of combinations tested per your suggestion. |
There was a problem hiding this comment.
(not blocking, AI assisted)
SystemGenerator.__init__still defaults tosmall_molecule_forcefield="openff-2.2.0". We'd agreed to remove the default value fromSMIRNOFFTemplateGeneratorbut hadn't mentioned instances outside of that. You might consider removing that default in this or another PR. And in the meantime, switching it toopenff-2.3.0could be good :-)
Noted; I think I'll save additional modifications to those defaults for another PR.
| assert self.get_terms( | ||
| unconstrained.createSystem(topology, constraints=AllBonds, flexibleConstraints=True) | ||
| ) == (non_h_bonds | h_bonds, non_h_angles | h_angles, non_h_bonds | h_bonds) |
There was a problem hiding this comment.
(not blocking) I won't fight your linter/formatter over the formatting of this line, but if you happen to not be using one it's be good if this followed the same pattern (line breaks after ,s) as the others for readability (it's an information-dense test so this tripped me up).
There was a problem hiding this comment.
The automatic formatter set up on this repository is responsible for this and some of the other strange formatting. Frankly I think it often does a bad job and makes the code less readable, but that's another issue.
| # Make force field | ||
| forcefield = openmm.app.ForceField() | ||
| forcefield.registerTemplateGenerator( | ||
| SMIRNOFFTemplateGenerator(molecules=molecule, forcefield="opc3.offxml").generator |
There was a problem hiding this comment.
(not blocking) For robustness, another permutation of this test could use openff_unconstrained-2.3.0.offxml, which contains the TIP3P water model and also harmonic bond and angle terms which will match the water (but the harmonic terms should be superseded by the constraints).
There was a problem hiding this comment.
Good idea; I've updated this test.
| if isinstance(site, LocalCoordinatesSite): | ||
| origin_weights = site.getOriginWeights() | ||
| x_weights = site.getXWeights() | ||
| y_weights = site.getYWeights() | ||
| position = site.getLocalPosition() |
There was a problem hiding this comment.
(not blocking, AI assisted) I did a second-pass review with Claude and it suggests a few ways to make this more failsafe WRT the parent atom identity. I can't fully vet the suggestion (is it straightforward to check the identity of the parent atom here?) but I thought I'd pass it on in case you see this as an easy thing to implement.
1. excludeWith not explicitly set for virtual sites in generated templates (lines 1218-1252)
The VirtualSite XML elements in convert_system_to_ffxml don't include an excludeWith attribute. OpenMM defaults
excludeWith to the first atom listed, which happens to work for SMIRNOFF's "parents" exclusion policy only if the
first atomName in the LocalCoordinatesSite is always the parent atom. Matt Thompson noted in the 02/18 meeting: "I
think it's likely that parent atom is always first in interchange-outputted localcoordinatesite, but not
guaranteed." If Interchange ever changes ordering, vsite exclusions will silently break. This should either:
- Explicitly set excludeWith to the parent atom name, or
- Add a guard/assertion that the first particle is always the parent atom
The energy tests would catch a mismatch today, but a future Interchange change could slip through if the test
molecules don't exercise the edge case.
There was a problem hiding this comment.
To actually determine what OpenFF thinks is the parent atom, openmmforcefields would need access to whatever internal tables it uses when it is creating virtual sites, since at this point, we only have the final OpenMM System. If you can confirm how to get this information, I could make this more robust. (Playing with it a bit, it seems like creating an Interchange and iterating over interchange.collections["VirtualSites"].virtual_site_key_topology_index_map would give the appropriate indices, since the documentation says that the first atom in this list will actually be the parent atom? But I don't know and wouldn't want to implement this based on a guess.)
There was a problem hiding this comment.
Right - you'd need to have the original Interchange around to directly query this. I think this may not be feasible, since the function that all this resides in only get the final system, not the interchange that created it. So this should be fine as-is.
There was a problem hiding this comment.
I suggest assuming that the first atom is the "parent," this is how it's implemented and I can't currently think of a case in which that would intentionally not be the case. Carrying around Interchange objects just for this detail may be feasible but it's a poor trade-off (significant code complexity and memory usage for no known benefit). Certainly, I don't think this should hold up the rest of this PR
(I'm having a hard time following the rest of this discussion, but above is likely to hold in any context)
There was a problem hiding this comment.
OK, in that case I think I won't try to add any additional checking. The test cases that Jeff came up with should be fairly thorough as to different kinds of vsites, so I think we will find out if this breaks down the road.
| raise FileNotFoundError( | ||
| f"No installed OFFXML with name {known_paths[entry]} was found for the force field {entry}" | ||
| ) |
There was a problem hiding this comment.
(not blocking, AI assisted)
At line 1446, entry has already been reassigned to None (the return from _search_paths), so known_paths[entry]
will raise a KeyError before the FileNotFoundError is ever constructed. The original entry value is lost. Need to
save the original name before the reassignment.
There was a problem hiding this comment.
Thanks; should be fixed. There wasn't a test for this case, since this would only ever get triggered in the event that the user's installation of openff-forcefields was broken, or we mistyped an entry in _INSTALLED_FORCEFIELDS. But it should work now.
Co-authored-by: Jeff Wagner <jwagnerjpl@gmail.com>
…s into smirnoff-vsites
for more information, see https://pre-commit.ci
…s into smirnoff-vsites
|
This may be repetitive, but to be explicit: since this PR is so big, I won't have the time to review it in the next few weeks. But that shouldn't be taken as negative feedback nor a reason to hold off merging this and/or making a release |
|
OK, no worries. I would like to get this merged in so I can create the second PR here (multi-residue molecules) for people to take a look at. Any other comments / questions? |
|
@peastman Unless you think we should wait any longer for more comments, this can be merged. |
|
I don't think the auto-merge will work as long as these (phantom) 8.3.1 checks are required |
|
I cannot merge this as the repository is set up to require some (no longer present) CI runs to complete before merging is allowed, and I do not have the permissions to override it. |
|
I remove the obsolete checks. |
Changes to
SMIRNOFFTemplateGenerator:SMIRNOFFTemplateGenerator? #428.