fix(playwright): filter unsupported context options in persistent browser#1796
fix(playwright): filter unsupported context options in persistent browser#1796sushant-mutnale wants to merge 3 commits intoapify:masterfrom
Conversation
…wser This addresses issue apify#1784 by dynamically filtering options passed to launch_persistent_context and providing a warning log for ignored options like storage_state.
Pijukatel
left a comment
There was a problem hiding this comment.
Hello, thanks for the PR. Please see my comments; maybe we can use this approach on a different level.
pyproject.toml
Outdated
| "scraping", | ||
| ] | ||
| dependencies = [ | ||
| "apify-fingerprint-datapoints>=0.11.0", |
There was a problem hiding this comment.
We have all these added dependencies in the optional dependencies group playwright. So please remove them from here.
| user_data_dir = tempfile.mkdtemp(prefix=self._TMP_DIR_PREFIX) | ||
| self._temp_dir = Path(user_data_dir) | ||
|
|
||
| launch_persistent_context_sig = inspect.signature(self._browser_type.launch_persistent_context) |
There was a problem hiding this comment.
This is a reasonable approach, but it has some drawbacks. If user has just typo ( in otherwise valid argument name), it will just show warning in log. Same for using some completely nonsensical argument. That should raise an error and not just log a warning.
For example, this should raise (typo in headles):
persist_browser = PlaywrightPersistentBrowser(
playwright.chromium, browser_launch_options={'headles': True}
)
Maybe this approach could be adopted one lever higher (not in PlaywrightPersistentBrowser - which always just calls launch_persistent_context), but in PlaywrightBrowserController - that is the class that decides about calling launch_persistent_context or new_context, but feeds them the same arguments.
It should properly raise exceptions for bad arguments, but it could just log a warning as per your suggestion for arguments at least valid in the other method. It would have to get 3 sets of arguments to be able to do such a distinction. Something like:
...
launch_persistent_context_sig = set(inspect.signature(BrowserType.launch_persistent_context).parameters)
new_context_sig = set(inspect.signature(Browser.new_context).parameters)
persistent_unique_options = launch_persistent_context_sig - new_context_sig
new_context_unique_options = new_context_sig - launch_persistent_context_sig
common_options = launch_persistent_context_sig & new_context_sig
...
And then raise an exception or just log based on the selected mode.
…owserController Moving the validation logic from the browser instance to its controller as suggested by the reviewer. This improves user experience by raising TypeError for typos and nonsensical arguments while still providing helpful warnings for valid but incompatible cross-mode options like storage_state in persistent contexts. Also fixed dependency management in pyproject.toml.
|
Hello! Thank you for the detailed feedback. I've refactored the validation logic into
New unit tests cover both the warning and error scenarios. Ready for another look! |
Ran ruff formatter to fix CI lint error.
| "browserforge>=1.2.4", | ||
| "cachetools>=5.5.0", | ||
| "colorama>=0.4.0", | ||
| "impit>=0.8.0", | ||
| "more-itertools>=10.2.0", | ||
| "playwright>=1.58.0", |
There was a problem hiding this comment.
browserforge and playwright should not be part of core dependencies
| "playwright>=1.27.0", | ||
| "scikit-learn>=1.6.0", | ||
| "apify_fingerprint_datapoints>=0.0.3", | ||
| "apify_fingerprint_datapoints>=0.11.0", |
| _launch_persistent_context_params = set(inspect.signature(PlaywrightBrowserType.launch_persistent_context).parameters) | ||
| _new_context_params = set(inspect.signature(Browser.new_context).parameters) |
There was a problem hiding this comment.
Is it necessary to run these at the import time of the module?
This PR fixes issue #1784, where PlaywrightCrawler would crash when passing context options (like storage_state) that are unsupported by Playwright's launch_persistent_context method.
Changes:
Implemented dynamic argument filtering in PlaywrightPersistentBrowser.new_context using inspect. signature.
Added a warning log to guide users when options are filtered out, suggesting the use of incognito pages as an alternative.
Added a unit test in
tests/unit/browsers/test_playwright_browser.py
to verify the fix and prevent regressions.
Fixes #1784