Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,10 @@
from crawlee.crawlers import (
PlaywrightCrawler,
PlaywrightCrawlingContext,
PlaywrightPostNavCrawlingContext,
PlaywrightPreNavCrawlingContext,
)
from crawlee.errors import SessionError


async def main() -> None:
Expand All @@ -24,6 +26,14 @@ async def configure_page(context: PlaywrightPreNavCrawlingContext) -> None:
# to speed up page loading
await context.block_requests()

@crawler.post_navigation_hook
async def custom_captcha_check(context: PlaywrightPostNavCrawlingContext) -> None:
# check if the page contains a captcha
captcha_element = context.page.locator('input[name="captcha"]').first
if await captcha_element.is_visible():
context.log.warning('Captcha detected! Skipping the page.')
raise SessionError('Captcha detected')

# Run the crawler with the initial list of URLs.
await crawler.run(['https://crawlee.dev'])

Expand Down
6 changes: 3 additions & 3 deletions docs/guides/playwright_crawler.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ import RunnableCodeBlock from '@site/src/components/RunnableCodeBlock';

import MultipleLaunchExample from '!!raw-loader!roa-loader!./code_examples/playwright_crawler/multiple_launch_example.py';
import BrowserConfigurationExample from '!!raw-loader!roa-loader!./code_examples/playwright_crawler/browser_configuration_example.py';
import PreNavigationExample from '!!raw-loader!roa-loader!./code_examples/playwright_crawler/pre_navigation_hook_example.py';
import NavigationHooksExample from '!!raw-loader!roa-loader!./code_examples/playwright_crawler/navigation_hooks_example.py';
import BrowserPoolPageHooksExample from '!!raw-loader!roa-loader!./code_examples/playwright_crawler/browser_pool_page_hooks_example.py';
import PluginBrowserConfigExample from '!!raw-loader!./code_examples/playwright_crawler/plugin_browser_configuration_example.py';

Expand Down Expand Up @@ -67,10 +67,10 @@ For additional setup or event-driven actions around page creation and closure, t

## Navigation hooks

Navigation hooks allow for additional configuration at specific points during page navigation. For example, the <ApiLink to="class/PlaywrightCrawler#pre_navigation_hook">`pre_navigation_hook`</ApiLink> is called before each navigation and provides <ApiLink to="class/PlaywrightPreNavCrawlingContext">`PlaywrightPreNavCrawlingContext`</ApiLink> - including the [page](https://playwright.dev/python/docs/api/class-page) instance and a <ApiLink to="class/PlaywrightPreNavCrawlingContext#block_requests">`block_requests`</ApiLink> helper for filtering unwanted resource types and URL patterns. See the [block requests example](https://crawlee.dev/python/docs/examples/playwright-crawler-with-block-requests) for a dedicated walkthrough.
Navigation hooks allow for additional configuration at specific points during page navigation. The <ApiLink to="class/PlaywrightCrawler#pre_navigation_hook">`pre_navigation_hook`</ApiLink> is called before each navigation and provides <ApiLink to="class/PlaywrightPreNavCrawlingContext">`PlaywrightPreNavCrawlingContext`</ApiLink> - including the [page](https://playwright.dev/python/docs/api/class-page) instance and a <ApiLink to="class/PlaywrightPreNavCrawlingContext#block_requests">`block_requests`</ApiLink> helper for filtering unwanted resource types and URL patterns. See the [block requests example](https://crawlee.dev/python/docs/examples/playwright-crawler-with-block-requests) for a dedicated walkthrough. Similarly, the <ApiLink to="class/PlaywrightCrawler#post_navigation_hook">`post_navigation_hook`</ApiLink> is called after each navigation and provides <ApiLink to="class/PlaywrightPostNavCrawlingContext">`PlaywrightPostNavCrawlingContext`</ApiLink> - useful for post-load checks such as detecting CAPTCHAs or verifying page state.

<RunnableCodeBlock className="language-python" language="python">
{PreNavigationExample}
{NavigationHooksExample}
</RunnableCodeBlock>

## Conclusion
Expand Down
Loading