Skip to content

Commit bc285e5

Browse files
gh-138907: Support RFC 9309 in robotparser (GH-138908)
* empty lines are always ignored instead of separating groups * the "user-agent" line after a rule starts a new group * groups matching the same user agent are now merged * the rule with the longest match wins instead of the first matching rule * in case of equal matches, the “Allow” rule wins over “Disallow” * special characters “$” and “*” are now supported in rules * prefer full match for user agent
1 parent c74cba1 commit bc285e5

4 files changed

Lines changed: 441 additions & 111 deletions

File tree

Doc/library/urllib.robotparser.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@
1818
This module provides a single class, :class:`RobotFileParser`, which answers
1919
questions about whether or not a particular user agent can fetch a URL on the
2020
website that published the :file:`robots.txt` file. For more details on the
21-
structure of :file:`robots.txt` files, see http://www.robotstxt.org/orig.html.
21+
structure of :file:`robots.txt` files, see :rfc:`9309`.
2222

2323

2424
.. class:: RobotFileParser(url='')

0 commit comments

Comments
 (0)