-
Notifications
You must be signed in to change notification settings - Fork 80
Has trouble telling sections apart on "Barack Obama" #332
Copy link
Copy link
Open
Description
To reproduce:
import mwparserfromhell
import requests
obama = requests.get("https://en.wikipedia.org/wiki/Barack_Obama?action=raw").text
parsed = mwparserfromhell.parse(obama)
sections = parsed.get_sections(levels=[2])
for section in sections:
print(section.filter_headings())This results in:
['==Early life and career==', '===Education===', '===Family and personal life===', '===Religious views===']
['==Legal career==', '===Civil rights attorney===']
['==Legislative career==', '===Illinois Senate (1997–2004)===', '===2004 U.S. Senate campaign in Illinois===', '===U.S. Senate (2005–2008)===']
['==Presidential campaigns==', '===2008===', '===2012===']
['==Presidency (2009–2017)==', '===First 100 days===', '===Domestic policy===', '====Racial issues====', '====LGBT rights====', '===== Same-sex marriage =====', '====Economic policy====', '====Environmental policy====', '====Health care reform====', '===Foreign policy===', '====War in Iraq====', '====Afghanistan and Pakistan====', '=====Killing of Osama bin Laden=====', '====Relations with Cuba====', '====Israel====', '====Libya====', '====Syrian civil war====', '====Iran nuclear talks====', '====Russia====']
['==Cultural and political image==', '=== Job approval ===', '===Foreign perceptions===', '=== Thanks, Obama ===', '==Post-presidency (2017–present)==', '==Legacy and recognition ==', '===Presidential library===', '=== Awards and honors ===', '===Eponymy===', '==Bibliography==', '===Books===', '===Audiobooks===', '===Articles===', '==See also==', '===Politics===', '===Other===', '===Lists===', '==Notes==', '==References==', '===Bibliography===', '==Further reading==', '==External links==', '===Official===', '===Other===']
There are more level-2 headers in the article, but it stops after "Cultural and political image", lumping the rest of the article into that section.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels