Bug report
Bug description:
There appears to be a bug with the findall() function in the re library in which using different syntax for search for the same sequence of strings does not produce the same results. For example, when searching a DNA sequence for a GC rich region, using re.findall("[GC]{12,}", dnaString) versus re.findall("(G|C){12,}", dnaString) produces different results:
import re
dnaString = "GCCGCGGGGGCCCCCGCGCCCGGGGATATTATAAAGGGGGGGGCCCCCCCCCCCCCCCCCCCCGC"
allGCrich = re.findall("[GC]{12,}", dnaString)
print(allGCrich)
# prints "['GCCGCGGGGGCCCCCGCGCCCGGGG', 'GGGGGGGGCCCCCCCCCCCCCCCCCCCCGC']" as desired
allGCrich = re.findall("(G|C){12,}", dnaString)
print(allGCrich)
# prints "['G', 'C']" which doesn't appear to be correct
This does not appear to occur some of the other functions in the re library—such as search()—as using re.search("(G|C){12,}", dnaString) and re.search("[GC]{12,}", dnaString) produces the same results, as desired.
CPython versions tested on:
3.14
Operating systems tested on:
macOS