python - RE to handle both formats -
i have 2 type of files.
one contains line below:
"55.28 longurl0.20s: preplan async"
another contains line blow:
>55.28 longurl0.20s: preplan async</a></span><br></td>
in both cases, i'd content starting longurl
, ending </a>
or end of line.
>>> b="55.28 longurl0.20s: preplan async" >>> a=">55.28 longurl0.20s: preplan async</a></span><br></td>" >>> re.findall(r'longurl\d*.\d*s:[^<]+',a) ['longurl0.20s: preplan async'] >>> re.findall(r'longurl\d*.\d*.*$',b) ['longurl0.20s: preplan async']
can single re can cover both?
why don't longurl\d+[^<]+
:
>>> import re >>> = ">55.28 longurl0.20s: preplan async</a></span><br></td>" >>> b = "55.28 longurl0.20s: preplan async" >>> re.findall(r'longurl\d+[^<]+', a) ['longurl0.20s: preplan async'] >>> re.findall(r'longurl\d+[^<]+', b) ['longurl0.20s: preplan async']
Comments
Post a Comment