python - RE to handle both formats -


i have 2 type of files.

one contains line below:

"55.28 longurl0.20s: preplan async" 

another contains line blow:

>55.28 longurl0.20s: preplan async</a></span><br></td> 

in both cases, i'd content starting longurl , ending </a> or end of line.

>>> b="55.28 longurl0.20s: preplan async" >>> a=">55.28 longurl0.20s: preplan async</a></span><br></td>" >>> re.findall(r'longurl\d*.\d*s:[^<]+',a) ['longurl0.20s: preplan async'] >>> re.findall(r'longurl\d*.\d*.*$',b) ['longurl0.20s: preplan async'] 

can single re can cover both?

why don't longurl\d+[^<]+:

>>> import re >>> = ">55.28 longurl0.20s: preplan async</a></span><br></td>" >>> b = "55.28 longurl0.20s: preplan async" >>> re.findall(r'longurl\d+[^<]+', a) ['longurl0.20s: preplan async'] >>> re.findall(r'longurl\d+[^<]+', b) ['longurl0.20s: preplan async'] 

Comments

Popular posts from this blog

html - How to style widget with post count different than without post count -

How to remove text and logo OR add Overflow on Android ActionBar using AppCompat on API 8? -

javascript - storing input from prompt in array and displaying the array -