regex - How to extract href from a body email, in Perl? -


i'm triyng extract url, can more one, come in body email.

and i'm trying parse urls, this:

use strict; use warnings; use net::imap::simple; use email::simple; use io::socket::ssl;  # here must connection imap hidden economize space  $es = email::simple->new( join '', @{ $imap->get($i) } ); $text = $es->body; print $text; $matches = ($text =~/<a[^>]*href="([^"]*)"[^>]*>.*<\/a>/); print $matches; 

on $text have next text:

 --047d7b47229eb3d9f404e58fd90a     content-type: text/plain; charset=iso-8859-1      try1 <http://www.washingtonpost.com/>      try2 <http://www.thesun.co.uk/sol/homepage/>      --047d7b47229eb3d9f404e58fd90a     content-type: text/html; charset=iso-8859-1      <div dir="ltr"><a href="http://www.washingtonpost.com/">try1</a><br><div><br></div><div><a href="http://www.thesun.co.uk/sol/homepage/">try2</a><br></div></div>      --047d7b47229eb3d9f404e58fd90a-- 

the output of program, gives me 1 ... that.

anyone can help??

thanks in advice.

email::simple not suitable mime messages. use courriel instead. regex not suitable html parsing. use web::query instead.

use courriel qw(); use web::query qw();  $email = courriel->parse( text => join …); $html = $email->html_body_part; @url = web::query->new_from_html($html)->find('a[href]')->attr('href'); __end__ http://www.washingtonpost.com/ http://www.thesun.co.uk/sol/homepage/ 

Comments

Popular posts from this blog

How to remove text and logo OR add Overflow on Android ActionBar using AppCompat on API 8? -

html - How to style widget with post count different than without post count -

url rewriting - How to redirect a http POST with urlrewritefilter -