regex - How to extract href from a body email, in Perl? -
i'm triyng extract url, can more one, come in body email.
and i'm trying parse urls, this:
use strict; use warnings; use net::imap::simple; use email::simple; use io::socket::ssl; # here must connection imap hidden economize space $es = email::simple->new( join '', @{ $imap->get($i) } ); $text = $es->body; print $text; $matches = ($text =~/<a[^>]*href="([^"]*)"[^>]*>.*<\/a>/); print $matches; on $text have next text:
--047d7b47229eb3d9f404e58fd90a content-type: text/plain; charset=iso-8859-1 try1 <http://www.washingtonpost.com/> try2 <http://www.thesun.co.uk/sol/homepage/> --047d7b47229eb3d9f404e58fd90a content-type: text/html; charset=iso-8859-1 <div dir="ltr"><a href="http://www.washingtonpost.com/">try1</a><br><div><br></div><div><a href="http://www.thesun.co.uk/sol/homepage/">try2</a><br></div></div> --047d7b47229eb3d9f404e58fd90a-- the output of program, gives me 1 ... that.
anyone can help??
thanks in advice.
email::simple not suitable mime messages. use courriel instead. regex not suitable html parsing. use web::query instead.
use courriel qw(); use web::query qw(); $email = courriel->parse( text => join …); $html = $email->html_body_part; @url = web::query->new_from_html($html)->find('a[href]')->attr('href'); __end__ http://www.washingtonpost.com/ http://www.thesun.co.uk/sol/homepage/
Comments
Post a Comment