regex - Perl: Replacing links (html) that meet certain criteria -
on forum, want automatically add rel="nofollow" links point external sites. instance, creates post following text:
link 1: <a href="http://www.external1.com" target="_blank">external link 1</a> link 2: <a href="http://www.myforum.com">local link 1</a> link 3: <a href="http://www.external2.com">external link 2</a> link 4: <a href="http://www.myforum.com/test" alt="local">local link 2</a>
using perl, want changed to:
link 1: <a href="http://www.external1.com" target="_blank" rel="nofollow">external link 1</a> link 2: <a href="http://www.myforum.com">local link 1</a> link 3: <a href="http://www.external2.com" rel="nofollow">external link 2</a> link 4: <a href="http://www.myforum.com/test" alt="local">local link 2</a>
i can using quite few lines of code, hoping 1 or more regexes. can't figure out how.
i'd use regex gobal , eval flag callback, eg so:
#!/usr/bin/perl use strict; $internal_link = qr'href="https?:\/\/(?:www\.)?myforum\.com'; $html = ' lorem ipsum <a href="http://www.external1.com" target="_blank">external link 1</a> lorem ipsum <a href="http://www.myforum.com">local link 1</a> lorem ipsum <a href="http://www.external2.com">external link 2</a> lorem ipsum <a href="http://www.myforum.com/test" alt="local">local link 2</a> '; $html =~ s/<a ([^>]+)>/"<a ". replace_externals($1). ">"/eg; print $html; sub replace_externals { ($inner) = @_; return $inner =~ $internal_link ? $inner : "$inner rel=\"nofollow\""; }
alternatively can surely use negative look-aheads, mess readability..
Comments
Post a Comment