regex - What is the syntax in an Oracle regexp_replace statement to ignore a tag and everything inside it? -


i need regular expression replace in oracle. want replace occurrences of word or phrase exists in document, long word or phrase not exist within set of tags. tags defined me (hot html or xml) , current concept is;

<term type=pos id=123>some phrase</term> 

i have created function wrapper (non-functional) regexp_replace looks this;

  function annotate_one_term(in_text in varchar2, search_term in varchar2, term_type in varchar2, record_id in number) return clob       regex_search varchar2(512);     regex_replace varchar2(512);   begin     regex_search := '((<term.*?</term>|[^<])*?)(^|\w)('|| search_term ||')($|\w)';     regex_replace := '\1 <term id='|| to_char(record_id)||' type=' || term_type ||'>'|| search_term ||'</term> ';     dbms_output.put_line('regex_search = ' || regex_search);     dbms_output.put_line('regex_replace = ' || regex_replace);     return trim(regexp_replace(in_text, regex_search, regex_replace,1,0,'in'));   end annotate_one_term; 

when called this;

select annotate_one_term(                          annotate_one_term('dog elephant dog cat cat dog dogfish fishdog mouse dog', 'dog cat', 'pos', 123),                          'dog', 'pos',456)  dual; 

it returns;

<term id=456 type=pos>dog</term> elephant <term id=123 type=pos>dog cat</term> cat <term id=456 type=pos>dog</term> dogfish fishdog mouse <term id=456 type=pos>dog</term> 

which correct. if called this;

select annotate_one_term(                          annotate_one_term('elephant dog cat cat dogfish fishdog mouse', 'dog cat', 'pos', 123),                          'dog', 'pos',456)  dual; 

it returns;

elephant <term id=123 type=pos <term id=456 type=pos>dog</term> cat</term> cat dogfish fishdog mouse 

which wrong. appears eating ">" , finding word/phrase within tags.

i actively trying increase knowledge regular expressions, 1 has eluded me far.

i understand tried match "negatively", tried direct match using closing markup <\term> , seems work:

create or replace function annotate_one_term(in_text      in varchar2,                                              search_term  in varchar2,                                              term_type    in varchar2,                                              record_id    in number) return clob     regex_search  varchar2(512);     regex_replace varchar2(512); begin     regex_search := '(?</term>| |^)' || search_term || '( |$)';     regex_replace := '\1<term id=' || to_char(record_id) || ' type='                       || term_type || '>' || search_term || '</term>\2';      return trim(regexp_replace(in_text, regex_search, regex_replace,1,0,'in')); end annotate_one_term; 

we obtain:

select annotate_one_term(            annotate_one_term('dog elephant dog cat cat dog dogfish fishdog mouse dog',                              'dog cat', 'pos', 123),            'dog', 'pos',456)  dual; 

gives :

<term id=456 type=pos>dog</term> elephant <term id=123 type=pos>dog cat</term> cat <term id=456 type=pos>dog</term> dogfish fishdog mouse <term id=456 type=pos>dog</term> 

and

select annotate_one_term(           annotate_one_term('elephant dog cat cat dogfish fishdog mouse',                             'dog cat', 'pos', 123),           'dog', 'pos',456)  dual; 

gives:

elephant <term id=123 type=pos>dog cat</term> cat dogfish fishdog mouse 

as expected, no cross terms. have use tricks because oracle doesn't support lookahead/lookbehind assertions (at least in version, 11g).


Comments

Popular posts from this blog

How to remove text and logo OR add Overflow on Android ActionBar using AppCompat on API 8? -

html - How to style widget with post count different than without post count -

url rewriting - How to redirect a http POST with urlrewritefilter -