java - unicode regex pattern not working -
i trying match unicode charaters sequence:
pattern pattern = pattern.compile("\\u05[ddee][0-9a-fa-f]{2,}"); string text = "\\n \\u05db\\u05d3\\u05d5\\u05e8\\u05d2\\u05dc\\n <\\/span>\\n<br style=\\"; matcher match = pattern.matcher(text); but doing gives exception:
exception in thread "main" java.util.regex.patternsyntaxexception: illegal unicode escape sequence near index 4 \u05[ddee][0-9a-fa-f]+ ^ how can use still use regex regex chars (like "[") match unicode?
edit: i'm trying parse text. text somewhere has sequence of unicode characters, know code range.
edit2: using ranges instead : [\\u05d0-\\u05ea]{2,} still can't match text above
edit3: ok, it's working, problem used two backslashes instead of one, both in regex , text. solution is, assuming know there 2 chars or more:
[\u05d0-\u05ea]{2,}
here causing exception:
\\u05[ddee][0-9a-fa-f]}{2,} ^^^^ the java regular expression parser thinks trying match unicode code point using escape sequence \unnnn giving exception, because \u requires 4 hexadecimal digits after , there 2 of them, namely 05 need change \\u0005 if want.
on other hand, if want match \\u in target string, need quad escape each backslash \ \\\\ match \\u need \\\\\\\\u.
\\\\\\\\u05[ddee][0-9a-fa-f]}{2,} finally, if want match unicode code points literally in target string need modify our last expression bit this:
(?:\\\\\\\\u05[ddee][0-9a-fa-f]){2,} edit: since there 1 backslash in target string regular expression should be:
(?:\\\\u05[ddee][0-9a-fa-f]){2,} this match \u05db\u05d3\u05d5\u05e8\u05d2\u05dc in string
<\/span><\/span><span dir=\"rtl\">\n \u05db\u05d3\u05d5\u05e8\u05d2\u05dc\n <\/span>\n<br style=\"clear : both; font-size : 1px;\">\n<\/div>"}, 200, null, null); edit 2: if want match literal \u05db\u05d3\u05d5\u05e8\u05d2\u05dc can't use range.
on other hand, if want match unicode code points between 05d0 , 05df can use:
(?:[\\u05d0\\u05df]){2,}
Comments
Post a Comment