unicode - Getting MySQL to properly distinguish Japanese characters in SELECT calls -


i'm setting database linguistic analysis, , japanese kana giving me bit of trouble.

unlike other questions on far, don't know it's encoding issue, per se. i've set coallation utf8_unicode_ci, , on surface it's saving , recalling things right.

the problem, however, when related kana, such キ (ki) , ギ (gi). sorting purposes, japanese doesn't distinguish between 2 unless in direct conflict. example:

  • ぎ (gi) comes before きかい (kikai)
  • きる (kiru) comes before ぎわく (giwaku)
  • き (ki) comes before ぎ (gi)

it's behavior think @ root of problem. when loading data set external file, had select call verify specific readings in japanese had not been logged. if there, fetch id paired headword; otherwise new entry added , paired thereafter.

what noticed after put in wherever 2 such similar readings occurred, first 1 encountered logged , show false positive other if showed up. example:

  • キョウ (kyou) appeared first, characters ギョウ (gyou) got paired kyou instead
  • ズ (zu) appeared before ス (su), likewise more characters got incorrectly matched.

i can go through , manually sort out if need be, set database take stricter view regarding differentiating between characters (e.g. if characters have 2 different utf-8 code points, treat them different characters). there way behavior?

you can use utf8_bin collation compares characters unicode code points.

the utf8_general_ci collation distinguishes キョウ , ギョウ.


Comments

Popular posts from this blog

How to remove text and logo OR add Overflow on Android ActionBar using AppCompat on API 8? -

html - How to style widget with post count different than without post count -

url rewriting - How to redirect a http POST with urlrewritefilter -