or: Why “ä” and “ä” isn’t the same..
Don’t these characters look the same to you?
To me – they do. Well now they do – i noticed during a project that one of the characters didn’t show up on screen while being clearly visible in the Code Inspection Tools of Chrome or Firefox.
What had happend?
A colleague copy pasted text from a PDF File and used parts from it in a description text.
It seems that some software instead of using the simple “ä” use a UTF-8 combination equivalent of “a” and ” ¨ “.
Often the single ” ¨ ” is not contained in public available fonts. This character is called trema or dieresis.
Fortunately the php-intl package already contains a solution for my problem – the Normalizer Class: https://www.php.net/manual/en/normalizer.normalize.php
I attached an example for you:
<?php
$a ='ä';
$b ='ä';
echo urlencode($a);
echo ' ';
echo urlencode($b).PHP_EOL.PHP_EOL;
$a = Normalizer::normalize( $a, Normalizer::FORM_C );
$b = Normalizer::normalize( $b, Normalizer::FORM_C );
echo urlencode($a);
echo ' ';
echo urlencode($b);
Sources:
https://chars.suikawiki.org/string?s=%C3%A4
https://chars.suikawiki.org/string?s=a%CC%88
https://blog.marcoka.de/index.php/posts/mit-umlauten-ins-21jahrhundert
https://www.php.net/manual/en/normalizer.normalize.php