Simple PHP cyr2lat command line transliteration filter from bulgarian to latin
Sometimes you need to easily convert some Cyrillic Bulgarian text to its latin equivalent (a process known as "romanization", see Romanization of Bulgarian ).
A possible use case scenario is making slugs for urls, containing bulgarian.
Since it is a common task, in the best Unix tradition, it is very usefull to have a simple command line filter, into which you can pipe the cyrillic text, and producing the romanized version in its output.
Here is a simple version of the command line filter cyr2lat, written in php, that does just that:
#!/usr/bin/env php <?php $cyr = array('а','б','в','г','д','е','ж','з','и','й','к','л','м','н','о','п','р', 'с','т','у','ф','х','ц','ч','ш','щ','ъ','ь','ю','я', 'А','Б','В','Г','Д','Е','Ж','З','И','Й','К','Л','М','Н','О','П','Р', 'С','Т','У','Ф','Х','Ц','Ч','Ш','Щ','Ъ','Ь', 'Ю','Я' ); $lat = array( 'a','b','v','g','d','e','zh','z','i','y','k','l','m','n','o','p','r', 's','t','u','f' ,'h' ,'ts' ,'ch','sh' ,'sht' ,'a' ,'y' ,'yu','ya', 'A','B','V','G','D','E','Zh','Z','I','Y','K','L','M','N','O','P','R', 'S','T','U','F' ,'H' ,'Ts' ,'Ch','Sh' ,'Sht' ,'A' ,'Y' ,'Yu' ,'Ya' ); $in = fopen ("php://stdin","r"); while($line = fgets($in)){ echo str_replace($cyr, $lat, $line); }
To use it, just save it to a file named cyr2lat.php, then make this script executable by:
chmod 755 cyr2lat.php
... and possibly move it to a location in your path:
mv cyr2lat.php /usr/local/bin
or
mv cyr2lat.php ~/bin
After this, you can run for example:
echo "Това е текст на кирилица" | cyr2lat.php
and you will get:
Tova e tekst na kirilitsa
NB: This filter assumes that the input text is in the utf8 encoding. If you have an input text in the cp1251 encoding, just pipe it first through iconv, like this:
echo "Това е пак текст на кирилица, но този път с кодировка cp1251" |iconv -fcp1251 -tutf8 |cyr2lat.php