Shifting Unicode Codepoints
While working on a word search generator that translated letter symbols (sort of like cryptograms, except this one’s impossible to solve without the key), php ran out of memory on a rather ordinary function.
function uniord($c) {
$h = ord($c{0});
if ($h <= 0x7F) {
return $h;
} else if ($h < 0xC2) {
return false;
} else if ($h <= 0xDF) {
return ($h & 0x1F) << 6 | (ord($c{1}) & 0x3F);
} else if ($h <= 0xEF) {
return ($h & 0x0F) << 12 | (ord($c{1}) & 0x3F) << 6
| (ord($c{2}) & 0x3F);
} else if ($h <= 0xF4) {
return ($h & 0x0F) << 18 | (ord($c{1}) & 0x3F) << 12
| (ord($c{2}) & 0x3F) << 6
| (ord($c{3}) & 0x3F);
} else {
return false;
}
}
function unichr($c) {
if ($c <= 0x7F) {
return chr($c);
} else if ($c <= 0x7FF) {
return chr(0xC0 | $c >> 6) . chr(0x80 | $c & 0x3F);
} else if ($c <= 0xFFFF) {
return chr(0xE0 | $c >> 12) . chr(0x80 | $c >> 6 & 0x3F)
. chr(0x80 | $c & 0x3F);
} else if ($c <= 0x10FFFF) {
return chr(0xF0 | $c >> 18) . chr(0x80 | $c >> 12 & 0x3F)
. chr(0x80 | $c >> 6 & 0x3F)
. chr(0x80 | $c & 0x3F);
} else {
return false;
}
}
function asc_shift($string, $amount) {
$key = substr($string, 0, 1);
if(strlen($string)==1) {
return unichr(uniord($key) + $amount);
} else {
return unichr(uniord($key) + $amount) . asc_shift(substr($string, 1, strlen($string)-1), $amount);
}
}
$h = ord($c{0});
if ($h <= 0x7F) {
return $h;
} else if ($h < 0xC2) {
return false;
} else if ($h <= 0xDF) {
return ($h & 0x1F) << 6 | (ord($c{1}) & 0x3F);
} else if ($h <= 0xEF) {
return ($h & 0x0F) << 12 | (ord($c{1}) & 0x3F) << 6
| (ord($c{2}) & 0x3F);
} else if ($h <= 0xF4) {
return ($h & 0x0F) << 18 | (ord($c{1}) & 0x3F) << 12
| (ord($c{2}) & 0x3F) << 6
| (ord($c{3}) & 0x3F);
} else {
return false;
}
}
function unichr($c) {
if ($c <= 0x7F) {
return chr($c);
} else if ($c <= 0x7FF) {
return chr(0xC0 | $c >> 6) . chr(0x80 | $c & 0x3F);
} else if ($c <= 0xFFFF) {
return chr(0xE0 | $c >> 12) . chr(0x80 | $c >> 6 & 0x3F)
. chr(0x80 | $c & 0x3F);
} else if ($c <= 0x10FFFF) {
return chr(0xF0 | $c >> 18) . chr(0x80 | $c >> 12 & 0x3F)
. chr(0x80 | $c >> 6 & 0x3F)
. chr(0x80 | $c & 0x3F);
} else {
return false;
}
}
function asc_shift($string, $amount) {
$key = substr($string, 0, 1);
if(strlen($string)==1) {
return unichr(uniord($key) + $amount);
} else {
return unichr(uniord($key) + $amount) . asc_shift(substr($string, 1, strlen($string)-1), $amount);
}
}
Now I understand why there is a need for a lower level implementation of UTF-8 strings by default in php 6. Python support of UTF-8 support has already been included in Python 3.0 released December 3rd, 2008.