I think I’m very nearly at the verge of beginning to understand UTF-8.
Internal UTF-8 string, encoded
Wrong:
sentinel:~ rmp$ perl -MHTML::Entities -e 'print encode_entities("°")'
°
Right:
sentinel:~ rmp$ perl -Mutf8 -MHTML::Entities -e 'print encode_entities("°")'
°
External UTF-8 input, encoded
Wrong:
sentinel:~ rmp$ echo "°" | perl -MHTML::Entities -e 'print encode_entities(<>)'
°
Right:
sentinel:~ rmp$ echo "°" | perl -MHTML::Entities -e 'binmode STDIN, ":utf8"; print encode_entities(<>)'
°
External UTF-8 string, as UTF-8 (unencoded)
Wrong:
sentinel:~ rmp$ echo "°" | perl -e 'binmode STDIN, ":utf8"; print <>'
?
Right:
sentinel:~ rmp$ echo "°" | perl -e 'binmode STDIN, ":utf8";
binmode STDOUT, ":utf8"; print <>'
°
External Input – Encoding after-the-fact
Wrong:
sentinel:~ rmp$ echo "°" | perl -Mutf8 -e '$in=<>; utf8::upgrade($in);
binmode STDOUT, ":utf8"; print $in'
°
Wrong:
sentinel:~ rmp$ echo "°" | perl -Mutf8 -e '$in=<>; utf8::encode($in);
binmode STDOUT, ":utf8"; print $in'
°
Wrong:
sentinel:~ rmp$ echo "°" | perl -Mutf8 -e '$in=<>; utf8::downgrade($in);
binmode STDOUT, ":utf8"; print $in'
°
Right:
sentinel:~ rmp$ echo "°" | perl -Mutf8 -e '$in=<>; utf8::decode($in);
binmode STDOUT, ":utf8"; print $in'
°