{"id":242,"date":"2010-06-14T09:19:41","date_gmt":"2010-06-14T09:19:41","guid":{"rendered":"http:\/\/psyphi.net\/blog\/?p=242"},"modified":"2010-06-14T09:19:41","modified_gmt":"2010-06-14T09:19:41","slug":"adventures-in-utf-8","status":"publish","type":"post","link":"https:\/\/psyphi.net\/blog\/2010\/06\/adventures-in-utf-8\/","title":{"rendered":"Adventures in UTF-8"},"content":{"rendered":"<p>I think I&#8217;m very nearly at the verge of beginning to understand UTF-8.<\/p>\n<p><strong>Internal UTF-8 string, encoded<\/strong><br \/>\nWrong:<\/p>\n<pre><code>sentinel:~ rmp$ perl -MHTML::Entities -e 'print encode_entities(\"\u00c2\u00b0\")'\r\n&amp;Acirc;&amp;deg;<\/code><\/pre>\n<p>Right:<\/p>\n<pre><code>sentinel:~ rmp$ perl -Mutf8 -MHTML::Entities -e 'print encode_entities(\"\u00c2\u00b0\")'\r\n&amp;deg;<\/code><\/pre>\n<p><strong>External UTF-8 input, encoded<\/strong><br \/>\nWrong:<\/p>\n<pre><code>sentinel:~ rmp$ echo \"\u00c2\u00b0\" | perl -MHTML::Entities -e 'print encode_entities(&lt;&gt;)'\r\n&amp;Acirc;&amp;deg;<\/code><\/pre>\n<p>Right:<\/p>\n<pre><code>sentinel:~ rmp$ echo \"\u00c2\u00b0\" | perl -MHTML::Entities -e 'binmode STDIN, \":utf8\"; print encode_entities(&lt;&gt;)'\r\n&amp;deg;<\/code><\/pre>\n<p><strong>External UTF-8 string, as UTF-8 (unencoded)<\/strong><br \/>\nWrong:<\/p>\n<pre><code>sentinel:~ rmp$ echo \"\u00c2\u00b0\" | perl -e 'binmode STDIN, \":utf8\"; print &lt;&gt;'\r\n?<\/code><\/pre>\n<p>Right:<\/p>\n<pre><code>sentinel:~ rmp$ echo \"\u00c2\u00b0\" | perl -e 'binmode STDIN, \":utf8\"; \r\nbinmode STDOUT, \":utf8\"; print &lt;&gt;'\r\n\u00c2\u00b0<\/code><\/pre>\n<p><strong>External Input &#8211; Encoding after-the-fact<\/strong><br \/>\nWrong:<\/p>\n<pre><code>sentinel:~ rmp$ echo \"\u00c2\u00b0\" | perl -Mutf8 -e '$in=&lt;&gt;; utf8::upgrade($in);\r\nbinmode STDOUT, \":utf8\"; print $in'\r\n\u00c3\u201a\u00c2\u00b0<\/code><\/pre>\n<p>Wrong:<\/p>\n<pre><code>sentinel:~ rmp$ echo \"\u00c2\u00b0\" | perl -Mutf8 -e '$in=&lt;&gt;; utf8::encode($in);\r\nbinmode STDOUT, \":utf8\"; print $in'\r\n\u00c3\u0192\u00c2\u201a\u00c3\u201a\u00c2\u00b0<\/code><\/pre>\n<p>Wrong:<\/p>\n<pre><code>sentinel:~ rmp$ echo \"\u00c2\u00b0\" | perl -Mutf8 -e '$in=&lt;&gt;; utf8::downgrade($in); \r\nbinmode STDOUT, \":utf8\"; print $in'\r\n\u00c3\u201a\u00c2\u00b0<\/code><\/pre>\n<p>Right:<\/p>\n<pre><code>sentinel:~ rmp$ echo \"\u00c2\u00b0\" | perl -Mutf8 -e '$in=&lt;&gt;; utf8::decode($in);\r\nbinmode STDOUT, \":utf8\"; print $in'\r\n\u00c2\u00b0<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>I think I&#8217;m very nearly at the verge of beginning to understand UTF-8. Internal UTF-8 string, encoded Wrong: sentinel:~ rmp$ perl -MHTML::Entities -e &#8216;print encode_entities(&#8220;\u00c2\u00b0&#8221;)&#8217; &amp;Acirc;&amp;deg; Right: sentinel:~ rmp$ perl -Mutf8 -MHTML::Entities -e &#8216;print encode_entities(&#8220;\u00c2\u00b0&#8221;)&#8217; &amp;deg; External UTF-8 input, encoded Wrong: sentinel:~ rmp$ echo &#8220;\u00c2\u00b0&#8221; | perl -MHTML::Entities -e &#8216;print encode_entities(&lt;&gt;)&#8217; &amp;Acirc;&amp;deg; Right: sentinel:~ rmp$ &hellip; <a href=\"https:\/\/psyphi.net\/blog\/2010\/06\/adventures-in-utf-8\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Adventures in UTF-8&#8221;<\/span><\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"om_disable_all_campaigns":false,"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"_uf_show_specific_survey":0,"_uf_disable_surveys":false,"footnotes":""},"categories":[11],"tags":[406,407,21,405,404],"class_list":["post-242","post","type-post","status-publish","format-standard","hentry","category-programming","tag-encoding","tag-entities","tag-perl","tag-utf-8","tag-utf8"],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/psyphi.net\/blog\/wp-json\/wp\/v2\/posts\/242","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/psyphi.net\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/psyphi.net\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/psyphi.net\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/psyphi.net\/blog\/wp-json\/wp\/v2\/comments?post=242"}],"version-history":[{"count":10,"href":"https:\/\/psyphi.net\/blog\/wp-json\/wp\/v2\/posts\/242\/revisions"}],"predecessor-version":[{"id":252,"href":"https:\/\/psyphi.net\/blog\/wp-json\/wp\/v2\/posts\/242\/revisions\/252"}],"wp:attachment":[{"href":"https:\/\/psyphi.net\/blog\/wp-json\/wp\/v2\/media?parent=242"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/psyphi.net\/blog\/wp-json\/wp\/v2\/categories?post=242"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/psyphi.net\/blog\/wp-json\/wp\/v2\/tags?post=242"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}