[Tut] Python decode() - Printable Version +- Sick Gaming (https://www.sickgaming.net) +-- Forum: Programming (https://www.sickgaming.net/forum-76.html) +--- Forum: Python (https://www.sickgaming.net/forum-83.html) +--- Thread: [Tut] Python decode() (/thread-100252.html) |
[Tut] Python decode() - xSicKxBot - 11-18-2022 Python decode() <div> <div class="kk-star-ratings kksr-auto kksr-align-left kksr-valign-top" data-payload='{"align":"left","id":"897189","slug":"default","valign":"top","ignore":"","reference":"auto","class":"","count":"1","readonly":"","score":"5","best":"5","gap":"5","greet":"Rate this post","legend":"5\/5 - (1 vote)","size":"24","width":"142.5","_legend":"{score}\/{best} - ({count} {votes})","font_factor":"1.25"}'> <div class="kksr-stars"> <div class="kksr-stars-inactive"> <div class="kksr-star" data-star="1" style="padding-right: 5px"> <div class="kksr-icon" style="width: 24px; height: 24px;"></div> </p></div> <div class="kksr-star" data-star="2" style="padding-right: 5px"> <div class="kksr-icon" style="width: 24px; height: 24px;"></div> </p></div> <div class="kksr-star" data-star="3" style="padding-right: 5px"> <div class="kksr-icon" style="width: 24px; height: 24px;"></div> </p></div> <div class="kksr-star" data-star="4" style="padding-right: 5px"> <div class="kksr-icon" style="width: 24px; height: 24px;"></div> </p></div> <div class="kksr-star" data-star="5" style="padding-right: 5px"> <div class="kksr-icon" style="width: 24px; height: 24px;"></div> </p></div> </p></div> <div class="kksr-stars-active" style="width: 142.5px;"> <div class="kksr-star" style="padding-right: 5px"> <div class="kksr-icon" style="width: 24px; height: 24px;"></div> </p></div> <div class="kksr-star" style="padding-right: 5px"> <div class="kksr-icon" style="width: 24px; height: 24px;"></div> </p></div> <div class="kksr-star" style="padding-right: 5px"> <div class="kksr-icon" style="width: 24px; height: 24px;"></div> </p></div> <div class="kksr-star" style="padding-right: 5px"> <div class="kksr-icon" style="width: 24px; height: 24px;"></div> </p></div> <div class="kksr-star" style="padding-right: 5px"> <div class="kksr-icon" style="width: 24px; height: 24px;"></div> </p></div> </p></div> </div> <div class="kksr-legend" style="font-size: 19.2px;"> 5/5 – (1 vote) </div> </div> <p>This tutorial explains the Python <code>decode()</code> method with arguments and examples. Before we dive into the Python <code>decode()</code> method, let’s first build some background knowledge about encoding and decoding so you can better understand its purpose. <img src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f447.png" alt="?" class="wp-smiley" style="height: 1em; max-height: 1em;" /></p> <h2>Encoding and Decoding – What Does It Mean?</h2> <figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="682" src="https://blog.finxter.com/wp-content/uploads/2022/11/image-183-1024x682.png" alt="" class="wp-image-897231" srcset="https://blog.finxter.com/wp-content/uploads/2022/11/image-183-1024x682.png 1024w, https://blog.finxter.com/wp-content/uploads/2022/11/image-183-300x200.png 300w, https://blog.finxter.com/wp-content/uploads/2022/11/image-183-768x512.png 768w, https://blog.finxter.com/wp-content/uploads/2022/11/image-183.png 1373w" sizes="(max-width: 1024px) 100vw, 1024px" /></figure> <p>Programs must handle various characters in several languages. Application developers often internationalize programs to display messages and error outputs in various languages, be it English, Russian, Japanese, French, or Hebrew. </p> <p>Python’s <strong>string </strong>type uses the <strong>Unicode Standard </strong>to represent characters, which lets Python programs work with all possible characters.</p> <p><a rel="noreferrer noopener" href="https://www.unicode.org/" data-type="URL" data-id="https://www.unicode.org/" target="_blank">Unicode</a> aims to list every character used by human languages and gives each character its unique code. The Unicode Consortium specifications regularly update its specifications for new languages and symbols.</p> <div class="wp-block-image"> <figure class="aligncenter size-large"><img decoding="async" loading="lazy" width="1024" height="767" src="https://blog.finxter.com/wp-content/uploads/2022/11/image-185-1024x767.png" alt="" class="wp-image-897244" srcset="https://blog.finxter.com/wp-content/uploads/2022/11/image-185-1024x767.png 1024w, https://blog.finxter.com/wp-content/uploads/2022/11/image-185-300x225.png 300w, https://blog.finxter.com/wp-content/uploads/2022/11/image-185-768x576.png 768w, https://blog.finxter.com/wp-content/uploads/2022/11/image-185.png 1221w" sizes="(max-width: 1024px) 100vw, 1024px" /></figure> </div> <p>A <strong>character</strong> is the smallest component of the text. For example, ’a, ‘B’, ‘c’, ‘È’ and ‘Í’ are different characters. Characters vary depending on language or context. For example, the character for “Roman Numeral One” is ‘Ⅰ’, separate from the uppercase letter ‘I’. Though they look the same, these are two different characters that have different meanings.</p> <p>The Unicode standard describes how code points represent characters. A code point value is an integer from 0 to 0x10FFFF. [1]</p> <h2>What are Encodings?</h2> <div class="wp-block-image"> <figure class="aligncenter size-large"><img decoding="async" loading="lazy" width="1024" height="683" src="https://blog.finxter.com/wp-content/uploads/2022/11/image-184-1024x683.png" alt="" class="wp-image-897242" srcset="https://blog.finxter.com/wp-content/uploads/2022/11/image-184-1024x683.png 1024w, https://blog.finxter.com/wp-content/uploads/2022/11/image-184-300x200.png 300w, https://blog.finxter.com/wp-content/uploads/2022/11/image-184-768x513.png 768w, https://blog.finxter.com/wp-content/uploads/2022/11/image-184.png 1371w" sizes="(max-width: 1024px) 100vw, 1024px" /></figure> </div> <p>A sequence of code points forms a Unicode String represented in memory as a set of <strong>code units. </strong>These code units are mapped to 8-bit bytes. Character Encoding is the set of rules to translate a Unicode string to a byte sequence<strong>.</strong></p> <p>UTF-8 is the most commonly used encoding, and Python defaults to it. UTF stands for “Unicode Transformation Format”, and the ‘8’ refers to 8-bit values used in the encoding. [2]</p> </p> <h2>Python decode()</h2> <div class="wp-block-image"> <figure class="aligncenter size-large"><img decoding="async" loading="lazy" width="1024" height="682" src="https://blog.finxter.com/wp-content/uploads/2022/11/image-186-1024x682.png" alt="" class="wp-image-897245" srcset="https://blog.finxter.com/wp-content/uploads/2022/11/image-186-1024x682.png 1024w, https://blog.finxter.com/wp-content/uploads/2022/11/image-186-300x200.png 300w, https://blog.finxter.com/wp-content/uploads/2022/11/image-186-768x512.png 768w, https://blog.finxter.com/wp-content/uploads/2022/11/image-186.png 1373w" sizes="(max-width: 1024px) 100vw, 1024px" /></figure> </div> <p>Encoders and decoders convert text between different representations, and specifically, the Python <code>bytes</code> <code>decode()</code> function converts bytes to string objects.</p> <p class="has-global-color-8-background-color has-background">The <code>decode()</code> method converts/decodes from one encoding scheme for the argument string to the desired encoding scheme. It is the opposite of the Python <code><a href="https://blog.finxter.com/python-string-encode/" data-type="post" data-id="26008" target="_blank" rel="noreferrer noopener">encode()</a></code> method. </p> <p><code>decode()</code> accepts the encoding of the encoded string, decodes it, and returns the original string.</p> <p>Here’s the syntax of the method:</p> <pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">decode(encoding, error) str.decode([encoding[, errors]]) # Example: str.decode(encoding='UTF-8',errors='strict'</pre> <p>The <code>decode()</code> arguments:</p> <figure class="wp-block-table is-style-stripes"> <table> <thead> <tr> <th>Argument</th> <th>Description</th> </tr> </thead> <tbody> <tr> <td><code>encoding</code> (optional)</td> <td>Specifies the encoding to decode. <a href="http://docs.python.org/library/codecs.html#standard-encodings" target="_blank" rel="noreferrer noopener">Standard Encodings</a> has a list of all encodings.</td> </tr> <tr> <td><code>errors</code> (optional)</td> <td>Decides how to handle the errors:</p> <p><code>'strict'</code><strong> </strong><em>[default]</em>, meaning encoding errors raise a UnicodeError. </p> <p>Other possible values are:</p> <p><code>'ignore'</code> – Ignore the character and continue with the next</p> <p><code>'replace'</code> – Replace with a suitable replacement character</p> <p><code>'xmlcharrefreplace'</code> – Inserts an XML character reference </p> <p><code>'backslashreplace'</code> – Inserts a backslash escape sequence (<code>\uNNNN</code>) instead of un-encodable Unicode characters<code></p> <p>'namereplace'</code> – Inserts a <code>\N{...}</code> escape sequence and any other name registered via <code>codecs.register_error()</code></td> </tr> </tbody> </table> </figure> <h2>Example 1</h2> <pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="2" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">text = "Python Decode converts text string from one encoding scheme to the desired one." encoded_text = text.encode('ubtf8', 'strict') print("Encoded String: ", encoded_text) print("Decoded String: ", encoded_text.decode('utf8', 'strict'))</pre> <ul> <li><strong>Encoded String</strong>: <code>b'Python Decode converts text from one encoding scheme to desired encoding scheme.'</code></li> <li><strong>Decoded String</strong>: <code>Python Decode converts text from one encoding scheme to desired encoding scheme.</code></li> </ul> <h2>Example 2</h2> <pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="1,6,8" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">>>> b'\x81abc'.decode("utf-8", "strict") Traceback (most recent call last): File "<pyshell#55>", line 1, in <module> b'\x81abc'.decode("utf-8", "strict") UnicodeDecodeError: 'utf-8' codec can't decode byte 0x81 in position 0: invalid start byte >>> b'\x80abc'.decode("utf-8", "backslashreplace") '\\x80abc' >>> b'\x80abc'.decode("utf-8", "ignore") 'abc'</pre> <h2>References</h2> <ul> <li>[1] <a href="https://home.unicode.org/" target="_blank" rel="noreferrer noopener">Unicode</a></li> <li>[2] <a href="https://docs.python.org/3/howto/unicode.html" target="_blank" rel="noreferrer noopener">Unicode HOWTO — Python 3.11.0 documentation</a></li> <li><a href="https://docs.python.org/3/library/codecs.html#standard-encodings" target="_blank" rel="noreferrer noopener">Codec registry and base classes — Python 3.11.0 documentation</a></li> <li><a href="https://www.askpython.com/python/string/python-encode-and-decode-functions" target="_blank" rel="noreferrer noopener">Python encode() and decode() Functions – AskPython</a></li> <li><a href="https://www.geeksforgeeks.org/python-strings-decode-method/" target="_blank" rel="noreferrer noopener">Python Strings decode() method – GeeksforGeeks</a></li> <li><a href="https://www.tutorialspoint.com/python/string_decode.htm" target="_blank" rel="noreferrer noopener">Python String decode() Method</a></li> <li><a href="https://www.digitalocean.com/community/tutorials/python-string-encode-decode" target="_blank" rel="noreferrer noopener">Python String encode() decode() | DigitalOcean</a></li> </ul> </div> https://www.sickgaming.net/blog/2022/11/16/python-decode/ |