Skip to content Skip to sidebar Skip to footer

Converting Unicode Sequences To A String In Python 3

In parsing an HTML response to extract data with Python 3.4 on Kubuntu 15.10 in the Bash CLI, using print() I am getting output that looks like this: \u05ea\u05d4 \u05e0\u05e9\u05d

Solution 1:

It appears your input uses backslash as an escape character, you should unescape the text before passing it to json:

>>>foobar = '{\\"body\\": \\"\\\\u05e9\\"}'>>>import re>>>json_text = re.sub(r'\\(.)', r'\1', foobar) # unescape>>>import json>>>print(json.loads(json_text)['body'])
ש

Don't use 'unicode-escape' encoding on JSON text; it may produce different results:

>>>import json>>>json_text = '["\\ud83d\\ude02"]'>>>json.loads(json_text)
['😂']
>>>json_text.encode('ascii', 'strict').decode('unicode-escape') #XXX don't do it
'["\ud83d\ude02"]'

'😂' == '\U0001F602' is U+1F602 (FACE WITH TEARS OF JOY).

Post a Comment for "Converting Unicode Sequences To A String In Python 3"