In ColdFusion, URLDecode() decodes the URL-encoded string (URL encoding formats some characters with a percent sign and the two-character hexadecimal representation of the character).
Example: %3A is the hex equivalent for ":(colon)" When the URLDecode() is applied on different strings as below,
URLDecode("%3A%") = :% -- Valid
URLDecode("%EE") = � -- this is because EE has no equivalent character.
But I am trying to decode the string "%ara%" which is invalid, the result I am getting is "%ar%". I don't find the 2nd occurrence of character "a".Can anyone explain me why this is happening??
Related
Python : 2.7
I need to convert to utf-7 before I go ahead, so I have used below code in python 2.7 interpreter:
>>> mbox = u'한국의'
>>> mbox.encode('utf-7').replace(b"+", b"&").replace(b"/", b",")
'&1VytbcdY-'
Same code when I use in my python script as shown below, the output for mbox is b'&Ti1W,XaE' instead of b'&Ti1W,XaE-' i.e. "-" at end of string is missing when running as a script instead of interpreter.
mbox = "b'" + mbox + "'"
print mbox
mbox = mbox.encode('utf-7').replace(b"+", b"&").replace(b"/", b",")
print mbox
Please suggest.
Quoting from Wikipedia's description of UTF-7:
Other characters must be encoded in UTF-16 (hence U+10000 and higher would be encoded into surrogates), big-endian (hence higher-order bits appear first), and then in modified Base64. The start of these blocks of modified Base64 encoded UTF-16 is indicated by a + sign. The end is indicated by any character not in the modified Base64 set. If the character after the modified Base64 is a - (ASCII hyphen-minus) then it is consumed by the decoder and decoding resumes with the next character. Otherwise decoding resumes with the character after the base64.
Any block of encoded characters must end with a non-Base64 character. If the string includes such a character, it will be used, otherwise - is added to the end of the block. Your first example includes a - for this reason. Your second example doesn't need one because ' is not part of the Base64 character set.
If your intent is to create a Python literal that creates a valid UTF-7 string, just do things in a different order.
mbox = b"b'" + mbox.encode('utf-7').replace(b"+", b"&").replace(b"/", b",") + b"'"
I'm trying to parse cyrillyc text from the site page and i have that error if i try to print soup.text of the scring which includes closing quotation marks in the word "word"
error 'charmap' codec can't encode character u'\xab' in position 6: charater maps to undefined
The original string page (utf-8)
urllib2.urlopen raw page = bbb = '\xab\x80\xd1\x8c\xc2\xbb'
\xbb and \xab- it's closing quotation mark
I try to convert to unicode by hand (BeautifulSoup does this too)
unicode(bbb, 'utf8', errors='ignore')
But inspite of error key "ignore" unknown elements they still exists int
i get
\xab\u0446\u0435\u0437\u0430\u0440\u044c**\xbb**'
I try to delete all unknown element starting with ^\x with help regular exp, but it's doesn't work
bbb = re.sub(r'[\x00-\x7f]', r' ', bbb)
But inspite of error key "ignore" unknown elements they still exists
u'\xbb' is not an unknown element, there is no problem there. It represents the character U+00BB Right-Pointing Double Angle Quotation Mark. The Unicode string literals u'\xbb' and u'\u00bb' represent the same string.
\x has a different meaning depending on what kind of string literal it is used in. In a byte string, it introduces a hex-encoded byte from 0x00 to 0xFF. In a Unicode string, it introduces a hex-encoded character from U+0000 to U+00FF. When producing the repr() representation of a string, Python prefers to output characters in the range up to U+00FF using \x escapes rather than the arguably-clearer \u escapes, because they're shorter.
The \u and \x are merely alternative ways to refer to a character in the string literal representation; they are not literally part of the value of the string. There is no actual backslash in the value, so you can't use re to try to remove characters that might appear in the repr() form as backslash escapes.
The actual error:
error 'charmap' codec can't encode character u'\xab' in position 6: charater maps to undefined
Is just PrintFails again as usual. Apparently your console is using an encoding that doesn't include the character U+00AB.
If you are using the Windows Command Prompt, you could try to use win-unicode-console as a workaround for the brokenness of that particular console.
data = "000000000000000117c80378b8da0e33559b5997f2ad55e2f7d18ec1975b9717"
result1 = data.decode('hex')[::-1]
The hex data are decoded to decimal, which is 6,860,217,587,554,922,525,607,992,740,653,361,396,256,930,700,588,249,487,127
Then the decimal number 6,860,217,587,554,922,525,607,992,740,653,361,396,256,930,700,588,249,487,127 is converted to bits and reversed its order (little-endian) and stored in result1 variable as a bitarray?
Is this what exactly happens with that code or did I misunderstood anything?
So the result1 variable is a bitarray?
If it's just a integer variable, how can it hold that much long decimal value?
Strings in python are declared using double or single quotes, therefore the variable data contains a string.
You can check the type of a variable directly in python:
data = "000000000000000117c80378b8da0e33559b5997f2ad55e2f7d18ec1975b9717"
type(data)
which outputs
str
meaning that the variable is a string.
When you call the function decode('hex') on a string you obtain another string:
data.decode('hex')
'\x00\x00\x00\x00\x00\x00\x00\x01\x17\xc8\x03x\xb8\xda\x0e3U\x9bY\x97\xf2\xadU\xe2\xf7\xd1\x8e\xc1\x97[\x97\x17'
Every character in your original string is interpreted as an hexadecimal number, and every pairs of hexadecimal numbers - e.s. "17" - is converted into an hexadecimal character using the escape sequence \x - becoming "\x17".
When you write "\x41" you are basically telling python to interpret 41 as a single ASCII character whose hexadecimal representation is 41.
The ASCII table contains the hexadecimal, decimal and octal values associated to the ascii characters.
If you try for example
"48454C4C4F".decode('hex')
you obtain the string "HELLO"
Lastly when you use [::-1] on a string you reverse it:
"48454C4C4F".decode('hex')[::-1]
produces the string "OLLEH"
You can find more about the escape characters reading the python documentation.
I am reading a webservice output from classic asp.
Web service output is as follows.
<boolean xmlns="http://somewebsite.com/">true</boolean>
This is proper output as expected.
I have written below code to read this output in classic asp.
Set obj1 = Server.createobject("MSXML2.ServerXMLHTTP.3.0")
URL1 = "http://webserive.asmx/method?para=2"
obj1.open "GET", URL1, False
obj1.setRequestHeader "Content-Type", "application/x-www-form-urlencoded; charset=utf-8"
obj1.setRequestHeader "SOAPAction", URL1
obj1.send
if obj1.responseText <> "" Then
response.write "ok." & obj1.responseText
end if
But this output prints following output:
"ok. true"
There is a space in the output which is not expected.
This is the problem.
Please advice.
Your output from ResponseText for all intense and purpose is valid HTML structure as far as an Internet Browser is concerned and will treat it accordingly. When you use Response.Write() to send content to a Browser it is sent "as is" so in this case the <boolean> element is seen as HTML so only the contained text true is outputted.
To fix this you first need to HTML encode the ResponseText before you send it to the Browser so the Browser knows to treat what is sent as plain old text. You do this by calling the method Server.HTMLEncode()
Response.Write "ok." & Server.HTMLEncode(obj1.ResponseText)
According to MSDN;
The HTMLEncode method applies HTML encoding to a specified string. This is useful as a quick method of encoding form data and other client request data before using it in your Web application. Encoding data converts potentially unsafe characters to their HTML-encoded equivalent.
If the string to be encoded is not Double-Byte Character Set (DBCS), HTMLEncode converts characters as follows:
The less-than character (<) is converted to <.
The greater-than character (>) is converted to >.
The ampersand character (&) is converted to &.
The double-quote character (") is converted to ".
Any ASCII code character whose code is greater-than or equal to 0x80 is converted to &#<number>, where <number> is the ASCII character value.
If the string to be encoded is DBCS, HTMLEncode converts characters as follows:
All extended characters are converted.
Any ASCII code character whose code is greater-than or equal to 0x80 is converted to &#<number>, where <number> is the ASCII character value.
Half-width Katakana characters in the Japanese code page are not converted.
At the moment when send the ResponseText this happens;
ResponseText
<boolean xmlns="http://somewebsite.com/">true</boolean>
Output at client
ok.true
If you use Server.HTMLEncode() it will be;
ResponseText (HTML Encoded)
<boolean xmlns="e;http://somewebsite.com/"e;>true</boolean>
Output at client
ok.<boolean xmlns="http://somewebsite.com/">true</boolean>
So I think I may have miss understood what you require. If you want to parse the XML to remove it so only the content of;
<boolean xmlns="http://somewebsite.com/">true</boolean>
is returned to the browser in this case true then you need to parse the XML using something like XPath to get the underlying value.
'After the initial Send()
Dim xml, root
If obj1.Status = 200 Then
Set xml = obj1.ResponseXML
Call xml.SetProperty("SelectionLanguage", "XPath")
Set root = xml.DocumentElement
Call Response.Write(root.SelectSingleNode("boolean").Text)
End If
I need to validate a hexadecimal string value (containing only A-F or a-f or 0-9 and combination of this pattern).
I have searched varioud forums and SO as well, and find some solution but none of them is satisfying, at some point some of them are failing to give appropriate results.
Below are some samples.
translate(upper(<VALUE-TO-CHECK>), '0123456789ABCDEF', '.') != '..'
above code is giving incorrect result for values '1234567890ABCDEF' or '000000' or '100000' etc.
REGEXP_LIKE(LTRIM(RTRIM(<VALUE-TO-CHECK>)), '[a-f|A-F|0-9].*');
above code is giving incorrect result for values 'Q1W'
hex_num := TO_NUMBER(<VALUE-TO-CHECK>, 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX');
EXCEPTION
WHEN value_error THEN -- When value_error that means not convertible to HEX value
RETURN FALSE;
above code is giving incorrect result for a 64 byte long hexadecimal character value i.e. 'CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC'
Can anyone please help on me to validate hexadecimal values.
select
case
when regexp_like(:str, '^[^g-zG-Z]*$') then 'Hex'
else 'NotHex'
end typ
from dual