base64 encode null terminator - c++

Hi I am currently trying to encode a string using the base64 encoding method in C++.
The string itself encodes fine however I would like to have an extra null character at the end of the decoded string (so the null character would also show up in the text file I want to save the decoded string into).
I am using this base64 code here -> http://www.adp-gmbh.ch/cpp/common/base64.html
I hope you can give me some advices what I can do here to make this possible (I tried already writing two null characters at the end of the string I am encoding but it seems as if the encoding method only reads to the first occurence of a null character).

A cursory lookat the encoding function does not seem to show any special handling of NUL. And neither does the decoding function, are you sure the issue is not in the way that you test for NUL in the decoded string?

Related

How to convert UTF-16 to UTF-8 using C++?

I already know 'codecvt', 'WideCharToMultiByte', and someone.
I use korean language. For example. '안녕하세요'.
It message can insert normal string class. right?
But in my case. If i have file :: 'test.txt' {in :: '안녕하세요'}
And read 'test.txt', and getline(),
(test.txt file read)
string temp;
getline(file pointer, temp);
cout<<temp;
Now i use cout. Ta-Da! message are broken!
I know that is WideCharacter problem. so i tring MultiByteToWideChar method.
Ok. It is work well.
But i not want this.
Finally I want reading widecharcter files, and save 'string' Variable.
So, I question for you.
How to convert UTF-16 (widecharcter/wstring) to UTF-8 (multibyte/string) when 'Not change message' ?
:: I want this style
wstring temp = "안녕하세요"
string temp2 = convert_to_string(temp);
->
string temp2 = "안녕하세요"
As mentioned in the comment, you can see Convert C++ std::string to UTF-16-LE encoded string for the code on how to do the conversion.
But given you assumed you have wstring to hold your Korean string, you avoided the trouble of distinguishing UTF-16-LE and UTF-16-BE and you can readily find the Unicode code point of each Korean character in the string. So your problem boils down to find the UTF-8 representation of any code point. It would not be hard, see page 3 of https://www.rfc-editor.org/rfc/rfc3629 (also Wikipedia https://en.wikipedia.org/wiki/UTF-8).
A sample code is in
Convert Unicode code points to UTF-8 and UTF-32

Decoding %E6%B0%94%E6%97%8B%E5%93%88%E5%88%A9.txt to a valid string

I am trying to decode a filename*= field of content disposition header. I get a string something like:
%E6%B0%94%E6%97%8B%E5%93%88%E5%88%A9.txt
What I have figured out that replacing % to \x works fine and I get the correct file name:
气旋哈利.txt
Is there a standard way of doing this in C++? Is there any library available to decode this?
I tried
boost::replace_all(name, "%x","\\x");
std::locale::generator gen;
std::locale locl = gen.generate("en_US.utf-8");
decoded_data = boost::locale::conv::from_utf( encoded_data, locl);
But it prints the replaced string instead of chinese characters.
\xE6\xB0\x94\xE6\x97\x8B\xE5\x93\x88\xE5\x88\xA9.txt
Any Idea where am I going wrong?
Replacing escape code like "\xE6" only work in string and character literals, not generally in strings. That's because it's handled by the compiler when it compiles the program.
However, it's not very hard to do yourself, using a simple loop that check for the '%' character, gets the next two characters and convert them to a number and use that number as a "character".

Converting Hexadecimal(\x) in a string to unicode (\u)

I'm encountering a problem currently.
I'm getting a string from a url, I'm decoding this string via curl_easy_unescape and I'm getting a decoded string. So far so good.
Now is where the problem is. For example, when the url had the "counterpart" to ü inside his header, curl_easy_unescape turns the counterpart of ü in \xfc. Now my String has \xfc.
I need it as a "ü".
I need a written "ü" in my string, or I'm getting an error that my string is not utf8 formatted. And i need it inside a string. For example
"Hallü howre yoü"
with curl_easy_escape this turns into
"Hall\xfc+howre+you\xfc"
And i want to revert the \xfc into "ü"s or into "\u00fc"s
My solutions i tried have been:
changing the \x to \u00 . That would work and do the trick. But replacing doesn't work.
encoding the string in utf 8
getting the decimal value of xFC and doing char = valueofFC.
I don't have a clue, how i could resolve that issue.

UTF 8 encoded Japanese string in XML

I am trying to create a SOAP call with Japanese string. The problem I faced is that when I encode this string to UTF8 encoded string, it has many control characters in it (e.g. 0x1B (Esc)). If I remove all such control characters to make it a valid SOAP call then the Japanese content appears as garbage on server side.
How can I create a valid SOAP request for Japanese characters? Any suggestion is highly appreciated.
I am using C++ with MS-DOM.
With Best Regards.
If I remember correctly it's true, the first 32 unicode code points are not allowed as characters in XML documents, even escaped with &#. Not sure whether they're allowed in HTML or not, but certainly the server thinks they're not allowed in your requests, and it gets the only meaningful vote.
I notice that your document claims to be encoded in iso-2022-jp, not utf-8. And indeed, the sequence of characters ESC $ B that appears in your document is valid iso-2022-jp. It indicates that the data is switching encodings (from ASCII to a 2-byte Japanese encoding called JIS X 0208-1983).
But somewhere in the process of constructing your request, something has seen that 0x1B byte and interpreted it as a character U+001B, not realising that it's intended as one byte in data that's already encoded in the document encoding. So, it has XML-escaped it as a "best effort", even though that's not valid XML.
Probably, whatever is serializing your XML document doesn't know that the encoding is supposed to be iso-2022-jp. I imagine it thinks it's supposed to be serializing the document as ASCII, ISO-Latin-1, or UTF-8, and the <meta> element means nothing to it (that's an HTML way of specifying the encoding anyway, it has no particular significance in XML). But I don't know MS-DOM, so I don't know how to correct that.
If you just remove the ESC characters from iso-2022-jp data, then you conceal the fact that the data has switched encodings, and so the decoder will continue to interpret all that 7nMK stuff as ASCII, when it's supposed to be interpreted as JIS X 0208-1983. Hence, garbage.
Something else strange -- the iso-2022-jp code to switch back to ASCII is ESC ( B, but I see |(B</font> in your data, when I'd expect the same thing to happen to the second ESC character as happened to the first: &#0x1B(B</font>. Similarly, $B#M#S(B and $BL#D+(B are mangled attempts to switch from ASCII to JIS X 0208-1983 and back, and again the ESC characters have just disappeared rather than being escaped.
I have no explanation for why some ESC characters have disappeared and one has been escaped, but it cannot be coincidence that what you generate looks almost, but not quite, like valid iso-2022-jp. I think iso-2022-jp is a 7 bit encoding, so part of the problem might be that you've taken iso-2022-jp data, and run it through a function that converts ISO-Latin-1 (or some other 8 bit encoding for which the lower half matches ASCII, for example any Windows code page) to UTF-8. If so, then this function leaves 7 bit data unchanged, it won't convert it to UTF-8. Then when interpreted as UTF-8, the data has ESC characters in it.
If you want to send the data as UTF-8, then first of all you need to actually convert it out of iso-2022-jp (to wide characters or to UTF-8, whichever your SOAP or XML library expects). Secondly you need to label it as UTF-8, not as iso-2022-jp. Finally you need to serialize the whole document as UTF-8, although as I've said you might already be doing that.
As pointed out by Steve Jessop, it looks like you have encoded the text as iso-2022-jp, not UTF-8. So the first thing to do is to check that and ensure that you have proper UTF-8.
If the problem still persists, consider encoding the text.
The simplest option is "hex encoding" where you just write the hex value of each byte as ASCII digits. e.g. the 0x1B byte becomes "1B", i.e. 0x31, 0x42.
If you want to be fancy you could use MIME or even UUENCODE.

URL encoding for multibyte character string in c++

I am trying to achieve URL encoding for some of my strings via c++. Strings can contaim multibyte characters like ™, ®, ©, etc.
Input text: Something ™
Output should be: Something%20%E2%84%A2
I can achieve URL encode or decode in JS with encodeURIComponent and decodeURIComponent,
but I have some native code in c++ and hence need to encode some text via c++.
Any help here would be great relief for me.
It's not to hard to do manually, if you can't find a library. First encode the string as UTF-8 (there are other posts on SO about using the standard library to do that if the string is in another encoding) and then replace every character with a value above 127, and every one that's restricted in URLs, with the percent encoding of that character (A percent sign followed by the two hexadecimal digits representing the character's value).