Convert special characters(UTF-8)

Convert special characters(UTF-8) - c++

I am trying to convert charactes à with #include <iconv.h> but i receive garbage �.
This is the code
#include <stdio.h>
#include <string.h>
#include <iconv.h>
#include <iostream>
int main()
{
char *gbk_str = "àèì asdsa sd aä";
char dest_str[100];
char *out = dest_str;
size_t inbytes = strlen(gbk_str);
size_t outbytes = sizeof dest_str;
iconv_t conv = iconv_open("ISO8859-1", "UTF-8");
if (conv == (iconv_t)-1) {
std::cout <<"iconv_open";
return 1;
}
if (iconv(conv, &gbk_str, &inbytes, &out, &outbytes) == (size_t)-1) {
std::cout << "iconv";
return 1;
}
dest_str[sizeof dest_str - outbytes] = 0;
puts(dest_str);
return 0;
}
Come back with :
Itlian character: POLS 6000 Impianto riduzione d. velocità
byte encoding : 50 4f 4c 53 20 36 30 30 30 20 49 6d 70 69 61 6e 74 6f 20 72 69 64 75 7a 69 6f 6e 65 20 64 2e 20 76 65 6c 6f 63 69 74 c3 a0
Converted: POLS 6000 Impianto riduzione d. velocità -> POLS 6000 Impianto riduzione d. velocit340

You have to set your terminal character encoding to ISO8859-1. In my Linux's Mate Terminal it is:
Terminal >> Set Character Encoding >> Central European (WINDOWS-1250)
Then I was able to see correct output from your program. Without this change it was indeed a garbage.

Related

Strange problem with decryption of AES using OpenSSL, Gets padded with same looking junk, Base64 padding issue

Can someone tell me whats wrong with my code?
It works fine in my test example.. but when I use it in production model it decrypts the string but adds a padded symbol to maintain some kind of block size or something.
I didn't post my encrypt/decrypt methods as they would make this post too big, plus they work fine as my test example decrypts and encrypts properly, ini.GetValue is a INI retrieval method there is nothing wrong with it, plus you can see the base64 size is the same as the example code, so I believe it works fine, I never had any problems with it before without encryption when I used it, it returns a const char*The problem is known as you can see the production code ciphertext has appended to it 2 null bytes which I find strange becasue both codes are pretty much identical, I'm not good at C++ so I'm probably overlooking some basic char array stuff
The encryption code I use is AES-256-CBC from OpenSSL 1.1.1
Look at my outputs to see whats wrong.
Good looking example code:
Ciphertext is:
000000: 7a e1 69 61 65 bb 74 ad 1a 68 8a ae 73 70 b6 0e z.iae.t..h..sp..
000010: 4f c9 45 9b 44 ca e2 be e2 aa 16 14 cd b1 79 7b O.E.D.........y{
000020: 86 a5 92 26 e6 08 3e 55 61 4e 60 03 50 f3 e4 c1 ...&..>UaN`.P...
000030: fe 5a 2c 0b df c9 1b d8 92 1f 48 75 0d f8 c2 44 .Z,.......Hu...D
Base64 (size=88):
000000: 65 75 46 70 59 57 57 37 64 4b 30 61 61 49 71 75 euFpYWW7dK0aaIqu
000010: 63 33 43 32 44 6b 2f 4a 52 5a 74 45 79 75 4b 2b c3C2Dk/JRZtEyuK+
000020: 34 71 6f 57 46 4d 32 78 65 58 75 47 70 5a 49 6d 4qoWFM2xeXuGpZIm
000030: 35 67 67 2b 56 57 46 4f 59 41 4e 51 38 2b 54 42 5gg+VWFOYANQ8+TB
000040: 2f 6c 6f 73 43 39 2f 4a 47 39 69 53 48 30 68 31 /losC9/JG9iSH0h1
000050: 44 66 6a 43 52 41 3d 3d DfjCRA==
b cip len = 64
a cip len = 16
plain b = 0
plain a = 3
Decrypted text is:
wtf
Decrypted base64 is:
wtf
000000: 77 74 66 00 wtf.
Bad production code example:
Base64 (size=88)
000000: 6a 7a 48 30 46 71 73 54 45 47 4d 76 2f 67 76 59 jzH0FqsTEGMv/gvY
000010: 4d 73 34 54 2f 39 58 32 6c 37 54 31 4d 6d 56 61 Ms4T/9X2l7T1MmVa
000020: 36 45 4f 38 52 64 45 57 42 6b 65 48 71 31 31 45 6EO8RdEWBkeHq11E
000030: 39 2b 77 37 47 4e 49 4a 47 4a 71 42 55 74 54 70 9+w7GNIJGJqBUtTp
000040: 30 36 58 46 31 4d 66 45 79 44 45 71 5a 69 58 54 06XF1MfEyDEqZiXT
000050: 79 45 53 6b 65 41 3d 3d yESkeA==
Ciphertext is:
000000: 8f 31 f4 16 ab 13 10 63 2f fe 0b d8 32 ce 13 ff .1.....c/...2...
000010: d5 f6 97 b4 f5 32 65 5a e8 43 bc 45 d1 16 06 47 .....2eZ.C.E...G
000020: 87 ab 5d 44 f7 ec 3b 18 d2 09 18 9a 81 52 d4 e9 ..]D..;......R..
000030: d3 a5 c5 d4 c7 c4 c8 31 2a 66 25 d3 c8 44 a4 78 .......1*f%..D.x
000040: 00 00 ..
b cip len = 65
a cip len = 17
crypt miss-match
plain b = 16
crypt write fail
plain a = 16
000000: 77 74 66 09 09 09 09 09 09 09 09 05 05 05 05 05 wtf.............
Here are my codes as you can see they both look very similar so I don't understand whats the problem.
Here is a little helper function for hexdump outputs I use.
void Hexdump(void* ptr, int buflen)
{
unsigned char* buf = (unsigned char*)ptr;
int i, j;
for (i = 0; i < buflen; i += 16) {
myprintf("%06x: ", i);
for (j = 0; j < 16; j++)
if (i + j < buflen)
myprintf("%02x ", buf[i + j]);
else
myprintf(" ");
myprintf(" ");
for (j = 0; j < 16; j++)
if (i + j < buflen)
myprintf("%c", isprint(buf[i + j]) ? buf[i + j] : '.');
myprintf("\n");
}
}
char* base64(const unsigned char* input, int length) {
const auto pl = 4 * ((length + 2) / 3);
auto output = reinterpret_cast<char*>(calloc(pl + 1, 1)); //+1 for the terminating null that EVP_EncodeBlock adds on
const auto ol = EVP_EncodeBlock(reinterpret_cast<unsigned char*>(output), input, length);
if (pl != ol) { myprintf("b64 calc %d,%d\n",pl, ol); }
return output;
}
unsigned char* decode64(const char* input, int length) {
const auto pl = 3 * length / 4;
auto output = reinterpret_cast<unsigned char*>(calloc(pl + 1, 1));
const auto ol = EVP_DecodeBlock(output, reinterpret_cast<const unsigned char*>(input), length);
if (pl != ol) { myprintf("d64 calc %d,%d\n", pl, ol); }
return output;
}
Here is the test example that works fine.
/* enc test */
/* Message to be encrypted */
unsigned char* plaintext = (unsigned char*)"wtf";
/*
* Buffer for ciphertext. Ensure the buffer is long enough for the
* ciphertext which may be longer than the plaintext, depending on the
* algorithm and mode.
*/
unsigned char* ciphertext = new unsigned char[128];
/* Buffer for the decrypted text */
unsigned char decryptedtext[128];
int decryptedtext_len, ciphertext_len;
/* Encrypt the plaintext */
ciphertext_len = encrypt(plaintext, strlen((char*)plaintext), ciphertext);
/* Do something useful with the ciphertext here */
myprintf("Ciphertext is:\n");
Hexdump((void*)ciphertext, ciphertext_len);
myprintf("Base64 (size=%d):\n", strlen(base64(ciphertext, ciphertext_len)));
Hexdump((void*)base64(ciphertext, ciphertext_len), 4 * ((ciphertext_len + 2) / 3));
/* Decrypt the ciphertext */
decryptedtext_len = decrypt(ciphertext, ciphertext_len, decryptedtext);
/* Add a NULL terminator. We are expecting printable text */
decryptedtext[decryptedtext_len] = '\0';
/* Show the decrypted text */
myprintf("Decrypted text is:\n");
myprintf("%s\n", decryptedtext);
myprintf("Decrypted base64 is:\n");
myprintf("%s\n", decode64(base64(decryptedtext, decryptedtext_len), 4 * ((decryptedtext_len + 2) / 3)));
Hexdump(decode64(base64(decryptedtext, decryptedtext_len), 4 * ((decryptedtext_len + 2) / 3)), 4 * ((decryptedtext_len + 2) / 3));
/* enc test end */
Here is the bad production code:
//Decrypt the username
const char* b64buffer = ini.GetValue("Credentials", "SavedPassword", "");
int b64buffer_length = strlen(b64buffer);
myprintf("Base64 (size=%d)\n", b64buffer_length);
Hexdump((void*)b64buffer, b64buffer_length);
int decryptedtext_len;
int decoded_size = 3 * b64buffer_length / 4;
unsigned char* decryptedtext = new unsigned char[decoded_size];
//unsigned char* ciphertext = decode64(b64buffer, b64buffer_length); //had this before same problem as below line, this worked without initializing new memory I perfer to fix this back up
unsigned char* ciphertext = new unsigned char[decoded_size];
memcpy(ciphertext, decode64(b64buffer, b64buffer_length), decoded_size); //same problem as top line.
myprintf("Ciphertext is:\n");
Hexdump((void*)ciphertext, decoded_size);
/* Decrypt the ciphertext */
decryptedtext_len = decrypt(ciphertext, decoded_size - 1, decryptedtext);
/* Add a NULL terminator. We are expecting printable text */
decryptedtext[decryptedtext_len] = '\0';
Hexdump(decryptedtext, decryptedtext_len);
strcpy(password_setting, (char*)decryptedtext); //save decrypted password back
delete[] decryptedtext;
delete[] ciphertext;

In the example that works, you get ciphertext_len directly from the encryption function. When you display the ciphertext, you use this length.
In the "bad production code", you calculate decoded_size from the length of the Base64 data. However, Base64 encoded data always has a length that is a multiple of 4. If the original data size is not a multiple of 3, then there are one or two padding characters added to the string. In both of your examples, you have two of these characters, the '=' at the end of the Base64 data.
When calculating the length of the decrypted data, you need to account for these bytes. If there are no '=' characters at the end of the string, use the length that you calculated (3 * N / 4). If there is one '=' character, reduce that calculated length by 1, and if there are two '=' characters, reduce the calculated length by 2. (There will not be 3 padding characters.)
Edit: Here is my fix: (sspoke)
char* base64(const unsigned char* input, int length) {
const auto pl = 4 * ((length + 2) / 3);
auto output = reinterpret_cast<char*>(calloc(pl + 1, 1)); //+1 for the terminating null that EVP_EncodeBlock adds on
const auto ol = EVP_EncodeBlock(reinterpret_cast<unsigned char*>(output), input, length);
if (pl != ol) { printf("encode64 fail size size %d,%d\n",pl, ol); }
return output;
}
unsigned char* decode64(const char* input, int* length) {
//Old code generated base length sizes because it didn't take into account the '==' signs.
const auto pl = 3 * *length / 4;
auto output = reinterpret_cast<unsigned char*>(calloc(pl + 1, 1));
const auto ol = EVP_DecodeBlock(output, reinterpret_cast<const unsigned char*>(input), *length);
if (pl != ol) { printf("decode64 fail size size %d,%d\n", pl, ol); }
//Little bug fix I added to fix incorrect length's because '==' signs are not considered in the output. -sspoke
if (*length > 3 && input[*length - 1] == '=' && input[*length - 2] == '=')
*length = ol - 2;
else if (*length > 2 && input[*length - 1] == '=')
*length = ol - 1;
else
*length = ol;
return output;
}

Remove Control Characters out of a string

I have a string given, which contains the following content (so the following lines are stored in a String-variable):
S*⸮
------------------------
K!
NAG 00.10
K"
NMAGICSTAR 2 L V1.0-1
K#
AUFSTELLORT: S 00000000
K$
GERAET NR.: 0000000000
KC
ZULASSUNGS NR.:411107770
K)
BAUART: NAG5A02
K(
ABLAUFDATUM: 2021/04
------------------------
Can anyone help me or give me a short hint how to remove the control codes (so the S*⸮ respectively the K!) out of this string (there is always a small rectangle before the control code, i don't know why it is removed)? So that in the end, it's
------------------------
NAG 00.10
NMAGICSTAR 2 L V1.0-1
AUFSTELLORT: S 00000000
GERAET NR.: 0000000000
ZULASSUNGS NR.:411107770
BAUART: NAG5A02
ABLAUFDATUM: 2021/04
------------------------
Let me finally quote something out of the documentation, maybe it helps:
Each line is max. 24 characters long and must end with LF [0Ah]
Control Code "ESC 'S' 21h LF" means: XON Startsequence with manufacturer code, machine code and dataset code
I am trying to do this whole task on an ESP32/ Arduino IDE (C++).

This is not an anwser. You may use the following code to print you string as integer in hex form. Every 12 characters a wide separation, and every 24 a new line. The arrange makes easier for you to count 24 characters.
#include <iostream>
void dump_str(const std::string&str)
{
int n;
std::cout << std::hex;
for (int i=0; i<str.size(); i++)
{
n = str[i];
if (i%24==0) std::cout << std::endl;
else if (i%12 == 0 ) std::cout <<" ";
if (n<16) std::cout << " " << '0' << n;
else std::cout << " " << n;
}
}
int main ()
{
std::string str ( "some\r\ttest\rst\2\athis is \n a ran\5dom\10g\n\nTake this for granted. To bo or not to be\a\a\t\t");
dump_str(str);
}
Print-out of this example:(meaning of the number can be checked in an ascii table google search.)
73 6f 6d 65 0d 09 74 65 73 74 0d 73 74 02 07 74 68 69 73 20 69 73 20 0a
20 61 20 72 61 6e 05 64 6f 6d 08 67 0a 0a 54 61 6b 65 20 74 68 69 73 20
66 6f 72 20 67 72 61 6e 74 65 64 2e 20 54 6f 20 62 6f 20 6f 72 20 6e 6f
74 20 74 6f 20 62 65 07 07 09 09
Send you string to above function dum_str(string), and copy the resultant table appending to your post.

Here's how to split the string at \r and \n characters. This way you can iterate each line separately.
void str_split(const std::string& in, std::vector<std::string>& out, const std::string& delim=" \t\r\n")
{
std::string::size_type firstPos = in.find_first_not_of(delim);
std::string::size_type secondPos = in.find_first_of(delim, firstPos);
out.clear();
if(firstPos != std::string::npos)
out.push_back( in.substr( firstPos, secondPos - firstPos ) );
while( secondPos != std::string::npos )
{
firstPos = in.find_first_not_of(delim, secondPos);
if(firstPos == std::string::npos)
break;
secondPos = in.find_first_of( delim, firstPos );
out.push_back( in.substr( firstPos, secondPos - firstPos ) );
}
}

Write raw memory to file

I am trying to dump the memory (made with malloc) to a file. I want to dump the raw data because I don't know what's inside the memory (int float double) at the point that I want to dump the memory.
What's the best way to do this?
I have tried a few thing already but non of them worked as i wanted.

In C, it's quite trivial, really:
const size_t size = 4711;
void *data = malloc(size);
if(data != NULL)
{
FILE *out = fopen("memory.bin", "wb");
if(out != NULL)
{
size_t to_go = size;
while(to_go > 0)
{
const size_t wrote = fwrite(data, to_go, 1, out);
if(wrote == 0)
break;
to_go -= wrote;
}
fclose(out);
}
free(data);
}
The above attempts to properly loop fwrite() to handle short writes, that's where most of the complexity comes from.

It's not clear what you mean by "not working".
You could reinterpret_cast the memory to a char * and write it to file easily.
Reading it back again is a different matter.

The "C++ way" of doing it would probably involve using std::ostream::write with a stream in binary mode.
#include <fstream>
#include <string>
bool write_file_binary (std::string const & filename,
char const * data, size_t const bytes)
{
std::ofstream b_stream(filename.c_str(),
std::fstream::out | std::fstream::binary);
if (b_stream)
{
b_stream.write(data, bytes);
return (b_stream.good());
}
return false;
}
int main (void)
{
double * buffer = new double[100];
write_file_binary("test.bin",
reinterpret_cast<char const *>(buffer),
sizeof(double)*100);
delete[] buffer;
return 0;
}

If this is C++, this might help you, as part of serializing and deserializing,
I write the raw memory array to a file (using new[] is essentially the same
as malloc in the C world):
https://github.com/goblinhack/simple-c-plus-plus-serializer
#include "hexdump.h"
auto elems = 128;
static void serialize (std::ofstream out)
{
auto a = new char[elems];
for (auto i = 0; i > bits(a);
hexdump(a, elems);
}
Output:
128 bytes:
0000 00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f |................|
0010 10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f |................|
0020 20 21 22 23 24 25 26 27 28 29 2a 2b 2c 2d 2e 2f | !"#$%&'()*+,-./|
0030 30 31 32 33 34 35 36 37 38 39 3a 3b 3c 3d 3e 3f |0123456789:;?|
0040 40 41 42 43 44 45 46 47 48 49 4a 4b 4c 4d 4e 4f |#ABCDEFGHIJKLMNO|
0050 50 51 52 53 54 55 56 57 58 59 5a 5b 5c 5d 5e 5f |PQRSTUVWXYZ[\]^_|
0060 60 61 62 63 64 65 66 67 68 69 6a 6b 6c 6d 6e 6f |`abcdefghijklmno|
0070 70 71 72 73 74 75 76 77 78 79 7a 7b 7c 7d 7e 7f |pqrstuvwxyz{|}~.|

How can I set a strings value from hex?

for example the text "Book of Summoning"s hex value is "42 6F 6F 6B 20 6F 66 20 53 75 6D 6D 6F 6E 69 6E 67"
so i would want to be able to set the variable as if i set it like string a = "book of summoing";
but using the hex value.
With input like this or something similar.
std::string hex = "42 6F 6F 6B 20 6F 66 20 53 75 6D 6D 6F 6E 69 6E 67";
I want to set a string variable with it so that the string will look like "Book of Summoning";
As if I were to look at this variable in debug mode, each character of the string would have one of these spaced hex values. But of course printing the string would print "Book of Summoning".
if i just new how to do it with one character i could build such a function.
OR if you can do with with decimal instead of hex that will work for me also, as ill just make a function to convert from hex to dec
EDIT:
in the debug mode i can see that the string hex[0] first char 'B' has a 66 beside it, which im guess is the dec value for that character. if i new how to get that value or set a string by setting that value i could do all this but i dont know if i can do that.

Like this:
std::string hex = "42 6F 6F 6B 20 6F 66 20 53 75 6D 6D 6F 6E 69 6E 67";
std::istringstream iss(hex);
int i;
while (iss >> std::hex >> i)
std::cout << static_cast<char>(i);
// alternatively
// s += static_cast<char>(i);
// where s is a std::string
This assumes the input is already sanitized and contains values that fit in a char.
Live demo.

#include <iostream>
#include <iomanip>
#include <sstream>
#include <string>
#include <iterator>
#include <algorithm>
int main()
{
std::string hex = "42 6F 6F 6B 20 6F 66 20 53 75 6D 6D 6F 6E 69 6E 67";
std::istringstream is( hex );
is >> std::hex;
std::copy( std::istream_iterator<int>( is ), std::istream_iterator<int>(),
std::ostream_iterator<char>( std::cout ) );
std::cout << std::endl;
}
EDIT: I added missed header <sstream>

The algorithm:
start traversing the input string
append in a temporary string the current character
at each space you:
convert the temporary string into a temporary number via strtol (or similar - where you can specify the base) using base 16
reset the temporary string to be empty
get the ASCII character for your temporary number
append it to the final string.
Go on as long as you have character in the input string.

c++ XOR string key hex

I am trying to XOR some already encrypted files.
I know that the XOR key is 0x14 or dec(20).
My code works except for one thing. All the '4' is gone.
Here is my function for the XOR:
void xor(string &nString) // Time to undo what we did from above :D
{
const int KEY = 0x14;
int strLen = (nString.length());
char *cString = (char*)(nString.c_str());
for (int i = 0; i < strLen; i++)
{
*(cString+i) = (*(cString+i) ^ KEY);
}
}
Here is part of my main:
ifstream inFile;
inFile.open("ExpTable.bin");
if (!inFile) {
cout << "Unable to open file";
}
string data;
while (inFile >> data) {
xor(data);
cout << data << endl;
}
inFile.close();
Here is a part of the encypted file:
$y{bq //0 move
%c|{ //1 who
&c|qfq //2 where
'saufp //3 guard
x{wu`}{z //4 location
But x{wu}{z` is returning //location. Its not displaying the 4.
Note the space infront of the X. thats supposed to be decoded to 4.
What am I missing? Why is it not showing all the 4? <space> = 4 // 4 = <space>
UPDATE
This is the list of all the specific conversions:
HEX(enc) ASCII(dec)
20 4
21 5
22 6
23 7
24 0
25 1
26 2
27 3
28 <
29 =
2a >
2b ?
2c 8
2d 9
2e :
2f ;
30 $
31 %
32 &
33 '
34
35 !
36 "
37 #
38 ,
39 -
3a .
3b /
3c (
3d )
3e *
3f +
40 T
41 U
42 V
43 W
44 P
45 Q
46 R
47 S
48 \
49 ]
4a ^
4b _
4c X
4d Y
4e Z
4f [
50 D
51 E
52 F
53 G
54 #
55 A
56 B
57 C
58 L
59 M
5a N
5b O
5c H
5d I
5e J
5f K
60 t
61 u
62 v
63 w
64 p
65 q
66 r
67 s
68 |
69 }
6a
6b
6c x
6d y
6e z
6f {
70 d
71 e
72 f
73 g
75 a
76 b
77 c
78 l
79 m
7a n
7b o
7c h
7d i
7e j
7f k
1d /tab
1e /newline

Get rid of all casts.
Don't use >> for input.
That should fix your problems.
Edit:
// got bored, wrote some (untested) code
ifstream inFile;
inFile.open("ExpTable.bin", in | binary);
if (!inFile) {
cerr << "Unable to open ExpTable.bin: " << strerror(errno) << "\n";
exit(EXIT_FAILURE);
}
char c;
while (inFile.get(c)) {
cout.put(c ^ '\x14');
}
inFile.close();

Are you sure that it is printing '//location'? I think it would print '// location' -- note the space after the double-slash. You are XORing 0x34 with 0x14. The result is 0x20, which is a space character. Why would you want to xor everything with 0x14 anyway?
** edit ** ignore the above; I missed part of your question. The real answer:
Are you entirely sure that the character before the x is a 0x20? Perhaps it's some unprintable character that looks like a space? I would check the hex value.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Convert special characters(UTF-8) - c++

You have to set your terminal character encoding to ISO8859-1. In my Linux's Mate Terminal it is: Terminal >> Set Character Encoding >> Central European (WINDOWS-1250) Then I was able to see correct output from your program. Without this change it was indeed a garbage.

Related

Strange problem with decryption of AES using OpenSSL, Gets padded with same looking junk, Base64 padding issue

Remove Control Characters out of a string

Write raw memory to file

How can I set a strings value from hex?

c++ XOR string key hex

Categories

Resources