Outputting ASCII in Hex form? - c++

I'm trying to implement AES for a school project. My goal is to output the encrypted text to both the screen and a .txt file. The encryption goes totally as expected, and I can verify this by looking at this:
for (int j = 0; j<object.words * 4; j++)
{
printf("%02x ", Encryptor.out[j]);
}
The text it is encrypting is "im so glad this works", with the 128-bit key 'dog', and this loop prints the first 16 characters of the encryption, which reads:
c8 88 45 0d 5d 40 ff 5b a4 55 91 c9 c4 00 f5 a4
I've verified that this is what AES should print in this context. Later, I have the following lines of output:
this is Encryptor.out[0] in cout: ╚
This is Encryptor.out[0] in printf with the format code '%02x': c8
Press any key to continue . . .
My cout call probably just needs a formatting code, so I'm not concerned about that. The complication is at this point:
ofstream OutFile("Encrypted.txt");
Outfile << Encryptor.out[0];
At this point, the only thing contained within Encrypted.txt is the single character 'È'. I know that c8 in hex is 'È' in ASCII, but I want it to print the original hex value.
So ultimately, my question is, how do I get this character to be saved in my output file as 'c8'? Is there a formatting code that ofstream can use, or do I have to jump through some hoops?
Thanks guys!

Like #stark commented, to print data in hex you can use std::hex which modifies the way your data is formatted. However, std::hex only changes the way that numbers are printed so you need to tell the compiler to treat your text as numbers. Fortunately there's an easy way to do this. You can use
ofstream OutFile("Encrypted.txt");
OutFile << std::hex;
for (const char c : Encryptor.out[0])
{
OutFile << static_cast<int>(c);
}
// Reset back to normal printing
OutFile << std::dec;
and you will get the correct hex value and not the accented character E.
Check out std::hex here http://en.cppreference.com/w/cpp/io/manip/hex

Related

C++: Convert String into Hex with support for diacritics

Diacritic Wikipedia
I've build an EEPROM (Kind of like a very low tech USB stick) Programmer and I'm writing a program witch reads text from a txt file and then converts it into bin/hex that the programmer can use to program the data onto the EEPROM. I've got everything except for a function that converts the string into hex. I've tried using this code which works somewhat well.
string Text = "This is a string.";
for(int i = 0; i < Text.size(); i++) {
cout << uppercase << hex << (int)Text[i] << " ";
}
This will out put this:
54 68 69 73 20 69 73 20 61 20 73 74 72 69 6E 67 2E
But when giving it:
Thïs ìs â stríng.
It wil retun this:
54 68 FFFFFFC3 FFFFFFAF 73 20 FFFFFFC3 FFFFFFAC 73 20 FFFFFFC3 FFFFFFA2 20 73 74 72 FFFFFFC3 FFFFFFAD 6E 67 2E
This doesn't look right to me. My best guess is that normal char are converted to ASCII and the special ones get converted in some form of Unicode.
Is there a way to make everything Unicode?
Side note The EEPROM can only hold 2k bytes so the more space efficient the better.
So my end goal is:
Make a function that turns a string into its hex equivalent.
With the end result being space efficient and supporting diacritics.
Make another function that could read the hex and turn it into a string, also with support for diacritics.
If that is not possible I'm willing to use a custom formatting that would store an 'ê' like "|e^" for example. With an equivalent of "|" as a way for me to intercept a special character.
Thanks for your help!
a double cast is needed here: first cast the character to (unsigned char), then cast to (int):
(int)(unsigned char)Text[i]
this is necessary because casting as (unsigned int) does not work as you might expect. the signed char value is first widened, then the cast is applied, but at that point, the sign extension has already been performed.
see this on https://godbolt.org/z/GhYz3T8v6

Why do Chinese characters turn into gibberish after it runs through compiler?

So I am writing a program to turn a Chinese-English definition .txt file into a vocab trainer that runs through the CLI. However, in windows when I try to compile this in VS2017 it turns into gibberish and I'm not sure why. I think it was working OK in linux but windows seems to mess it up quite a bit. Does this have something to do with the encoding table in windows? Am I missing something? I wrote the code in Linux as well as the input file, but I tried writing the characters using windows IME and still has the same result. I think the picture speaks best for itself. Thanks
Note: Added sample of input/output as it appears in Windows, as requested. Also, input is UTF-8.
Sample of input
人(rén),person
刀(dāo),knife
力(lì),power
又(yòu),right hand; again
口(kǒu),mouth
Sample of output
人(rén),person
刀(dāo),knife
力(lì),power
又(yòu),right hand; again
口(kǒu),mouth
土(tǔ),earth
Picture of Input file & Output
TL;DR: The Windows terminal hates Unicode. You can work around it, but it's not pretty.
Your issues here are unrelated to "char versus wchar_t". In fact, there's nothing wrong with your program! The problems only arise when the text leaves through cout and arrives at the terminal.
You're probably used to thinking of a char as a "character"; this is a common (but understandable) misconception. In C/C++, the char type is usually synonymous with an 8-bit integer, and thus is more accurately described as a byte.
Your text file chineseVocab.txt is encoded as UTF-8. When you read this file via fstream, what you get is a string of UTF-8-encoded bytes.
There is no such thing as a "character" in I/O; you're always transmitting bytes in a particular encoding. In your example, you are reading UTF-8-encoded bytes from a file handle (fin).
Try running this, and you should see identical results on both platforms (Windows and Linux):
int main()
{
fstream fin("chineseVocab.txt");
string line;
while (getline(fin, line))
{
cout << "Number of bytes in the line: " << dec << line.length() << endl;
cout << " ";
for (char c : line)
{
// Here we need to trick the compiler into displaying this "char" as an integer:
unsigned int byte = (unsigned char)c;
cout << hex << byte << " ";
}
cout << endl;
cout << endl;
}
return 0;
}
Here's what I see in mine (Windows):
Number of bytes in the line: 16
e4 ba ba 28 72 c3 a9 6e 29 2c 70 65 72 73 6f 6e
Number of bytes in the line: 15
e5 88 80 28 64 c4 81 6f 29 2c 6b 6e 69 66 65
Number of bytes in the line: 14
e5 8a 9b 28 6c c3 ac 29 2c 70 6f 77 65 72
Number of bytes in the line: 27
e5 8f 88 28 79 c3 b2 75 29 2c 72 69 67 68 74 20 68 61 6e 64 3b 20 61 67 61 69 6e
Number of bytes in the line: 15
e5 8f a3 28 6b c7 92 75 29 2c 6d 6f 75 74 68
So far, so good.
The problem starts now: you want to write those same UTF-8-encoded bytes to another file handle (cout).
The cout file handle is connected to your CLI (the "terminal", the "console", the "shell", whatever you wanna call it). The CLI reads bytes from cout and decodes them into characters so they can be displayed.
Linux terminals are usually configured to use a UTF-8 decoder. Good news! Your bytes are UTF-8-encoded, so your Linux terminal's decoder matches the text file's encoding. That's why everything looks good in the terminal.
Windows terminals, on the other hand, are usually configured to use a system-dependent decoder (yours appears to be DOS codepage 437). Bad news! Your bytes are UTF-8-encoded, so your Windows terminal's decoder does not match the text file's encoding. That's why everything looks garbled in the terminal.
OK, so how do you solve this? Unfortunately, I couldn't find any portable way to do it... You will need to fork your program into a Linux version and a Windows version. In the Windows version:
Convert your UTF-8 bytes into UTF-16 code units.
Set standard output to UTF-16 mode.
Write to wcout instead of cout
Tell your users to change their terminals to a font that supports Chinese characters.
Here's the code:
#include <fstream>
#include <iostream>
#include <string>
#include <windows.h>
#include <fcntl.h>
#include <io.h>
#include <stdio.h>
using namespace std;
// Based on this article:
// https://msdn.microsoft.com/magazine/mt763237?f=255&MSPPError=-2147217396
wstring utf16FromUtf8(const string & utf8)
{
std::wstring utf16;
// Empty input --> empty output
if (utf8.length() == 0)
return utf16;
// Reject the string if its bytes do not constitute valid UTF-8
constexpr DWORD kFlags = MB_ERR_INVALID_CHARS;
// Compute how many 16-bit code units are needed to store this string:
const int nCodeUnits = ::MultiByteToWideChar(
CP_UTF8, // Source string is in UTF-8
kFlags, // Conversion flags
utf8.data(), // Source UTF-8 string pointer
utf8.length(), // Length of the source UTF-8 string, in bytes
nullptr, // Unused - no conversion done in this step
0 // Request size of destination buffer, in wchar_ts
);
// Invalid UTF-8 detected? Return empty string:
if (!nCodeUnits)
return utf16;
// Allocate space for the UTF-16 code units:
utf16.resize(nCodeUnits);
// Convert from UTF-8 to UTF-16
int result = ::MultiByteToWideChar(
CP_UTF8, // Source string is in UTF-8
kFlags, // Conversion flags
utf8.data(), // Source UTF-8 string pointer
utf8.length(), // Length of source UTF-8 string, in bytes
&utf16[0], // Pointer to destination buffer
nCodeUnits // Size of destination buffer, in code units
);
return utf16;
}
int main()
{
// Based on this article:
// https://blogs.msmvps.com/gdicanio/2017/08/22/printing-utf-8-text-to-the-windows-console/
_setmode(_fileno(stdout), _O_U16TEXT);
fstream fin("chineseVocab.txt");
string line;
while (getline(fin, line))
wcout << utf16FromUtf8(line) << endl;
return 0;
}
In my terminal, it mostly looks OK after I change the font to MS Gothic:
Some characters are still messed up, but that's due to the font not supporting them.

Use of backspace with endl and \n in C++

I have written a small C++ program to understand the use of \b. The program is given below -
#include <iostream>
using namespace std;
int main(){
cout << "Hello World!" << "\b";
return 0;
}
So, this program gives the desired output Hello World.This should not happen because backspace only moves cursor one space back and not delete it from the buffer.So,why ! is not printed?
Now,Consider another program-
#include <iostream>
using namespace std;
int main(){
cout << "Hello World!" << "\b";
cout << "\nAnother Line\n";
return 0;
}
So, here the output is -
Hello World!
Another Line
Why does the backspace does not work here? Newline should not flush the buffer,so ! should be deleted.What is the issue here?
Also,when i add either endl or \n after \b,in both the cases,the output is Hello World!.But,newline character does not flush the buffer whereas endl flushes the buffer.So, how the output is same in both the cases?
I assume the output from your first program looks something like this?
$ ./hello
Hello World$
If so, the ! is not deleted from the buffer; it is clobbered when the shell prints the prompt.
With regard to the second program, when the buffer is flushed only influences when \b is sent to the terminal, not how it is processed. The \b is a part of the stream and a terminal happens to interpret this to mean "back up one column". If this is not clear, take a look at the actual bytes sent to stdout:
$ ./hello2 | hexdump -C
00000000 48 65 6c 6c 6f 20 57 6f 72 6c 64 21 08 0a 41 6e |Hello World!..An|
00000010 6f 74 68 65 72 20 4c 69 6e 65 0a |other Line.|
0000001b
The \b is followed by the \n (08 and 0a respectively), matching what you wrote to cout in your program.
Finally, cout is flushed when the program exits so it does not matter whether you pass \n or endl in this example. In fact, \n will likely flush anyway since stdout is connected to a terminal.

C++ turning a .txt with Hexa characters to a .txt with hexa characters with \x in front

I would like to ask if there is a method in C++ to turning a .txt file with hexa digits for example
0E 1F BA 0E 00 B4 09 CD 21 B8 01 4C CD 21 54 68 69 73 20 70 72
to a new .txt with that looking
\x0E\x1F\xBA\x0E\x00\xB4\x09\xCD\x21\xB8\x01\x4C\xCD\x21\x54\x68\x69\x73\x20\x70\x72"
I searched the answer in google but found nothing and tried a script in C++ but does not work with error message "24 11 \x used with no following hex digits"
#include <iostream>
#include <fstream>
#include<vector>
using namespace std;
int main()
{
string hexaEnter;
ifstream read;
ofstream write;
write.open ("newhexa.txt",std::ios_base::app);
read.open("hexa.txt");
while (!read.eof() )
{
read >> hexaEnter;
write << "\x" + hexaEnter;
}
write.close();
read.close();
system("pause");
return 1;
}
write << "\x" + hexaEnter;
// ^^
Here, C++ sees the beginning of a hex escape sequence, like \x0E or \x1F, but it can't find the actual hex values because you didn't provide any.
That's because what you intended to do was literally write the character \ and the character x, so escape the backslash to make that happen:
write << "\\x" + hexaEnter;
// ^^^
As an aside, your loop condition is wrong.

Loop through hex variable in C

I have the following code in a project that write's the ascii representation of packet to a unix tty:
int written = 0;
int start_of_data = 3;
//write data to fifo
while (length) {
if ((written = write(fifo_fd, &packet[start_of_data], length)) == -1)
{
printf("Error writing to FIFO\n");
} else {
length -= written;
}
}
I just want to take the data that would have been written to the socket and put it in a variable. to debug, I have just been trying to printf the first letter/digit. I have tried numerous ways to get it to print out, but I keep getting hex forms (I think).
The expected output is: 13176
and the hex value is: 31 33 31 37 36 0D 0A (if that is even hex)
Obviously my C skills are not the sharpest tools in the shed. Any help would be appreciated.
update: I am using hexdump() to get the output
These are the ASCII codes of characters: 31 is '1', 33 is '3' etc. 0D and 0A are the terminating new line characters, also known as '\r' and '\n', respectively. So if you convert the values to characters, you can print them out directly, e.g. with printf using the %c or %s format codes. As you can check from the table linked, the values you posted do represent "13176" :-)