Proper way to convert HEX to ASCII read from a file C++ - c++

In my code bellow CODE 1 reading HEX from a file and storing in in string array won't convert it to ASCII when printed out.
#include <iostream>
#include <sstream>
#include <fstream>
int main(int argc, char** argv)
{
// CODE 1
std::ifstream input("C:\\test.txt"); // The test.txt contains \x48\x83\xEC\x28\x48\x83
std::stringstream sstr;
input >> sstr.rdbuf();
std::string test = sstr.str();
std::cout << "\nString from file: " << test;
//char* lol = new char[test.size()];
//memcpy(lol, test.data(), test.size());
////////////////////////////////////////////////////////
// CODE 2
std::string test_2 = "\x48\x83\xEC\x28\x48\x83";
std::cout << "\n\nHardcoded string: " << test_2 << "\n";
// Prints as ASCII "H(H" , which I want my CODE 1 to do.
}
In my CODE 2 sample, same HEX is used and it prints it as ASCII. Why is it not the same for CODE 1?

Okay, it looks like there is some confusion. First, I have to ask if you're SURE you know what is in your file.
That is, does it contain, oh, it looks like about 20 characters:
\
x
4
8
et cetera?
Or does it contain a hex 48 (one byte), a hex 83 (one byte), for a total of 5-ish characters?
I bet it's the first. I bet your file is about 20 characters long and literally contains the string that's getting printed.
And if so, then the code is doing what you expect. It's reading a line of text and writing it back out. If you want it to actually interpret it like the compiler does, then you're going to have to do the steps yourself.
Now, if it actually contains the hex characters (but I bet it doesn't), then that's a little different problem, and we'll have to look at that. But I think you just have a string of characters that includes \x in it. And reading / writing that isn't going to automatically do some magic for you.

When you read from file, the backslash characters are not escaped. Your test string from file is literally an array of chars: {'\\', 'x', '4', '8', ... }
Whereas your hardcoded literal string, "\x48\x83\xEC\x28\x48\x83"; is fully hex escaped by the compiler.
If you really want to store your data as a text file as a series of "backslash x NN" sequences, you'll need to convert after you read from file. Here's a hacked up loop that would do it for you.
std::string test = sstr.str();
char temp[3] = {};
size_t t = 0;
std::string corrected;
for (char c : test)
{
if (isxdigit(c))
{
temp[t] = c;
t++;
if (t == 2)
{
t = 0;
unsigned char uc = (unsigned char)strtoul(tmp, nullptr, 16);
corrected += (char)uc;
}
}
}

You can split the returned string in \x then make casting from string to int,
finally casting to char.
this resource will be helpful
strtok And convert

Related

How to read a specific amount of characters

I can get the characters from console with this code:
Displays 2 characters each time in a new line
#include <iostream>
#include <fstream>
#include <string>
using namespace std;
int main()
{
char ch[3] = "";
ifstream file("example.txt");
while (file.read(ch, sizeof(ch)-1))
{
cout << ch << endl;
}
return 0;
}
My problem is, if the set of characters be odd it doesn't displays the last character in the text file!
my text file contains this:
abcdefg
it doesn't displays the letter g in the console
its displaying this:
ab
cd
ef
I wanna display like this:
ab
cd
ef
g
I wanna use this to read 1000 characters at a time for a large file so i don't wanna read character by character, It takes a lot of time, but it has a problem if u can fix it or have a better suggestion, share it with me
The following piece of code should work:
while (file) {
file.read(ch, sizeof(ch) - 1);
int number_read_chars = file.gcount();
// print chars here ...
}
By moving the read call into the loop, you'll be able to handle the last call, where too few characters are available. The gcount method will provide you with the information how many characters were actually read by the last unformatted input operation, e.g. read.
Please note, when reading less than sizeof(ch) chars, you manually have to insert a NUL character at the position returned by gcount, if you intend to use the buffer as a C string, as those are null terminated:
ch[file.gcount()] = '\0';

String handling with Nordic characters is difficult in C++

I have tried many ways to solve this problem. I just want to part a string or do stuff with each character. As soon as there are Nordic characters in the string, it's not possible to part that string.
The length() function returns the right answer if we look at memory use, but that's not the same as the string length. "ABCÆØÅ" does not have 6 as the length, is has 9. One extra for each special character.
Anybody with a good answer??
The test under here, shows the problem, some letters and a lot of ? marks. :-(
int main()
{
string name = "some æøå string";
for_each(name.begin(), name.end(), [] (char c) {
cout << c;
cout << endl;
});
}
If your terminal supports utf-8 encoding shouldn't be no problem in using the std::cout with the string you enter, but, you need to tell the compiler that you typed in an utf8 string, like this:
int main()
{
string name = u8"some æøå string";
for_each(name.begin(), name.end(), [] (char c) {
cout << c;
cout << endl;
});
cout<<name; //this will also work
return 0; //add this just to be tidy
}
you need to that because characters in UTF-8 might need 1,2,3 or 4 bytes depending on its face.
Then depending on what you need to do, for example split between characters, you should create a function to detect how long is each utf8 character. Then you create a 'string' for each utf8 character and extract as many characters as needed from the original string.
There is a very good library (very compact) utf8proc that let you do those such things.
utf8proc helped me in many projects for resolving these kind of issues.

Bit manipulation on character string

Can we apply bit manipulation on a character string?
If so, is it always possible to retrieve back a character string from the manipulated string?
I was hoping to use the XOR operator on two strings by converting them to binary and then back to character string.
I took up some code from another StackOverflow question but it only solves half the problem
std::string TextToBinaryString(string words)
{
string binaryString = "";
for (char& _char : words)
{
binaryString +=std::bitset<8>(_char).to_string();
}
return binaryString;
}
I don't know how to convert this string of ones and zeroes back to a string of characters.
I did read std::stio in some google search results as a solution but was not able to understand them.
The manipulation that I wish to do is
std::string message("Hello World");
int n = message.size();
bin_string = TextToBinaryString(message)
std::string left,right;
bin_string.copy(left,n/2,0);
bin_string.copy(right,n,n/2);
std::string result = left^right;
I know I can hardcode this by picking up every entry and applying the operation but it is the conversion of the binary string back to characters that are making me scratch my head.
*EDIT: *I am trying to implement a cipher framework called Feistel cipher (SORRY, should had made that clear before) there they use the property of XOR that when you XOR something with the same thing again it cancels out... For eg. (A^B)^B=A. I wanted to output the ciphered jibberish in the middle. Hence, the query.
Can we apply bit manipulation on a character string?
Yes.
A character is an integer type, so you can do anything to them you can do to any other integer. What happened when you tried?
If so, is it always possible to retrieve back a character string from the manipulated string?
No. It is sometimes possible to recover the original string, but some manipulations are not reversible.
XOR, the particular operation you asked about, is self-reversing, so it works in that case but not in general.
A cheesy example (depends on ASCII character set, don't do this in real code for converting case, etc. etc.)
#include <iostream>
#include <string>
int main() {
std::string s("a");
std::cout << "original: " << s << '\n';
s[0] ^= 0x20;
std::cout << "modified: " << s << '\n';
s[0] ^= 0x20;
std::cout << "restored: " << s << '\n';
}
shows (on an ASCII-compatible) system
original: a
modified: A
restored: a
Note that I'm not converting "a" into "1100001" first, and then using XOR (somehow) zero bit 5 giving "1000001" and then converting that back into "A". Why would I?
This part of your question suggests you don't understand the difference between values and representations: the character is always stored in binary. You can also always treat it as if it is stored in octal, or in decimal, or in hexadecimal - the choice of base only affects how we write (or print) the value, and not what the value is in itself.
Writing a Feistel cipher where the plaintext and key are the same length is trivial:
std::string feistel(std::string const &text, std::string const &key)
{
std::string result;
std::transform(text.begin(), text.end(), key.begin(),
std::back_inserter(result),
[](char a, char b) { return a^b; }
);
return result;
}
This doesn't work at all if the key is shorter, though - looping round the key appropriately is left as an exercise for the reader.
Oh, and printing the encoded string is unlikely to work nicely (unless your key is helpfully just a sequence of space characters, as above).
You probably want something like this:
#include<string>
#include<cassert>
using namespace std;
std::string someBitmanipulation(string words)
{
std::string manipulatedstring;
for (char& thechar : words)
{
thechar ^= 0x5A; // xor with 0x5A
}
return manipulatedstring;
}
int main()
{
std::string original{ "ABC" };
// xor each char of original with 0x5a at put result into manipulated
auto manipulated = someBitmanipulation(original);
// check if manipulating the manipulated string is the same as the original string
assert(original == someBitmanipulation(manipulated));
}
You don't need std::bitset at all.
Now change thechar ^= 0x5A; to say thechar |= 0x5A; and see what happens.

How could I copy data that contain '\0' character

I'm trying to copy data that conatin '\0'. I'm using C++ .
When the result of the research was negative, I decide to write my own fonction to copy data from one char* to another char*. But it doesn't return the wanted result !
My attempt is the following :
#include <iostream>
char* my_strcpy( char* arr_out, char* arr_in, int bloc )
{
char* pc= arr_out;
for(size_t i=0;i<bloc;++i)
{
*arr_out++ = *arr_in++ ;
}
*arr_out = '\0';
return pc;
}
int main()
{
char * out= new char[20];
my_strcpy(out,"12345aa\0aaaaa AA",20);
std::cout<<"output data: "<< out << std::endl;
std::cout<< "the length of my output data: " << strlen(out)<<std::endl;
system("pause");
return 0;
}
the result is here:
I don't understand what is wrong with my code.
Thank you for help in advance.
Your my_strcpy is working fine, when you write a char* to cout or calc it's length with strlen they stop at \0 as per C string behaviour. By the way, you can use memcpy to copy a block of char regardless of \0.
If you know the length of the 'string' then use memcpy. Strcpy will halt its copy when it meets a string terminator, the \0. Memcpy will not, it will copy the \0 and anything that follows.
(Note: For any readers who are unaware that \0 is a single-character byte with value zero in string literals in C and C++, not to be confused with the \\0 expression that results in a two-byte sequence of an actual backslash followed by an actual zero in the string... I will direct you to Dr. Rebmu's explanation of how to split a string in C for further misinformation.)
C++ strings can maintain their length independent of any embedded \0. They copy their contents based on this length. The only thing is that the default constructor, when initialized with a C-string and no length, will be guided by the null terminator as to what you wanted the length to be.
To override this, you can pass in a length explicitly. Make sure the length is accurate, though. You have 17 bytes of data, and 18 if you want the null terminator in the string literal to make it into your string as part of the data.
#include <iostream>
using namespace std;
int main() {
string str ("12345aa\0aaaaa AA", 18);
string str2 = str;
cout << str;
cout << str2;
return 0;
}
(Try not to hardcode such lengths if you can avoid it. Note that you didn't count it right, and when I corrected another answer here they got it wrong as well. It's error prone.)
On my terminal that outputs:
12345aaaaaaa AA
12345aaaaaaa AA
But note that what you're doing here is actually streaming a 0 byte to the stdout. I'm not sure how formalized the behavior of different terminal standards are for dealing with that. Things outside of the printable range can be used for all kinds of purposes depending on the kind of terminal you're running... positioning the cursor on the screen, changing the color, etc. I wouldn't write out strings with embedded zeros like that unless I knew what the semantics were going to be on the stream receiving them.
Consider that if what you're dealing with are bytes, not to confuse the issue and to use a std::vector<char> instead. Many libraries offer alternatives, such as Qt's QByteArray
Your function is fine (except that you should pass to it 17 instead of 20). If you need to output null characters, one way is to convert the data to std::string:
std::string outStr(out, out + 17);
std::cout<< "output data: "<< outStr << std::endl;
std::cout<< "the length of my output data: " << outStr.length() <<std::endl;
I don't understand what is wrong with my code.
my_strcpy(out,"12345aa\0aaaaa AA",20);
Your string contains character '\' which is interpreted as escape sequence. To prevent this you have to duplicate backslash:
my_strcpy(out,"12345aa\\0aaaaa AA",20);
Test
output data: 12345aa\0aaaaa AA
the length of my output data: 18
Your string is already terminated midway.
my_strcpy(out,"12345aa\0aaaaa AA",20);
Why do you intend to have \0 in between like that? Have some other delimiter if yo so desire
Otherwise, since std::cout and strlen interpret a \0 as a string terminator, you get surprises.
What I mean is that follow the convention i.e. '\0' as string terminator

C++ printf: newline (\n) from commandline argument

How print format string passed as argument ?
example.cpp:
#include <iostream>
int main(int ac, char* av[])
{
printf(av[1],"anything");
return 0;
}
try:
example.exe "print this\non newline"
output is:
print this\non newline
instead I want:
print this
on newline
No, do not do that! That is a very severe vulnerability. You should never accept format strings as input. If you would like to print a newline whenever you see a "\n", a better approach would be:
#include <iostream>
#include <cstdlib>
int main(int argc, char* argv[])
{
if ( argc != 2 ){
std::cerr << "Exactly one parameter required!" << std::endl;
return 1;
}
int idx = 0;
const char* str = argv[1];
while ( str[idx] != '\0' ){
if ( (str[idx]=='\\') && (str[idx+1]=='n') ){
std::cout << std::endl;
idx+=2;
}else{
std::cout << str[idx];
idx++;
}
}
return 0;
}
Or, if you are including the Boost C++ Libraries in your project, you can use the boost::replace_all function to replace instances of "\\n" with "\n", as suggested by Pukku.
At least if I understand correctly, you question is really about converting the "\n" escape sequence into a new-line character. That happens at compile time, so if (for example) you enter the "\n" on the command line, it gets printed out as "\n" instead of being converted to a new-line character.
I wrote some code years ago to convert escape sequences when you want it done. Please don't pass it as the first argument to printf though. If you want to print a string entered by the user, use fputs, or the "%s" conversion format:
int main(int argc, char **argv) {
if (argc > 1)
printf("%s", translate(argv[1]));
return 0;
}
You can't do that because \n and the like are parsed by the C compiler. In the generated code, the actual numerical value is written.
What this means is that your input string will have to actually contain the character value 13 (or 10 or both) to be considered a new line because the C functions do not know how to handle these special characters since the C compiler does it for them.
Alternatively you can just replace every instance of \\n with \n in your string before sending it to printf.
passing user arguments directly to printf causes a exploit called "String format attack"
See Wikipedia and Much more details
There's no way to automatically have the string contain a newline. You'll have to do some kind of string replace on your own before you use the parameter.
It is only the compiler that converts \n etc to the actual ASCII character when it finds that sequence in a string.
If you want to do it for a string that you get from somewhere, you need to manipulate the string directly and replace the string "\n" with a CR/LF etc. etc.
If you do that, don't forget that "\\" becomes '\' too.
Please never ever use char* buffers in C++, there is a nice std::string class that's safer and more elegant.
I know the answer but is this thread is active ?
btw
you can try
example.exe "print this$(echo -e "\n ")on newline".
I tried and executed
Regards,
Shahid nx