How to convert a string of hex values to a string? - c++

Say I have a string like:
string hex = "48656c6c6f";
Where every two characters correspond to the hex representation of their ASCII, value, eg:
0x48 0x65 0x6c 0x6c 0x6f = "Hello"
So how can I get "hello" from "48656c6c6f" without having to create a lookup ASCII table? atoi() obviously won't work here.

int len = hex.length();
std::string newString;
for(int i=0; i< len; i+=2)
{
std::string byte = hex.substr(i,2);
char chr = (char) (int)strtol(byte.c_str(), null, 16);
newString.push_back(chr);
}

Hex digits are very easy to convert to binary:
// C++98 guarantees that '0', '1', ... '9' are consecutive.
// It only guarantees that 'a' ... 'f' and 'A' ... 'F' are
// in increasing order, but the only two alternative encodings
// of the basic source character set that are still used by
// anyone today (ASCII and EBCDIC) make them consecutive.
unsigned char hexval(unsigned char c)
{
if ('0' <= c && c <= '9')
return c - '0';
else if ('a' <= c && c <= 'f')
return c - 'a' + 10;
else if ('A' <= c && c <= 'F')
return c - 'A' + 10;
else abort();
}
So to do the whole string looks something like this:
void hex2ascii(const string& in, string& out)
{
out.clear();
out.reserve(in.length() / 2);
for (string::const_iterator p = in.begin(); p != in.end(); p++)
{
unsigned char c = hexval(*p);
p++;
if (p == in.end()) break; // incomplete last digit - should report error
c = (c << 4) + hexval(*p); // + takes precedence over <<
out.push_back(c);
}
}
You might reasonably ask why one would do it this way when there's strtol, and using it is significantly less code (as in James Curran's answer). Well, that approach is a full decimal order of magnitude slower, because it copies each two-byte chunk (possibly allocating heap memory to do so) and then invokes a general text-to-number conversion routine that cannot be written as efficiently as the specialized code above. Christian's approach (using istringstream) is five times slower than that. Here's a benchmark plot - you can tell the difference even with a tiny block of data to decode, and it becomes blatant as the differences get larger. (Note that both axes are on a log scale.)
Is this premature optimization? Hell no. This is the kind of operation that gets shoved in a library routine, forgotten about, and then called thousands of times a second. It needs to scream. I worked on a project a few years back that made very heavy use of SHA1 checksums internally -- we got 10-20% speedups on common operations by storing them as raw bytes instead of hex, converting only when we had to show them to the user -- and that was with conversion functions that had already been tuned to death. One might honestly prefer brevity to performance here, depending on what the larger task is, but if so, why on earth are you coding in C++?
Also, from a pedagogical perspective, I think it's useful to show hand-coded examples for this kind of problem; it reveals more about what the computer has to do.

std::string str("48656c6c6f");
std::string res;
res.reserve(str.size() / 2);
for (int i = 0; i < str.size(); i += 2)
{
std::istringstream iss(str.substr(i, 2));
int temp;
iss >> std::hex >> temp;
res += static_cast<char>(temp);
}
std::cout << res;

strtol should do the job if you add 0x to each hex digit pair.

Related

How to replace a char in string with another char fast(I think test didn't want common way)

I was asked this question in tech test.
They asked how to change ' ' to '_' in string.
I think they didn't want common answer. like this (I can assure this)
void replaceChar(char originalStr[], size_t strLength, char originalChar, char newChar
{
for(size_t i = 0 ; i < strLength ; i++)
{
if(originalStr[i] == originalChar)
{
originalStr[i] = newChar ;
}
}
}
So I answered like this. Use WORD. ( Actually I didn't write code, They want just explaining how to do)
I think comparing Each 8 byte(64bit OS) of string with mask 8 byte.
if They eqaul, replace 8byte in a time.
When Cpu read data with size less than WORD , Cpu should do operation clearing rest bits.
It's slow. So I tried to use WORD in comparing chars.
void replaceChar(char originalStr[], size_t strLength, char originalChar, char newChar //
{
size_t mask = 0;
size_t replaced = 0;
for(size_t i = 0 ; i < sizeof(size_t) ; i++)
{
mask |= originalChar << i;
replaced |= newChar << i;
}
for(size_t i = 0 ; i < strLength ; i++)
{
// if 8 byte data equal with 8 byte data filled with originalChar
// replace 8 byte data with 8 byte data filled with newChar
if(i % sizeof(size_t) == 0 &&
strLength - i > sizeof(size_t) &&
*(size_t*)(originalStr + i) == mask)
{
*(size_t*)(originalStr + i) = replaced;
i += sizeof(size_t);
continue;
}
if(originalStr[i] == originalChar)
{
originalStr[i] = newChar ;
}
}
}
Is There any faster way??
Do not try to optimize a code when you do not know what is the bottleneck of the code. Try to write a clear readable code.
This function declaration and definition
void replaceChar(char originalStr[], size_t strLength, char originalChar, char newChar
{
for(size_t i = 0 ; i < strLength ; i++)
{
if(originalStr[i] == originalChar)
{
originalStr[i] = newChar ;
}
}
}
does not make a sense because it duplicates the behavior of the standard algorithm std::replace.
Moreover for such a simple basic general-purpose function you are using too long identifier names.
If you need to write a similar function specially for C-strings then it can look for example the following way as it is shown in the demonstrative program below
#include <iostream>
#include <cstring>
char * replaceChar( char s[], char from, char to )
{
for ( char *p = s; ( p = strchr( p, from ) ) != nullptr; ++p )
{
*p = to;
}
return s;
}
int main()
{
char s[] = "Hello C strings!";
std::cout << replaceChar( s, ' ', '_' ) << '\n';
return 0;
}
The program output is
Hello_C_strings!
As for your second function then it is unreadable. Using the continue statement in a body of for loop makes it difficult to follow its logic.
As a character array is not necessary aligned by the value of size_t then the function is not as fast as you think.
If you need a very optimized function then you should write it directly in assembler.
The first thing in the road to being fast is being correct. The problem with the original proposal is that sizeof(s) should be a cached value of strlen(s). Then the obvious problem is that this approach scans the string twice -- first to find the terminating character and then the character to be replaced.
This should be addressed by a data structure with known length, or data structure, with enough guaranteed excess data so that multiple bytes can be processed at once without Undefined Behaviour.
Once this is solved (the OP has been edited to fix this) the problem with the proposed approach of scanning 8 bytes worth of data for ALL the bytes being the same is that a generic case does have 8 successive characters, but maybe only 7. In all those cases one would need to scan the same area twice (on top of scanning the string terminating character).
If the string length is not known, the best thing is to use a low level method:
while (*ptr != 0) {
if (*ptr == search_char) {
*ptr = replace_char;
}
++ptr;
}
If the string length is known, it's best to use a library method std::replace, or it's low level counterpart
for (auto i = 0; i < size; ++i) {
if (str[i] == search_char) {
str[i] = replace_char;
}
}
Any decent compiler is able to autovectorize this, although the compiler might generate a larger variety of kernels than intended (one kernel for small sizes, one for intermediate and one to process in chunks of 32 or 64 bytes).

Checking the size of hashes in C++

As one would do with a blockchain, I want to check if a hash satisfies a size requirement. This is fairly easy in Python, but I am having some difficulty implementing the same system in C++. To be clear about what I am after, this first example is the python implementation:
difficulty = 25
hash = "0000004fbbc4261dc666d31d4718566b7e11770c2414e1b48c9e37e380e8e0f0"
print(int(hash, 16) < 2 ** (256 - difficulty))
The main problem I'm having is with these numbers - it is difficult to deal with such large numbers in C++ (2 ** 256, for example). This is solved with the boost/multiprecision library:
boost::multiprecision::cpp_int x = boost::multiprecision::pow(2, 256)
However, I cannot seem to find a way to convert my hash into a numeric value for comparison. Here is a generic example of what I am trying to do:
int main() {
string hash = "0000004fbbc4261dc666d31d4718566b7e11770c2414e1b48c9e37e380e8e0f0";
double difficulty = 256 - 25;
cpp_int requirement = boost::multiprecision::pow(2, difficulty);
// Something to convert hash into a number for comparison (converted_hash)
if (converted_hash < requirement) {
cout << "True" << endl;
}
return 1;
}
The hash is either being received from my web server or from a local python script, in which case the hash is read into the C++ program via fstream. Either way, it will be a string upon arrival.
Since I am already integrating python into this project, I am not entirely opposed to simply using the Python version of this algorithm; however, sometimes taking the easier path prevents you from learning, so unless this is a really cumbersome task, I would like to try to accomplish it in C++.
Your basic need is to compute how many zero bits exist before the first non-zero bit. This has nothing to do with multi-precision really, it can be reformulated into a simple counting problem:
// takes hexadecimal ASCII [0-9a-fA-F]
inline int count_zeros(char ch) {
if (ch < '1') return 4;
if (ch < '2') return 3;
if (ch < '4') return 2;
if (ch < '8') return 1;
return 0; // see ASCII table, [a-zA-Z] are all greater than '8'
}
int count_zeros(const std::string& hash) {
int sum = 0;
for (char ch : hash) {
int zeros = count_zeros(ch);
sum += zeros;
if (zeros < 4)
break;
}
return sum;
}
A fun optimization is to realize there are two termination conditions for the loop, and we can fold them together if we check for characters less than '0' which includes the null terminator and also will stop on any invalid input:
// takes hexadecimal [0-9a-fA-F]
inline int count_zeros(char ch) {
if (ch < '0') return 0; // change 1
if (ch < '1') return 4;
if (ch < '2') return 3;
if (ch < '4') return 2;
if (ch < '8') return 1;
return 0; // see ASCII table, [a-zA-Z] are all greater than '8'
}
int count_zeros(const std::string& hash) {
int sum = 0;
for (const char* it = hash.c_str(); ; ++it) { // change 2
int zeros = count_zeros(*it);
sum += zeros;
if (zeros < 4)
break;
}
return sum;
}
This produces smaller code when compiled with g++ -Os.

Comparing a char

So, I am trying to figure out the best/simplest way to do this. For my algorithms class we are supposed read in a string (containing up to 40 characters) from a file and use the first character of the string (data[1]...we are starting the array at 1 and wanting to use data[0] as something else later) as the number of rotations(up to 26) to rotate letters that follow (it's a Caesar cipher, basically).
An example of what we are trying to do is read in from a file something like : 2ABCD and output CDEF.
I've definitely made attempts, but I am just not sure how to compare the first letter in the array char[] to see which number, up to 26, it is. This is how I had it implemented (not the entire code, just the part that I'm having issues with):
int rotation = 0;
char data[41];
for(int i = 0; i < 41; i++)
{
data[i] = 0;
}
int j = 0;
while(!infile.eof())
{
infile >> data[j+1];
j++;
}
for(int i = 1; i < 27; i++)
{
if( i == data[1])
{
rotation = i;
cout << rotation;
}
}
My output is always 0 for rotation.
I'm sure the problem lies in the fact that I am trying to compare a char to a number and will probably have to convert to ascii? But I just wanted to ask and see if there was a better approach and get some pointers in the right direction, as I am pretty new to C++ syntax.
Thanks, as always.
Instead of formatted input, use unformatted input. Use
data[j+1] = infile.get();
instead of
infile >> data[j+1];
Also, the comparison of i to data[1] needs to be different.
for(int i = 1; i < 27; i++)
{
if( i == data[1]-'0')
// ^^^ need this to get the number 2 from the character '2'.
{
rotation = i;
std::cout << "Rotation: " << rotation << std::endl;
}
}
You can do this using modulo math, since characters can be treated as numbers.
Let's assume only uppercase letters (which makes the concept easier to understand).
Given:
static const char letters[] = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
const std::string original_text = "MY DOG EATS HOMEWORK";
std::string encrypted_text;
The loop:
for (unsigned int i = 0; i < original_text.size(); ++i)
{
Let's convert the character in the string to a number:
char c = original_text[i];
unsigned int cypher_index = c - 'A';
The cypher_index now contains the alphabetic offset of the letter, e.g. 'A' has index of 0.
Next, we rotate the cypher_index by adding an offset and using modulo arithmetic to "circle around":
cypher_index += (rotation_character - 'A'); // Add in the offset.
cypher_index = cypher_index % sizeof(letters); // Wrap around.
Finally, the new, shifted, letter is created by looking up in the letters array and append to the encrypted string:
encrypted_text += letters[cypher_index];
} // End of for loop.
The modulo operation, using the % operator, is great for when a "wrap around" of indices is needed.
With some more arithmetic and arrays, the process can be expanded to handle all letters and also some symbols.
First of all you have to cast the data chars to int before comparing them, just put (int) before the element of the char array and you will be okay.
Second, keep in mind that the ASCII table doesn't start with letters. There are some funny symbols up until 60-so element. So when you make i to be equal to data[1] you are practically giving it a number way higher than 27 so the loop stops.
The ASCII integer value of uppercase letters ranges from 65 to 90. In C and its descendents, you can just use 'A' through 'Z' in your for loop:
change
for(int i = 1; i < 27; i++)
to
for(int i = 'A'; i <= 'Z'; i++)
and you'll be comparing uppercase values. The statement
cout << rotation;
will print the ASCII values read from infile.
How much of the standard library are you permitted to use? Something like this would likely work better:
#include <iostream>
#include <string>
#include <sstream>
int main()
{
int rotation = 0;
std::string data;
std::stringstream ss( "2ABCD" );
ss >> rotation;
ss >> data;
for ( int i = 0; i < data.length(); i++ ) {
data[i] += rotation;
}
// C++11
// for ( auto& c : data ) {
// c += rotation;
// }
std::cout << data;
}
Live demo
I used a stringstream instead of a file stream for this example, so just replace ss with your infile. Also note that I didn't handle the wrap-around case (i.e., Z += 1 isn't going to give you A; you'll need to do some extra handling here), because I wanted to leave that to you :)
The reason your rotation is always 0 is because i is never == data[1]. ASCII character digits do not have the same underlying numeric value as their integer representations. For example, if data[1] is '5', it's integer value is actually 49. Hint: you'll need to know these values when handle the wrap-around case. Do a quick google for "ANSI character set" and you'll see all the different values.
Your determination of the rotation is also flawed in that you're only checking data[1]. What happens if you have a two-digit number, like 10?

Having an issue with character displacemet

I'm reading a file with a line of text. I'm reading the file and changing the characters based on a displacement given by the user. While it works for some characters, it doesn't for others beyond a certain point.
My file contains this text: "This is crazy".
When I run my code with a displacement of 20, this is what I get:
▒bc▒ c▒ w▒u▒▒
string Security::EncWordUsingRot(int rotNum, string word)
{
rotNum = rotNum%26;
string encWord = word;
for (int i = 0; i < word.size(); i++)
{
char c = word[i];
c = tolower(c);
if ((c < 'a') || (c > 'z'))
encWord[i] = c;
else
{
c = (c + rotNum);
if (c > 'z')
c = (c - 26);
}
encWord[i] = c;
}
return encWord;
}
*EDIT**
I changed the commented sections to correct my error. I changed unsigned char c = word[i] back to char c = word[i]. I also added another two lines of code that took care of the value of c being lower than 'a'. I did this because I noticed an issue when I wanted to essentially return the encrypted statement to its original form.
string Security::EncWordUsingRot(int rotNum, string word)
{
rotNum = rotNum%26;
string encWord = word;
for (int i = 0; i < word.size(); i++)
{
char c = word[i]; //removed unsigned
c = tolower(c);
if ((c < 'a') || (c > 'z'))
encWord[i] = c;
else
{
c = (c + rotNum);
if (c > 'z')
c = (c - 26);
if (c < 'a') //but I added this if statement if the value of c is less than 'a'
c = (c + 26);
}
encWord[i] = c;
}
return encWord;
}
Change:
char c = word[i];
To:
unsigned char c = word[i];
In C and C++ you should always pay attention to numeric overflow because the language assumption is that a programmer will never make such a mistake.
A char is a kind of integer and is quite often 8 bits and signed, giving it an acceptable range of -128...127. This means that when you store a value in a char variable you should never exceed those bounds.
char is also a "storage type" meaning that computations are never done using chars and for example
char a = 'z'; // numeric value is 122 in ASCII
int x = a + 20; // 122 + 20 = 142
x will actually get the value 142 because the computation did not "overflow" (all char values are first converted to integers in an expression)
However storing a value bigger that the allowable range in a variable is undefined behaviour and code like
char a = 'z'; // numeric value is 122 in ASCII
char x = a + 20; // 122 + 20 = 142 (too big, won't fit)
is not acceptable: the computation is fine but the result doesn't fit into x.
Storing a value outside the valid range for signed chars in a signed char variable is exactly what your code did and that's the reason for the strange observed behaviour.
A simple solution is to use an integer to store the intermediate results instead of a char.
A few more notes:
A few functions about chars are indeed handling integers because they must be able to handle the special value EOF in addition to all valid chars. For example fgetc returns an int and isspace accepts an int (they return/accept either the code of the char converted to unsigned or EOF).
char could be signed or not depending on the compiler/options; if unsigned and 8-bit wide the allowable range is 0...255
Most often when storing a value outside bounds into a variable you simply get a "wrapping" behavior, but this is not guaranteed and doesn't always happen. For example a compiler is allowed to optimize (char(x + 20) < char(y + 20)) to (x < y) because the assumption is that the programmer will never ever overflow with signed numeric values.

Input C-style string and get the length

The string input format is like this
str1 str2
I DONT know the no. of characters to be inputted beforehand so need to store 2 strings and get their length.
Using the C-style strings ,tried to made use of the scanf library function but was actually unsuccessful in getting the length.This is what I have:
// M W are arrays of char with size 25000
while (T--)
{
memset(M,'0',25000);memset(W,'0',25000);
scanf("%s",M);
scanf("%s",W);
i = 0;m = 0;w = 0;
while (M[i] != '0')
{
++m; ++i; // incrementing till array reaches '0'
}
i = 0;
while (W[i] != '0')
{
++w; ++i;
}
cout << m << w;
}
Not efficient mainly because of the memset calls.
Note:
I'd be better off using std::string but then because of 25000 length input and memory constraints of cin I switched to this.If there is an efficient way to get a string then it'd be good
Aside from the answers already given, I think your code is slightly wrong:
memset(M,'0',25000);memset(W,'0',25000);
Do you really mean to fill the string with the character zero (value 48 or 0x30 [assuming ASCII before some pedant downvotes my answer and points out that there are other encodings]), or with a NUL (character of the value zero). The latter is 0, not '0'
scanf("%s",M);
scanf("%s",W);
i = 0;m = 0;w = 0;
while (M[i] != '0')
{
++m; ++i; // incrementing till array reaches '0'
}
If you are looking for the end of the string, you should be using 0, not '0' (as per above).
Of course, scanf will put a 0 a the end of the string for you, so there's no need to fill the whole string with 0 [or '0'].
And strlen is an existing function that will give the length of a C style string, and will most likely have a more clever algorithm than just checking each character and increment two variables, making it faster [for long strings at least].
You do not need memset when using scanf, scanf adds the terminating '\0' to string.
Also, strlen is more simple way to determine string's length:
scanf("%s %s", M, W); // provided that M and W contain enough space to store the string
m = strlen(M); // don't forget #include <string.h>
w = strlen(W);
C-style strlen without memset may looks like this:
#include <iostream>
using namespace std;
unsigned strlen(const char *str) {
const char *p = str;
unsigned len = 0;
while (*p != '\0') {
len++;
*p++;
}
return len;
}
int main() {
cout << strlen("C-style string");
return 0;
}
It's return 14.