What's the best hash function for a hex string? - c++

I'm looking to encode a set of hexadecimal values stored in strings using a hash function. Since the hex ''alphabet'' is composed of only 16 letters, what would be the best hash algorithm with the least amount of collisions?

Bit of a too general question, as you left out any constraints on the hash function, and/or what you're going to do with the hashes. (On a side note, hashing isn't an encoding)
That being said, having an alphabet of 16 letters, you need 4 bit to store each (i.e. you could build a XOR sum over each two letters crammed into a single byte, to get an 8-bit hash.Of course, that can be extended to any other word length, too (but you left out too much information)
for instance like this:
uint8_t
hexhash(const char *str)
{
uint8_t res = 0;
while (*str && *(str+1)) {
res ^= (fromchar(*str) << 4) | fromchar(*(str+1));
str += 2; //EDIT: forgot this in my original reply
}
return res;
}
(where 'fromchar' is a function to return 0 for '0', 1 for '1', ..., 15 for 'f')

Related

Add a bit value to a string

I am trying to send a packet over a network and so want it to be as small as possible (in terms of size).
Each of the input can contain a common prefix substring, like ABCD. In such cases, I just wanna send a single bit say, 1 denoting that the current string has the same prefix ABCD and append it to the remaining string. So, if the string was ABCDEF, I will send 1EF; if it was LKMPS, I wish to send the string LKMPS as is.
Could someone please point out how I could add a bit to a string?
Edit: I get that adding a 1 to a string does not mean that this 1 is a bit - it is just a character that I added to the string. And that exactly is my question - for each string, how do I send a bit denoting that the prefix matches? And then send the remaining part of the string that is different?
In common networking hardware, you won't be able to send individual bits. And most architectures cannot address individual bits, either.
However, you can still minimize the size as you want by using one of the bits that you may not be using. For instance, if your strings contain only 7-bit ASCII characters, you could use the highest bit to encode the information you want in the first byte of the string.
For example, if the first byte is:
0b01000001 == 0x41 == 'A'
Then set the highest bit using |:
(0b01000001 | 0x80) == 0b11000001 == 0xC1
To test for the bit, use &:
(0b01000001 & 0x80) == 0
(0b11000001 & 0x80) != 0
To remove the bit (in the case where it was set) to get back the original first byte:
(0b11000001 & 0x7F) == 0b01000001 == 0x41 == 'A'
If you're working with a buffer for use in your communications protocol, it should generally not be an std::string. Standard strings are not intended for use as buffers; and they can't generally be prepended in-place with anything.
It's possible that you may be better served by an std::vector<std::byte>; or by a (compile-time-fixed-size) std::array. Or, again, a class of your own making. That is especially true if you want your "in-place" prepending of bits or characters to not merely keep the same span of memory for your buffer, but to actually not move any of the existing data. For that, you'd need twice the maximum length of the buffer, and start it at the middle, so you can either append or prepend data without shifting anything - while maintaining bit-resolution "pointers" to the effective start and end of the full part of the buffer. This is would be readily achievable with, yes you guessed it, your own custom buffer class.
I think the smallest amount of memory you can work with is 8 bits.
If you wanted to operate with bits, you could specify 8 prefixes as follows:
#include <iostream>
using namespace std;
enum message_header {
prefix_on = 1 << 0,
bitdata_1 = 1 << 1,
bitdata_2 = 1 << 2,
bitdata_3 = 1 << 3,
bitdata_4 = 1 << 4,
bitdata_5 = 1 << 5,
bitdata_6 = 1 << 6,
bitdata_7 = 1 << 7,
};
int main() {
uint8_t a(0);
a ^= prefix_1;
if(a & prefix_on) {
std::cout << "prefix_on" << std::endl;
}
}
That being said, networks pretty fast nowadays so I wouldn't do it.

How to convert a string say "test" to unsigned int in c++

I have to use an encryption algorithm which takes unsigned int as an input. For this I want to convert my password which is alpha numeric 8 character string to int.
I am using the below code and not sure if it works right. I want to convert my characters say "test" to unsigned integer.
I do get an output value. But I'm not sure if this is the right way of doing this and if there can be any side effects.
Can you please explain what actually is happening here?
unsigned int ConvertStringToUInt(CString Input)
{
unsigned int output;
output = ((unsigned int)Input[3] << 24);
output += ((unsigned int)Input[2] << 16);
output += ((unsigned int)Input[1] << 8);
output += ((unsigned int)Input[0]);
return output;
}
For an input of "ABCD" the output of ConvertStringToUInt will be 0x44434241 because:
0x41 is the ASCII code of 'A'
0x42 is the ASCII code of 'B'
0x43 is the ASCII code of 'C'
0x44 is the ASCII code of 'D'
<< being the shift left operator.
So we have:
0x44 << 24 = 0x44000000
0x43 << 16 = 0x00430000
0x42 << 8 = 0x00004200
output =
0x44000000
+ 0x00430000
+ 0x00004200
+ 0x00000041
============
0x44434241
Be aware that your ConvertStringToUInt function only works if the length of the provided string is exactly 4, so this function is useless for your case because the length of your password is 8.
You can't do a unique mapping of a 8 character alphanumeric string to a 32 bit integer.
(10 + 26 + 26) ^ 8 is 218,340,105,584,896 (digits + upper-case and lower-case letters)
(10 + 26) ^ 8 is 2,821,109,907,456 (digits + case-insensitive letters)
2 ^ 32 is 4,294,967,296 (a 32 bit unsigned int)
So if you need to convert your 8 characters into a 32 bit number, you will need to use hashing. And that means that multiple passwords will map to the same key.
Note that this NOT encryption because the mapping is not reversible. It cannot be reversible. This can be proven mathematically.
The Wikipedia page on hash functions is a good place to start learning about this. Also the page on the pigeonhole principle.
However, it should also be noted that 8 character passwords are too small to be secure. And if you are hashing to a 32 bit code, brute-force attacks will be easy.
What you are trying to do is to reinvent a hashing algorithm, very poorly. I strongly recommend to use SHA-256 or some equivalent hashing algorithm available by the libs of your system, which is best practice and usually sufficient to transmit and compare passwords.
You should start reading the basics on that issue before writing any more code, else the security level of your application won't be much better than no hashing/encryption at all, but with the false sense of being on the safe side. Start here, for instance.

Convert hex- bin- or decimal string to long long in C++

I have this code which handles Strings like "19485" or "10011010" or "AF294EC"...
long long toDecimalFromString(string value, Format format){
long long dec = 0;
for (int i = value.size() - 1; i >= 0; i--) {
char ch = value.at(i);
int val = int(ch);
if (ch >= '0' && ch <= '9') {
val = val - 48;
} else {
val = val - 55;
}
dec = dec + val * (long long)(pow((int) format, (value.size() - 1) - i));
}
return dec;
}
this code works for all values which are not in 2's complement.
If I pass a hex-string which is supposed to be a negativ number in decimal I don't get the right result.
If you don't handle the minus sign, it won't handle itself.
Check for it, and memorize the fact you've seen it. Then, at
the end, if you'd seen a '-' as the first character, negate
the results.
Other points:
You don't need (nor want) to use pow: it's just
results = format * results + digit each time through.
You do need to validate your input, making sure that the digit
you obtain is legal in the base (and that you don't have any
other odd characters).
You also need to check for overflow.
You should use isdigit and isalpha (or islower and
isupper) for you character checking.
You should use e.g. val -= '0' (and not 48) for your
conversion from character code to digit value.
You should use [i], and not at(i), to read the individual
characters. Compile with the usual development options, and
you'll get a crash, rather than an exception, in case of error.
But you should probably use iterators, and not an index, to go
through the string. It's far more idiomatic.
You should almost certainly accept both upper and lower case
for the alphas, and probably skip leading white space as well.
Technically, there's also no guarantee that the alphabetic
characters are in order and adjacent. In practice, I think you
can count on it for characters in the range 'A'-'F' (or
'a'-'f', but the surest way of converting character to digit
is to use table lookup.
You need to know whether the specified number is to be interpreted as signed or unsigned (in other words, is "ffffffff" -1 or 4294967295?).
If signed, then to detect a negative number test the most-significant bit. If ms bit is set, then after converting the number as you do (generating an unsigned value) take the 1's complement (bitwise negate it then add 1).
Note: to test the ms bit you can't just test the leading character. If the number is signed, is "ff" supposed to be -1 or 255?. You need to know the size of the expected result (if 32 bits and signed, then "ffffffff" is negative, or -1. But if 64 bits and signed, "ffffffff' is positive, or 4294967295). Thus there is more than one right answer for the example "ffffffff".
Instead of testing ms bit you could just test if unsigned result is greater than the "midway point" of the result range (for example 2^31 -1 for 32-bit numbers).

convert char[] of hexadecimal numbers to char[] of letters corresponding to the hexadecimal numbers in ascii table and reversing it

I have a char a[] of hexadecimal characters like this:
"315c4eeaa8b5f8aaf9174145bf43e1784b8fa00dc71d885a804e5ee9fa40b16349c146fb778cdf2d3aff021dfff5b403b510d0d0455468aeb98622b137dae857553ccd8883a7bc37520e06e515d22c954eba5025b8cc57ee59418ce7dc6bc41556bdb36bbca3e8774301fbcaa3b83b220809560987815f65286764703de0f3d524400a19b159610b11ef3e"
I want to convert it to letters corresponding to each hexadecimal number like this:
68656c6c6f = hello
and store it in char b[] and then do the reverse
I don't want a block of code please, I want explanation and what libraries was used and how to use it.
Thanks
Assuming you are talking about ASCII codes. Well, first step is to find the size of b. Assuming you have all characters by 2 hexadecimal digits (for example, a tab would be 09), then size of b is simply strlen(a) / 2 + 1.
That done, you need to go through letters of a, 2 by 2, convert them to their integer value and store it as a string. Written as a formula you have:
b[i] = (to_digit(a[2*i]) << 4) + to_digit(a[2*i+1]))
where to_digit(x) converts '0'-'9' to 0-9 and 'a'-'z' or 'A'-'Z' to 10-15.
Note that if characters below 0x10 are shown with only one character (the only one I can think of is tab, then instead of using 2*i as index to a, you should keep a next_index in your loop which is either added by 2, if a[next_index] < '8' or added by 1 otherwise. In the later case, b[i] = to_digit(a[next_index]).
The reverse of this operation is very similar. Each character b[i] is written as:
a[2*i] = to_char(b[i] >> 4)
a[2*i+1] = to_char(b[i] & 0xf)
where to_char is the opposite of to_digit.
Converting the hexadecimal string to a character string can be done by using std::substr to get the next two characters of the hex string, then using std::stoi to convert the substring to an integer. This can be casted to a character that is added to a std::string. The std::stoi function is C++11 only, and if you don't have it you can use e.g. std::strtol.
To do the opposite you loop over each character in the input string, cast it to an integer and put it in an std::ostringstream preceded by manipulators to have it presented as a two-digit, zero-prefixed hexadecimal number. Append to the output string.
Use std::string::c_str to get an old-style C char pointer if needed.
No external library, only using the C++ standard library.
Forward:
Read two hex chars from input.
Convert to int (0..255). (hint: sscanf is one way)
Append int to output char array
Repeat 1-3 until out of chars.
Null terminate the array
Reverse:
Read single char from array
Convert to 2 hexidecimal chars (hint: sprintf is one way).
Concat buffer from (2) to final output string buffer.
Repeat 1-3 until out of chars.
Almost forgot to mention. stdio.h and the regular C-runtime required only-assuming you're using sscanf and sprintf. You could alternatively create a a pair of conversion tables that would radically speed up the conversions.
Here's a simple piece of code to do the trick:
unsigned int hex_digit_value(char c)
{
if ('0' <= c && c <= '9') { return c - '0'; }
if ('a' <= c && c <= 'f') { return c + 10 - 'a'; }
if ('A' <= c && c <= 'F') { return c + 10 - 'A'; }
return -1;
}
std::string dehexify(std::string const & s)
{
std::string result(s.size() / 2);
for (std::size_t i = 0; i != s.size(); ++i)
{
result[i] = hex_digit_value(s[2 * i]) * 16
+ hex_digit_value(s[2 * i + 1]);
}
return result;
}
Usage:
char const a[] = "12AB";
std::string s = dehexify(a);
Notes:
A proper implementation would add checks that the input string length is even and that each digit is in fact a valid hex numeral.
Dehexifying has nothing to do with ASCII. It just turns any hexified sequence of nibbles into a sequence of bytes. I just use std::string as a convenient "container of bytes", which is exactly what it is.
There are dozens of answers on SO showing you how to go the other way; just search for "hexify".
Each hexadecimal digit corresponds to 4 bits, because 4 bits has 16 possible bit patterns (and there are 16 possible hex digits, each standing for a unique 4-bit pattern).
So, two hexadecimal digits correspond to 8 bits.
And on most computers nowadays (some Texas Instruments digital signal processors are an exception) a C++ char is 8 bits.
This means that each C++ char is represented by 2 hex digits.
So, simply read two hex digits at a time, convert to int using e.g. an istringstream, convert that to char, and append each char value to a std::string.
The other direction is just opposite, but with a twist.
Because char is signed on most systems, you need to convert to unsigned char before converting that value again to hex digits.
Conversion to and from hexadecimal can be done using hex, like e.g.
cout << hex << x;
cin >> hex >> x;
for a suitable definition of x, e.g. int x
This should work for string streams as well.

How to write individual bytes to filein C++

GIven the fact that I generate a string containing "0" and "1" of a random length, how can I write the data to a file as bits instead of ascii text ?
Given my random string has 12 bits, I know that I should write 2 bytes (or add 4 more 0 bits to make 16 bits) in order to write the 1st byte and the 2nd byte.
Regardless of the size, given I have an array of char[8] or int[8] or a string, how can I write each individual group of bits as one byte in the output file?
I've googled a lot everywhere (it's my 3rd day looking for an answer) and didn't understand how to do it.
Thank you.
You don't do I/O with an array of bits.
Instead, you do two separate steps. First, convert your array of bits to a number. Then, do binary file I/O using that number.
For the first step, the types uint8_t and uint16_t found in <stdint.h> and the bit manipulation operators << (shift left) and | (or) will be useful.
You haven't said what API you're using, so I'm going to assume you're using I/O streams. To write data to the stream just do this:
f.write(buf, len);
You can't write single bits, the best granularity you are going to get is bytes. If you want bits you will have to do some bitwise work to your byte buffer before you write it.
If you want to pack your 8 element array of chars into one byte you can do something like this:
char data[8] = ...;
char byte = 0;
for (unsigned i = 0; i != 8; ++i)
{
byte |= (data[i] & 1) << i;
}
f.put(byte);
If data contains ASCII '0' or '1' characters rather than actual 0 or 1 bits replace the |= line with this:
byte |= (data[i] == '1') << i;
Make an unsigned char out of the bits in an array:
unsigned char make_byte(char input[8]) {
unsigned char result = 0;
for (int i=0; i<8; i++)
if (input[i] != '0')
result |= (1 << i);
return result;
}
This assumes input[0] should become the least significant bit in the byte, and input[7] the most significant.