How to convert a string say "test" to unsigned int in c++ - c++

I have to use an encryption algorithm which takes unsigned int as an input. For this I want to convert my password which is alpha numeric 8 character string to int.
I am using the below code and not sure if it works right. I want to convert my characters say "test" to unsigned integer.
I do get an output value. But I'm not sure if this is the right way of doing this and if there can be any side effects.
Can you please explain what actually is happening here?
unsigned int ConvertStringToUInt(CString Input)
{
unsigned int output;
output = ((unsigned int)Input[3] << 24);
output += ((unsigned int)Input[2] << 16);
output += ((unsigned int)Input[1] << 8);
output += ((unsigned int)Input[0]);
return output;
}

For an input of "ABCD" the output of ConvertStringToUInt will be 0x44434241 because:
0x41 is the ASCII code of 'A'
0x42 is the ASCII code of 'B'
0x43 is the ASCII code of 'C'
0x44 is the ASCII code of 'D'
<< being the shift left operator.
So we have:
0x44 << 24 = 0x44000000
0x43 << 16 = 0x00430000
0x42 << 8 = 0x00004200
output =
0x44000000
+ 0x00430000
+ 0x00004200
+ 0x00000041
============
0x44434241
Be aware that your ConvertStringToUInt function only works if the length of the provided string is exactly 4, so this function is useless for your case because the length of your password is 8.

You can't do a unique mapping of a 8 character alphanumeric string to a 32 bit integer.
(10 + 26 + 26) ^ 8 is 218,340,105,584,896 (digits + upper-case and lower-case letters)
(10 + 26) ^ 8 is 2,821,109,907,456 (digits + case-insensitive letters)
2 ^ 32 is 4,294,967,296 (a 32 bit unsigned int)
So if you need to convert your 8 characters into a 32 bit number, you will need to use hashing. And that means that multiple passwords will map to the same key.
Note that this NOT encryption because the mapping is not reversible. It cannot be reversible. This can be proven mathematically.
The Wikipedia page on hash functions is a good place to start learning about this. Also the page on the pigeonhole principle.
However, it should also be noted that 8 character passwords are too small to be secure. And if you are hashing to a 32 bit code, brute-force attacks will be easy.

What you are trying to do is to reinvent a hashing algorithm, very poorly. I strongly recommend to use SHA-256 or some equivalent hashing algorithm available by the libs of your system, which is best practice and usually sufficient to transmit and compare passwords.
You should start reading the basics on that issue before writing any more code, else the security level of your application won't be much better than no hashing/encryption at all, but with the false sense of being on the safe side. Start here, for instance.

Related

16-bit to 10-bit conversion code explanation

I came across the following code to convert 16-bit numbers to 10-bit numbers and store it inside an integer. Could anyone maybe explain to me what exactly is happening with the AND 0x03?
// Convert the data to 10-bits
int xAccl = (((data[1] & 0x03) * 256) + data[0]);
if(xAccl > 511) {
xAccl -= 1024;
}
Link to where I got the code: https://www.instructables.com/id/Measurement-of-Acceleration-Using-ADXL345-and-Ardu/
The bitwise operator & will make a mask, so in this case, it voids the 6 highest bits of the integer.
Basically, this code does a modulo % 1024 (for unsigned values).
data[1] takes the 2nd byte; & 0x03 masks that byte with binary 11 - so: takes 2 bits; * 256 is the same as << 8 - i.e. pushes those 2 bits into the 9th and 10th positions; adding data[0] to data combines these two bytes (personally I'd have used |, not +).
So; xAccl is now the first 10 bits, using big-endian ordering.
The > 511 seems to be a sign check; essentially, it is saying "if the 10th bit is set, treat the entire thing as a negative integer as though we'd used 10-bit twos complement rules".

Add a bit value to a string

I am trying to send a packet over a network and so want it to be as small as possible (in terms of size).
Each of the input can contain a common prefix substring, like ABCD. In such cases, I just wanna send a single bit say, 1 denoting that the current string has the same prefix ABCD and append it to the remaining string. So, if the string was ABCDEF, I will send 1EF; if it was LKMPS, I wish to send the string LKMPS as is.
Could someone please point out how I could add a bit to a string?
Edit: I get that adding a 1 to a string does not mean that this 1 is a bit - it is just a character that I added to the string. And that exactly is my question - for each string, how do I send a bit denoting that the prefix matches? And then send the remaining part of the string that is different?
In common networking hardware, you won't be able to send individual bits. And most architectures cannot address individual bits, either.
However, you can still minimize the size as you want by using one of the bits that you may not be using. For instance, if your strings contain only 7-bit ASCII characters, you could use the highest bit to encode the information you want in the first byte of the string.
For example, if the first byte is:
0b01000001 == 0x41 == 'A'
Then set the highest bit using |:
(0b01000001 | 0x80) == 0b11000001 == 0xC1
To test for the bit, use &:
(0b01000001 & 0x80) == 0
(0b11000001 & 0x80) != 0
To remove the bit (in the case where it was set) to get back the original first byte:
(0b11000001 & 0x7F) == 0b01000001 == 0x41 == 'A'
If you're working with a buffer for use in your communications protocol, it should generally not be an std::string. Standard strings are not intended for use as buffers; and they can't generally be prepended in-place with anything.
It's possible that you may be better served by an std::vector<std::byte>; or by a (compile-time-fixed-size) std::array. Or, again, a class of your own making. That is especially true if you want your "in-place" prepending of bits or characters to not merely keep the same span of memory for your buffer, but to actually not move any of the existing data. For that, you'd need twice the maximum length of the buffer, and start it at the middle, so you can either append or prepend data without shifting anything - while maintaining bit-resolution "pointers" to the effective start and end of the full part of the buffer. This is would be readily achievable with, yes you guessed it, your own custom buffer class.
I think the smallest amount of memory you can work with is 8 bits.
If you wanted to operate with bits, you could specify 8 prefixes as follows:
#include <iostream>
using namespace std;
enum message_header {
prefix_on = 1 << 0,
bitdata_1 = 1 << 1,
bitdata_2 = 1 << 2,
bitdata_3 = 1 << 3,
bitdata_4 = 1 << 4,
bitdata_5 = 1 << 5,
bitdata_6 = 1 << 6,
bitdata_7 = 1 << 7,
};
int main() {
uint8_t a(0);
a ^= prefix_1;
if(a & prefix_on) {
std::cout << "prefix_on" << std::endl;
}
}
That being said, networks pretty fast nowadays so I wouldn't do it.

What's the best hash function for a hex string?

I'm looking to encode a set of hexadecimal values stored in strings using a hash function. Since the hex ''alphabet'' is composed of only 16 letters, what would be the best hash algorithm with the least amount of collisions?
Bit of a too general question, as you left out any constraints on the hash function, and/or what you're going to do with the hashes. (On a side note, hashing isn't an encoding)
That being said, having an alphabet of 16 letters, you need 4 bit to store each (i.e. you could build a XOR sum over each two letters crammed into a single byte, to get an 8-bit hash.Of course, that can be extended to any other word length, too (but you left out too much information)
for instance like this:
uint8_t
hexhash(const char *str)
{
uint8_t res = 0;
while (*str && *(str+1)) {
res ^= (fromchar(*str) << 4) | fromchar(*(str+1));
str += 2; //EDIT: forgot this in my original reply
}
return res;
}
(where 'fromchar' is a function to return 0 for '0', 1 for '1', ..., 15 for 'f')

printf: Displaying an SHA1 hash in hexadecimal

I have been following the msdn example that shows how to hash data using the Windows CryptoAPI. The example can be found here: http://msdn.microsoft.com/en-us/library/windows/desktop/aa382380%28v=vs.85%29.aspx
I have modified the code to use the SHA1 algorithm.
I don't understand how the code that displays the hash (shown below) in hexadecmial works, more specifically I don't understand what the >> 4 operator and the & 0xf operator do.
if (CryptGetHashParam(hHash, HP_HASHVAL, rgbHash, &cbHash, 0)){
printf("MD5 hash of file %s is: ", filename);
for (DWORD i = 0; i < cbHash; i++)
{
printf("%c%c", rgbDigits[rgbHash[i] >> 4],
rgbDigits[rgbHash[i] & 0xf]);
}
printf("\n");
}
I would be grateful if someone could explain this for me, thanks in advance :)
x >> 4 shifts x right four bits. x & 0xf does a bitwise and between x and 0xf. 0xf has its four least significant bits set, and all the other bits clear.
Assuming rgbHash is an array of unsigned char, this means the first expression retains only the four most significant bits and the second expression the four least significant bits of the (presumably) 8-bit input.
Four bits is exactly what will fit in one hexadecimal digit, so each of those is used to look up a hexadecimal digit in an array which presumably looks something like this:
char rgbDigits[] = "0123456789abcdef"; // or possibly upper-case letters
this code uses simple bit 'filtering' techniques
">> 4" means shift right by 4 places, which in turn means 'divide by 16'
"& 0xf" equals to bit AND operation which means 'take first 4 bits'
Both these values are passed to rgbDigits which proly produced output in valid range - human readable

Convert 128-bit hexadecimal string to base-36 string

I have a 128-bit number in hexadecimal stored in a string (from md5, security isn't a concern here) that I'd like to convert to a base-36 string. If it were a 64-bit or less number I'd convert it to a 64-bit integer then use an algorithm I found to convert integers to base-36 strings but this number is too large for that so I'm kind of at a loss for how to approach this. Any guidance would be appreciated.
Edit: After Roland Illig pointed out the hassle of saying 0/O and 1/l over the phone and not gaining much data density over hex I think I may end up staying with hex. I'm still curious though if there is a relatively simple way to convert an hex string of arbitrary length to a base-36 string.
A base-36 encoding requires 6 bits to store each token. Same as base-64 but not using 28 of the available tokens. Solving 36^n >= 2^128 yields n >= log(2^128) / log(36) or 25 tokens to encode the value.
A base-64 encoding also requires 6 bits, all possible token values are used. Solving 64^n >= 2^128 yields n >= log(2^128) / log(64) or 22 tokens to encode the value.
Calculating the base-36 encoding requires dividing by powers of 36. No easy shortcuts, you need a division algorithm that can work with 128-bit values. The base-64 encoding is much easier to compute since it is a power of 2. Just take 6 bits at a time and shift by 6, in total 22 times to consume all 128 bits.
Why do you want to use base-36? Base-64 encoders are standard. If you really have a constraint on the token space (you shouldn't, ASCII rulez) then at least use a base-32 encoding. Or any power of 2, base-16 is hex.
If the only thing that is missing is the support for 128 bit unsigned integers, here is the solution for you:
#include <stdio.h>
#include <inttypes.h>
typedef struct {
uint32_t v3, v2, v1, v0;
} uint128;
static void
uint128_divmod(uint128 *out_div, uint32_t *out_mod, const uint128 *in_num, uint32_t in_den)
{
uint64_t x = 0;
x = (x << 32) + in_num->v3;
out_div->v3 = x / in_den;
x %= in_den;
x = (x << 32) + in_num->v2;
out_div->v2 = x / in_den;
x %= in_den;
x = (x << 32) + in_num->v1;
out_div->v1 = x / in_den;
x %= in_den;
x = (x << 32) + in_num->v0;
out_div->v0 = x / in_den;
x %= in_den;
*out_mod = x;
}
int
main(void)
{
uint128 x = { 0x12345678, 0x12345678, 0x12345678, 0x12345678 };
uint128 result;
uint32_t mod;
uint128_divmod(&result, &mod, &x, 16);
fprintf(stdout, "%08"PRIx32" %08"PRIx32" %08"PRIx32" %08"PRIx32" rest %08"PRIx32"\n", result.v3, result.v2, result.v1, result.v0, mod);
return 0;
}
Using this function you can repeatedly compute the mod-36 result, which leads you to the number encoded as base-36.
If you are using C++ with .NET 4 you could always use the System.Numerics.BigInteger class. You could try calling one of the toString overrides to get you to base 36.
Alternatively look at one of the many Big Integer libraries e.g. Matt McCutchen's C++ Big Integer Library although you might have to look into the depths of the classes to use a custom base such as 36.
Two things:
1. It really isn't that hard to divide a byte string by 36. But if you can't be bothered to implement that, you can use base-32 encoding, which would need 26 bytes instead of 25.
2. If you want to be able to read the result over the phone to humans, you absolutely must add a simple checksum to your string, which will cost one or two bytes but will save you a huge amount of 'Chinese whispers' hassle from hard-of-hearing customers.