convert md5 string to base 62 string in c++ - c++

i'm trying to convert an md5 string (base 16) to a base 62 string in c++. every solution i've found so far for converting to base 62 only works if you can represent your number as a 64 bit integer or smaller. an md5 string is 128 bits and i'm not getting anywhere with this on my own.
should i just include a bigint library and be done with it?

Let's see. 128/log2(62)=21.497. That means you'd need 22 "digits" for a base-62 representation.
If you're just interested in a string representation that's not longer than 22 characters and doesn't use more than 62 different characters, you don't need a real base-62 representation. You can break up 128 bits into smaller pieces and code the pieces separately. This way you won't need any 128 bit arithmetics. You could split the 128 bits to 2x64 bits and encode each 64 bit chunk with a string of length 11. Doing so is even possible with just 57 different characters. So, you could eliminate 5 of the 62 characters to avoid any "visual ambiguities". For example, remove l,1,B,8. That leaves 58 different characters and 11*log2(58)=64.438 which is just enough to encode 64 bits.
Getting the two 64 bit chunks is not that difficult:
#include <climits>
#if CHAR_BIT != 8
#error "platform not supported, CHAR_BIT==8 expected"
#endif
// 'long long' is not yet part of C++
// But it's usually a supported extension
typedef unsigned long long uint64;
uint64 bits2uint64_bigendian(unsigned char const buff[]) {
return (static_cast<uint64>(buff[0]) << 56)
| (static_cast<uint64>(buff[1]) << 48)
| (static_cast<uint64>(buff[2]) << 40)
| (static_cast<uint64>(buff[3]) << 32)
| (static_cast<uint64>(buff[4]) << 24)
| (static_cast<uint64>(buff[5]) << 16)
| (static_cast<uint64>(buff[6]) << 8)
| static_cast<uint64>(buff[7]);
}
int main() {
unsigned char md5sum[16] = {...};
uint64 hi = bits2uint64_bigendian(md5sum);
uint64 lo = bits2uint64_bigendian(md5sum+8);
}

For simplicity, you can use my uint128_t c++ class (http://www.codef00.com/code/uint128.h). With it, a base converter would look pretty much as simple as this:
#include "uint128.h"
#include <iostream>
#include <algorithm>
int main() {
char a[] = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
uint128_t x = U128_C(0x130eb6003540debd012d96ce69453aed);
std::string r;
r.reserve(22); // shouldn't result in more than 22 chars
// 6-bits per 62-bit value means (128 / 6 == 21.3)
while(x != 0) {
r += a[(x % 62).to_integer()];
x /= 62;
}
// when converting bases by division, the digits are reversed...fix that :-)
std::reverse(r.begin(), r.end());
std::cout << r << std::endl;
}
This prints:
J7JWEJ0YbMGqaJFCGkUxZ

GMP provides a convenient c++ binding for arbitrary precision integers

Related

Reverse nibbles of a hexadecimal number in C++

What would be the fastest way possible to reverse the nibbles (e.g digits) of a hexadecimal number in C++?
Here's an example of what I mean : 0x12345 -> 0x54321
Here's what I already have:
unsigned int rotation (unsigned int hex) {
unsigned int result = 0;
while (hex) {
result = (result << 4) | (hex & 0xF);
hex >>= 4;
}
return result;
}
This problem can be split into two parts:
Reverse the nibbles of an integer. Reverse the bytes, and swap the nibble within each byte.
Shift the reversed result right by some amount to adjust for the "variable length". There are std::countl_zero(x) & -4 (number of leading zeroes, rounded down to a multiple of 4) leading zero bits that are part of the leading zeroes in hexadecimal, shifting right by that amount makes them not participate in the reversal.
For example, using some of the new functions from <bit>:
#include <stdint.h>
#include <bit>
uint32_t reverse_nibbles(uint32_t x) {
// reverse bytes
uint32_t r = std::byteswap(x);
// swap adjacent nibbles
r = ((r & 0x0F0F0F0F) << 4) | ((r >> 4) & 0x0F0F0F0F);
// adjust for variable-length of input
int len_of_zero_prefix = std::countl_zero(x) & -4;
return r >> len_of_zero_prefix;
}
That requires C++23 for std::byteswap which may be a bit optimistic, you can substitute it with some other byteswap.
Easily adaptable to uint64_t too.
i would do it without loops based on the assumption that the input is 32 bits
result = (hex & 0x0000000f) << 28
| (hex & 0x000000f0) << 20
| (hex & 0x00000f00) << 12
....
dont know if faster, but I find it more readable

not able to shift hex data in a unsigned long

i am trying to convert IEEE 754 Floating Point Representation to its Decimal Equivalent so i have an example data [7E FF 01 46 4B CD CC CC CC CC CC 10 40 1B 7E] which is in hex.
char strResponseData[STATUS_BUFFERSIZE]={0};
unsigned long strData = (((strResponseData[12] & 0xFF)<< 512 ) |((strResponseData[11] & 0xFF) << 256) |((strResponseData[10] & 0xFF)<< 128 ) |((strResponseData[9] & 0xFF)<< 64) |((strResponseData[8] & 0xFF)<< 32 ) |((strResponseData[7]& 0xFF) << 16) |((strResponseData[6] & 0xFF )<< 8) |(strResponseData[5] & 0xFF));
value = IEEEHexToDec(strData,1);
then i am passing this value to this function
IEEEHexToDec(unsigned long number, int isDoublePrecision)
{
int mantissaShift = isDoublePrecision ? 52 : 23;
unsigned long exponentMask = isDoublePrecision ? 0x7FF0000000000000 : 0x7f800000;
int bias = isDoublePrecision ? 1023 : 127;
int signShift = isDoublePrecision ? 63 : 31;
int sign = (number >> signShift) & 0x01;
int exponent = ((number & exponentMask) >> mantissaShift) - bias;
int power = -1;
double total = 0.0;
for ( int i = 0; i < mantissaShift; i++ )
{
int calc = (number >> (mantissaShift-i-1)) & 0x01;
total += calc * pow(2.0, power);
power--;
}
double value = (sign ? -1 : 1) * pow(2.0, exponent) * (total + 1.0);
return value;
}
but in return am getting value 0, also when am trying to print strData it is giving me only CCCCCD.
i am using eclipse ide.
please i need some suggestion
((strResponseData[12] & 0xFF)<< 512 )
First, the << operator takes a number of bits to shift, you seem to be confusing it with multiplication by the resulting power of two - while it has the same effect, you need to supply the exponent. Given that you have no typical data types of 512 bit width, it's fairly certain that this should actually be.
((strResponseData[12] & 0xFF)<< 9 )
Next, it's necessary for the value to be shifted to be of a sufficient type to hold the result before you do the shift. A char is obviously not sufficient, so you need to explicitly cast the value to a sufficient type to hold the result before you perform the shift.
Additionally keep in mind that depending on your platform an unsigned long may be either a 32 bit or 64 bit type, so if you were doing an operation with a bit shift where the result would not fit in 32 bits, you may want to use an unsigned long long or better yet make things unambiguous, for example with #include <stdint.h> and type such as uint32_t or uint64_t. Given that your question is tagged "embedded" this is especially important to keep in mind as you might be targeting a 32 (or even 8) bit processor, but sometimes building algorithms to test on the development machine instead.
Further, a char can be either a signed or an unsigned type. Before shifting, you should make that explicit. Given that you are combining multiple pieces of something, it is almost certain that at least most of these should be treated as unsigned.
So probably you want something like
((uint32_t)(strResponseData[12] & 0xFF)<< 9 )
Unless you are on an odd platform where char is not 8 bits (for example some TI DSP's) you probably don't need to pre-mask with 0xff, but it's not hurting anything
Finally it is not 100% clear what you are staring with:
i have an example data [7E FF 01 46 4B CD CC CC CC CC CC 10 40 1B 7E] which is in hex.
Is ambiguous as it is not clear if you mean
[0x7e, 0xff, 0x01, 0x46...]
Which would be an array of byte values which debugging code has printed out in hex for human convenience, or if you actually mean that you something such as
"[7E FF 01 46 .... ]"
Which string of text containing a human readable representation of hex digits as printable characters. In the latter case, you'd first have to convert the character representation of hex digits or octets into into numeric values.

Manually changing a group of bytes in an unsigned int

I'm working with C and I'm trying to figure out how to change a set of bits in a 32-bit unsigned integer.
For example, if I have
int a = 17212403u;
In binary, that becomes 1000001101010001111110011. Now, supposing I labeled these bits, which are arranged in little-endian format, such that the bit utmost right represents the ones, the second to the right is the twos, and so on, how can I manually change a group of bits?
For example, suppose I wanted to change the bits such that the 11th bit to the 15th bit has the decimal value of 17. How would this be possible?
I was thinking of getting that range by doing as such:
unsigned int range = (a << (sizeof(a) * 8) - 14) >> (28)
But I'm not sure where to go on from now.
You will (1) first have to clear the bits 11..15, and (2) then to set the bits according to the value you want to set. To achieve (1), create a "mask" that has all bits set to 1 except the ones that you want to clear; use then a & bitMask to set the bits to 0. Then, use | myValue to set the bits to the value wanted.
Use the bit shift operator << to place the mask and the value at the right positions:
int main(int argc, char** argv) {
// Let's assume a range of 5 bits
unsigned int bitRange = 0x1Fu; // is ...00000000011111
// Let's assume to position the range from bit 11 onwards (i.e. move 10 left):
bitRange = bitRange << 10; // something like 000000111110000000000
unsigned int bitMask = ~bitRange; // something like 111111000001111111111
unsigned int valueToSet = (17u << 10); // corresponds to 000000101110000000000
unsigned int a = (17212403u & bitMask) | valueToSet;
return 0;
}
This is the long version to explain what's going on. In brief, you could also write:
unsigned int a = (17212403u & ~(0x1Fu << 10)) | (17u << 10)
The 11th to 15th bit is 5 bits, assuming you meant including the 15th bit. 5 bits is the hex value: 0x1f
Then you shift these 5 bits 11 position to the left:0x1f << 11
Now we have a mask for the bits 11 through 15 that we want to clear in the original variable, which - we do that by inverting the mask, bitwise and the variable with the inverted mask: a & ~(0x1f << 11)
Next is shifting the value 17 up to the 11th bit: 17 << 11
Then we bitwise or that into the 5 bits we have cleared:
unsigned int b = (a & ~(0x1f << 11)) | (17 << 11)
Consider using bit fields. This allows you to name and access sub-sections of the integer as though they were integer members of a struct.
For info on C bitfields see:
https://www.tutorialspoint.com/cprogramming/c_bit_fields.htm
Below is code to do what you want, using bitfields. The "middle5" member of the struct holds bits 11-15. The "lower11" member is a filler for the lower 11 bits, so that the "middle5" member will be in the right place.
#include <stdio.h>
void showBits(unsigned int w)
{
unsigned int bit = 1<<31;
while (bit > 0)
{
printf("%d", ((bit & w) != 0)? 1 : 0);
bit >>= 1;
}
printf("\n");
}
int main(int argc, char* argv[])
{
struct aBitfield {
unsigned int lower11: 11;
unsigned int middle5: 5;
unsigned int upper16: 16;
};
union uintBits {
unsigned int whole;
struct aBitfield parts;
};
union uintBits b;
b.whole = 17212403u;
printf("Before:\n");
showBits(b.whole);
b.parts.middle5 = 17;
printf("After:\n");
showBits(b.whole);
}
Output of the program:
Before:
00000001000001101010001111110011
After:
00000001000001101000101111110011
Of course, you would want to use more meaningful naming for the various fields.
Be careful though, bitfields may be implemented differently on different platforms - so it may not be completely portable.

n bit 2s binary to decimal in C++

I am trying to convert a string of signed binary numbers to decimal value in C++ using stoi as shown below.
stoi( binaryString, nullptr, 2 );
My inputs are binary string in 2s format and stoi will work fine as long as the number of digits is eight. for instance "1100" results 12 because stoi probably perceive it as "00001100".
But for a 4 bit system, 1100 in 2s format equals to -4. Any clues how to do this kind of conversion for arbitrary bit length 2s numbers in C++?
Handle sigendness for numbers with less bits:
convert binary -> decimal
calc 2s-complement if signed bit is set (wherever your sign bit is depending on wordlength).
.
#define BITSIZE 4
#define SIGNFLAG (1<<(BITSIZE-1)) // 0b1000
#define DATABITS (SIGNFLAG-1) // 0b0111
int x= std::stoi( "1100", NULL, 2); // x= 12
if ((x & SIGNFLAG)!=0) { // signflag set
x= (~x & DATABITS) + 1; // 2s complement without signflag
x= -x; // negative number
}
printf("%d\n", x); // -4
You can use strtoul, which is the unsigned equivalent. The only difference is that it returns an unsigned long, instead of an int.
You probably can implement
in C++, where a is binaryString, N is binaryString.size() and w is result.
The correct answer would probably depend on what you ultimately want to do with the int after you convert it. If you want to do signed math with it then you would need to 'sign extend' your result after the stoi conversion -- this is what the compiler does internally on a cast operation from one size signed int to another.
You can manually do this with something like this for a 4-bit system:
int myInt;
myInt = std::stoi( "1100", NULL, 2);
myInt |= myInt & 0x08 ? (-16 ) : 0;
Note, I used 0x08 as the test mask and -16 as the or mask as this is for a 4-bit result. You can change the mask to be correct for whatever your input bit length is. Also using a negative int like this will correctly sign-extend no matter what your systems integer size is.
Example for arbitrary bit width system (I used bitWidth to denote the size:
myInt = std::stoi( "1100", NULL, 2);
int bitWidth = 4;
myInt |= myInt & (1 << (bitWidth-1)) ? ( -(1<<bitWidth) ) : 0;
you can use the bitset header file for this :
#include <iostream>
#include <bitset>
using namespace std;
int main()
{
bitset<4> bs;
int no;
cin>>bs;
if(bs[3])
{
bs[3]=0;
no=-1*bs.to_ulong();
}
else
no=bs.to_ulong();
cout<<no;
return 0;
}
Since it returns unsigned long so you have to check the last bit.

Convert 128-bit hexadecimal string to base-36 string

I have a 128-bit number in hexadecimal stored in a string (from md5, security isn't a concern here) that I'd like to convert to a base-36 string. If it were a 64-bit or less number I'd convert it to a 64-bit integer then use an algorithm I found to convert integers to base-36 strings but this number is too large for that so I'm kind of at a loss for how to approach this. Any guidance would be appreciated.
Edit: After Roland Illig pointed out the hassle of saying 0/O and 1/l over the phone and not gaining much data density over hex I think I may end up staying with hex. I'm still curious though if there is a relatively simple way to convert an hex string of arbitrary length to a base-36 string.
A base-36 encoding requires 6 bits to store each token. Same as base-64 but not using 28 of the available tokens. Solving 36^n >= 2^128 yields n >= log(2^128) / log(36) or 25 tokens to encode the value.
A base-64 encoding also requires 6 bits, all possible token values are used. Solving 64^n >= 2^128 yields n >= log(2^128) / log(64) or 22 tokens to encode the value.
Calculating the base-36 encoding requires dividing by powers of 36. No easy shortcuts, you need a division algorithm that can work with 128-bit values. The base-64 encoding is much easier to compute since it is a power of 2. Just take 6 bits at a time and shift by 6, in total 22 times to consume all 128 bits.
Why do you want to use base-36? Base-64 encoders are standard. If you really have a constraint on the token space (you shouldn't, ASCII rulez) then at least use a base-32 encoding. Or any power of 2, base-16 is hex.
If the only thing that is missing is the support for 128 bit unsigned integers, here is the solution for you:
#include <stdio.h>
#include <inttypes.h>
typedef struct {
uint32_t v3, v2, v1, v0;
} uint128;
static void
uint128_divmod(uint128 *out_div, uint32_t *out_mod, const uint128 *in_num, uint32_t in_den)
{
uint64_t x = 0;
x = (x << 32) + in_num->v3;
out_div->v3 = x / in_den;
x %= in_den;
x = (x << 32) + in_num->v2;
out_div->v2 = x / in_den;
x %= in_den;
x = (x << 32) + in_num->v1;
out_div->v1 = x / in_den;
x %= in_den;
x = (x << 32) + in_num->v0;
out_div->v0 = x / in_den;
x %= in_den;
*out_mod = x;
}
int
main(void)
{
uint128 x = { 0x12345678, 0x12345678, 0x12345678, 0x12345678 };
uint128 result;
uint32_t mod;
uint128_divmod(&result, &mod, &x, 16);
fprintf(stdout, "%08"PRIx32" %08"PRIx32" %08"PRIx32" %08"PRIx32" rest %08"PRIx32"\n", result.v3, result.v2, result.v1, result.v0, mod);
return 0;
}
Using this function you can repeatedly compute the mod-36 result, which leads you to the number encoded as base-36.
If you are using C++ with .NET 4 you could always use the System.Numerics.BigInteger class. You could try calling one of the toString overrides to get you to base 36.
Alternatively look at one of the many Big Integer libraries e.g. Matt McCutchen's C++ Big Integer Library although you might have to look into the depths of the classes to use a custom base such as 36.
Two things:
1. It really isn't that hard to divide a byte string by 36. But if you can't be bothered to implement that, you can use base-32 encoding, which would need 26 bytes instead of 25.
2. If you want to be able to read the result over the phone to humans, you absolutely must add a simple checksum to your string, which will cost one or two bytes but will save you a huge amount of 'Chinese whispers' hassle from hard-of-hearing customers.