So in my code I have a series of chars which I want to replace with random data. Since rand can replace ints, I figured I could save some time by replacing four chars at once instead of one at a time. So basically instead of this:
unsigned char TXT[] = { data1,data2,data3,data4,data4,data5....
for (i = 34; i < flenght; i++) // generating the data to send.
TXT[i] = rand() % 255;
I'd like to do something like:
unsigned char TXT[] = { data1,data2,data3,data4,data4,data5....
for (i = 34; i < flenght; i+4) // generating the data to send.
TXT[i] = rand() % 4294967295;
Something that effect, but I'm not sure how to do the latter part. Any help you can give me is greatly appreciated, thanks!
That won't work. The compiler will take the result from rand() % big_number and chop off the extra data to fit it in an unsigned char.
Speed-wise, your initial approach was fine. The optimization you contemplated is valid, but most likely unneeded. It probably wouldn't make a noticeable difference.
What you wanted to do is possible, of course, but given your mistake, I'd say the effort to understand how right now far outweights the benefits. Keep learning, and the next time you run across code like this, you'll know what to do (and judge if it's necessary), look back on this moment and smile :).
You'll have to access memory directly, and do some transformations on your data. You probably want something like this:
unsigned char TXT[] = { data1,data2,data3,data4,data4,data5....
for (i = 34; i < flenght/sizeof(int); i+=sizeof(int)) // generating the data to send.
{
int *temp = (int*)&TXT[i]; // very ugly
*temp = rand() % 4294967295;
}
It can be problematic though because of alignment issues, so be careful. Alignment issues can cause your program to crash unexpectedly, and are hard to debug. I wouldn't do this if I were you, your initial code is just fine.
TXT[i] = rand() % 4294967295;
Will not work the way you expect it to. Perhaps you are expecting that rand()%4294967295 will generate a 4 byte integer(which you maybe interpreting as 4 different characters). The value that rand()%4294967295, produces will be type cast into a single char and will get assigned to only one of the index of TXT[i].
Though it's not quire clear as to why you need to make 4 assigning at the same time, one approach would be to use bit operators to obtain 4 different significant bytes of the number generated and those can then be assigned to the four different index.
There are valid answers just so much C does not care very much about what type it stores at which address. So you can get away with something like:
#include <stdio.h>
#include <stdlib.h>
#include <limits.h>
char *arr;
int *iArr;
int main (void){
int i;
arr = malloc(100);
/* Error handling ommitted, yes that's evil */
iArr = (int*) arr;
for (i = 0; i < 25; i++) {
iArr[i] = rand() % INT_MAX;
}
for (i = 0; i < 25; i++) {
printf("iArr[%d] = %d\n", i, iArr[i]);
}
for (i = 0; i < 100; i++) {
printf("arr[%d] = %c\n", i, arr[i]);
}
free(arr);
return 0;
}
In the end an array is just some contiguous block in memory. And you can interpret it as you like (if you want). If you know that sizeof(int) = 4 * sizeof(char) then the above code will work.
I do not say I recommend it. And the others have pointed out whatever happened the first loop through all the chars in TXT will yield the same result. One could think for example of unrolling a loop but really I'd not care about that.
The (int*) just alone is warning enough. It means to the compiler, do not think about what you think the type is just "believe" he programmer that he knows better.
Well this "know better" is probably the root of all evil in C programming....
unsigned char TXT[] = { data1,data2,data3,data4,data4,data5....
for (i = 34; i < flenght; i+4)
// generating the data to send.
TXT[i] = rand() % 4294967295;
This has a few issues:
TXT is not guaranteed to be memory-aligned as needed for the CPU to write int data (whether it works - perhaps relatively slowly - or not - e.g. SIGBUS on Solaris - is hardware specific)
the last 1-3 characters may be missed (even if you change i + 4 to i += 4 ;-P)
rand() returns an int anyway - you don't need to mod it with anything
you need to write your random data via an int* so you're accessing 4 bytes at a time and not simply slicing a byte off the end of the random data and overwriting every fourth single character
for stuff like this where you're dependent on the size of int, you should really write it in terms of sizeof(int) so it'll work even if int isn't 32 bits, or use a (currently sadly) non-Standard but common typedef such as int32_t (or on Windows I think it's __int32, or you can use a boost or other library header to get int32_t, or write your own typedef).
It's actually pretty tricky to align your text data: your code suggests you want int-sized slices from the 35th character... even if the overall character array is aligned properly for ints, the 35th character will not be.
If it really is always the 35th, then you can pad the data with a leading character so you're accessing the 36th (being a multiple of presumably 32-bit int size), then align the text to an 32-bit address (with a compiler-specific #pragma or using a union with int32_t). If the real code varies the character you start overwriting from, such that you can't simply align the data once, then you're stuck with:
your original character-at-a-time overwrites
non-portable unaligned overwrites (if that's possible and better on your system), OR
implementing code that overwrites up to three leading unaligned characters, then switches to 32-bit integer overwrite mode for aligned addresses, then back to character-by-character overwrites for up to three trailing characters.
That does not work because the generated value is converted to type of array element - char in this particular case. But you are free to interpret allocated memory in the manner you like. For example, you could convert it into array int:
unsigned char TXT[] = { data1,data2,data3,data4,data4,data5....
for (i = 34; i < flenght-sizeof(int); i+=sizeof(int)) // generating the data to send.
*(int*)(TXT+i) = rand(); // There is no need in modulo operator
for (; i < flenght; ++i) // generating the data to send.
TXT[i] = rand(); // There is no need in modulo operator either
I just want to complete solution with the remarks about modulo operator and handling of arrays not multiple of sizeof(int).
1) % means "the remainder when divided by", so you want rand() % 256 for a char, or else you will never get chars with a value of 255. Similarly for the int case, although here there is no point in doing a modulus operation anyway, since you want the entire range of output values.
2) rand usually only generates two bytes at a time; check the value of RAND_MAX.
3) 34 isn't divisible by 4 anyway, so you will have to handle the end case specially.
4) You will want to cast the pointer, and it won't work if it isn't already aligned. Once you have the cast, though, there is no need to account for the sizeof(int) in your iteration: pointer arithmetic automatically takes care of the element size.
5) Chances are very good that it won't make a noticeable difference. If scribbling random data into an array is really the bottleneck in your program, then it isn't really doing anything significiant anyway.
Related
I have a long list of numbers between 0 and 67600. Now I want to store them using an array that is 67600 elements long. An element is set to 1 if a number was in the set and it is set to 0 if the number is not in the set. ie. each time I need only 1bit information for storing the presence of a number. Is there any hack in C/C++ that helps me achieve this?
In C++ you can use std::vector<bool> if the size is dynamic (it's a special case of std::vector, see this) otherwise there is std::bitset (prefer std::bitset if possible.) There is also boost::dynamic_bitset if you need to set/change the size at runtime. You can find info on it here, it is pretty cool!
In C (and C++) you can manually implement this with bitwise operators. A good summary of common operations is here. One thing I want to mention is its a good idea to use unsigned integers when you are doing bit operations. << and >> are undefined when shifting negative integers. You will need to allocate arrays of some integral type like uint32_t. If you want to store N bits, it will take N/32 of these uint32_ts. Bit i is stored in the i % 32'th bit of the i / 32'th uint32_t. You may want to use a differently sized integral type depending on your architecture and other constraints. Note: prefer using an existing implementation (e.g. as described in the first paragraph for C++, search Google for C solutions) over rolling your own (unless you specifically want to, in which case I suggest learning more about binary/bit manipulation from elsewhere before tackling this.) This kind of thing has been done to death and there are "good" solutions.
There are a number of tricks that will maybe only consume one bit: e.g. arrays of bitfields (applicable in C as well), but whether less space gets used is up to compiler. See this link.
Please note that whatever you do, you will almost surely never be able to use exactly N bits to store N bits of information - your computer very likely can't allocate less than 8 bits: if you want 7 bits you'll have to waste 1 bit, and if you want 9 you will have to take 16 bits and waste 7 of them. Even if your computer (CPU + RAM etc.) could "operate" on single bits, if you're running in an OS with malloc/new it would not be sane for your allocator to track data to such a small precision due to overhead. That last qualification was pretty silly - you won't find an architecture in use that allows you to operate on less than 8 bits at a time I imagine :)
You should use std::bitset.
std::bitset functions like an array of bool (actually like std::array, since it copies by value), but only uses 1 bit of storage for each element.
Another option is vector<bool>, which I don't recommend because:
It uses slower pointer indirection and heap memory to enable resizing, which you don't need.
That type is often maligned by standards-purists because it claims to be a standard container, but fails to adhere to the definition of a standard container*.
*For example, a standard-conforming function could expect &container.front() to produce a pointer to the first element of any container type, which fails with std::vector<bool>. Perhaps a nitpick for your usage case, but still worth knowing about.
There is in fact! std::vector<bool> has a specialization for this: http://en.cppreference.com/w/cpp/container/vector_bool
See the doc, it stores it as efficiently as possible.
Edit: as somebody else said, std::bitset is also available: http://en.cppreference.com/w/cpp/utility/bitset
If you want to write it in C, have an array of char that is 67601 bits in length (67601/8 = 8451) and then turn on/off the appropriate bit for each value.
Others have given the right idea. Here's my own implementation of a bitsarr, or 'array' of bits. An unsigned char is one byte, so it's essentially an array of unsigned chars that stores information in individual bits. I added the option of storing TWO or FOUR bit values in addition to ONE bit values, because those both divide 8 (the size of a byte), and would be useful if you want to store a huge number of integers that will range from 0-3 or 0-15.
When setting and getting, the math is done in the functions, so you can just give it an index as if it were a normal array--it knows where to look.
Also, it's the user's responsibility to not pass a value to set that's too large, or it will screw up other values. It could be modified so that overflow loops back around to 0, but that would just make it more convoluted, so I decided to trust myself.
#include<stdio.h>
#include <stdlib.h>
#define BYTE 8
typedef enum {ONE=1, TWO=2, FOUR=4} numbits;
typedef struct bitsarr{
unsigned char* buckets;
numbits n;
} bitsarr;
bitsarr new_bitsarr(int size, numbits n)
{
int b = sizeof(unsigned char)*BYTE;
int numbuckets = (size*n + b - 1)/b;
bitsarr ret;
ret.buckets = malloc(sizeof(ret.buckets)*numbuckets);
ret.n = n;
return ret;
}
void bitsarr_delete(bitsarr xp)
{
free(xp.buckets);
}
void bitsarr_set(bitsarr *xp, int index, int value)
{
int buckdex, innerdex;
buckdex = index/(BYTE/xp->n);
innerdex = index%(BYTE/xp->n);
xp->buckets[buckdex] = (value << innerdex*xp->n) | ((~(((1 << xp->n) - 1) << innerdex*xp->n)) & xp->buckets[buckdex]);
//longer version
/*unsigned int width, width_in_place, zeros, old, newbits, new;
width = (1 << xp->n) - 1;
width_in_place = width << innerdex*xp->n;
zeros = ~width_in_place;
old = xp->buckets[buckdex];
old = old & zeros;
newbits = value << innerdex*xp->n;
new = newbits | old;
xp->buckets[buckdex] = new; */
}
int bitsarr_get(bitsarr *xp, int index)
{
int buckdex, innerdex;
buckdex = index/(BYTE/xp->n);
innerdex = index%(BYTE/xp->n);
return ((((1 << xp->n) - 1) << innerdex*xp->n) & (xp->buckets[buckdex])) >> innerdex*xp->n;
//longer version
/*unsigned int width = (1 << xp->n) - 1;
unsigned int width_in_place = width << innerdex*xp->n;
unsigned int val = xp->buckets[buckdex];
unsigned int retshifted = width_in_place & val;
unsigned int ret = retshifted >> innerdex*xp->n;
return ret; */
}
int main()
{
bitsarr x = new_bitsarr(100, FOUR);
for(int i = 0; i<16; i++)
bitsarr_set(&x, i, i);
for(int i = 0; i<16; i++)
printf("%d\n", bitsarr_get(&x, i));
for(int i = 0; i<16; i++)
bitsarr_set(&x, i, 15-i);
for(int i = 0; i<16; i++)
printf("%d\n", bitsarr_get(&x, i));
bitsarr_delete(x);
}
I'm writing a websocket server and I have to deal with masked data that I need to unmask.
The mask is unsigned char[4], and the data is a unsigned char* buffer as well.
I don't want to XOR byte by byte, I'd much rather XOR 4-bytes at a time.
uint32_t * const end = reinterpret_cast<uint32_t *>(data_+length);
for(uint32_t *i = reinterpret_cast<uint32_t *>(data_); i != end; ++i) {
*i ^= mask_;
}
Is there anything wrong with the use of reinterpret_cast in this situation?
The alternative would be the following code which isn't as clear and not as fast:
uint64_t j = 0;
uint8_t *end = data_+length;
for(uint8_t *i = data_; i != end; ++i,++j) {
*i ^= mask_[j % 4];
}
I'm all ears for alternatives, including ones dependent on c++11 features.
The are a few potential problems with the approach posted:
On some systems objects of a type bigger than char needs to be aligned properly to be accessible. A typical requirement for uint32_t is that the object is aligned to an address divisible by four.
If length / sizeof(uint32_t) != 0 the loop may never terminate.
Depending on the endianess of the system mask needs to contain different values. If mask is produced by *reinterpret_cast<uint32_t>(char_mask) of a suitable array this shouldn't be an array.
If these issues are taken care of, reinterpret_cast<...>(...) can be used in the situation you have. Reinterpreting the meaning of pointers is one of the reasons this operation is there and sometimes it is needed. I would create a suitable test case to verify that it works properly, though, to avoid having to hunt down problems when porting the code to a different platform.
Personally I would go with a different approach until profiling shows that it is too slow:
char* it(data);
if (4 < length) {
for (char* end(data + length - 4); it < end; it += 4) {
it[0] ^= mask_[0];
it[1] ^= mask_[1];
it[2] ^= mask_[2];
it[3] ^= mask_[3];
}
}
it != data + length && *it++ ^= mask_[0];
it != data + length && *it++ ^= mask_[1];
it != data + length && *it++ ^= mask_[2];
it != data + length && *it++ ^= mask_[3];
I'm definitely using a number of similar approaches in software which meant to be really faster and haven't found them to be a notable performance problem.
There's nothing specifically wrong with reinterpret_cast in this case. But, take care.
The 32-bit loop as it stands is incorrect, because it doesn't cater for the case where the payload isn't a multiple of 32 bits in size. Two possible solutions, I suppose:
replace the != with < in the for loop check (there's a reason why people use <, and it's not because they're dumb...) and do the trailing 1-3 bytes bytewise
arrange the buffer so that the size of the buffer for the payload part is a multiple of 32 bits, and just XOR the extra bytes. (Presumably the code checks the payload length when returning bytes to the caller, so this doesn't matter.)
Additionally, depending on how the code is structured you might also have to cope with misaligned data accesses for some CPUs. If you have an entire frame buffered, header and all, in a buffer that's 32-bit aligned, and if the payload length is <126 bytes or >65,535 bytes, then both the masking key and the payload will be misaligned.
For whatever it's worth, my server uses something like the first loop:
for(int i=0;i<n;++i)
payload[i]^=key[i&3];
Unlike the 32-bit option, this is basically impossible to get wrong.
Why is the output of the following program 84215045?
int grid[110];
int main()
{
memset(grid, 5, 100 * sizeof(int));
printf("%d", grid[0]);
return 0;
}
memset sets each byte of the destination buffer to the specified value. On your system, an int is four bytes, each of which is 5 after the call to memset. Thus, grid[0] has the value 0x05050505 (hexadecimal), which is 84215045 in decimal.
Some platforms provide alternative APIs to memset that write wider patterns to the destination buffer; for example, on OS X or iOS, you could use:
int pattern = 5;
memset_pattern4(grid, &pattern, sizeof grid);
to get the behavior that you seem to expect. What platform are you targeting?
In C++, you should just use std::fill_n:
std::fill_n(grid, 100, 5);
memset(grid, 5, 100 * sizeof(int));
You are setting 400 bytes, starting at (char*)grid and ending at (char*)grid + (100 * sizeof(int)), to the value 5 (the casts are necessary here because memset deals in bytes, whereas pointer arithmetic deals in objects.
84215045 in hex is 0x05050505; since int (on your platform/compiler/etc.) is represented by four bytes, when you print it, you get "four fives."
memset is about setting bytes, not values. One of the many ways to set array values in C++ is std::fill_n:
std::fill_n(grid, 100, 5);
Don't use memset.
You set each byte [] of the memory to the value of 5. Each int is 4 bytes long [5][5][5][5], which the compiler correctly interprets as 5*256*256*256 + 5*256*256 + 5*256 + 5 = 84215045. Instead, use a for loop, which also doesn't require sizeof(). In general, sizeof() means you're doing something the hard way.
for(int i=0; i<110; ++i)
grid[i] = 5;
Well, the memset writes bytes, with the selected value. Therefore an int will look something like this:
00000101 00000101 00000101 00000101
Which is then interpreted as 84215045.
You haven't actually said what you want your program to do.
Assuming that you want to set each of the first 100 elements of grid to 5 (and ignoring the 100 vs. 110 discrepancy), just do this:
for (int i = 0; i < 100; i ++) {
grid[i] = 5;
}
I understand that you're concerned about speed, but your concern is probably misplaced. On the one hand, memset() is likely to be optimized and therefore faster than a simple loop. On the other hand, the optimization is likely to consist of writing more than one byte at a time, which is what this loop does. On the other other hand, memset() is a loop anyway; writing the loop explicitly rather than burying it in a function call doesn't change that. On the other other other hand, even if the loop is slow, it's not likely to matter; concentrate on writing clear code, and think about optimizing it if actual measurements indicate that there's a significant performance issue.
You've spent many orders of magnitude more time writing the question than your computer will spend setting grid.
Finally, before I run out of hands (too late!), it doesn't matter how fast memset() is if it doesn't do what you want. (Not setting grid at all is even faster!)
If you type man memset on your shell, it tells you that
void * memset(void *b, int c, size_t len)
A plain English explanation of this would be, it fills a byte string b of length len with each byte a value c.
For your case,
memset(grid, 5, 100 * sizeof(int));
Since sizeof(int)==4, thus the above code pieces looked like:
for (int i=0; i<100; i++)
grid[i]=0x05050505;
OR
char *grid2 = (char*)grid;
for (int i=0; i<100*sizeof(int); i++)
grid2[i]=0x05;
It would print out 84215045
But in most C code, we want to initialize a piece of memory block to value zero.
char type --> \0 or NUL
int type --> 0
float type --> 0.0f
double type --> 0.0
pointer type --> nullptr
And either gcc or clang etc. modern compilers can take well care of this for you automatically.
// variadic length array (VLA) introduced in C99
int len = 20;
char carr[len];
int iarr[len];
float farr[len];
double darr[len];
memset(carr, 0, sizeof(char)*len);
memset(iarr, 0, sizeof(int)*len);
memset(farr, 0, sizeof(float)*len);
memset(darr, 0, sizeof(double)*len);
for (int i=0; i<len; i++)
{
printf("%2d: %c\n", i, carr[i]);
printf("%2d: %i\n", i, iarr[i]);
printf("%2d: %f\n", i, farr[i]);
printf("%2d: %lf\n", i, darr[i]);
}
But be aware, C ISO Committee does not imposed such definitions, it is compiler-specific.
Since the memset writes bytes,I usually use it to set an int array to zero like:
int a[100];
memset(a,0,sizeof(a));
or you can use it to set a char array,since a char is exactly a byte:
char a[100];
memset(a,'*',sizeof(a));
what's more,an int array can also be set to -1 by memset:
memset(a,-1,sizeof(a));
This is because -1 is 0xffffffff in int,and is 0xff in char(a byte).
This code has been tested. Here is a way to memset an "Integer" array to a value between 0 to 255.
MinColCost=new unsigned char[(Len+1) * sizeof(int)];
memset(MinColCost,0x5,(Len+1)*sizeof(int));
memset(MinColCost,0xff,(Len+1)*sizeof(int));
In C/C++, is there an easy way to apply bitwise operators (specifically left/right shifts) to dynamically allocated memory?
For example, let's say I did this:
unsigned char * bytes=new unsigned char[3];
bytes[0]=1;
bytes[1]=1;
bytes[2]=1;
I would like a way to do this:
bytes>>=2;
(then the 'bytes' would have the following values):
bytes[0]==0
bytes[1]==64
bytes[2]==64
Why the values should be that way:
After allocation, the bytes look like this:
[00000001][00000001][00000001]
But I'm looking to treat the bytes as one long string of bits, like this:
[000000010000000100000001]
A right shift by two would cause the bits to look like this:
[000000000100000001000000]
Which finally looks like this when separated back into the 3 bytes (thus the 0, 64, 64):
[00000000][01000000][01000000]
Any ideas? Should I maybe make a struct/class and overload the appropriate operators? Edit: If so, any tips on how to proceed? Note: I'm looking for a way to implement this myself (with some guidance) as a learning experience.
I'm going to assume you want bits carried from one byte to the next, as John Knoeller suggests.
The requirements here are insufficient. You need to specify the order of the bits relative to the order of the bytes - when the least significant bit falls out of one byte, does to go to the next higher or next lower byte.
What you are describing, though, used to be very common for graphics programming. You have basically described a monochrome bitmap horizontal scrolling algorithm.
Assuming that "right" means higher addresses but less significant bits (ie matching the normal writing conventions for both) a single-bit shift will be something like...
void scroll_right (unsigned char* p_Array, int p_Size)
{
unsigned char orig_l = 0;
unsigned char orig_r;
unsigned char* dest = p_Array;
while (p_Size > 0)
{
p_Size--;
orig_r = *p_Array++;
*dest++ = (orig_l << 7) + (orig_r >> 1);
orig_l = orig_r;
}
}
Adapting the code for variable shift sizes shouldn't be a big problem. There's obvious opportunities for optimisation (e.g. doing 2, 4 or 8 bytes at a time) but I'll leave that to you.
To shift left, though, you should use a separate loop which should start at the highest address and work downwards.
If you want to expand "on demand", note that the orig_l variable contains the last byte above. To check for an overflow, check if (orig_l << 7) is non-zero. If your bytes are in an std::vector, inserting at either end should be no problem.
EDIT I should have said - optimising to handle 2, 4 or 8 bytes at a time will create alignment issues. When reading 2-byte words from an unaligned char array, for instance, it's best to do the odd byte read first so that later word reads are all at even addresses up until the end of the loop.
On x86 this isn't necessary, but it is a lot faster. On some processors it's necessary. Just do a switch based on the base (address & 1), (address & 3) or (address & 7) to handle the first few bytes at the start, before the loop. You also need to special case the trailing bytes after the main loop.
Decouple the allocation from the accessor/mutators
Next, see if a standard container like bitset can do the job for you
Otherwise check out boost::dynamic_bitset
If all fails, roll your own class
Rough example:
typedef unsigned char byte;
byte extract(byte value, int startbit, int bitcount)
{
byte result;
result = (byte)(value << (startbit - 1));
result = (byte)(result >> (CHAR_BITS - bitcount));
return result;
}
byte *right_shift(byte *bytes, size_t nbytes, size_t n) {
byte rollover = 0;
for (int i = 0; i < nbytes; ++i) {
bytes[ i ] = (bytes[ i ] >> n) | (rollover < n);
byte rollover = extract(bytes[ i ], 0, n);
}
return &bytes[ 0 ];
}
Here's how I would do it for two bytes:
unsigned int rollover = byte[0] & 0x3;
byte[0] >>= 2;
byte[1] = byte[1] >> 2 | (rollover << 6);
From there, you can generalize this into a loop for n bytes. For flexibility, you will want to generate the magic numbers (0x3 and 6) rather then hardcode them.
I'd look into something similar to this:
#define number_of_bytes 3
template<size_t num_bytes>
union MyUnion
{
char bytes[num_bytes];
__int64 ints[num_bytes / sizeof(__int64) + 1];
};
void main()
{
MyUnion<number_of_bytes> mu;
mu.bytes[0] = 1;
mu.bytes[1] = 1;
mu.bytes[2] = 1;
mu.ints[0] >>= 2;
}
Just play with it. You'll get the idea I believe.
Operator overloading is syntactic sugar. It's really just a way of calling a function and passing your byte array without having it look like you are calling a function.
So I would start by writing this function
unsigned char * ShiftBytes(unsigned char * bytes, size_t count_of_bytes, int shift);
Then if you want to wrap this up in an operator overload in order to make it easier to use or because you just prefer that syntax, you can do that as well. Or you can just call the function.
I have a function I've written to convert from a 64-bit integer to a base 62 string. Originally, I achieved this like so:
char* charset = " 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
int charsetLength = strlen(charset);
std::string integerToKey(unsigned long long input)
{
unsigned long long num = input;
string key = "";
while(num)
{
key += charset[num % charsetLength];
num /= charsetLength;
}
return key;
}
However, this was too slow.
I improved the speed by providing an option to generate a lookup table. The table is about 624 strings in size, and is generated like so:
// Create the integer to key conversion lookup table
int lookupChars;
if(lookupDisabled)
lookupChars = 1;
else
largeLookup ? lookupChars = 4 : lookupChars = 2;
lookupSize = pow(charsetLength, lookupChars);
integerToKeyLookup = new char*[lookupSize];
for(unsigned long i = 0; i < lookupSize; i++)
{
unsigned long num = i;
int j = 0;
integerToKeyLookup[i] = new char[lookupChars];
while(num)
{
integerToKeyLookup[i][j] = charset[num % charsetLength];
num /= charsetLength;
j++;
}
// Null terminate the string
integerToKeyLookup[i][j] = '\0';
}
The actual conversion then looks like this:
std::string integerToKey(unsigned long long input)
{
unsigned long long num = input;
string key = "";
while(num)
{
key += integerToKeyLookup[num % lookupSize];
num /= lookupSize;
}
return key;
}
This improved speed by a large margin, but I still believe it can be improved. Memory usage on a 32-bit system is around 300 MB, and more than 400 MB on a 64-bit system. It seems like I should be able to reduce memory and/or improve speed, but I'm not sure how.
If anyone could help me figure out how this table could be further optimized, I'd greatly appreciate it.
Using some kind of string builder rather than repeated concatenation into 'key' would provide a significant speed boost.
You may want to reserve memory in advance for your string key. This may get you a decent performance gain, as well as a gain in memory utilization. Whenever you call the append operator on std::string, it may double the size of the internal buffer if it has to reallocate. This means each string may be taking up significantly more memory than is necessary to store the characters. You can avoid this by reserving memory for the string in advance.
I agree with Rob Walker - you're concentrating on improving performance in the wrong area. The string is the slowest part.
I timed the code (your original is broken, btw) and your original (when fixed) was 44982140 cycles for 100000 lookups and the following code is about 13113670.
const char* charset = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
#define CHARSET_LENGTH 62
// maximum size = 11 chars
void integerToKey(char result[13], unsigned long long input)
{
char* p = result;
while(input > 0)
{
*p++ = charset[input % CHARSET_LENGTH];
input /= CHARSET_LENGTH;
}
// null termination
*p = '\0';
// need to reverse the output
char* o = result;
while(o + 1 < p)
swap(*++o, *--p);
}
This is almost a textbook case of how not to do this. Concatenating strings in a loop is a bad idea, both because appending isn't particularly fast, and because you're constantly allocating memory.
Note: your question states that you're converting to base-62, but the code seems to have 63 symbols. Which are you trying to do?
Given a 64-bit integer, you can calculate that you won't need any more than 11 digits in the result, so using a static 12 character buffer will certainly help improve your speed. On the other hand, it's likely that your C++ library has a long-long equivalent to ultoa, which will be pretty optimal.
Edit: Here's something I whipped up. It allows you to specify any desired base as well:
std::string ullToString(unsigned long long v, int base = 64) {
assert(base < 65);
assert(base > 1);
static const char digits[]="0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ+/";
const int max_length=65;
static char buffer[max_length];
buffer[max_length-1]=0;
char *d = buffer + max_length-1;
do {
d--;
int remainder = v % base;
v /= base;
*d = digits[remainder];
} while(v>0);
return d;
}
This only creates one std::string object, and doesn't move memory around unnecessarily. It currently doesn't zero-pad the output, but it's trivial to change it to do that to however many digits of output you want.
You don't need to copy input into num, because you pass it by value. You can also compute the length of charset in compiletime, there's no need to compute it in runtime every single time you call the function.
But these are very minor performance issues. I think the the most significant help you can gain is by avoiding the string concatenation in the loop. When you construct the key string pass the string constructor the length of your result string so that there is only one allocation for the string. Then in the loop when you concatenate into the string you will not re-allocate.
You can make things even slightly more efficient if you take the target string as a reference parameter or even as two iterators like the standard algorithms do. But that is arguably a step too far.
By the way, what if the value passed in for input is zero? You won't even enter the loop; shouldn't key then be "0"?
I see the value passed in for input can't be negative, but just so we note: the C remainder operator isn't a modulo operator.
Why not just use a base64 library? Is really important that 63 equals '11' and not a longer string?
size_t base64_encode(char* outbuffer, size_t maxoutbuflen, const char* inbuffer, size_t inbuflen);
std::string integerToKey(unsigned long long input) {
char buffer[14];
size_t len = base64_encode(buffer, sizeof buffer, (const char*)&input, sizeof input);
return std::string(buffer, len);
}
Yes, every string will end with an equal size. If you don't want it to, strip off the equal sign. (Just remember to add it back if you need to decode the number.)
Of course, my real question is why are you turning a fixed width 8byte value and not using it directly as your "key" instead of the variable length string value?
Footnote: I'm well aware of the endian issues with this. He didn't say what the key will be used for and so I assume it isn't being used in network communications between machines of disparate endian-ness.
If you could add two more symbols so that it is converting to base-64, your modulus and division operations would turn into a bit mask and shift. Much faster than a division.
If all you need is a short string key, converting to base-64 numbers would speed up things a lot, since div/mod 64 is very cheap (shift/mask).