Right shift a binary string efficiently in C++

Right shift a binary string efficiently in C++ - c++

If I have a string that represents an integer in binary form such as
1101101
and I want to circularly right shift it to obtain
1110110
One way I could think of would be converting the string into an int and use (taken from wikipedia)
// https://stackoverflow.com/a/776550/3770260
template <typename INT>
#if __cplusplus > 201100L // Apply constexpr to C++ 11 to ease optimization
constexpr
#endif // See also https://stackoverflow.com/a/7269693/3770260
INT rol(INT val, size_t len) {
#if __cplusplus > 201100L && _wp_force_unsigned_rotate // Apply unsigned check C++ 11 to make sense
static_assert(std::is_unsigned<INT>::value,
"Rotate Left only makes sense for unsigned types");
#endif
return (val << len) | ((unsigned) val >> (-len & (sizeof(INT) * CHAR_BIT - 1)));
}
However, if the string consists of, say, 10^6 char then this does not work as the integer representation exceeds even the range of __int64.
In that case I could think of a solution by looping over the string
//let str be a char string of length n
char temp = str[n - 1];
for(int i = n - 1; i > 0; i--)
{
str[i] = str[i - 1];
}
str[0] = temp;
This solution runs in O(n) due the loop over the length of the string, n. My question is, is there much more efficient way to implement circular shifting for large binary strings?
EDIT
Both input and output are std::strings

You have to move memory one way or another, so your proposed solution is as fast as it gets.
You might also use standard std::string functions:
str.insert(str.begin(), str[n - 1]);
str.erase(str.end() - 1);
or memmove, or memcpy (I don't actually recommend this, it's for an argument)
char temp = str[n - 1];
memmove(str.data() + 1, str.data(), n - 1);
str[0] = temp;
Note that memmove may look faster, but it's essentially the same thing as your loop. It is moving bytes one by one, it's just encapsulated in a different function. This method might be faster for much larger data blocks, of size 1000 bytes or more, since the CPU is optimized to move large chunks of memory. But you won't be able to measure any difference for 10 or 20 bytes.
Moreover, the compiler will most likely run additional optimizations when it sees your for loop, it realizes that you are moving memory and chooses the best option.
The compiler is also good at dealing with std::string methods. These are common operations and the compiler knows the best way to handle it.

Related

Writing a program for a computer that uses Litttle or Big endian. And have the same result [duplicate]

This question already has answers here:
Detecting endianness programmatically in a C++ program
(29 answers)
Closed 2 years ago.
This question is about endian's.
Goal is to write 2 bytes in a file for a game on a computer. I want to make sure that people with different computers have the same result whether they use Little- or Big-Endian.
Which of these snippet do I use?
char a[2] = { 0x5c, 0x7B };
fout.write(a, 2);
or
int a = 0x7B5C;
fout.write((char*)&a, 2);
Thanks a bunch.

From wikipedia:
In its most common usage, endianness indicates the ordering of bytes within a multi-byte number.
So for char a[2] = { 0x5c, 0x7B };, a[1] will be always 0x7B
However, for int a = 0x7B5C;, char* oneByte = (char*)&a; (char *)oneByte[0]; may be 0x7B or 0x5C, but as you can see, you have to play with casts and byte pointers (bear in mind the undefined behaviour when char[1], it is only for explanation purposes).

One way that is used quite often is to write a 'signature' or 'magic' number as the first data in the file - typically a 16-bit integer whose value, when read back, will depend on whether or not the reading platform has the same endianness as the writing platform. If you then detect a mismatch, all data (of more than one byte) read from the file will need to be byte swapped.
Here's some outline code:
void ByteSwap(void *buffer, size_t length)
{
unsigned char *p = static_cast<unsigned char *>(buffer);
for (size_t i = 0; i < length / 2; ++i) {
unsigned char tmp = *(p + i);
*(p + i) = *(p + length - i - 1);
*(p + length - i - 1) = tmp;
}
return;
}
bool WriteData(void *data, size_t size, size_t num, FILE *file)
{
uint16_t magic = 0xAB12; // Something that can be tested for byte-reversal
if (fwrite(&magic, sizeof(uint16_t), 1, file) != 1) return false;
if (fwrite(data, size, num, file) != num) return false;
return true;
}
bool ReadData(void *data, size_t size, size_t num, FILE *file)
{
uint16_t test_magic;
bool is_reversed;
if (fread(&test_magic, sizeof(uint16_t), 1, file) != 1) return false;
if (test_magic == 0xAB12) is_reversed = false;
else if (test_magic == 0x12AB) is_reversed = true;
else return false; // Error - needs handling!
if (fread(data, size, num, file) != num) return false;
if (is_reversed && (size > 1)) {
for (size_t i = 0; i < num; ++i) ByteSwap(static_cast<char *>(data) + (i*size), size);
}
return true;
}
Of course, in the real world, you wouldn't need to write/read the 'magic' number for every input/output operation - just once per file, and store the is_reversed flag for future use when reading data back.
Also, with proper use of C++ code, you would probably be using std::stream arguments, rather than the FILE* I have shown - but the sample I have posted has been extracted (with only very little modification) from code that I actually use in my projects (to do just this test). But conversion to better use of modern C++ should be straightforward.
Feel free to ask for further clarification and/or explanation.
NOTE: The ByteSwap function I have provided is not ideal! It almost certainly breaks strict aliasing rules and may well cause undefined behaviour on some platforms, if used carelessly. Also, it is not the most efficient method for small data units (like int variables). One could (and should) provide one's own byte-reversal function(s) to handle specific types of variables - a good case for overloading the function with different argument types).

Which of these snippet do I use?
The first one. It has same output regardless of native endianness.
But you'll find that if you need to interpret those bytes as some integer value, that is not so straightforward. char a[2] = { 0x5c, 0x7B } can represent either 0x5c7B (big endian) or 0x7B5c (little endian). So, which one did you intend?
The solution for cross platform interpretation of integers is to decide on particular byte order for the reading and writing. De-facto "standard" for cross platform data is to use big endian.
To write a number in big endian, start by bit-shifting the input value right so that the most significant byte is in the place of the least significant byte. Mask all other bytes (technically redundant in first iteration, but we'll loop back soon). Write this byte to the output. Repeat this for all other bytes in order of significance.
This algorithm produces same output regardless of the native endianness - it will even work on exotic "middle" endian systems if you ever encounter one. Writing to little endian is similar, but in reverse order.
To read a big endian value, read the first byte of input, shift it left so that it goes to the place of most significant byte. Combine the shifted byte with the result (initially zero) using bitwise-or. Repeat with the next byte by shifting to the second most significant place and so on.
to know the Endianess of a computer?
To know endianness of a system, you can use std::endian in the upcoming C++20. Prior to that, you can use implementation specific macros from endian.h header. Or you can do a simple calculation like you suggest.
But you never really need to know the endianness of a system. You can simply use the algorithms that I described, which work on systems of all endianness without having to know what that endianness is.

C hack for storing a bit that takes 1 bit space?

I have a long list of numbers between 0 and 67600. Now I want to store them using an array that is 67600 elements long. An element is set to 1 if a number was in the set and it is set to 0 if the number is not in the set. ie. each time I need only 1bit information for storing the presence of a number. Is there any hack in C/C++ that helps me achieve this?

In C++ you can use std::vector<bool> if the size is dynamic (it's a special case of std::vector, see this) otherwise there is std::bitset (prefer std::bitset if possible.) There is also boost::dynamic_bitset if you need to set/change the size at runtime. You can find info on it here, it is pretty cool!
In C (and C++) you can manually implement this with bitwise operators. A good summary of common operations is here. One thing I want to mention is its a good idea to use unsigned integers when you are doing bit operations. << and >> are undefined when shifting negative integers. You will need to allocate arrays of some integral type like uint32_t. If you want to store N bits, it will take N/32 of these uint32_ts. Bit i is stored in the i % 32'th bit of the i / 32'th uint32_t. You may want to use a differently sized integral type depending on your architecture and other constraints. Note: prefer using an existing implementation (e.g. as described in the first paragraph for C++, search Google for C solutions) over rolling your own (unless you specifically want to, in which case I suggest learning more about binary/bit manipulation from elsewhere before tackling this.) This kind of thing has been done to death and there are "good" solutions.
There are a number of tricks that will maybe only consume one bit: e.g. arrays of bitfields (applicable in C as well), but whether less space gets used is up to compiler. See this link.
Please note that whatever you do, you will almost surely never be able to use exactly N bits to store N bits of information - your computer very likely can't allocate less than 8 bits: if you want 7 bits you'll have to waste 1 bit, and if you want 9 you will have to take 16 bits and waste 7 of them. Even if your computer (CPU + RAM etc.) could "operate" on single bits, if you're running in an OS with malloc/new it would not be sane for your allocator to track data to such a small precision due to overhead. That last qualification was pretty silly - you won't find an architecture in use that allows you to operate on less than 8 bits at a time I imagine :)

You should use std::bitset.
std::bitset functions like an array of bool (actually like std::array, since it copies by value), but only uses 1 bit of storage for each element.
Another option is vector<bool>, which I don't recommend because:
It uses slower pointer indirection and heap memory to enable resizing, which you don't need.
That type is often maligned by standards-purists because it claims to be a standard container, but fails to adhere to the definition of a standard container*.
*For example, a standard-conforming function could expect &container.front() to produce a pointer to the first element of any container type, which fails with std::vector<bool>. Perhaps a nitpick for your usage case, but still worth knowing about.

There is in fact! std::vector<bool> has a specialization for this: http://en.cppreference.com/w/cpp/container/vector_bool
See the doc, it stores it as efficiently as possible.
Edit: as somebody else said, std::bitset is also available: http://en.cppreference.com/w/cpp/utility/bitset

If you want to write it in C, have an array of char that is 67601 bits in length (67601/8 = 8451) and then turn on/off the appropriate bit for each value.

Others have given the right idea. Here's my own implementation of a bitsarr, or 'array' of bits. An unsigned char is one byte, so it's essentially an array of unsigned chars that stores information in individual bits. I added the option of storing TWO or FOUR bit values in addition to ONE bit values, because those both divide 8 (the size of a byte), and would be useful if you want to store a huge number of integers that will range from 0-3 or 0-15.
When setting and getting, the math is done in the functions, so you can just give it an index as if it were a normal array--it knows where to look.
Also, it's the user's responsibility to not pass a value to set that's too large, or it will screw up other values. It could be modified so that overflow loops back around to 0, but that would just make it more convoluted, so I decided to trust myself.
#include<stdio.h>
#include <stdlib.h>
#define BYTE 8
typedef enum {ONE=1, TWO=2, FOUR=4} numbits;
typedef struct bitsarr{
unsigned char* buckets;
numbits n;
} bitsarr;
bitsarr new_bitsarr(int size, numbits n)
{
int b = sizeof(unsigned char)*BYTE;
int numbuckets = (size*n + b - 1)/b;
bitsarr ret;
ret.buckets = malloc(sizeof(ret.buckets)*numbuckets);
ret.n = n;
return ret;
}
void bitsarr_delete(bitsarr xp)
{
free(xp.buckets);
}
void bitsarr_set(bitsarr *xp, int index, int value)
{
int buckdex, innerdex;
buckdex = index/(BYTE/xp->n);
innerdex = index%(BYTE/xp->n);
xp->buckets[buckdex] = (value << innerdex*xp->n) | ((~(((1 << xp->n) - 1) << innerdex*xp->n)) & xp->buckets[buckdex]);
//longer version
/*unsigned int width, width_in_place, zeros, old, newbits, new;
width = (1 << xp->n) - 1;
width_in_place = width << innerdex*xp->n;
zeros = ~width_in_place;
old = xp->buckets[buckdex];
old = old & zeros;
newbits = value << innerdex*xp->n;
new = newbits | old;
xp->buckets[buckdex] = new; */
}
int bitsarr_get(bitsarr *xp, int index)
{
int buckdex, innerdex;
buckdex = index/(BYTE/xp->n);
innerdex = index%(BYTE/xp->n);
return ((((1 << xp->n) - 1) << innerdex*xp->n) & (xp->buckets[buckdex])) >> innerdex*xp->n;
//longer version
/*unsigned int width = (1 << xp->n) - 1;
unsigned int width_in_place = width << innerdex*xp->n;
unsigned int val = xp->buckets[buckdex];
unsigned int retshifted = width_in_place & val;
unsigned int ret = retshifted >> innerdex*xp->n;
return ret; */
}
int main()
{
bitsarr x = new_bitsarr(100, FOUR);
for(int i = 0; i<16; i++)
bitsarr_set(&x, i, i);
for(int i = 0; i<16; i++)
printf("%d\n", bitsarr_get(&x, i));
for(int i = 0; i<16; i++)
bitsarr_set(&x, i, 15-i);
for(int i = 0; i<16; i++)
printf("%d\n", bitsarr_get(&x, i));
bitsarr_delete(x);
}

Replacing multiple chars at the same time

So in my code I have a series of chars which I want to replace with random data. Since rand can replace ints, I figured I could save some time by replacing four chars at once instead of one at a time. So basically instead of this:
unsigned char TXT[] = { data1,data2,data3,data4,data4,data5....
for (i = 34; i < flenght; i++) // generating the data to send.
TXT[i] = rand() % 255;
I'd like to do something like:
unsigned char TXT[] = { data1,data2,data3,data4,data4,data5....
for (i = 34; i < flenght; i+4) // generating the data to send.
TXT[i] = rand() % 4294967295;
Something that effect, but I'm not sure how to do the latter part. Any help you can give me is greatly appreciated, thanks!

That won't work. The compiler will take the result from rand() % big_number and chop off the extra data to fit it in an unsigned char.
Speed-wise, your initial approach was fine. The optimization you contemplated is valid, but most likely unneeded. It probably wouldn't make a noticeable difference.
What you wanted to do is possible, of course, but given your mistake, I'd say the effort to understand how right now far outweights the benefits. Keep learning, and the next time you run across code like this, you'll know what to do (and judge if it's necessary), look back on this moment and smile :).

You'll have to access memory directly, and do some transformations on your data. You probably want something like this:
unsigned char TXT[] = { data1,data2,data3,data4,data4,data5....
for (i = 34; i < flenght/sizeof(int); i+=sizeof(int)) // generating the data to send.
{
int *temp = (int*)&TXT[i]; // very ugly
*temp = rand() % 4294967295;
}
It can be problematic though because of alignment issues, so be careful. Alignment issues can cause your program to crash unexpectedly, and are hard to debug. I wouldn't do this if I were you, your initial code is just fine.

TXT[i] = rand() % 4294967295;
Will not work the way you expect it to. Perhaps you are expecting that rand()%4294967295 will generate a 4 byte integer(which you maybe interpreting as 4 different characters). The value that rand()%4294967295, produces will be type cast into a single char and will get assigned to only one of the index of TXT[i].
Though it's not quire clear as to why you need to make 4 assigning at the same time, one approach would be to use bit operators to obtain 4 different significant bytes of the number generated and those can then be assigned to the four different index.

There are valid answers just so much C does not care very much about what type it stores at which address. So you can get away with something like:
#include <stdio.h>
#include <stdlib.h>
#include <limits.h>
char *arr;
int *iArr;
int main (void){
int i;
arr = malloc(100);
/* Error handling ommitted, yes that's evil */
iArr = (int*) arr;
for (i = 0; i < 25; i++) {
iArr[i] = rand() % INT_MAX;
}
for (i = 0; i < 25; i++) {
printf("iArr[%d] = %d\n", i, iArr[i]);
}
for (i = 0; i < 100; i++) {
printf("arr[%d] = %c\n", i, arr[i]);
}
free(arr);
return 0;
}
In the end an array is just some contiguous block in memory. And you can interpret it as you like (if you want). If you know that sizeof(int) = 4 * sizeof(char) then the above code will work.
I do not say I recommend it. And the others have pointed out whatever happened the first loop through all the chars in TXT will yield the same result. One could think for example of unrolling a loop but really I'd not care about that.
The (int*) just alone is warning enough. It means to the compiler, do not think about what you think the type is just "believe" he programmer that he knows better.
Well this "know better" is probably the root of all evil in C programming....

unsigned char TXT[] = { data1,data2,data3,data4,data4,data5....
for (i = 34; i < flenght; i+4)
// generating the data to send.
TXT[i] = rand() % 4294967295;
This has a few issues:
TXT is not guaranteed to be memory-aligned as needed for the CPU to write int data (whether it works - perhaps relatively slowly - or not - e.g. SIGBUS on Solaris - is hardware specific)
the last 1-3 characters may be missed (even if you change i + 4 to i += 4 ;-P)
rand() returns an int anyway - you don't need to mod it with anything
you need to write your random data via an int* so you're accessing 4 bytes at a time and not simply slicing a byte off the end of the random data and overwriting every fourth single character
for stuff like this where you're dependent on the size of int, you should really write it in terms of sizeof(int) so it'll work even if int isn't 32 bits, or use a (currently sadly) non-Standard but common typedef such as int32_t (or on Windows I think it's __int32, or you can use a boost or other library header to get int32_t, or write your own typedef).
It's actually pretty tricky to align your text data: your code suggests you want int-sized slices from the 35th character... even if the overall character array is aligned properly for ints, the 35th character will not be.
If it really is always the 35th, then you can pad the data with a leading character so you're accessing the 36th (being a multiple of presumably 32-bit int size), then align the text to an 32-bit address (with a compiler-specific #pragma or using a union with int32_t). If the real code varies the character you start overwriting from, such that you can't simply align the data once, then you're stuck with:
your original character-at-a-time overwrites
non-portable unaligned overwrites (if that's possible and better on your system), OR
implementing code that overwrites up to three leading unaligned characters, then switches to 32-bit integer overwrite mode for aligned addresses, then back to character-by-character overwrites for up to three trailing characters.

That does not work because the generated value is converted to type of array element - char in this particular case. But you are free to interpret allocated memory in the manner you like. For example, you could convert it into array int:
unsigned char TXT[] = { data1,data2,data3,data4,data4,data5....
for (i = 34; i < flenght-sizeof(int); i+=sizeof(int)) // generating the data to send.
*(int*)(TXT+i) = rand(); // There is no need in modulo operator
for (; i < flenght; ++i) // generating the data to send.
TXT[i] = rand(); // There is no need in modulo operator either
I just want to complete solution with the remarks about modulo operator and handling of arrays not multiple of sizeof(int).

1) % means "the remainder when divided by", so you want rand() % 256 for a char, or else you will never get chars with a value of 255. Similarly for the int case, although here there is no point in doing a modulus operation anyway, since you want the entire range of output values.
2) rand usually only generates two bytes at a time; check the value of RAND_MAX.
3) 34 isn't divisible by 4 anyway, so you will have to handle the end case specially.
4) You will want to cast the pointer, and it won't work if it isn't already aligned. Once you have the cast, though, there is no need to account for the sizeof(int) in your iteration: pointer arithmetic automatically takes care of the element size.
5) Chances are very good that it won't make a noticeable difference. If scribbling random data into an array is really the bottleneck in your program, then it isn't really doing anything significiant anyway.

C/C++: Bitwise operators on dynamically allocated memory

In C/C++, is there an easy way to apply bitwise operators (specifically left/right shifts) to dynamically allocated memory?
For example, let's say I did this:
unsigned char * bytes=new unsigned char[3];
bytes[0]=1;
bytes[1]=1;
bytes[2]=1;
I would like a way to do this:
bytes>>=2;
(then the 'bytes' would have the following values):
bytes[0]==0
bytes[1]==64
bytes[2]==64
Why the values should be that way:
After allocation, the bytes look like this:
[00000001][00000001][00000001]
But I'm looking to treat the bytes as one long string of bits, like this:
[000000010000000100000001]
A right shift by two would cause the bits to look like this:
[000000000100000001000000]
Which finally looks like this when separated back into the 3 bytes (thus the 0, 64, 64):
[00000000][01000000][01000000]
Any ideas? Should I maybe make a struct/class and overload the appropriate operators? Edit: If so, any tips on how to proceed? Note: I'm looking for a way to implement this myself (with some guidance) as a learning experience.

I'm going to assume you want bits carried from one byte to the next, as John Knoeller suggests.
The requirements here are insufficient. You need to specify the order of the bits relative to the order of the bytes - when the least significant bit falls out of one byte, does to go to the next higher or next lower byte.
What you are describing, though, used to be very common for graphics programming. You have basically described a monochrome bitmap horizontal scrolling algorithm.
Assuming that "right" means higher addresses but less significant bits (ie matching the normal writing conventions for both) a single-bit shift will be something like...
void scroll_right (unsigned char* p_Array, int p_Size)
{
unsigned char orig_l = 0;
unsigned char orig_r;
unsigned char* dest = p_Array;
while (p_Size > 0)
{
p_Size--;
orig_r = *p_Array++;
*dest++ = (orig_l << 7) + (orig_r >> 1);
orig_l = orig_r;
}
}
Adapting the code for variable shift sizes shouldn't be a big problem. There's obvious opportunities for optimisation (e.g. doing 2, 4 or 8 bytes at a time) but I'll leave that to you.
To shift left, though, you should use a separate loop which should start at the highest address and work downwards.
If you want to expand "on demand", note that the orig_l variable contains the last byte above. To check for an overflow, check if (orig_l << 7) is non-zero. If your bytes are in an std::vector, inserting at either end should be no problem.
EDIT I should have said - optimising to handle 2, 4 or 8 bytes at a time will create alignment issues. When reading 2-byte words from an unaligned char array, for instance, it's best to do the odd byte read first so that later word reads are all at even addresses up until the end of the loop.
On x86 this isn't necessary, but it is a lot faster. On some processors it's necessary. Just do a switch based on the base (address & 1), (address & 3) or (address & 7) to handle the first few bytes at the start, before the loop. You also need to special case the trailing bytes after the main loop.

Decouple the allocation from the accessor/mutators
Next, see if a standard container like bitset can do the job for you
Otherwise check out boost::dynamic_bitset
If all fails, roll your own class
Rough example:
typedef unsigned char byte;
byte extract(byte value, int startbit, int bitcount)
{
byte result;
result = (byte)(value << (startbit - 1));
result = (byte)(result >> (CHAR_BITS - bitcount));
return result;
}
byte *right_shift(byte *bytes, size_t nbytes, size_t n) {
byte rollover = 0;
for (int i = 0; i < nbytes; ++i) {
bytes[ i ] = (bytes[ i ] >> n) | (rollover < n);
byte rollover = extract(bytes[ i ], 0, n);
}
return &bytes[ 0 ];
}

Here's how I would do it for two bytes:
unsigned int rollover = byte[0] & 0x3;
byte[0] >>= 2;
byte[1] = byte[1] >> 2 | (rollover << 6);
From there, you can generalize this into a loop for n bytes. For flexibility, you will want to generate the magic numbers (0x3 and 6) rather then hardcode them.

I'd look into something similar to this:
#define number_of_bytes 3
template<size_t num_bytes>
union MyUnion
{
char bytes[num_bytes];
__int64 ints[num_bytes / sizeof(__int64) + 1];
};
void main()
{
MyUnion<number_of_bytes> mu;
mu.bytes[0] = 1;
mu.bytes[1] = 1;
mu.bytes[2] = 1;
mu.ints[0] >>= 2;
}
Just play with it. You'll get the idea I believe.

Operator overloading is syntactic sugar. It's really just a way of calling a function and passing your byte array without having it look like you are calling a function.
So I would start by writing this function
unsigned char * ShiftBytes(unsigned char * bytes, size_t count_of_bytes, int shift);
Then if you want to wrap this up in an operator overload in order to make it easier to use or because you just prefer that syntax, you can do that as well. Or you can just call the function.

Is there a way to improve the speed or efficiency of this lookup? (C/C++)

I have a function I've written to convert from a 64-bit integer to a base 62 string. Originally, I achieved this like so:
char* charset = " 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
int charsetLength = strlen(charset);
std::string integerToKey(unsigned long long input)
{
unsigned long long num = input;
string key = "";
while(num)
{
key += charset[num % charsetLength];
num /= charsetLength;
}
return key;
}
However, this was too slow.
I improved the speed by providing an option to generate a lookup table. The table is about 624 strings in size, and is generated like so:
// Create the integer to key conversion lookup table
int lookupChars;
if(lookupDisabled)
lookupChars = 1;
else
largeLookup ? lookupChars = 4 : lookupChars = 2;
lookupSize = pow(charsetLength, lookupChars);
integerToKeyLookup = new char*[lookupSize];
for(unsigned long i = 0; i < lookupSize; i++)
{
unsigned long num = i;
int j = 0;
integerToKeyLookup[i] = new char[lookupChars];
while(num)
{
integerToKeyLookup[i][j] = charset[num % charsetLength];
num /= charsetLength;
j++;
}
// Null terminate the string
integerToKeyLookup[i][j] = '\0';
}
The actual conversion then looks like this:
std::string integerToKey(unsigned long long input)
{
unsigned long long num = input;
string key = "";
while(num)
{
key += integerToKeyLookup[num % lookupSize];
num /= lookupSize;
}
return key;
}
This improved speed by a large margin, but I still believe it can be improved. Memory usage on a 32-bit system is around 300 MB, and more than 400 MB on a 64-bit system. It seems like I should be able to reduce memory and/or improve speed, but I'm not sure how.
If anyone could help me figure out how this table could be further optimized, I'd greatly appreciate it.

Using some kind of string builder rather than repeated concatenation into 'key' would provide a significant speed boost.

You may want to reserve memory in advance for your string key. This may get you a decent performance gain, as well as a gain in memory utilization. Whenever you call the append operator on std::string, it may double the size of the internal buffer if it has to reallocate. This means each string may be taking up significantly more memory than is necessary to store the characters. You can avoid this by reserving memory for the string in advance.

I agree with Rob Walker - you're concentrating on improving performance in the wrong area. The string is the slowest part.
I timed the code (your original is broken, btw) and your original (when fixed) was 44982140 cycles for 100000 lookups and the following code is about 13113670.
const char* charset = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
#define CHARSET_LENGTH 62
// maximum size = 11 chars
void integerToKey(char result[13], unsigned long long input)
{
char* p = result;
while(input > 0)
{
*p++ = charset[input % CHARSET_LENGTH];
input /= CHARSET_LENGTH;
}
// null termination
*p = '\0';
// need to reverse the output
char* o = result;
while(o + 1 < p)
swap(*++o, *--p);
}

This is almost a textbook case of how not to do this. Concatenating strings in a loop is a bad idea, both because appending isn't particularly fast, and because you're constantly allocating memory.
Note: your question states that you're converting to base-62, but the code seems to have 63 symbols. Which are you trying to do?
Given a 64-bit integer, you can calculate that you won't need any more than 11 digits in the result, so using a static 12 character buffer will certainly help improve your speed. On the other hand, it's likely that your C++ library has a long-long equivalent to ultoa, which will be pretty optimal.
Edit: Here's something I whipped up. It allows you to specify any desired base as well:
std::string ullToString(unsigned long long v, int base = 64) {
assert(base < 65);
assert(base > 1);
static const char digits[]="0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ+/";
const int max_length=65;
static char buffer[max_length];
buffer[max_length-1]=0;
char *d = buffer + max_length-1;
do {
d--;
int remainder = v % base;
v /= base;
*d = digits[remainder];
} while(v>0);
return d;
}
This only creates one std::string object, and doesn't move memory around unnecessarily. It currently doesn't zero-pad the output, but it's trivial to change it to do that to however many digits of output you want.

You don't need to copy input into num, because you pass it by value. You can also compute the length of charset in compiletime, there's no need to compute it in runtime every single time you call the function.
But these are very minor performance issues. I think the the most significant help you can gain is by avoiding the string concatenation in the loop. When you construct the key string pass the string constructor the length of your result string so that there is only one allocation for the string. Then in the loop when you concatenate into the string you will not re-allocate.
You can make things even slightly more efficient if you take the target string as a reference parameter or even as two iterators like the standard algorithms do. But that is arguably a step too far.
By the way, what if the value passed in for input is zero? You won't even enter the loop; shouldn't key then be "0"?
I see the value passed in for input can't be negative, but just so we note: the C remainder operator isn't a modulo operator.

Why not just use a base64 library? Is really important that 63 equals '11' and not a longer string?
size_t base64_encode(char* outbuffer, size_t maxoutbuflen, const char* inbuffer, size_t inbuflen);
std::string integerToKey(unsigned long long input) {
char buffer[14];
size_t len = base64_encode(buffer, sizeof buffer, (const char*)&input, sizeof input);
return std::string(buffer, len);
}
Yes, every string will end with an equal size. If you don't want it to, strip off the equal sign. (Just remember to add it back if you need to decode the number.)
Of course, my real question is why are you turning a fixed width 8byte value and not using it directly as your "key" instead of the variable length string value?
Footnote: I'm well aware of the endian issues with this. He didn't say what the key will be used for and so I assume it isn't being used in network communications between machines of disparate endian-ness.

If you could add two more symbols so that it is converting to base-64, your modulus and division operations would turn into a bit mask and shift. Much faster than a division.

If all you need is a short string key, converting to base-64 numbers would speed up things a lot, since div/mod 64 is very cheap (shift/mask).

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js