Fast code for searching bit-array for contiguous set/clear bits?

Fast code for searching bit-array for contiguous set/clear bits? - c++

Is there some reasonably fast code out there which can help me quickly search a large bitmap (a few megabytes) for runs of contiguous zero or one bits?
By "reasonably fast" I mean something that can take advantage of the machine word size and compare entire words at once, instead of doing bit-by-bit analysis which is horrifically slow (such as one does with vector<bool>).
It's very useful for e.g. searching the bitmap of a volume for free space (for defragmentation, etc.).

Windows has an RTL_BITMAP data structure one can use along with its APIs.
But I needed the code for this sometime ago, and so I wrote it here (warning, it's a little ugly):
https://gist.github.com/3206128
I have only partially tested it, so it might still have bugs (especially on reverse). But a recent version (only slightly different from this one) seemed to be usable for me, so it's worth a try.
The fundamental operation for the entire thing is being able to -- quickly -- find the length of a run of bits:
long long GetRunLength(
const void *const pBitmap, unsigned long long nBitmapBits,
long long startInclusive, long long endExclusive,
const bool reverse, /*out*/ bool *pBit);
Everything else should be easy to build upon this, given its versatility.
I tried to include some SSE code, but it didn't noticeably improve the performance. However, in general, the code is many times faster than doing bit-by-bit analysis, so I think it might be useful.
It should be easy to test if you can get a hold of vector<bool>'s buffer somehow -- and if you're on Visual C++, then there's a function I included which does that for you. If you find bugs, feel free to let me know.

I can't figure how to do well directly on memory words, so I've made up a quick solution which is working on bytes; for convenience, let's sketch the algorithm for counting contiguous ones:
Construct two tables of size 256 where you will write for each number between 0 and 255, the number of trailing 1's at the beginning and at the end of the byte. For example, for the number 167 (10100111 in binary), put 1 in the first table and 3 in the second table. Let's call the first table BBeg and the second table BEnd. Then, for each byte b, two cases: if it is 255, add 8 to your current sum of your current contiguous set of ones, and you are in a region of ones. Else, you end a region with BBeg[b] bits and begin a new one with BEnd[b] bits.
Depending on what information you want, you can adapt this algorithm (this is a reason why I don't put here any code, I don't know what output you want).
A flaw is that it does not count (small) contiguous set of ones inside one byte ...
Beside this algorithm, a friend tells me that if it is for disk compression, just look for bytes different from 0 (empty disk area) and 255 (full disk area). It is a quick heuristic to build a map of what blocks you have to compress. Maybe it is beyond the scope of this topic ...

Sounds like this might be useful:
http://www.aggregate.org/MAGIC/#Population%20Count%20%28Ones%20Count%29
and
http://www.aggregate.org/MAGIC/#Leading%20Zero%20Count
You don't say if you wanted to do some sort of RLE or to simply count in-bytes zeros and one bits (like 0b1001 should return 1x1 2x0 1x1).
A look up table plus SWAR algorithm for fast check might gives you that information easily.
A bit like this:
byte lut[0x10000] = { /* see below */ };
for (uint * word = words; word < words + bitmapSize; word++) {
if (word == 0 || word == (uint)-1) // Fast bailout
{
// Do what you want if all 0 or all 1
}
byte hiVal = lut[*word >> 16], loVal = lut[*word & 0xFFFF];
// Do what you want with hiVal and loVal
The LUT will have to be constructed depending on your intended algorithm. If you want to count the number of contiguous 0 and 1 in the word, you'll built it like this:
for (int i = 0; i < sizeof(lut); i++)
lut[i] = countContiguousZero(i); // Or countContiguousOne(i)
// The implementation of countContiguousZero can be slow, you don't care
// The result of the function should return the largest number of contiguous zero (0 to 15, using the 4 low bits of the byte, and might return the position of the run in the 4 high bits of the byte
// Since you've already dismissed word = 0, you don't need the 16 contiguous zero case.

Related

Using part of a variable as bool

Let's say memory is precious, and I have a class with a uint32_t member variable ui and I know that the values will stay below 1 million. The class also hase some bool members.
Does it make sense to use the highest (highest 2,3,..) bit(s) of ui in order to save memory, since bool is 1 byte?
If it does make sense, what is the most efficient way to get the highest (leftmost?) bit (or 2nd)? I read a few old threads and there seems to be disagreement about using inline ASM or some sort of shift.

It's a bit dangerous to use part of the bits as bool. The thing is that the way the numbers are kept in binary, makes it harder to maintain that keeping mechanism correct.
Negative numbers are kept as a complement of positive. Check this for more explanation. You may assign number to be 10 and then setting bool bit from false to true, and the number may turn out to become huge negative number as a result.
As for getting if n-th bit is 0 or 1 you can use this, where 0-th bit is the right most:
int nth_bit(int a, int n){
return a & (1 << n);
}
It will return 0 or 1 identifying the n-th bit.

Well, if the memory is in fact precious, you should look deeper.
1,000,000 uses only 20 bits. This is less that 3 bytes. So you can allocate 3 bytes to keep your value and up to four booleans. Obviously, access will be a bit more complicated, but you save 25% of memory!
If you know that the values are below 524,287, for example, you can save another 15% by packing it (with bool) into 20 bits :)
Also, keeping bool in a separate array (as you said in a comment) would kill performance if you need to access the value and a corresponding bool simultaneously because they are far apart and will likely never be in a cache.

Keeping track of boolean data

I need to keep track of n samples. The information I am keeping track of is of boolean type, i.e. something is true or false. As soon as I am on sample n+1, i basically want to ignore the oldest sample and record information about the newest one.
So say I keep track of samples, I may have something like
OLDEST 0 0 1 1 0 NEWEST
If the next sample is 1, this will become
OLDEST 0 1 1 0 1 NEWEST
if the next one is 0, this will become...
OLDEST 1 1 0 1 0 NEWEST
So what is the best way to implement this in terms of simplicity and memory?
Some ideas I had:
Vector of bool (this would require shifting elements so seems expensive)
Storing it as bits...and using bit shifting (memorywise --cheap? but is there a limit on the number of samples?)
Linked lists? (might be an overkill for the task)
Thanks for the ideas and suggestions :)

You want a set of bits. Maybe you can look into a std::bitset
http://www.sgi.com/tech/stl/bitset.html
Very straightfoward to use, optimal memory consumption and probably the best performance
The only limitation is that you need to know at compile-time the value of n. If you want to set it on runtime, have a look at boost http://www.boost.org/doc/libs/1_36_0/libs/dynamic_bitset/dynamic_bitset.html

Sounds like a perfect use of a ring buffer. Unfortunately there isn't one in the standard library, but you could use boost.
Alternately roll your own using a fixed-length std::list and splice the head node to the tail when you need to overwrite an old element.

It really depends on how many samples you want to keep.
vector<bool> could be a valid option; I would expect an
erase() on the first element to be reasonably efficient.
Otherwise, there's deque<bool>. If you know how many elements
you want to keep at compile time, bitset<N> is probably better
than either.
In any case, you'll have to wrap the standard container in some
additional logic; none have the actual logic you need (that of
a ring buffer).

If you only need 8 bits... then use a char and do logical shifts "<<, >>" and do a mask to look at the one you need.
16 Bits - short
32 Bits - int
64 Bits - long
etc...
Example:
Oldest 00110010 Newest -> Oldest 1001100101 Newest
Done by:
char c = 0x32; // 50 decimal or 00110010 in binary
c<<1; // Logical shift left once.
c++; // Add one, sense LSB is the newest.
//Now look at the 3rd newest bit
print("The 3rd newest bit is: %d\n", (c & 0x4));
Simple and EXTREMELY cheap on resources. Will be VERY VERY high performance.

From your question, it's not clear what you intend to do with the samples. If all you care about is storing the N most recent samples, you could try the following. I'll do it for "chars" and let you figure out how to optimize for "bool" should you need that.
char buffer[N];
int samples = 0;
void record_sample( char value )
{
buffer[samples%N] = value;
samples = samples + 1;
}
Once you've stored N samples (once you've called record_sample N times) you can read the oldest and newest samples like so:
char oldest_sample()
{
return buffer[samples%N];
}
char newest_sample()
{
return buffer[(samples+N-1)%N];
}
Things get a little trickier if you intend to read the oldest sample before you've already stored N samples - but not that much trickier. For that, you want a "ring buffer" which you can find in boost and on wikipedia.

how to efficiently access 3^20 vectors in a 2^30 bits of memory

I want to store a 20-dimensional array where each coordinate can have 3 values,
in a minimal amount of memory (2^30 or 1 Gigabyte).
It is not a sparse array, I really need every value.
Furthermore I want the values to be integers of arbirary but fixed precision,
say 256 bits or 8 words
example;
set_big_array(1,0,0,0,1,2,2,0,0,2,1,1,2,0,0,0,1,1,1,2, some_256_bit_value);
and
get_big_array(1,0,0,0,1,2,2,0,0,2,1,1,2,0,0,0,1,1,1,2, &some_256_bit_value);
Because the value 3 is relative prime of 2. its difficult to implement this using
efficient bitwise shift, and and or operators.
I want this to be as fast as possible.
any thoughts?

Seems tricky to me without some compression:
3^20 = 3486784401 values to store
256bits / 8bitsPerByte = 32 bytes per value
3486784401 * 32 = 111577100832 size for values in bytes
111577100832 / (1024^3) = 104 Gb
You're trying to fit 104 Gb in 1 Gb. There'd need to be some pattern to the data that could be used to compress it.
Sorry, I know this isn't much help, but maybe you can rethink your strategy.

There are 3.48e9 variants of 20-tuple of indexes that are 0,1,2. If you wish to store a 256 bit value at each index, that means you're talking about 8.92e11 bits - about a terabit, or about 100GB.
I'm not sure what you're trying to do, but that sounds computationally expensive. It may be reasonable feasible as a memory-mapped file, and may be reasonably fast as a memory-mapped file on an SSD.
What are you trying to do?
So, a practical solution would be to use a 64-bit OS and a large memory-mapped file (preferably on an SSD) and simply compute the address for a given element in the typical way for arrays, i.e. as sum-of(forall-i(i-th-index * 3^i)) * 32 bytes in pseudeo-math. Or, use a very very expensive machine with that much memory, or another algorithm that doesn't require this array in the first place.
A few notes on platforms: Windows 7 supports just 192GB of memory, so using physical memory for a structure like this is possible but really pushing it (more expensive editions support more). If you can find a machine at all that is. According to microsoft's page on the matter the user-mode virtual address space is 7-8TB, so mmap/virtual memory should be doable. Alex Ionescu explains why there's such a low limit on virtual memory despite an apparently 64-bit architecture. Wikipedia puts linux's addressable limits at 128TB, though probably that's before the kernel/usermode split.
Assuming you want to address such a multidimensional array, you must process each index at least once: that means any algorithm will be O(N) where N is the number of indexes. As mentioned before, you don't need to convert to base-2 addressing or anything else, the only thing that matters is that you can compute the integer offset - and which base the maths happens in is irrelevant. You should use the most compact representation possible and ignore the fact that each dimension is not a multiple of 2.
So, for a 16-dimensional array, that address computation function could be:
int offset = 0;
for(int ii=0;ii<16;ii++)
offset = offset*3 + indexes[ii];
return &the_array[offset];
As previously said, this is just the common array indexing formula, nothing special about it. Note that even for "just" 16 dimensions, if each item is 32 bytes, you're dealing with a little more than a gigabyte of data.

Maybe i understand your question wrong. But can't you just use a normal array?
INT256 bigArray[3][3][3][3][3][3][3][3][3][3][3][3][3][3][3][3][3][3][3][3];
OR
INT256 ********************bigArray = malloc(3^20 * 8);
bigArray[1][0][0][1][2][0][1][1][0][0][0][0][1][1][2][1][1][1][1][1] = some_256_bit_value;
etc.
Edit:
Will not work because you would need 3^20 * 8Byte = ca. 25GByte.
The malloc variant is wrong.

I'll start by doing a direct calculation of the address, then see if I can optimize it
address = 0;
for(i=15; i>=0; i--)
{
address = 3*address + array[i];
}
address = address * number_of_bytes_needed_for_array_value

2^30 bits is 2^27 bytes so not actually a gigabyte, it's an eighth of a gigabyte.
It appears impossible to do because of the mathematics although of course you can create the data size bigger then compress it, which may get you down to the required size although it cannot guarantee. (It must fail to some of the time as the compression is lossless).
If you do not require immediate "random" access your solution may be a "variable sized" two-bit word so your most commonly stored value takes only 1 bit and the other two take 2 bits.
If 0 is your most common value then:
0 = 0
10 = 1
11 = 2
or something like that.
In that case you will be able to store your bits in sequence this way.
It could take up to 2^40 bits this way but probably will not.
You could pre-run through your data and see which is the commonly occurring value and use that to indicate your single-bit word.
You can also compress your data after you have serialized it in up to 2^40 bits.
My assumption here is that you will be using disk possibly with memory mapping as you are unlikely to have that much memory available.
My assumption is that space is everything and not time.

You might want to take a look at something like STXXL, an implementation of the STL designed for handling very large volumes of data

You can actually use a pointer-to-array20 to have your compiler implement the index calculations for you:
/* Note: there are 19 of the [3]'s below */
my_256bit_type (*foo)[3][3][3][3][3][3][3][3][3][3][3][3][3][3][3][3][3][3][3];
foo = allocate_giant_array();
foo[0][1][1][0][2][1][2][2][0][2][1][0][2][1][0][0][2][1][0][0] = some_256bit_value;

C++, using one byte to store two variables

I am working on representation of the chess board, and I am planning to store it in 32 bytes array, where each byte will be used to store two pieces. (That way only 4 bits are needed per piece)
Doing it in that way, results in a overhead for accessing particular index of the board.
Do you think that, this code can be optimised or completely different method of accessing indexes can be used?
c++
char getPosition(unsigned char* c, int index){
//moving pointer
c+=(index>>1);
//odd number
if (index & 1){
//taking right part
return *c & 0xF;
}else
{
//taking left part
return *c>>4;
}
}
void setValue(unsigned char* board, char value, int index){
//moving pointer
board+=(index>>1);
//odd number
if (index & 1){
//replace right part
//save left value only 4 bits
*board = (*board & 0xF0) + value;
}else
{
//replacing left part
*board = (*board & 0xF) + (value<<4);
}
}
int main() {
char* c = (char*)malloc(32);
for (int i = 0; i < 64 ; i++){
setValue((unsigned char*)c, i % 8,i);
}
for (int i = 0; i < 64 ; i++){
cout<<(int)getPosition((unsigned char*)c, i)<<" ";
if (((i+1) % 8 == 0) && (i > 0)){
cout<<endl;
}
}
return 0;
}
I am equally interested in your opinions regarding chess representations, and optimisation of the method above, as a stand alone problem.
Thanks a lot
EDIT
Thanks for your replies. A while ago I created checkers game, where I was using 64 bytes board representation. This time I am trying some different methods, just to see what I like. Memory is not such a big problem. Bit-boards is definitely on my list to try. Thanks

That's the problem with premature optimization. Where your chess board would have taken 64 bytes to store, now it takes 32. What has this really boughten you? Did you actually analyze the situation to see if you needed to save that memory?
Assuming that you used one of the least optimal search method, straight AB search to depth D with no heuristics, and you generate all possible moves in a position before searching, then absolute maximum memory required for your board is going to be sizeof(board) * W * D. If we assume a rather large W = 100 and large D = 30 then you're going to have 3000 boards in memory at depth D. 64k vs 32k...is it really worth it?
On the other hand, you've increased the amount of operations necessary to access board[location] and this will be called many millions of times per search.
When building chess AI's the main thing you'll end up looking for is cpu cycles, not memory. This may vary a little bit if you're targeting a cell phone or something, but even at that you're going to worry more about speed before you'll ever reach enough depth to cause any memory issues.
As to which representation I prefer...I like bitboards. Haven't done a lot of serious measurements but I did compare two engines I made, one bitboard and one array, and the bitboard one was faster and could reach much greater depths than the other.

Let me be the first to point out a potential bug (depending on compilers and compiler settings). And bugs being why premature optimization is evil:
//taking left part
return *c>>4;
if *c is negative, then >> may repeat the negative high bit. ie in binary:
0b10100000 >> 4 == 0b11111010
for some compilers (ie the C++ standard leaves it to the compiler to decide - both whether to carry the high bit, and whether a char is signed or unsigned).
If you do want to go forward with your packed bits (and let me say that you probably shouldn't bother, but it is up to you), I would suggest wrapping the packed bits into a class, and overriding [] such that
board[x][y]
gives you the unpacked bits. Then you can turn the packing on and off easily, and having the same syntax in either case. If you inline the operator overloads, it should be as efficient as the code you have now.

Well, 64 bytes is a very small amount of RAM. You're better off just using a char[8][8]. That is, unless you plan on storing a ton of chess boards. Doing char[8][8] makes it easier (and faster) to access the board and do more complex operations on it.
If you're still interested in storing the board in packed representation (either for practice or to store a lot of boards), I say you're "doing it right" regarding the bit operations. You may want to consider inlining your accessors if you're going for speed using the inline keyword.

Is space enough of a consideration where you can't just use a full byte to represent a square? That would make accesses easier to follow on the program and additionally most likely faster as the bit manipulations are not required.
Otherwise to make sure everything goes smoothly I would make sure all your types are unsigned: getPosition return unsigned char, and qualify all your numeric literals with "U" (0xF0U for example) to make sure they're always interpreted as unsigned. Most likely you won't have any problems with signed-ness but why take chances on some architecture that behaves unexpectedly?

Nice code, but if you are really that deep into performance optimization, you should probably learn more about your particular CPU architecture.
AFAIK, you may found that storing a chess piece in as much 8 bytes will be more efficient. Even if you recurse 15 moves deep, L2 cache size would hardly be a constraint, but RAM misalignment may be. I would guess that proper handling of a chess board would include Expand() and Reduce() functions to translate between board representations during different parts of the algorithm: some may be faster on compact representation, and some vice versa. For example, caching, and algorithms involving hashing by composition of two adjacent cells might be good for the compact structure, all else no.
I would also consider developing some helper hardware, like some FPGA board, or some GPU code, if performance is so important..

As a chess player, I can tell you: There's more to a position than the mere placement of each piece. You have to take in to consideration some other things:
Which side has to move next?
Can white and/or black castle king and/or queenside?
Can a pawn be taken en passant?
How many moves have passed since the last pawn move and/or capturing move?
If the data structure you use to represent a position doesn't reflect this information, then you're in big trouble.

10 character id that's globally and locally unique

I need to generate a 10 character unique id (SIP/VOIP folks need to know that it's for a param icid-value in the P-Charging-Vector header). Each character shall be one of the 26 ASCII letters (case sensitive), one of the 10 ASCII digits, or the hyphen-minus.
It MUST be 'globally unique (outside of the machine generating the id)' and sufficiently 'locally unique (within the machine generating the id)', and all that needs to be packed into 10 characters, phew!
Here's my take on it. I'm FIRST encoding the 'MUST' be encoded globally unique local ip address into base-63 (its an unsigned long int that will occupy 1-6 characters after encoding) and then as much as I can of the current time stamp (its a time_t/long long int that will occupy 9-4 characters after encoding depending on how much space the encoded ip address occupies in the first place).
I've also added loop count 'i' to the time stamp to preserve the uniqueness in case the function is called more than once in a second.
Is this good enough to be globally and locally unique or is there another better approach?
Gaurav
#include <stdio.h>
#include <string.h>
#include <sys/time.h>
//base-63 character set
static char set[]="abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-";
// b63() returns the next vacant location in char array x
int b63(long long longlong,char *x,int index){
if(index > 9)
return index+1;
//printf("index=%d,longlong=%lld,longlong%63=%lld\n",index,longlong,longlong%63);
if(longlong < 63){
x[index] = set[longlong];
return index+1;
}
x[index] = set[longlong%63];
return b63(longlong/63,x,index+1);
}
int main(){
char x[11],y[11] = {0}; /* '\0' is taken care of here */
//let's generate 10 million ids
for(int i=0; i<10000000; i++){
/* add i to timestamp to take care of sub-second function calls,
3770168404(is a sample ip address in n/w byte order) = 84.52.184.224 */
b63((long long)time(NULL)+i,x,b63((long long)3770168404,x,0));
// reverse the char array to get proper base-63 output
for(int j=0,k=9; j<10; j++,k--)
y[j] = x[k];
printf("%s\n",y);
}
return 0;
}

It MUST be 'globally unique (outside
of the machine generating the id)' and
sufficiently 'locally unique (within
the machine generating the id)', and
all that needs to be packed into 10
characters, phew!
Are you in control of all the software generating ids? Are you doling out the ids? If not...
I know nothing about SIP, but there's got to be a misunderstanding that you have about the spec (or the spec must be wrong). If another developer attempts to build an id using a different algorithm than the one you've cooked up, you will have collisions with their ids, meaning they will know longer be globally unique in that system.
I'd go back to the SIP documentation, see if there's an appendix with an algorithm for generating these ids. Or maybe a smarter SO user than I can answer what the SIP algorithm for generating these id's is.

I would have a serious look at RFC 4122 which describes the generation of 128-bit GUIDs. There are several different generation algorithms, some of which may fit (MAC address-based one springs to mind). This is a bigger number-space than yours 2^128 = 3.4 * 10^38 compared with 63^10 = 9.8 * 10^17, so you may have to make some compromises on uniqueness. Consider factors like how frequently the IDs will be generated.
However in the RFC, they have considered some practical issues, like the ability to generate large numbers of unique values efficiently by pre-allocating blocks of IDs.

Can't you just have a distributed ID table ?

Machines on NAT'ed LANs will often have an IP from a small range, and not all of the 32-bit values would be valid (think multicast, etc). Machines may also grab the same timestamp, especially if the granularity is large (such as seconds); keep in mind that the year is very often going to be the same, so it's the lower bits that will give you the most 'uniqueness'.
You may want to take the various values, hash them with a cryptographic hash, and translate that to the characters you are permitted to use, truncating to the 10 characters.
But you're dealing with a value with less than 60 bits; you need to think carefully about the implications of a collision. You might be approaching the problem the wrong way...

Well, if I cast aside the fact that I think this is a bad idea, and concentrate on a solution to your problem, here's what I would do:
You have an id range of 10^63, which correspond to roughly 60 bits. You want it to be both "globally" and "locally" unique. Let's generate the first N bits to be globally unique, and the rest to be locally unique. The concatenation of the two will have the properties you are looking for.
First, the global uniqueness : IP won't work, especially local ones, they hold very little entropy. I would go with MAC addresses, they were made for being globally unique. They cover a range of 256^6, so using up 6*8 = 48 bits.
Now, for the locally unique : why not use the process ID ? I'm making the assumption that the uniqueness is per process, if it's not, you'll have to think of something else. On Linux, process ID is 32 bits. If we wanted to nitpick, the 2 most significant bytes probably hold very little entropy, as they would at 0 on most machines. So discard them if you know what you're doing.
So now you'll see you have a problem as it would use up to 70 bits to generate a decent (but not bulletproof) globally and locally unique ID (using my technique anyway). And since I would also advise to put in a random number (at least 8 bits long) just in case, it definitely won't fit. So if I were you, I would hash the ~78 generated bits to SHA1 (for example), and convert the first 60 bits of the resulting hash to your ID format. To do so, notice that you have a 63 characters range to chose from, so almost the full range of 6 bits. So split the hash in 6 bits pieces, and use the first 10 pieces to select the 10 characters of your ID from the 63 character range. Obviously, the range of 6 bits is 64 possible values (you only want 63), so if you have a 6 bits piece equals to 63, either floor it to 62 or assume modulo 63 and pick 0. It will slightly bias the distribution, but it's not too bad.
So there, that should get you a decent globally and locally pseudo-unique ID.
A few last points: according to the Birthday paradox, you'll get a ~ 1 % chance of collisions after generating ~ 142 million IDs, and a 99% chance after generating 3 billions IDs. So if you hit great commercial success and have millions of IDs being generated, get a larger ID.
Finally, I think I provided a "better than the worse" solution to your problem, but I can't help but think you're attacking this problem in the wrong fashion, and possibly as other have mentioned, misreading the specs. So use this if there are no other ways that would be more "bulletproof" (centralised ID provider, much longer ID ... ).
Edit: I re-read your question, and you say you call this function possibly many times a second. I was assuming this was to serve as some kind of application ID, generated once at the start of your application, and never changed afterwards until it exited. Since it's not the case, you should definitely add a random number and if you generate a lot of IDs, make that at least a 32 bits number. And read and re-read the Birthday Paradox I linked to above. And seed your number generator to a highly entropic value, like the usec value of the current timestamp for example. Or even go so far as to get your random values from /dev/urandom .
Very honestly, my take on your endeavour is that 60 bits is probably not enough...

Hmm, using the system clock may be a weakness... what if someone sets the clock back? You might re-generate the same ID again. But if you are going to use the clock, you might call gettimeofday() instead of time(); at least that way you'll get better resolution than one second.

#Doug T.
No, I'm not in control of all the software generating the ids.
I agree without a standardized algorithm there maybe collisions, I've raised this issue in the appropriate mailing lists.
#Florian
Taking a cue from you're reply. I decided to use the /dev/urandom PRNG for a 32 bit random number as the space unique component of the id. I assume that every machine will have its own noise signature and it can be assumed to be safely globally unique in space at an instant of time. The time unique component that I used earlier remains the same.
These unique ids are generated to collate all the billing information collected from different network functions that independently generated charging information of a particular call during call processing.
Here's the updated code below:
Gaurav
#include <stdio.h>
#include <string.h>
#include <time.h>
//base-63 character set
static char set[]="abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-";
// b63() returns the next vacant location in char array x
int b63(long long longlong, char *x, int index){
if(index > 9)
return index+1;
if(longlong < 63){
x[index] = set[longlong];
return index+1;
}
x[index] = set[longlong%63];
return b63(longlong/63, x, index+1);
}
int main(){
unsigned int number;
char x[11], y[11] = {0};
FILE *urandom = fopen("/dev/urandom", "r");
if(!urandom)
return -1;
//let's generate a 1 billion ids
for(int i=0; i<1000000000; i++){
fread(&number, 1, sizeof(number), urandom);
// add i to timestamp to take care of sub-second function calls,
b63((long long)time(NULL)+i, x, b63((long long)number, x, 0));
// reverse the char array to get proper base-63 output
for(int j=0, k=9; j<10; j++, k--)
y[j] = x[k];
printf("%s\n", y);
}
if(urandom)
fclose(urandom);
return 0;
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js