Possible to generate 8-byte Unique ID for an entity - c++

I need a unique identifier to distinguish entities, but in reality these entities don't have a lot, and the uid can be repeated when the entity is destroyed. Entities are created in a distributed system and can be created multiple times at the same time.
Currently using a popular UUID library, but UUID is a 128-bit number. According to the design of my system, an int type is more than enough. If uid can be recycled, 8-byte should ok. So I think there is a lot of optimization space.
For example:
bool isEqual(const char *uid1, const char *uid2) {
return strcmp(uid1, uid2) == 0;
}
If I can make uid an integer instead of a string, then I don't need to use the string comparison function.
bool isEqual(int uid1, int uid2) {
return uid1 == uid2;
}
But I don't know now that there are mature libraries that meet my needs.
So I want to ask you:
How feasible if I implement it myself?
What difficulties will I encounter?
What should I pay attention to?
Is there a library that already implements similar functions?
Worth it?
BTW, I can use C/C++/lua.

If you want a custom dedicated uid generation on a fully controlled distributed system, you have 3 possibilities:
A central system generates simply serial values and the other systems ask it for each new uid. Simple and fully deterministic, but the generator is a Single Point Of Failure
Each (logical) system receives an id and combines it with a local serial number. For example if the number of systems is beyond 32000 you could use 16 bits for the system id and 48 bits for the serial. Fully deterministic but requires an admin to give each system its id
Random. High quality random number generators that comply with crypto requirements should give you pseudo uid with a low probability of collision. But it is only probabilistic so a collision is still possible.
Point to pay attention to:
race conditions. If more than one process can be client for a generator, you must ensure that the uid generation is correctly synchronized
uid recycling. If the whole system must be designed to live long enough to exhaust a serial generator, you will have to keep somewhere the list of still existing entities and their uid
for the probabilistic solution, the risk of collision is proportional to the maximum number of simultaneous entities. You should carefully evaluates that probability and evaluates whether the risk can be accepted.
Are such solutions already implemented?
Yes, in database systems that allow automatic id generation.
Worth it?
Only you can say...

We have a tiny, secure, unique string ID generator for Python, which allows you to reduce ID length (but increase collisions probability), you can pass the length as an argument. To use in python env :
pip install nanoid
from nanoid import generate
generate(size=10) => u'p1yS9T21Bf'
To check how the ID's are generated an their collision probablity for a given length visit https://zelark.github.io/nano-id-cc/
Refer: https://pypi.org/project/nanoid/

Related

How can we generate multiple random number in ethereum?

I want my smart contract to return 7 or 8 UNIQUE random numbers ranging from 1 to 100 upon calling the contract. What can be the best approach to obtain such result?
Probably if you are trying to build roulettes, lotteries, and card games using the Ethereum blockchain, as the Ethereum blockchain is deterministic, it imposes certain difficulties for those who have chosen to write their own pseudo-random number generator (PRNG).
Some Vulnerable Methods Currently Used
If you are using the block variables like block.coinbase, block.difficulty, block.timestamp etc.. as the source of entropy, all these block variables can be manipulated by miners, so they cannot be used as a source of entropy because of the miners’ incentive. As the block variables are obviously shared within the same block, you can easily use internal messages to yield the same outcome.
Other methods are like using blockhash of current or some past block
or blockhash of a past block combined with a private seed. block.blockhash(block.number) function is used in these cases. However, at the moment of transaction execution in the EVM, the blockhash of the block that is being created is not yet known for obvious reasons and the EVM will always yield zero. If we are trying it with the blockhash of a previous block, an attacker can make an exploit contract with the same code in order to call the target contract via an internal message. The “random” numbers for the two contracts will be the same.
Even if we combine the blockhash with a private seed, being transparent in nature, the blockchain must not be used to store secrets in plaintext. It is trivial to extract the value of the private variable pointer from the contract storage and supply it as an argument to an exploit.
Some Areas Worth Exploring
External oracles
Signidice
Commit–reveal approach
With External oracles like Oraclize, smart contracts can request data from web APIs such as currency exchange rates, weather forecasts, and stock prices (like random.org). The key drawback of this approach is that it is centralized. Will Oraclize daemon tamper with the results? Can we trust random.org?
Instead of Oraclize, we can also use BTCRelay which is a bridge between Ethereum and Bitcoin blockchains. Using BTCRelay, smart contracts in the Ethereum blockchain can request future Bitcoin blockhashes and use them as a source of entropy.
Signidice is an algorithm based on cryptographic signatures that can be used for random number generation in smart contracts involving only two parties: the player and the house. The algorithm works as follows:
The player makes a bet by calling a smart contract.
The house sees the bet, signs it with its private key, and sends the signature to the smart contract.
The smart contract verifies the signature using the known public key.
This signature is then used to generate a random number.
Commit–reveal approach consists of two phases:
A “commit” stage, when the parties submit their cryptographically protected secrets to the smart contract.
A “reveal” stage, when the parties announce cleartext seeds, the smart contract verifies that they are correct, and the seeds are used to generate a random number.
A better implementation of the commit–reveal approach is Randao. Commit–reveal can be combined with future blockhashes to make it more secure.
This pretty much covers all the methods for random number generation using Ethereum.
Like Raghav said, random numbers on the blockchain are hard. The public nature of the network makes it very hard to generate a number that cannot be pre-calculated.
With that said, one of the best solutions is to use an oracle that gets the random number from an external (read: non-blockchain based) source. Take a look at this guide. The Ethtroll Dapp is a good example of this, so take a look at the code here. They use Oraclize to get a random number from Random.org.
An issue with using an oracle is the centralization factor. If you set up your Dapp in the way I have described above, you are at the mercy of a rouge employee at two different centralized services—Oraclize and Random.org. Though it would be unlikely for someone to manipulate either of these sources, people will perform irrational acts for potential economic gain.
Use a Chainlink VRF.
There are a number of issues with using the blockhash or similar as the method of random seeding. If an attacker knows the blockhash before your contract, they can use that information to gain a malicious advantage on whatever it is you're trying to do. An oracle can help here, but they are a central source of failure and must be able to prove they are random.
You need to have an oracle network that can:
Prove that the numbers generated are random.
Have enough oracles/nodes that even if one fails/is corrupt, your smart contract will persist.
At this time, the example below shows how to solve #1. You can solve #2 by pulling from a sufficient number of nodes who support the Chainlink VRF.
For an exact implementation, see this answer from a similar question.
You'll want to make a request to a node with a function that takes a seed generated by you:
function rollDice(uint256 userProvidedSeed) public returns (bytes32 requestId) {
require(LINK.balanceOf(address(this)) > fee, "Not enough LINK - fill contract with faucet");
uint256 seed = uint256(keccak256(abi.encode(userProvidedSeed, blockhash(block.number)))); // Hash user seed and blockhash
bytes32 _requestId = requestRandomness(keyHash, fee, seed);
emit RequestRandomness(_requestId, keyHash, seed);
return _requestId;
}
And when the value is returned, you'll mod it by 100 and add 1. You'll need to call this 7 or 8 times if you want 7 or 8 random numbers.
function fulfillRandomness(bytes32 requestId, uint256 randomness) external override {
uint256 d6Result = randomness.mod(100).add(1);
emit RequestRandomnessFulfilled(requestId, randomness);
}
I have a brainstorming idea, maybe helps somebody.
It is a simplified Commit–reveal approach with just one participant.
It wil be needed a title for every random generation. That tittle should be and standard, and easy to audit.
First I Commit("Alice's Lotery") on the smartContract. If title is repeated (check hashes) it will be rejected.
And for reveal will be needed to wait at least 1 extra block confirmation, this 2 blocks should come from different miners to ensure miner is not attacking this smartcontract.
And then you execute Reveal("Alberto's Lottery").
The magic happens here; The sources for random will be the titlle, msg.sender, block.blockhash of the commit block, and the block.blockhash(commitBlockNumber+1) because nobody can predict the future hash nor which miner will discover it [you could add coinbase or timestamp also to get more random value].
Also you could check if timestamps of commitBlockNumber and commitBlockNumber+1 are too much close or too much separated, this could indicate that some minner is trying to force some block, so you could reject this lottery.
And of course if you can watch too much close tx with commits like ("Alice's Lottery") || ("AAlice's Lottery") you can probe that this lottery is being tricked.
Also you could do this with more than 2 "interval" blocks

why is it necessary to calculate hash based on difficulty in blockchain?

for example if a certain block requires certain 5 leading zeros then can't we write a 32 byte random hash which has 5 leading zeroes like
00000C66DA510B7C524F7EF33279BCA641E2E8BF94B6B15AC6343CD2B706F673
or
00000C66DA510B7C524F7EF33279BCA641E2E8BF94B6B15AC6343CD2ASDFERTY
Also, if i am designing my own blockchain,then will choosing any random number without hashing make a difference in security of the blockchain?
You can't choose the hash of a blockchain block. Each block has some data (including the hash of the previous block), and then a "nonce", and then a cryptographic hash of the whole thing. One of the essential design properties of a cryptographic hash is that nobody knows how to choose the input to make the output come out a particular value.
What a blockchain miner does all day is compute HASH(prev_hash || new data || 1), HASH(prev_hash || new data || 2), HASH(prev_hash || data || 3), ... until they reach a number that makes the hash come out with the required number of leading zeroes. That's the work that they are proving they have done. And then they publish the block, with its nonce and hash, and it takes only a trivial amount of work by comparison to verify that HASH(prev_hash || new data || n) has the required number of leading zeroes, so the block is accepted.
Proof of work is not a mandatory feature of a blockchain, and in fact it would be really nice if we could figure out a good alternative, because proof of work is what makes Bitcoin consume as much electricity as the entire country of Sweden or whatever it's up to these days. A blockchain is just a sequence of blocks of data, chained together with cryptographic hashes and signatures. What proof-of-work does - what including the "hash must have so many leading zeroes" rule does - is make it difficult for someone to control which miner will successfully mine the next block, and that is argued to mean "no one individual can force the distributed system to accept their next block instead of someone else's."
If some of the assertions in the previous paragraph seem a bit, um, shaky to you, you are absolutely correct. That is why cryptocurrency is still a field of active research rather than something everyone uses all the time.

Checksumming large swathes of prime numbers? (for verification)

Are there any clever algorithms for computing high-quality checksums on millions or billions of prime numbers? I.e. with maximum error-detection capability and perhaps segmentable?
Motivation:
Small primes - up to 64 bits in size - can be sieved on demand to the tune of millions per second, by using a small bitmap for sieving potential factors (up to 2^32-1) and a second bitmap for sieving the numbers in the target range.
Algorithm and implementation are reasonably simple and straightforward but the devil is in the details: values tend to push against - or exceed - the limits of builtin integral types everywhere, boundary cases abound (so to speak) and even differences in floating point strictness can cause breakage if programming is not suitably defensive. Not to mention the mayhem that an optimising compiler can wreak, even on already-compiled, already-tested code in a static lib (if link-time code generation is used). Not to mention that faster algorithms tend to be a lot more complicated and thus even more brittle.
This has two consequences: test results are basically meaningless unless the tests are performed using the final executable image, and it becomes highly desirable to verify proper operation at runtime, during normal use.
Checking against pre-computed values would give the highest degree of confidence but the required files are big and clunky. A text file with 10 million primes has on the order of 100 MB uncompressed and more than 10 MB compressed; storing byte-encoded differences requires one byte per prime and entropy coding can at best reduce the size to half (5 MB for 10 million primes). Hence even a file that covers only the small factors up to 2^32 would weigh in at about 100 MB, and the complexity of the decoder would exceed that of the windowed sieve itself.
This means that checking against files is not feasible except as a final release check for a newly-built executable. Not to mention that the trustworthy files are not easy to come by. The Prime Pages offer files for the first 50 million primes, and even the amazing primos.mat.br goes only up to 1,000,000,000,000. This is unfortunate since many of the boundary cases (== need for testing) occur between 2^62 and 2^64-1.
This leaves checksumming. That way the space requirements would be marginal, and only proportional to the number of test cases. I don't want to require that a decent checksum like MD5 or SHA-256 be available, and with the target numbers all being prime it should be possible to generate a high-quality, high-resolution checksum with some simple ops on the numbers themselves.
This is what I've come up with so far. The raw digest consists of four 64-bit numbers; at the end it can be folded down to the desired size.
for (unsigned i = 0; i < ELEMENTS(primes); ++i)
{
digest[0] *= primes[i]; // running product (must be initialised to 1)
digest[1] += digest[0]; // sum of sequence of running products
digest[2] += primes[i]; // running sum
digest[3] += digest[2] * primes[i]; // Hornerish sum
}
At two (non-dependent) muls per prime the speed is decent enough, and except for the simple sum each of the components has always uncovered all errors I tried to sneak past the digest. However, I'm not a mathematician, and empirical testing is not a guarantee of efficacy.
Are there some mathematical properties that can be exploited to design - rather than 'cook' as I did - a sensible, reliable checksum?
Is it possible to design the checksum in a way that makes it steppable, in the sense that subranges can be processed separately and then the results combined with a bit of arithmetic to give the same result as if the whole range had been checksummed in one go? Same thing as all advanced CRC implementations tend to have nowadays, to enable parallel processing.
EDIT The rationale for the current scheme is this: the count, the sum and the product do not depend on the order in which primes are added to the digest; they can be computed on separate blocks and then combined. The checksum does depend on the order; that's its raison d'être. However, it would be nice if the two checksums of two consecutive blocks could be combined somehow to give the checksum of the combined block.
The count and the sum can sometimes be verified against external sources, like certain sequences on oeis.org, or against sources like the batches of 10 million primes at primos.mat.br (the index gives first and last prime, the number == 10 million is implied). No such luck for product and checksum, though.
Before I throw major time and computing horsepower at the computation and verification of digests covering the whole range of small factors up to 2^64 I'd like to hear what the experts think about this...
The scheme I'm currently test-driving in 32-bit and 64-bit variants looks like this:
template<typename word_t>
struct digest_t
{
word_t count;
word_t sum;
word_t product;
word_t checksum;
// ...
void add_prime (word_t n)
{
count += 1;
sum += n;
product *= n;
checksum += n * sum + product;
}
};
This has the advantage that the 32-bit digest components are equal to the lower halves of the corresponding 64-bit values, meaning only 64-bit digests need to be computed stored even if fast 32-bit verification is desired. A 32-bit version of the digest can be found in this simple sieve test program # pastebin, for hands-on experimentation. The full Monty in a revised, templated version can be found in a newer paste for a sieve that works up to 2^64-1.
I've done a good bit of work parallelizing operations on Cell architectures. This has a similar feel.
In this case, I would use a hash function that's fast and possibly incremental (e.g. xxHash or MurmurHash3) and a hash list (which is a less flexible specialization of a Merkle Tree).
These hashes are extremely fast. It's going to be surprisingly hard to get better with some simple set of operations. The hash list affords parallelism -- different blocks of the list can be handled by different threads, and then you hash the hashes. You could also use a Merkle Tree, but I suspect that'd just be more complex without much benefit.
Virtually divide your range into aligned blocks -- we'll call these microblocks. (e.g. a microblock is a range such as [n<<15, (n+1)<<15) )
To handle a microblock, compute what you need to compute, add it to a buffer, hash the buffer. (An incremental hash function will afford a smaller buffer. The buffer doesn't have to be filled with the same length of data every time.)
Each microblock hash will be placed in a circular buffer.
Divide the circular buffer into hashable blocks ("macroblocks"). Incrementally hash these macroblocks in the proper order as they become available or if there's no more microblocks left.
The resulting hash is the one you want.
Some additional notes:
I recommend a design where threads reserve a range of pending microblocks that the circular buffer has space for, process them, dump the values in the circular buffer, and repeat.
This has the added benefit that you can decide how many threads you want to use on the fly. e.g. when requesting a new range of microblocks, each thread could detect if there's too many/little threads running and adjust.
I personally would have the thread adding the last microblock hash to a macroblock clean up that macroblock. Less parameters to tune this way.
Maintaining a circular buffer isn't as hard as it sounds -- the lowest order macroblock still unhandled defines what portion of the "macroblock space" the circular buffer represents. All you need is a simple counter that increments when appropriate to express this.
Another benefit is that since the threads go through a reserve/work/reserve/work cycle on a regular basis, a thread that is unexpectedly slow won't hinder the running time nearly as badly.
If you're looking to make something less robust but easier, you could forgo a good bit of the work by using a "striped" pattern -- decide on the max number of threads (N), and have each thread handle every N-th microblock (offset by its thread "ID") and hash the resulting macroblocks per thread instead. Then at the end, hash the macroblock hashes from the N threads. If you have less than N threads, you can divide the work up amongst the number of threads you do want. (e.g. 64 max threads, but three real threads, thread 0 handles 21 virtual threads, thread 1 handles 21 virtual threads, and thread 2 handles 22 virtual threads -- not ideal, but not terrible) This is essentially a shallow Merkel tree instead of a hash list.
Kaganar's excellent answer demonstrates how to make things work even if the digests for adjacent blocks cannot be combined mathematically to give the same result as if the combined block had been digested instead.
The only drawback of his solution is that the resulting block structure is by necessity rather rigid, rather like PKI with its official all-encompassing hierarchy of certifications vs. 'guerrilla style' PGP whose web of trust covers only the few subjects who are of interest. In other words, it requires devising a global addressing structure/hierarchy.
This is the digest in its current form; the change is that the order-dependent part has been simplified to its essential minimum:
void add_prime (word_t n)
{
count += 1;
sum += n;
product *= n;
checksum += n * count;
}
Here are the lessons learnt from practical work with that digest:
count, sum and product (i.e. partial primorial modulo word size) turned out to be exceedingly useful because of the fact that they relate to things also found elsewhere in the world, like certain lists at OEIS
count and sum were very useful because the first tends to be naturally available when manipulating (generating, using, comparing) batches of primes, and the sum is easily computed on the fly with zero effort; this allows partial verification against existing results without going the whole hog of instantiating and updating a digest, and without the overhead of two - comparatively slow - multiplications
count is also exceedingly useful as it must by necessity be part of any indexing superstructure built on systems of digests, and conversely it can guide the search straight to the block (range) containing the nth prime, or to the blocks overlapped by the nth through (n+k)th primes
the order dependency of the fourth component (checksum) turned out be less of a hindrance than anticipated, since small primes tend to 'occur' (be generated or used) in order, in situations where verification might be desired
the order dependency of the checksum - and lack of combinability - made it perfectly useless outside of the specific block for which it was generated
fixed-size auxiliary program structures - like the ubiquitous small factor bitmaps - are best verified as raw memory for startup self-checks, instead of running a primes digest on them; this drastically reduces complexity and speeds things up by several orders of magnitude
For many practical purposes the order-dependent checksum could simply be dropped, leaving you with a three-component digest that is trivially combinable for adjacent ranges.
For verification of fixed ranges (like in self-tests) the checksum component is still useful. Any other kind of checksum - the moral equivalent of a CRC - would be just as useful for that and probably faster. It would be even more useful if an order-independent (combinable) way of supplementing the resolution of the first three components could be found. Extending the resolution beyond the first three components is most relevant for bigger computing efforts, like sieving, verifying and digesting trillions of primes for posterity.
One such candidate for an order-independent, combinable fourth component is the sum of squares.
Overall the digest turned out to be quite useful as is, despite the drawbacks concerning the checksum component. The best way of looking at the digest is probably as consisting of a 'characteristic' part (the first three components, combinable) and a checksum part that is only relevant for the specific block. The latter could just as well be replaced with a hash of any desired resolution. Kaganar's solution indicates how this checksum/hash can be integrated into a system that extends beyond a single block, despite its inherent non-combinability.
The summary of prime number sources seems to have fallen by the wayside, so here it is:
up to 1,000,000,000,000 available as files from sites like primos.mat.br
up to 2^64-10*2^64 in super-fast bulk via the primesieve.org console program (pipe)
up to 2^64-1 - and beyond - via the gp/PARI program (pipe, about 1 million primes/minute)
I'm answering this question again in a second answer since this is a very different and hopefully better tack:
It occurred to me that what you're doing is basically looking for a checksum, not over a list of primes, but over a range of a bitfield where a number is prime (bit is set to 1) or it's not (bit is set to 0). You're going to have a lot more 0's than 1's for any interesting range, so you hopefully only have to do an operation for the 1's.
Typically the problem with using a trivial in-any-order hash is that they handle multiplicity poorly and are oblivious to order. But you don't care about either of these problems -- every bit can only be set or unset once.
From that point of view, a bitwise-exclusive-or or addition should be just fine if combined with a good hashing function of the index of the bit -- that is, the found prime. (If your primes are 64-bit you could go with some of the functions here.)
So, for the ultimate simplicity that will give you the same value for any set of ranges of inputs, yes, stick to hashing and combining it with a simple operation like you are. But change to a traditional hash function which appears "random" given its input -- hash64shift on the linked page is likely what you're looking for. The probability of a meaningful collision is remote. Most hash functions stink, however -- make sure you pick one that is known to have good properties. (Avalanches well, etc.) Thomas Wang's are usually not so bad. (Bob Jenkin's are fantastic, but he sticks mostly to 32 bit functions. Although his mix function on the linked page is very good, but probably overkill.)
Parallelizing the check is obviously trivial, the code size and effort is vastly reduced from my other answer, and there's much less synchronization and almost no buffering that needs to occur.

How do I convert a directory path to a unique numerical identifier (Linux/C++)?

I am investigating ways to take a directory (folder) and derive some form of unique numerical identifier. I have investigated "string to hash" methods, however, the Pigeon Hole Principle means that one can never derive a truely unique number for every single string.
String to unique hash is no good.
I have recently been investigating other means of achieving my goal and thus have the following question to ask:
Directory time stamps - how 'unique' are they?
To what resolution are the time stamps reported by 'stat' as described here (second post)? if the resolution is small enough, is it possible for more than one folder to share the exact same time stamp on a Linux system?
If anyone has other methods/techniques they'd like to share, I'd be happy to listen :)
Edit 1 To clarify my use case in response to the answers posted so far: I am working on Android platforms, so the filesystem is not linked to any other (except of course for removeable media such as Micro SD cards).
I am inserting each path into a database but trying to avoid string comparisons when querying the table. The use of maps/hashmaps is not an option here. Yes, the path itself is unique, but ideally I need a numerical identifier that can be used to query the table as opposed to the path itself. The identifier must also be unique per path. I have experimented with std::collate but found there were many collides in the hashes (a dataset of 20, 000 paths yeilds approximatley 100 collides). What was even more surprising is that the hashes appeared to be largely different each time my application is run. I wonder if it's seeded somehow?
Many thanks,
P
On any UNIX-based system, you can use the inode number as a unique identifier within that file system. Combining it with the device number will make it unique within the machine. If you wanted it to be globally unique, you could throw in the system's primary MAC address.
Keep in mind, however, that:
The inode number will "follow" the directory if it is moved or renamed. It will change if the directory is deleted and replaced.
The inode number will not be stable across systems, beyond one or two really special directories. (For instance, / is usually inode 2.)
+1 duskwuff, good one!
Another way is to simply treat the dir's path as a number ("BigInt").
Take this dir for example: /opt/www/log
It's 12 chars long.
12 * 8bits = 96bits
Thus you have a 96 bits long number,which you can represent in hex/base64/anything (in case you need to pass it as an HTML link).
I'd personally go for duskwuff's approach though.
I think it depends very much on the purpose of why you want a unique numerical identifier. Timestamps can change, inodes can change, disknumbers can change, MAC adresses can change. (Still, +1 for duskwuff)
In some scenarios you can simply make a table, where each path you add gets a new, unique number, just like a numerical key column in a database.
Although hashes can collide, in every actual environment this is absolutely unlikely (if you don't use the lousiest algorithm around...) It is much more likely that you get errors due to flaws in your implementation, for example you treat "/tmp" differently than "/tmp/", because you don't normalize the paths before hashing them. Or, you want to distinguish physical folders, but forget to check for hardlinks and symlinks to the same folder, thus you get multiple hashes / id's for the same dir.
Again, depending on your usecase, a collision is not necessarily fatal: if you find that a new path results in the same hash as an existing one (will not happen!) you can still react on that case. (*)
Just to help your imagination: If you use a 64 bit hash, you could fill 150 000 000 of 1TB hard disk drives with empty folders (nothing on them but short folder names...) and then you will have a collision for sure. If you think, that's too risky (blink, blink), use a 128 bit hash which makes it 18 446 744 073 710 000 times less likely.
Hashes are designed to make collisions unlikely, and even good old MD5 will do its job well if nobody willingly tries to produce a collision.
(*) Edit:
Your pigeonhole article already points that out: a collision just mean that lookup is no longer O(1), but slightly slower. As it rarely ever happens, you can easily live with that. If you use a std::map (no hash) or std::hashmap, you'll not have to worry about collisions. See what the difference between map and hashmap in STL

Algorithm for generating a unique ID in C++?

What can be the best algorithm to generate a unique id in C++?
The length ID should be a 32 bit unsigned integer.
Getting a unique 32-bit ID is intuitively simple: the next one. Works 4 billion times. Unique for 136 years if you need one a second. The devil is in the detail: what was the previous one? You need a reliable way to persist the last used value and an atomic way to update it.
How hard that will be depends on the scope of the ID. If it is one thread in one process then you only need a file. If it is multiple threads in one process then you need a file and a mutex. If is multiple processes on one machine then you need a file and a named mutex. If it is multiple processes on multiple machines then you need to assign a authoritative ID provider, a single server that all machines talk to. A database engine is a common provider like that, they have this built-in as a feature, an auto-increment column.
The expense of getting the ID goes progressively up as the scope widens. When it becomes impractical, scope is Internet or provider too slow or unavailable then you need to give up on a 32-bit value. Switch to a random value. One that's random enough to make the likelihood that the machine is struck by a meteor is at least a million times more likely than repeating the same ID. A goo-ID. It is only 4 times as large.
Here's the simplest ID I can think of.
MyObject obj;
uint32_t id = reinterpret_cast<uint32_t>(&obj);
At any given time, this ID will be unique across the application. No other object will be located at the same address. Of course, if you restart the application, the object may be assigned a new ID. And once the object's lifetime ends, another object may be assigned the same ID.
And objects in different memory spaces (say, on different computers) may be assigned identical IDs.
And last but not least, if the pointer size is larger than 32 bits, the mapping will not be unique.
But since we know nothing about what kind of ID you want, and how unique it should be, this seems as good an answer as any.
You can see this. (Complete answer, I think, is on Stack Overflow.)
Some note for unique id in C++ in Linux in this site. And you can use uuid in Linux, see this man page and sample for this.
If you use windows and need windows APIs, see this MSDN page.
This Wikipedia page is also useful: http://en.wikipedia.org/wiki/Universally_Unique_Identifier.
DWORD uid = ::GetTickCount();
::Sleep(100);
If you can afford to use Boost, then there is a UUID library that should do the trick. It's very straightforward to use - check the documentation and this answer.
There is little context, but if you are looking for a unique ID for objects within your application you can always use a singleton approach similar to
class IDGenerator {
public:
static IDGenerator * instance ();
uint32_t next () { return _id++; }
private:
IDGenerator () : _id(0) {}
static IDGenerator * only_copy;
uint32_t _id;
}
IDGenerator *
IDGenerator::instance () {
if (!only_copy) {
only_copy = new IDGenerator();
}
return only_copy;
}
And now you can get a unique ID at any time by doing:
IDGenerator::instance()->next ()