Following are the things that I already know about these:
Hamming codes can be used both to detect and correct errors, while in crc errors can only be detected.
CRC is used in communication while Hamming code is used to detect errors in memory disks
My main question is what is the advantage of using one code over another?
Hamming is used in case where fixed length of data is to be detected and corrected while CRC works for any length of data.
It will depend on your application, but as you have pointed out, the two are
quite different. The main difference I would consider is the amount of
overhead needed for each.
For a simple (7,4) hamming code, you are adding 75% overhead to your data in
order to get the ability to correct one error for every 4-bits.
So if you were sending or storing a 1000 byte message, you would have to
really send/store 1750 bytes. That's a lot of overhead!
For a CRC, you are accumulating a single result over a large amount of data in
order to detect if there is an error somewhere in the data. You do not need
to tell exactly where it is, just that something is wrong. For that, you
could accumulate a 32-bit CRC over your message and do pretty well.
So for our 1000 byte message, you would really be sending/storing 1004 bytes.
That is very efficient if all you need is detection of a problem.
Related
I'm getting this warning and an error afterwards when I try to parse a large message. I know than 64MB which is the default limit. I am using message.ParseFromIstream now. Does any one know to get access to CodedInputStream object to call the SetTotalBytesLimit function? or any other way to solve this problem?
Reading dangerously large protocol message. If the message turns out
to be larger than 67108864 bytes, parsing will be halted for security
reasons. To increase the limit (or to disable these warnings), see
CodedInputStream::SetTotalBytesLimit() in
google/protobuf/io/coded_stream.h.
The correct fix: You should try to limit the sizes of your protobuf messages. Please see:
https://developers.google.com/protocol-buffers/docs/techniques#streaming
The quick and dirty (read not recommended) approach:
In the file coded_stream.h of the protobuf library source, change the values of kDefaultTotalBytesLimit and kDefaultTotalBytesWarningThreshold, recompile, and reinstall.
Just reading the documentation of the function that the error already told you about, would've answered that question:
Hint: If you are reading this because your program is printing a
warning about dangerously large protocol messages, you may be confused
about what to do next. The best option is to change your design such
that excessively large messages are not necessary. For example, try to
design file formats to consist of many small messages rather than a
single large one. If this is infeasible, you will need to increase the
limit. Chances are, though, that your code never constructs a
CodedInputStream on which the limit can be set. You probably parse
messages by calling things like Message::ParseFromString(). In this
case, you will need to change your code to instead construct some sort
of ZeroCopyInputStream (e.g. an ArrayInputStream), construct a
CodedInputStream around that, then call
Message::ParseFromCodedStream() instead. Then you can adjust the
limit. Yes, it's more work, but you're doing something unusual.
Source
Also it's probably a really good idea to follow the first part of the advice and redesign the application.
Here's a comment from the code (google/protobuf/io/coded_stream.h) that sets the message limit for those who's wondering what is the security reason they are talking about. In my case I cannot modify how my application work so I have to change this limit.
This thread is quite old, but recently deep learning has got attention and the library Caffe used Protobuf so maybe more people will stumbled upon this. I have to do neural network stuff with Caffe, and the whole network took so much memory even with smallest batch size.
// Total Bytes Limit -----------------------------------------------
// To prevent malicious users from sending excessively large messages
// and causing integer overflows or memory exhaustion, CodedInputStream
// imposes a hard limit on the total number of bytes it will read.
// Sets the maximum number of bytes that this CodedInputStream will read
// before refusing to continue. To prevent integer overflows in the
// protocol buffers implementation, as well as to prevent servers from
// allocating enormous amounts of memory to hold parsed messages, the
// maximum message length should be limited to the shortest length that
// will not harm usability. The theoretical shortest message that could
// cause integer overflows is 512MB. The default limit is 64MB. Apps
// should set shorter limits if possible. If warning_threshold is not -1,
// a warning will be printed to stderr after warning_threshold bytes are
// read. For backwards compatibility all negative values get squashed to -1,
// as other negative values might have special internal meanings.
// An error will always be printed to stderr if the limit is reached.
//
// This is unrelated to PushLimit()/PopLimit().
//
// Hint: If you are reading this because your program is printing a
// warning about dangerously large protocol messages, you may be
// confused about what to do next. The best option is to change your
// design such that excessively large messages are not necessary.
// For example, try to design file formats to consist of many small
// messages rather than a single large one. If this is infeasible,
// you will need to increase the limit. Chances are, though, that
// your code never constructs a CodedInputStream on which the limit
// can be set. You probably parse messages by calling things like
// Message::ParseFromString(). In this case, you will need to change
// your code to instead construct some sort of ZeroCopyInputStream
// (e.g. an ArrayInputStream), construct a CodedInputStream around
// that, then call Message::ParseFromCodedStream() instead. Then
// you can adjust the limit. Yes, it's more work, but you're doing
// something unusual.
Are there any clever algorithms for computing high-quality checksums on millions or billions of prime numbers? I.e. with maximum error-detection capability and perhaps segmentable?
Motivation:
Small primes - up to 64 bits in size - can be sieved on demand to the tune of millions per second, by using a small bitmap for sieving potential factors (up to 2^32-1) and a second bitmap for sieving the numbers in the target range.
Algorithm and implementation are reasonably simple and straightforward but the devil is in the details: values tend to push against - or exceed - the limits of builtin integral types everywhere, boundary cases abound (so to speak) and even differences in floating point strictness can cause breakage if programming is not suitably defensive. Not to mention the mayhem that an optimising compiler can wreak, even on already-compiled, already-tested code in a static lib (if link-time code generation is used). Not to mention that faster algorithms tend to be a lot more complicated and thus even more brittle.
This has two consequences: test results are basically meaningless unless the tests are performed using the final executable image, and it becomes highly desirable to verify proper operation at runtime, during normal use.
Checking against pre-computed values would give the highest degree of confidence but the required files are big and clunky. A text file with 10 million primes has on the order of 100 MB uncompressed and more than 10 MB compressed; storing byte-encoded differences requires one byte per prime and entropy coding can at best reduce the size to half (5 MB for 10 million primes). Hence even a file that covers only the small factors up to 2^32 would weigh in at about 100 MB, and the complexity of the decoder would exceed that of the windowed sieve itself.
This means that checking against files is not feasible except as a final release check for a newly-built executable. Not to mention that the trustworthy files are not easy to come by. The Prime Pages offer files for the first 50 million primes, and even the amazing primos.mat.br goes only up to 1,000,000,000,000. This is unfortunate since many of the boundary cases (== need for testing) occur between 2^62 and 2^64-1.
This leaves checksumming. That way the space requirements would be marginal, and only proportional to the number of test cases. I don't want to require that a decent checksum like MD5 or SHA-256 be available, and with the target numbers all being prime it should be possible to generate a high-quality, high-resolution checksum with some simple ops on the numbers themselves.
This is what I've come up with so far. The raw digest consists of four 64-bit numbers; at the end it can be folded down to the desired size.
for (unsigned i = 0; i < ELEMENTS(primes); ++i)
{
digest[0] *= primes[i]; // running product (must be initialised to 1)
digest[1] += digest[0]; // sum of sequence of running products
digest[2] += primes[i]; // running sum
digest[3] += digest[2] * primes[i]; // Hornerish sum
}
At two (non-dependent) muls per prime the speed is decent enough, and except for the simple sum each of the components has always uncovered all errors I tried to sneak past the digest. However, I'm not a mathematician, and empirical testing is not a guarantee of efficacy.
Are there some mathematical properties that can be exploited to design - rather than 'cook' as I did - a sensible, reliable checksum?
Is it possible to design the checksum in a way that makes it steppable, in the sense that subranges can be processed separately and then the results combined with a bit of arithmetic to give the same result as if the whole range had been checksummed in one go? Same thing as all advanced CRC implementations tend to have nowadays, to enable parallel processing.
EDIT The rationale for the current scheme is this: the count, the sum and the product do not depend on the order in which primes are added to the digest; they can be computed on separate blocks and then combined. The checksum does depend on the order; that's its raison d'ĂȘtre. However, it would be nice if the two checksums of two consecutive blocks could be combined somehow to give the checksum of the combined block.
The count and the sum can sometimes be verified against external sources, like certain sequences on oeis.org, or against sources like the batches of 10 million primes at primos.mat.br (the index gives first and last prime, the number == 10 million is implied). No such luck for product and checksum, though.
Before I throw major time and computing horsepower at the computation and verification of digests covering the whole range of small factors up to 2^64 I'd like to hear what the experts think about this...
The scheme I'm currently test-driving in 32-bit and 64-bit variants looks like this:
template<typename word_t>
struct digest_t
{
word_t count;
word_t sum;
word_t product;
word_t checksum;
// ...
void add_prime (word_t n)
{
count += 1;
sum += n;
product *= n;
checksum += n * sum + product;
}
};
This has the advantage that the 32-bit digest components are equal to the lower halves of the corresponding 64-bit values, meaning only 64-bit digests need to be computed stored even if fast 32-bit verification is desired. A 32-bit version of the digest can be found in this simple sieve test program # pastebin, for hands-on experimentation. The full Monty in a revised, templated version can be found in a newer paste for a sieve that works up to 2^64-1.
I've done a good bit of work parallelizing operations on Cell architectures. This has a similar feel.
In this case, I would use a hash function that's fast and possibly incremental (e.g. xxHash or MurmurHash3) and a hash list (which is a less flexible specialization of a Merkle Tree).
These hashes are extremely fast. It's going to be surprisingly hard to get better with some simple set of operations. The hash list affords parallelism -- different blocks of the list can be handled by different threads, and then you hash the hashes. You could also use a Merkle Tree, but I suspect that'd just be more complex without much benefit.
Virtually divide your range into aligned blocks -- we'll call these microblocks. (e.g. a microblock is a range such as [n<<15, (n+1)<<15) )
To handle a microblock, compute what you need to compute, add it to a buffer, hash the buffer. (An incremental hash function will afford a smaller buffer. The buffer doesn't have to be filled with the same length of data every time.)
Each microblock hash will be placed in a circular buffer.
Divide the circular buffer into hashable blocks ("macroblocks"). Incrementally hash these macroblocks in the proper order as they become available or if there's no more microblocks left.
The resulting hash is the one you want.
Some additional notes:
I recommend a design where threads reserve a range of pending microblocks that the circular buffer has space for, process them, dump the values in the circular buffer, and repeat.
This has the added benefit that you can decide how many threads you want to use on the fly. e.g. when requesting a new range of microblocks, each thread could detect if there's too many/little threads running and adjust.
I personally would have the thread adding the last microblock hash to a macroblock clean up that macroblock. Less parameters to tune this way.
Maintaining a circular buffer isn't as hard as it sounds -- the lowest order macroblock still unhandled defines what portion of the "macroblock space" the circular buffer represents. All you need is a simple counter that increments when appropriate to express this.
Another benefit is that since the threads go through a reserve/work/reserve/work cycle on a regular basis, a thread that is unexpectedly slow won't hinder the running time nearly as badly.
If you're looking to make something less robust but easier, you could forgo a good bit of the work by using a "striped" pattern -- decide on the max number of threads (N), and have each thread handle every N-th microblock (offset by its thread "ID") and hash the resulting macroblocks per thread instead. Then at the end, hash the macroblock hashes from the N threads. If you have less than N threads, you can divide the work up amongst the number of threads you do want. (e.g. 64 max threads, but three real threads, thread 0 handles 21 virtual threads, thread 1 handles 21 virtual threads, and thread 2 handles 22 virtual threads -- not ideal, but not terrible) This is essentially a shallow Merkel tree instead of a hash list.
Kaganar's excellent answer demonstrates how to make things work even if the digests for adjacent blocks cannot be combined mathematically to give the same result as if the combined block had been digested instead.
The only drawback of his solution is that the resulting block structure is by necessity rather rigid, rather like PKI with its official all-encompassing hierarchy of certifications vs. 'guerrilla style' PGP whose web of trust covers only the few subjects who are of interest. In other words, it requires devising a global addressing structure/hierarchy.
This is the digest in its current form; the change is that the order-dependent part has been simplified to its essential minimum:
void add_prime (word_t n)
{
count += 1;
sum += n;
product *= n;
checksum += n * count;
}
Here are the lessons learnt from practical work with that digest:
count, sum and product (i.e. partial primorial modulo word size) turned out to be exceedingly useful because of the fact that they relate to things also found elsewhere in the world, like certain lists at OEIS
count and sum were very useful because the first tends to be naturally available when manipulating (generating, using, comparing) batches of primes, and the sum is easily computed on the fly with zero effort; this allows partial verification against existing results without going the whole hog of instantiating and updating a digest, and without the overhead of two - comparatively slow - multiplications
count is also exceedingly useful as it must by necessity be part of any indexing superstructure built on systems of digests, and conversely it can guide the search straight to the block (range) containing the nth prime, or to the blocks overlapped by the nth through (n+k)th primes
the order dependency of the fourth component (checksum) turned out be less of a hindrance than anticipated, since small primes tend to 'occur' (be generated or used) in order, in situations where verification might be desired
the order dependency of the checksum - and lack of combinability - made it perfectly useless outside of the specific block for which it was generated
fixed-size auxiliary program structures - like the ubiquitous small factor bitmaps - are best verified as raw memory for startup self-checks, instead of running a primes digest on them; this drastically reduces complexity and speeds things up by several orders of magnitude
For many practical purposes the order-dependent checksum could simply be dropped, leaving you with a three-component digest that is trivially combinable for adjacent ranges.
For verification of fixed ranges (like in self-tests) the checksum component is still useful. Any other kind of checksum - the moral equivalent of a CRC - would be just as useful for that and probably faster. It would be even more useful if an order-independent (combinable) way of supplementing the resolution of the first three components could be found. Extending the resolution beyond the first three components is most relevant for bigger computing efforts, like sieving, verifying and digesting trillions of primes for posterity.
One such candidate for an order-independent, combinable fourth component is the sum of squares.
Overall the digest turned out to be quite useful as is, despite the drawbacks concerning the checksum component. The best way of looking at the digest is probably as consisting of a 'characteristic' part (the first three components, combinable) and a checksum part that is only relevant for the specific block. The latter could just as well be replaced with a hash of any desired resolution. Kaganar's solution indicates how this checksum/hash can be integrated into a system that extends beyond a single block, despite its inherent non-combinability.
The summary of prime number sources seems to have fallen by the wayside, so here it is:
up to 1,000,000,000,000 available as files from sites like primos.mat.br
up to 2^64-10*2^64 in super-fast bulk via the primesieve.org console program (pipe)
up to 2^64-1 - and beyond - via the gp/PARI program (pipe, about 1 million primes/minute)
I'm answering this question again in a second answer since this is a very different and hopefully better tack:
It occurred to me that what you're doing is basically looking for a checksum, not over a list of primes, but over a range of a bitfield where a number is prime (bit is set to 1) or it's not (bit is set to 0). You're going to have a lot more 0's than 1's for any interesting range, so you hopefully only have to do an operation for the 1's.
Typically the problem with using a trivial in-any-order hash is that they handle multiplicity poorly and are oblivious to order. But you don't care about either of these problems -- every bit can only be set or unset once.
From that point of view, a bitwise-exclusive-or or addition should be just fine if combined with a good hashing function of the index of the bit -- that is, the found prime. (If your primes are 64-bit you could go with some of the functions here.)
So, for the ultimate simplicity that will give you the same value for any set of ranges of inputs, yes, stick to hashing and combining it with a simple operation like you are. But change to a traditional hash function which appears "random" given its input -- hash64shift on the linked page is likely what you're looking for. The probability of a meaningful collision is remote. Most hash functions stink, however -- make sure you pick one that is known to have good properties. (Avalanches well, etc.) Thomas Wang's are usually not so bad. (Bob Jenkin's are fantastic, but he sticks mostly to 32 bit functions. Although his mix function on the linked page is very good, but probably overkill.)
Parallelizing the check is obviously trivial, the code size and effort is vastly reduced from my other answer, and there's much less synchronization and almost no buffering that needs to occur.
I am having run-time memory allocation errors with a C++ application. I have eliminated memory leaks, invalid pointer references and out-of-bounds vector assignments as the source of the issue - I am pretty sure it's something to do with memory fragmentation and I am hoping to get help with how to further diagnose and correct the problem.
My code is too large to post (about 15,000 lines - I know not huge but clearly too large to put online), so I am going to describe things with a few relevant snippets of code.
Basically, my program takes a bunch of string and numerical data sets as inputs (class objects with vector variables of type double, string, int and bool), performs a series of calculations, and then spits out the resulting numbers. I have tested and re-tested the calculations and outputs - everything is calculating as it should, and on smaller datasets things run perfectly.
However, when I scale things up, I start getting memory allocation errors, but I don't think I am even close to approaching the memory limits of my system - please see the two graphs below...my program cycles through a series of scenarios (performing identical calculations under a different set of parameters for each scenario) - in the first graph, I run 7 scenarios on a dataset of about 200 entries. As the graph shows, each "cycle" results in memory swinging up and back down to its baseline, and the overall memory usage is tiny (see the seven small blips on the right half of the bottom graph). On the second graph, I am now running a dataset of about 10,000 entries (see notes on dataset below). In this case, I only get through 2 full cycles before getting my error (as it is trying to resize a class object for the third scenario). You can see the first two scenarios in the bottom right-half graph; a lot more memory usage than before, but still only a small fraction of available memory. And as with the smaller dataset, usage increases while my scenario runs, and then decreases back to it's initial level before reaching the next scenario.
This pattern, along with other tests I have done, lead me to believe it's some sort of fragmentation problem. The error always occurs when I am attempting to resize a vector, although the particular resize operation that causes the error varies based on the dataset size. Can anyone help me understand what's going on here and how I might fix it? I can describe things in much greater detail but already felt like my post was getting long...please ask questions if you need to and I will respond/edit promptly.
Clarification on the data set
The numbers 200 and 10,000 represent the number of unique records I am analyzing. Each record contains somewhere between 75 and 200 elements / variables, many of which are then being manipulated. Further, each variable is being manipulated over time and across multiple iterations (both dimensions variable). As a result, for an average "record" (the 200 to 10,000 referenced above), there could be easily as many as 200,000 values associated with it - a sample calculation:
1 Record * 75 Variables * 150 periods * 20 iterations = 225,000 unique values per record.
Offending Code (in this specific instance):
vector<LoanOverrides> LO;
LO.resize(NumOverrides + 1); // Error is occuring here. I am certain that NumOverrides is a valid numerical entry = 2985
// Sample class definition
class LoanOverrides {
public:
string IntexDealName;
string LoanID;
string UniqueID;
string PrepayRate;
string PrepayUnits;
double DefaultRate;
string DefaultUnits;
double SeverityRate;
string SeverityUnits;
double DefaultAdvP;
double DefaultAdvI;
double RecoveryLag;
string RateModRate;
string RateModUnits;
string BalanceForgivenessRate;
string BalanceForgivenessRateUnits;
string ForbearanceRate;
string ForbearanceRateUnits;
double ForbearanceRecoveryRate;
string ForbearanceRecoveryUnits;
double BalloonExtension;
double ExtendPctOfPrincipal;
double CouponStepUp;
};
You have a 64-bit operating system capable of allocating large quantities of memory, but have built your application as a 32-bit application, which can only allocate a maximum of about 3GB of memory. You are trying to allocate more than that.
Try compiling as a 64-bit application. This may enable you to reach your goals. You may have to increase your pagefile size.
See if you can dispose of intermediate results earlier than you are currently doing so.
Try calculating how much memory is being used/would be used by your algorithm, and try reworking your algorithm to use less.
Try avoiding duplicating data by reworking your algorithm. I see you have a lot of reference data, which by the looks of it isn't going to change during the application run. You could put all of that into a single vector, which you allocate once, then refer to them via integer indexes everywhere else, rather than copying them. (Just guessing that you are copying them).
Try avoiding loading all the data at once by reworking your algorithm to work on batches.
Without knowing more about your application, it is impossible to offer better advice. But basically you are running out of memory because you are allocating a huge, huge amount of it, and based on your application and the snippets you have posted I think you can probably avoid doing so with a little thought. Good luck.
So I know functionally what I would like to happen, I just don't know the best way to make a computer do it... in C++...
I would like to implement a C++ function that maps a 10 bit sequence to a 6 bit sequence.
Nevermind what the bits stand for right now... There are 2^10 = 1024 possible inputs. There are 2^6 = 64 different outputs. Probably lots of patterns. Obviously lots of patterns. But it's complicated. It's s a known mapping, just a complicated mapping.
The output is just one of 64 possibilities. Maybe they all don't get used. They probably won't. But assume they do.
Right now, I'm thinking a quadruple nested switch statement that just takes care of each of the 1024 cases and takes care of business inline, assigning appropriate values to whatever pointer to whatever structure I passed to this function. This seems to be naive and sort of slow. Not that I've implemented it, but that's why I want to ask you first.
This basic function (mapping) will have to be run at every statement node, often more than once, for as many statements this system wishes to support. I ask you, how do I map 10 bits to 6 bits as efficiently as possible in C++?
I know what the mapping is, I know which inputs of 10 bits go with what output of 6 bits... I could totally hard-code that ... somehow? Multi-switch is so ugly. How can I map my 10 bits to 6 bits?! Neural net? Memory muffin? What would you do?
Note to self: So here is why I am not a fan of the lookup table. Let's assume all inputs are equally likely (of course they are not, and could be ordered more effectively, but still) then it will take on average 512 memory advances of the array to retrieve the output values... It seems that if you make a (global, why not) binary tree 10 levels deep, you cover the 1024 inputs and can retrieve the output in an average of just 10 steps... and maybe less if there are good patterns... given a deterministic function that is run so often, how best to retrieve known outputs from known inputs?
I would use a lookup table of 1024 elements. So hard-code that and just access it by index.
This saves the need for a massive switch statement and will probably be much more readable.
Depends on your definition of efficiency.
Time-efficient: Look-up table.
Space-efficient: Use a Karnaugh map.
Use a look-up table of size 1024.
If you need to map to some particular 6-bit values, use a lookup table of size 64 (not 1024!) after dividing by 16. This will fit into the cache more easily than a 16-times redundant 1024-entry table (and, the 2 extra cycles for a right shift outweight the cost of a possible cache miss by far).
Otherwise, if a simple sequential mapping is fine, just do a divide by 16.
1024/64 = 16, so dividing by 16 (a right shift with compiler optimizations turned on), maps to 6 bits (sequentially). It cannot get more efficient than that.
My (DSP) application produces data at a constant rate. The rate depends on the configuration that is selected by the user. I would like to know how many bytes are generated per second. The data structure contains a repeated (packed) floating point field. The length of the field is constant, but can be changed by the user.
Is there a protocol buffers function that will calculate the message size before serialization?
If you have build the message objects, you can call ByteSize() on the message which returns the number of bytes the serializes message would take up. There is a link to the C++ docs of ByteSize.
It's impossible to know ahead of time, because protobuf packs the structures it is given into the fewest bytes possible - it won't use four bytes for int x = 1; for example - so the library would have to walk the entire graph to know the output size.
I believe you could find this out by doing a serialize operation to a protobuf-compliant stream of your own design that just counts the bytes it is given. That could be costly, but no more costly than it would be for the library to do that work.
You could fill out the message without sending it and then call CalculateSize() on it