I'm curious to know how does the algorithm of 'Beyond Compare' work?
I guess there's a standard (well-known?) algorithm they used to implement the "character .vs. character" comparison. Do you know the name of this algorithm? Thank you
Beyond Compare uses a number of different algorithms depending on the file type and configuration. In v4 the line alignment algorithms are explicitly named in the interface:
Standard alignment - This is a proprietary algorithm; we haven't made the details publicly available.
Myers O(ND) alignment - This is the same one that the GNU diff utility and most other applications use. It's based on the paper "An O(ND) difference algorithm and its variations" by Eugene Myers (1986).
Patience Diff alignment - This is the "Patience Diff" algorithm that Bram Cohen originally developed for Bazaar, which he talks about here.
The character alignment to highlight differences within lines is based on the Myers O(ND) algorithm with some post-processing to clean up the results.
Related
I'm looking for a C++-based alternative to the SystemVerilog language.
While I doubt anything out there can match the simplicity and flexibility of the SystemVerilog constraint language, I have settled on using either Z3 or Gecode for what I'm working on, primarily because they're both under the MIT license.
What I'm looking for is:
Support for variable-sized bit vectors AND bit vector arithmetic logic operations. For example:
bit_vector a<30>;
bit_vector b<30>;
constraint {
a == (b << 2);
a == (b * 2);
b < a;
}
The problem with Gecode, as far as I can tell, is that it does not provide bit vectors right out of the box. However, its programming model seems a bit simpler, and it does provide a means for one to create their own types of variables. So I could perhaps creates some kind of wrapper around the C++ bitset, similar to how IntVar wraps around 32-bit integers. However, that would lack the ability to perform multiplication-based constrains, since C++ bitsets don't support such operations.
Z3 does provide bit vectors right out of the box, but I'm not sure how it would handle trying to set constraints on, for example, 128-bit vectors. I'm also unsure how I can specify that I want to produce a variety of randomized variables that satisfy a constraint when possible. With Gecode, it's much clearer given how thorough its documentation is.
A simplistic constraint programming model, close or similar to SystemVerilog. For example, a language where I only need to type (x == y + z) instead of something like EQ(x, y + z). As far as I can tell, both APIs provide such a simple programming model.
A means of performing constrained randomization, for the sake of producing random stimulus. As in, I can provide some random seed that, depending on the constraints, result in an answer that may differ from the previous answer. Similar to how SystemVerilog randomize calls may produce new random results. Gecode seems to support the use of random seeds. Z3, it's much less clear.
Support for weighted distribution. Gecode appears to support this via weighted sets. I imagine I can establish a relationship between certain conditions and boolean variables, and then add weights to those variables. Z3 appears to be more flexible in that you can assign weights to expressions, via the optimize class.
At the moment, I'm undecided, because Z3 lacks in documentation what Gecode lacks in out-of-the-box variable types. I'm wondering if anyone has any prior experience using either tool to achieve what SystemVerilog could. I'd like to hear any suggestions for any other API under a flexible license as well.
While z3 (or any SMT solver) can handle all of these, getting a nice sampling of satisfying assignments would be rather difficult to control. SMT solvers are optimized for just giving you a model, and they don't have much in terms of how you want to sample the solution space.
Incidentally, this is an active research area in SMT solving. Here's a paper that appeared only 6 weeks ago on this very topic: https://ieeexplore.ieee.org/document/8894251
So, I'd say if support for "good sampling" is your primary motivation, using an SMT solver is probably not the best choice. If your goal is to find satisfying assumptions for bit-vectors expressed conveniently (there are high level APIs in any language you can imagine these days), then z3 would be an extremely fine choice.
From your description, good sampling sounds like the primary motivation though, and for that SMT solvers are probably not that great. At least not for the time being.
Are there any good libraries in c++ for sequential nonlinear optimization with constraints?
I am looking for inequality constraints and/or upper and lower bounds.
There is a stackoverflow question already for this but not all of them have constraints.
I know of NLopt, but it doesn't work well for my specific problem. Are there any others?
I finally found the solution that i was looking for if any one else is interested lpOpt
One SQP algorithm that you could try is DONLP2. It was originally written in Fortran 77 but there is an ANSI C version as well. It uses dense algebra, so it is primarily suitable for small to medium-sized problems. It is free for academic use. You need to request the code directly from the author, follow the instructions in the link.
UPDATE Sequential Quadratic Programming is only one approach to solving non-linear objective functions with constraints, there is also for example interior point methods. One very good large-scale open-source C++ alternative that applies the interior point approach is Ipopt (already mentioned in another answer). There is also for example the commercial package KNITRO. If you cannot or do not want to provide objective function and constraints gradients, you could also have a look at COBYLA2, of which a C version can be downloaded here.
For further inspiration, you could also consult the Decision Tree For Optimization Software, which lists different optimization codes suitable for a wide range of different problems.
I try to code in c++ a component labelling code uses two pass algorithm with 4-connectivity. You might want to see https://en.wikipedia.org/wiki/Connected-component_labeling. In that algorithm there is a data structure named as union-find. I cannot get the structure of that and cannot code it since I cannot understand how the algorithm is using that structure.
Do you know how to use union-find in that algorithm or at least Is there any native library in C++ environment or do you know any source to understand that structure. Maybe an animation might be useful.
The data structure of Union-Find is also called a "disjoint-set". You can actually find some more descriptions and information of disjoint-set on its Wikipedia page (http://en.wikipedia.org/wiki/Disjoint-set_data_structure). A more detailed introduction to disjoint-set data structures call be found in the book "Introduction to Algorithms" Chapter 21 (as also shown in Reference 1 of the Wikipedia page.)
Usually when we talk about disjoint-set data structures, we are talking about a specific implementation called "disjoint-set forest". What is good about this specific implementation is that: 1) it is really easy to implement 2) has a perfect time complexity (almost constant).
You can also find some pseudocode of how to implement disjoint-set forest in the Wikipedia page or in Chapter 21 of the book I've mentioned.
See: http://berkeley.intel-research.net/arahimi/connected/ and http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=5F7A5FE1F4DCBA968A0B0E99B0593F71?doi=10.1.1.2.5996&rep=rep1&type=pdf
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I have recently read an article about fast sqrt calculation. Therefore, I have decided to ask SO community and its experts to help me find out, which STL algorithms or mathematical calculations can be implemented faster with programming hacks?
It would be great if you can give examples or links.
Thanks in advance.
System library developers have more concerns than just performance in mind:
Correctness and standards compliance: Critical!
General use: No optimisations are introduced, unless they benefit the majority of users.
Maintainability: Good hand-written assembly code can be faster, but you don't see much of it. Why?
Portability: Decent libraries should be portable to more than just Windows/x86/32bit.
Many optimisation hacks that you see around violate one or more of the requirements above.
In addition, optimisations that will be useless or even break when the next generation CPU comes around the corner are not a welcome thing.
If you don't have profiler evidence on it being really useful, don't bother optimising the system libraries. If you do, work on your own algorithms and code first, anyway...
EDIT:
I should also mention a couple of other all-encompassing concerns:
The cost/effort to profit/result ratio: Optimisations are an investment. Some of them are seemingly-impressive bubbles. Others are deeper and more effective in the long run. Their benefits must always be considered in relation to the cost of developing and maintaining them.
The marketing people: No matter what you think, you'll end up doing whatever they want - or think they want.
Probably all of them can be made faster for a specific problem domain.
Now the real question is, which ones should you hack to make faster? None, until the profiler tells you to.
Several of the algorithms in <algorithm> can be optimized for vector<bool>::[const_]iterator. These include:
find
count
fill
fill_n
copy
copy_backward
move // C++0x
move_backward // C++0x
swap_ranges
rotate
equal
I've probably missed some. But all of the above algorithms can be optimized to work on many bits at a time instead of just one bit at a time (as would a naive implementation).
This is an optimization that I suspect is sorely missing from most STL implementations. It is not missing from this one:
http://libcxx.llvm.org/
This is where you really need to listen to project managers and MBAs. What you're suggesting is re-implementing parts of the STL and or standard C library. There is an associated cost in terms of time to implement and maintenance burden of doing so, so you shouldn't do it unless you really, genuinely need to, as John points out. The rule is simple: is this calculation you're doing slowing you down (a.k.a. you are bound by the CPU)? If not, don't create your own implementation just for the sake of it.
Now, if you're really interested in fast maths, there are a few places you can start. The gnu multi-precision library implements many algorithms from modern computer arithmetic and semi numerical algorithms that are all about doing maths on arbitrary precision integers and floats insanely fast. The guys who write it optimise in assembly per build platform - it is about as fast as you can get in single core mode. This is the most general case I can think of for optimised maths i.e. that isn't specific to a certain domain.
Bringing my first paragraph and second in with what thkala has said, consider that GMP/MPIR have optimised assembly versions per cpu architecture and OS they support. Really. It's a big job, but it is what makes those libraries so fast on a specific small subset of problems that are programming.
Sometimes domain specific enhancements can be made. This is about understanding the problem in question. For example, when doing finite field arithmetic under rijndael's finite field you can, based on the knowledge that the characteristic polynomial is 2 with 8 terms, assume that your integers are of size uint8_t and that addition/subtraction are equivalent to xor operations. How does this work? Well basically if you add or subtract two elements of the polynomial, they contain either zero or one. If they're both zero or both one, the result is always zero. If they are different, the result is one. Term by term, that is equivalent to xor across a 8-bit binary string, where each bit represents a term in the polynomial. Multiplication is also relatively efficient. You can bet that rijndael was designed to take advantage of this kind of result.
That's a very specific result. It depends entirely on what you're doing to make things efficient. I can't imagine many STL functions are purely optimised for cpu speed, because amongst other things STL provides: collections via templates, which are about memory, file access which is about storage, exception handling etc. In short, being really fast is a narrow subset of what STL does and what it aims to achieve. Also, you should note that optimisation has different views. For example, if your app is heavy on IO, you are IO bound. Having a massively efficient square root calculation isn't really helpful since "slowness" really means waiting on the disk/OS/your file parsing routine.
In short, you as a developer of an STL library are trying to build an "all round" library for many different use cases.
But, since these things are always interesting, you might well be interested in bit twiddling hacks. I can't remember where I saw that, but I've definitely stolen that link from somebody else on here.
Almost none. The standard library is designed the way it is for a reason.
Taking sqrt, which you mention as an example, the standard library version is written to be as fast as possible, without sacrificing numerical accuracy or portability.
The article you mention is really beyond useless. There are some good articles floating around the 'net, describing more efficient ways to implement square roots. But this article isn't among them (it doesn't even measure whether the described algorithms are faster!) Carmack's trick is slower than std::sqrt on a modern CPU, as well as being less accurate.
It was used in a game something like 12 years ago, when CPUs had very different performance characteristics. It was faster then, but CPU's have changed, and today, it's both slower and less accurate than the CPU's built-in sqrt instruction.
You can implement a square root function which is faster than std::sqrt without losing accuracy, but then you lose portability, as it'll rely on CPU features not present on older CPU's.
Speed, accuracy, portability: choose any two. The standard library tries to balance all three, which means that the speed isn't as good as it could be if you were willing to sacrifice accuracy or portability, and accuracy is good, but not as good as it could be if you were willing to sacrifice speed, and so on.
In general, forget any notion of optimizing the standard library. The question you should be asking is whether you can write more specialized code.
The standard library has to cover every case. If you don't need that, you might be able to speed up the cases that you do need. But then it is no longer a suitable replacement for the standard library.
Now, there are no doubt parts of the standard library that could be optimized. the C++ IOStreams library in particular comes to mind. It is often naively, and very inefficiently, implemented. The C++ committee's technical report on C++ performance has an entire chapter dedicated to exploring how IOStreams could be implemented to be faster.
But that's I/O, where performance is often considered to be "unimportant".
For the rest of the standard library, you're unlikely to find much room for optimization.
Looking for good source code either in C or C++ or Python to understand how a hash function is implemented and also how a hash table is implemented using it.
Very good material on how hash fn and hash table implementation works.
Thanks in advance.
Hashtables are central to Python, both as the 'dict' type and for the implementation of classes and namespaces, so the implementation has been refined and optimised over the years. You can see the C source for the dict object here.
Each Python type implements its own hash function - browse the source for the other objects to see their implementations.
When you want to learn, I suggest you look at the Java implementation of java.util.HashMap. It's clear code, well-documented and comparably short. Admitted, it's neither C, nor C++, nor Python, but you probably don't want to read the GNU libc++'s upcoming implementation of a hashtable, which above all consists of the complexity of the C++ standard template library.
To begin with, you should read the definition of the java.util.Map interface. Then you can jump directly into the details of the java.util.HashMap. And everything that's missing you will find in java.util.AbstractMap.
The implementation of a good hash function is independent of the programming language. The basic task of it is to map an arbitrarily large value set onto a small value set (usually some kind of integer type), so that the resulting values are evenly distributed.
There is a problem with your question: there are as many types of hash map as there are uses.
There are many strategies to deal with hash collision and reallocation, depending on the constraints you have. You may find an average solution, of course, that will mostly fit, but if I were you I would look at wikipedia (like Dennis suggested) to have an idea of the various implementations subtleties.
As I said, you can mostly think of the strategies in two ways:
Handling Hash Collision: Bucket, which kind ? Open Addressing ? Double Hash ? ...
Reallocation: freeze the map or amortized linear ?
Also, do you want baked in multi-threading support ? Using atomic operations it's possible to get lock-free multithreaded hashmaps as has been proven in Java by Cliff Click (Google Tech Talk)
As you can see, there is no one size fits them all. I would consider learning the principles first, then going down to the implementation details.
C++ std::unordered_map use a linked-list bucket and freeze the map strategies, no concern is given to proper synchronization as usual with the STL.
Python dict is the base of the language, I don't know of the strategies they elected