Can Bubble Sort be categorized as a "decrease and conquer" algorithm? - bubble-sort

I have found many implementations of bubble sort online, referred to as divide and conquer solutions. However, I believe decrease and conquer is more appropriate, according to the definitions:
-A divide-and-conquer algorithm recursively breaks down a problem into two or more sub-problems of the same or related type, until these become simple enough to be solved directly.
-A “divide and conquer” should be used only when each problem may generate two or more subproblems. The name "decrease and conquer" has been proposed instead for the single-subproblem class.

Related

Getting all solutions in Google or-tools

I have a linear problem of finding all solutions that meet all constraints.
For example my variables are = [0.323, 0.123, 1.32, 6.3...]
Is it possible to get for example top 100 solutions sorted by fitness(maximization/minimization) function?
In a continuous LP enumerating different solutions is a difficult concept. E.g. consider max x, s.t. x <= 1. Obviously x=1, x=0.99999 are solutions and so are the infinite number of solutions in between. We could enumerate "corner solutions" (or basic solutions). See here for an example. Such a scheme could be adapted to find the first 100 different corner points sorted by the objective. For models with discrete variables, many constraint programming solvers will give you the possibility to find many solutions.
If you can define a fitness function as you suggested, then you might first want to solve the LP that maximizes this function. Afterwards you can include an objective cutoff that forces your second solution to be slightly worse than the first. You can implement this by introducing a cut that is your objective function with the right hand side of optimal value - epsilon.
Of course, this will not give you all (basic) solutions, but you might discover which variables are always at the same value or how much variance there is between the different solutions.

How can I implement Python sets in another language (maybe C++)?

I want to translate some Python code that I have already written to C++ or another fast language because Python isn't quite fast enough to do what I want to do. However the code in question abuses some of the impressive features of Python sets, specifically the average O(1) membership testing which I spam within performance critical loops, and I am unsure of how to implement Python sets in another language.
In Python's Time Complexity Wiki Page, it states that sets have O(1) membership testing on average and in worst-case O(n). I tested this personally using timeit and was astonished by how blazingly fast Python sets do membership testing, even with large N. I looked at this Stack Overflow answer to see how C++ sets compare when using find operations to see if an element is a member of a given set and it said that it is O(log(n)).
I hypothesize the time complexity for find is logarithmic in that C++ std library sets are implemented with some sort of binary tree. I think that because Python sets have average O(1) membership testing and worst case O(n), they are probably implemented with some sort of associative array with buckets which can just look up an element with ease and test it for some dummy value which indicates that the element is not part of the set.
The thing is, I don't want to slow down any part of my code by switching to another language (since that is the problem im trying to fix in the first place) so how could I implement my own version of Python sets (specifically just the fast membership testing) in another language? Does anybody know anything about how Python sets are implemented, and if not, could anyone give me any general hints to point me in the right direction?
I'm not looking for source code, just general ideas and links that will help me get started.
I have done a bit of research on Associative Arrays and I think I understand the basic idea behind their implementation but I'm unsure of their memory usage. If Python sets are indeed just really associative arrays, how can I implement them with a minimal use of memory?
Additional note: The sets in question that I want to use will have up to 50,000 elements and each element of the set will be in a large range (say [-999999999, 999999999]).
The theoretical difference betwen O(1) and O(log n) means very little in practice, especially when comparing two different languages. log n is small for most practical values of n. Constant factors of each implementation are easily more significant.
C++11 has unordered_set and unordered_map now. Even if you cannot use C++11, there are always the Boost version and the tr1 version (the latter is named hash_* instead of unordered_*).
Several points: you have, as has been pointed out, std::set and
std::unordered_set (the latter only in C++11, but most compilers have
offered something similar as an extension for many years now). The
first is implemented by some sort of balanced tree (usually a red-black
tree), the second as a hash_table. Which one is faster depends on the
data type: the first requires some sort of ordering relationship (e.g.
< if it is defined on the type, but you can define your own); the
second an equivalence relationship (==, for example) and a hash
function compatible with this equivalence relationship. The first is
O(lg n), the second O(1), if you have a good hash function. Thus:
If comparison for order is significantly faster than hashing,
std::set may actually be faster, at least for "smaller" data sets,
where "smaller" depends on how large the difference is—for
strings, for example, the comparison will often resolve after the first
couple of characters, whereas the hash code will look at every
character. In one experiment I did (many years back), with strings of
30-50 characters, I found the break even point to be about 100000
elements.
For some data types, simply finding a good hash function which is
compatible with the type may be difficult. Python uses a hash table for
its set, and if you define a type with a function __hash__ that always
returns 1, it will be very, very slow. Writing a good hash function
isn't always obvious.
Finally, both are node based containers, which means they use a lot
more memory than e.g. std::vector, with very poor locality. If lookup
is the predominant operation, you might want to consider std::vector,
keeping it sorted and using std::lower_bound for the lookup.
Depending on the type, this can result in a significant speed-up, and
much less memory use.

When to use which Sorting algorithm and when definitely shouldn't

We see lot of sorting techniques like Merge, quick, Heap. Could you help me decide which of these sorting techniques to be used in which kind of environment(as in the problem)? When should we use which of these sorting algorithms and where not(their disadvantages in time and space)?
I am expecting answer something in this form:
a) We would use Merge sort when... we should definitely not use Merge Sort when...
b) We would use Quick sort when... we should definitely not use quick Sort when...
There are a few basic parameters that characterize the behavior of each sorting algorithm:
average case computational complexity
worst case computational complexity
memory requirements
stability (i.e. is it a stable sort or not?)
All of these are widely documented for all commonly used sorts, and this is all the information one needs to provide an answer in the format that you want. However, since even four parameters for each sort make for a lot of things -- not all of which will be relevant -- to consider, it isn't a very good idea to try and give such a "scripted" answer. Furthermore, there are even more advanced concepts that could come into consideration (such as behavior when run on almost-sorted or reverse-sorted data, cache performance, resistance to maliciously constructed input), making such an answer even more lengthy and error-prone.
I suggest that you spend some time familiarizing yourself with the four basic concepts mentioned above, perhaps by visualizing how each type of sort works on simple input and reading an introductory text on sorting algorithms. Do this and soon enough you will be able to answer such questions yourself.
For starters, take a look at this comparison table on wikipedia, the comparison criteria will give you clues on what to look for on an algorithm and its possible tradeoffs.

When to use merge sort and when to use quick sort?

The wikipedia article for merge sort.
The wikipedia article for quick sort.
Both articles have excellent visualizations.
Both have n*log(n) complexity.
So obviously the distribution of the data will effect the speed of the sort. My guess would be that since a comparison can just as quickly compare any two values, no matter their spread, the range of data values does not matter.
More importantly one should consider the lateral distribution (x direction ) with respect to ordering (magnitude removed).
A good test case to consider would be if the test data had some level of sorting...
It typically depends on the data structures involved. Quick sort is
typically the fastest, but it doesn't guarantee O(n*log(n)); there are
degenerate cases where it becomes O(n^2). Heap sort is the usual
alternative; it guarantees O(n*log(n)), regardless of the initial order,
but it has a much higher constant factor. It's usually used when you
need a hard upper limit to the time taken. Some more recent algorithms
use quick sort, but attempt to recognize when it starts to degenerate,
and switch to heap sort then. Merge sort is used when the data
structure doesn't support random access, since it works with pure
sequential access (forward iterators, rather than random access
iterators). It's used in std::list<>::sort, for example. It's also
widely used for external sorting, where random access can be very, very
expensive compared to sequential access. (When sorting a file which
doesn't fit into memory, you might break it into chunks which fit into
memory, sort these using quicksort, writing each out to a file, then
merge sort the generated files.)
Mergesort is quicker when dealing with linked lists. This is because pointers can easily be changed when merging lists. It only requires one pass (O(n)) through the list.
Quicksort's in-place algorithm requires the movement (swapping) of data. While this can be very efficient for in-memory dataset, it can be much more expensive if your dataset doesn't fit in memory. The result would be lots of I/O.
These days, there is a lot of parallelization that occurs. Parallelizing Mergesort is simpler than Quicksort (in-place). If not using the in-place algorithm, then the space complexity for quicksort is O(n) which is the same are mergesort.
So, to generalize, quicksort is probably more effective for datasets that fit in memory. For stuff that's larger, it's better to use mergesort.
The other general time to use mergesort over quicksort is if the data is very similar (that is, not close to being uniform). Quicksort relies on using a pivot. In the case where all the values are the similar, quicksort hits a worst case of O(n^2). If the values of the data are very similar, then it's more likely that a poor pivot will be chosen leading to very unbalanced partitions leading to an O(n^2) runtime. The most straightforward example is if all the values in the list are the same.
There is a real-world sorting algorithm -- called Timsort -- that does exploit the idea that data encountered in the wild is often partially sorted.
The algorithm is derived from merge sort and insertion sort, and is used in CPython, Java 7 and Android.
See the Wikipedia article for more details.
While Java 6 and earlier versions use merge sort as the sorting algorithms, C# uses QuickSort as the sorting algorithm.
QuickSort performs better than merge sort even though they are both O(nlogn). QuickSort's has a smaller constant than merge sort.
Of the two, use merge sort when you need a stable sort. You can use a modified quicksort (such as introsort) when you don't, since it tends to be faster and it uses less memory.
Plain old Quicksort as described by Hoare is quite sensitive to performance-killing special cases that make it Theta(n^2), so you normally do need a modified version. That's where the data-distribution comes in, since merge sort doesn't have bad cases. Once you start modifying quicksort you can go on with all sorts of different tweaks, and introsort is one of the more effective ones. It detects on the fly whether it's in a killer case, and if so switches to heapsort.
In fact, Hoare's most basic Quicksort fails worst for already-sorted data, and so your "good test cases" with some level of sorting will kill it to some level. That fact is for curiosity only, though, since it only takes a very small tweak to avoid that, nothing like as complicated as going all the way to introsort. So it's simplistic to even bother analyzing the version that's killed by sorted data.
In practice, in C++ you'd generally use std::stable_sort and std::sort rather than worrying too much about the exact algorithm.
Remember in practice, unless you have a very large data set and/or are executing the sort many many times, it probably won't matter at all. That being said, quicksort is generally considered the 'fastest' n*log(n) sorter. See this question already asked: Quick Sort Vs Merge Sort

What is the best single-source shortest path algorithm for programming contests?

I was working on this graph problem from the UVa problem set. It's a single-source-shortest-paths problem with no negative edge weights. From what I've gathered, the algorithm with the best big-O running time for such problems is Dijkstra with a Fibonacci heap as the priority queue, although practically speaking a binary heap is easier to implement and works pretty well too.
However, it would seem that even a binary heap takes quite some time to roll, and in a competition time is limited. I am aware that the STL provides some heap algorithms and priority queues, but they don't seem to provide a decrease-key function which Dijkstra's needs. Or am I wrong here?
It seems that another possibility is to simply not use Dijkstra's. This forum thread has people claiming that they solved the above problem with breadth-first search / Bellman-Ford, which are much easier to code up. (Edit: OTOH, Dijkstra's with an unsorted array for the priority queue timed out.) That BFS/Bellman-Ford worked surprised me a little as I thought that the input size was quite large. I guess different problems will require solutions of different complexity, but my question is, how often would I need to use Dijkstra's in such competitions? Should I practice more on the simpler-but-slower algorithms instead?
If you can come up with a good best-first heuristics, I would try using A*
Based on my own experience, I never needed to implement Dijkstra algorithm with a heap in a programming contest. You can get away most of the time, using a slower but efficient enough algorithm. You might use a best Dijkstra implementation to solve a problem which expects a different/simpler algorithm, but this is rare the case.
You can implement Dijkstra using heaps/priority queues without decrease-key in (I think) O((E+V)log V). If you want to decrease key simply add the new entry to your priority queue (leaving the old entry still in the queue) and update your array with distances. When you take the minimum element out of your queue first check that it is equal to your distance array, if it isn't then it was a key you wanted to decrease so just ignore it.
The Boost Graph Library appears to have implementations for both Dijkstra and Bellman-Ford.
Dijkstra and a simple priority queue should do nicely even for large datasets. If you're practicing, you could try it also with a binary heap and compare performance. Certainly, I think doing a fibonacci heap is a little fringe and would choose to practice on other data structures and algorithms first.
Interestingly, using a priority queue is equivalent to breadth-first search with the heuristic of exploring the current best solution first.