reporting all prime numbers less than n

reporting all prime numbers less than n - primes

I need to print all the prime numbers less than a given number n. I can use sieve of Eratothenes but the running time of that algorithm IS NOT O(n). Is there any O(n) time solution for this problem?

The Sieve of Eratosthenes has time complexity O(n log log n). The function log log n grows very slowly; for example, log(log(10^9)) = 3. That means you can effectively treat the log log n part of the time complexity as a constant, and ignore it, giving a time complexity of "nearly" O(n).
There are various algorithms that operate in time O(n) or O(n / log log n), including Pritchard's wheel sieves and Atkins' sieve. But constant factors generally make those algorithms slower in practice than the Sieve of Eratosthenes. Unless you need extreme speed, and you know what you are doing, and you are willing to spend lots of time doing it, the practical answer is the Sieve of Eratosthenes.
Your question says that you are going to print the list of primes. In that case, output will dominate the run time of any algorithm you choose. So do yourself a favor and implement a simple Sieve of Eratosthenes.

I don't think you'll find an algorithm for checking an arbitrary number for primality with an O(n) time complexity. I'm pretty certain the NSA (and any other organisations that deal with crypto issues) wouldn't be very happy with that :-)
The only way you'll get an O(n) or better way is to pre-calculate (for example) the first fifty million primes (or use someone else's already-precalculated list) and use that as a lookup. I have such a file locally and it's a simple matter of running grep over it to see if the number is prime. It doesn't help for arbitrary numbers but I rarely have to use ones that big. Crypto guys, of course, would consider such a list vanishingly small for their purposes.
And, if you turn it into a bitmap (about 120M for the first fifty million primes), you can even even reduce the complexity to O(1) by simply turning the number into a byte offset and bit mask - a couple of bit shifts and/or bitwise operations.
However, getting a list of the primes below a certain n is certainly doable in O(n) time complexity. The Atkin and Bernstein paper detailing the Sieve of Atkin claims:
We introduce an algorithm that computes the prime numbers up to N using O(N/log(log(N))) additions ...
which is actually slightly better than O(n).
However, it's still unlikely to compete with a lookup solution. My advice would be to use Atkin or Eratosthenes to make a list - it doesn't really matter since you'll only be doing this bit once so performance would not be critical.
Then use the list itself for checking primality.

Related

How to judge Time Limit for a constraint bound coding?

I have seen many coding sites stating time limits and size of source code constraints to be considered while submitting any solution of a problem. I never make out how can i check whether my code would pass or not like if its exponential time limit is doubtful, or if O(n^2) maybe 2 sec depending on size of input. But how can i get a rough idea that this much size of test case will pass in the stated time?
Some good examples would be helpful.

There are some rules of thumb, but a lot depends on the hardware/programming language the judge system is using. The best way is to make some tests like for loops or putting random numbers in a priority queue, just to get a feeling for it.
Mostly if you need more than 10^7 steps (which can consist of several simple operations), than you have to watch out for time out. That means:
If running time is O(n!), than n>11 is already critical: you have at least 10^7 operations and it is a lot.
If running time is O(2^n), than it is safe for n<=20, but too risky for n>25
If running time is O(n^3), than n should be around 300.
For O(n^2), n=5000 could be Ok but 10000 would very probable fail.
For n<=200000, algorithms with O(nlogn) are mostly ok.
For n<=10^7 linear running times are Ok, after that you would need sublinear algorithms.
But, as already said, these numbers can vary depending on the judging system

SIMD Implementation of std::nth_element

I have an algorithm that runs on my dual-core, 3 GHz Intel processor in on average 250ms, and I am trying to optimize it. Currently, I have an std::nth_element call that is invoked around 6,000 times on std::vectors of between 150 and 300 elements, taking on average 50ms. I've spent some time optimizing the comparator I use, which currently looks up two doubles from a vector and does a simple < comparison. The comparator takes a negligible fraction of the time to run std::nth_element. The comparator's copy-constructor is also simple.
Since this call is currently taking 20% of the time for my algorithm, and since the time is mostly spent in the code for nth_element that I did not write (i.e. not the comparator), I'm wondering if anyone knows of a way of optimizing nth_element using SIMD or any other approach? I've seen some questions on parallelizing std::nth_element using OpenCL and multiple threads, but since the vectors are pretty short, I'm not sure how much benefit I would get from that approach, though I'm open to being told I'm wrong.
If there is an SSE approach, I can use any SSE instruction up to (the current, I think) SSE4.2.
Thanks!

Two thoughts:
Multithreading probably won't speed up processing for any single vector, but might help you as the number of vectors grows large.
Sorting is too powerful a tool for your problem: you're computing the entire order of the vector, but you don't care about anything but the top few. You know for each vector how many elements make up the top 5%, so instead of sorting the whole thing you should make one pass through the array and find the k largest. You can do this is O(n) time with k extra space, so it's probably a win overall.

What are the criteria for choosing a sorting algorithm?

I was reading sorting method which include bubble sort, selection sort, merge sort, heap sort, bucket sort etc.. They also contain time complexity which help us to know which sorting is efficient. So I had a basic question. If we contain data than how will we be choose sorting. Time complexity is one of parameter which help us to decide sorting method. But do we have another parameter to choose sorting method?.
Just trying to figure out sorting for better understanding.
Having some query about heap sort:
Where do we use heap sort?
What is bigger advantage of heap sort (except time complexity O(n log n))?
What is disadvantage of heap sort?
What is build time for heap? (I heard O(n) but I'm not sure.)
Any scenario where we have to use heap sort or heap sort is better option (except priority queue)?
Before applying the heap sort on data, what are the parameter will we look into data?

The two main theoretical features of sorting algorithms are time complexity and space complexity.
In general, time complexity lets us know how the performance of the algorithm changes as the size of the data set increases. Things to consider:
How much data are you expecting to sort? This will help you know whether you need to look for an algorithm with a very low time complexity.
How sorted will your data be already? Will it be partly sorted? Randomly sorted? This can affect the time complexity of the sorting algorithm. Most algorithms will have worst and best cases - you want to make sure you're not using an algorithm on a worst-case data set.
Time complexity is not the same as running time. Remember that time complexity only describes how the performance of an algorithm varies as the size of the data set increases. An algorithm that always does one pass over all the input would be O(n) - it's performance is linearly correlated with the size of the input. But, an algorithm that always does two passes over the data set is also O(n) - the correlation is still linear, even if the constant (and actual running time) is different.
Similarly, space complexity describes how much space an algorithm needs to run. For example, a simple sort such as insertion sort needs an additional fixed amount of space to store the value of the element currently being inserted. This is an auxiliary space complexity of O(1) - it doesn't change with the size of the input. However, merge sort creates extra arrays in memory while it runs, with an auxiliary space complexity of O(n). This means the amount of extra space it requires is linearly correlated with the size of the input.
Of course, algorithm design is often a trade-off between time and space - algorithms with a low space complexity may require more time, and algoithms with a low time complexity may require more space.
For more information, you may find this tutorial useful.
To answer your updated question, you may find the wikipedia page on Heap Sort useful.

If you mean criteria for what type of sort to choose, here are some other items to consider.
The amount of data you have: To you have ten, one hundred, a thousand or millions of items to be sorted.
Complexity of the algorithm: The more complex the more testing will need to be done to make sure it is correct. For small amounts, a bubble sort or quick sort is easy to code and test, verse other sorts which may be overkill for the amount of data you have to sort.
How much time will it take to sort: If you have a large set, bubble/quick sort will take a lot of time, but if you have a lot of time, that may not be an issue. However, using a more complex algorithm will cut down the time to sort, but at the cost of more effort in coding and testing, which may be worth it if sorting goes from long (hours/days) to a shorter amount of time.
The data itself: Is the data close to being the same for everything. For some sorts you may end up with a linear list, so if you know something about the composition of the data, it may help in determining which algorithm to choose for the effort.
The amount of resources available: Do you have lots of memory in which you store all items, or do you need to store items to disk. If everything cannot fit in memory, merge sort may be best, where other may be better if you work with everything in memory.

Can function overhead slow down a program by a factor of 50x?

I have a code that I'm running for a project. It is O(N^2), where N is 200 for my case. There is an algorithm that turns this O(N^2) to O(N logN). This means that, with this new algorithm, it should be ~100 times faster. However, I'm only getting a factor of 2-fold increase (aka 2x faster).
I'm trying to narrow down things to see if I messed something up, or whether it's something inherent to the way I coded this program. For starters, I have a lot of function overhead within nested classes. For example, I have a lot of this (within many loops):
energy = globals->pair_style->LJ->energy();
Since I'm getting the right results when it comes to actual data, just wrong speed increase, I'm wondering if function overhead can actually cause that much speed decrease, by as much as 50-fold.
Thanks!

Firstly, your interpretation that O(N logN) is ~100 times faster than O(N^2) for N=200 is incorrect. The big-Oh notation deals with upper bounds and behaviour in the limit, and doesn't account for any multiplicative constants in the complexity.
Secondly, yes, on modern hardware function calls tend to be relatively expensive due to pipeline disruption. To find out how big a factor this is in your case, you'd have to come up with some microbenchmarks.

The absoloute biggest hit is cache misses. An L1 cache miss is relatively cheap but when you miss on L2 (or L3 if you have it) you may be losing hundreds or even thousands of cycles to the incoming stall.
Thing is though this may only be part of the problem. Do not optimise your code until you have profiled it. Identify the slow areas and then figure out WHY they are slow. Once you have an understanding of why its running slowly you have a good chance to optimise it.
As an aside, O notation is very handy but is not the be all and end all. I've seen O(n^2) algorithms work significantly faster than O(n log n) for small amounts fo data (and small may mean less than several thousand) due to the fact that they cache far more effectively.

The important thing about Big O notation is that it only specifies the limit of the execution time, as the data set size increases - any constants are thrown away. While O(N^2) is indeed slower than O(N log N), the actual run times might be N^2 vs. 1000N log N - that is, an O(N^2) can be faster than O(N log N) on some data sets.
Without more details, it's hard to say more - yes, function calls do indeed have a fair amount of overhead, and that might be why you're not seeing a bigger increase in performance - or it might just be the case that your O(N log N) doesn't perform quite as well on a data set of your size.

I've worked on image processing algorithms, and calling a function per pixel (ie: for 640x480 would be 307200) can significanly reduce performance. Try declaring your function inline, or making the function a macro. This can quickly show you if it is because of function calls. Try looking at some profiling tools. VS 2010 comes with some nice tools, or else there is also Intel VTune, glowcode. They can help show where you are spending time.
IMHO I don't think that 1600 function calls should reduce performance much at all (200 log 200)

I suggest profiling it using
The big FAQ topic on profiling is here: How can I profile C++ code running in Linux?
gprof (requires compiletime instrumentation)
valgrind --tool=callgrind and kcachegrind; excellent tool with excellent visualization - screenshots here:

Complexity of external merge sort

What is the complexity of a 2 phase multi-way external sort using quick sort (nlogn) as internal sort.

Not an expert here but...
If I understand correctly, what you describe as phases are the number of passes your algorithm will make over the input, right? In this case, a running time approximation would be the number of passes (2 in your case) * the time necessary to read and write the whole input to the external device.
When evaluating complexity of such algorithms it is hard to put it in usual running time terms. There are many aspects that could influence the result (sequential/non-sequential access, technology, etc). The common approach is to provide complexity in terms of passes, which accounts from the number of devices used, the number of items in the input, and the number of items that can fit in memory.
The point is that the sorting algorithm is dominated by the IO operations. Internal quick sort should be ok (although its quadratic worst case).
Also, I'm not sure if you counted the initial distribution. This is also a pass.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js