how msd reduces examine number of keys - c++

I am reading MSD at following link.
http://www.cs.princeton.edu/~rs/AlgsDS07/18RadixSort.pdf
Here it is mentioned that in MSD may not have to examine all the keys on page 20. How is this related to program on page 18. When I try to put the example in code walk through I am not able to understand how we reduced to examine all the keys.
Thanks!

Let's say you have: (0's can really be anything)
600000
100000
300000
500000
400000
200000
If you only sort on the most significant digit, you get:
100000
200000
300000
400000
500000
600000
After this the array is already sorted and we can stop - there's no need to check the other digits.
While it may not be so ideal in practice, there are certainly cases where we don't have to evaluate all the digits of all the elements.
The if (hi <= lo + 1) return; statement should cause it to return if there's only one element in the subarray, preventing checking unnecessary data.

Related

Haskell List Generator High Memory Usage

While working on a competitive programming problem I discovered an interesting issue that drastically reduced the performance of some of my code. After much experimentation I have managed to reduce the issue to the following minimal example:
module Main where
main = interact handle
handle :: String -> String
-- handle s = show $ sum l
-- handle s = show $ length l
-- handle s = show $ seq (length l) (sum l)
where
l = [0..10^8] :: [Int]
If you uncomment each commented line individually, compile with ghc -O2 test.hs and run with time ./test > /dev/null, you should get something like the following:
For sum l:
0.02user 0.00system 0:00.03elapsed 93%CPU (0avgtext+0avgdata 3380maxresident)k
0inputs+0outputs (0major+165minor)pagefaults 0swaps
For length l:
0.02user 0.00system 0:00.02elapsed 100%CPU (0avgtext+0avgdata 3256maxresident)k
0inputs+0outputs (0major+161minor)pagefaults 0swaps
For seq (length l) (sum l):
5.47user 1.15system 0:06.63elapsed 99%CPU (0avgtext+0avgdata 7949048maxresident)k
0inputs+0outputs (0major+1986697minor)pagefaults 0swaps
Look at that huge increase in peak memory usage. This makes some amount of sense, because of course both sum and length can lazily consume the list as a stream, while the seq will be triggering the evaluation of the whole list, which must then be stored. But the seq version of the code is using just shy of 8 GB of memory to handle a list that contains just 400 MB of actual data. The purely functional nature of Haskell lists could explain some small constant factor, but a 20 fold increase in memory seems unintended.
This behaviour can be triggered by a number of things. Perhaps the easiest way is using force from Control.DeepSeq, but the way in which I originally encountered this was while using Data.Array.IArray (I can only use the standard library) and trying to construct an array from a list. The implementation of Array is monadic, and so was forcing the evaluation of the list from which it was being constructed.
If anyone has any insight into the underlying cause of this behaviour, I would be very interested to learn why this happens. I would of course also appreciate any suggestions as to how to avoid this issue, bearing in mind that I have to use just the standard library in this case, and that every Array constructor takes and eventually forces a list.
I hope you find this issue as interesting as I did, but hopefully less baffling.
EDIT: user2407038's comment made me realize I had forgotten to post profiling results. I have tried profiling this code and the profiler simply states that 100% of allocations are performed in handle.l, so it seems that simply anything that forces the evaluation of the list uses huge amounts of memory. As I mentioned above, using the force function from Control.DeepSeq, constructing an Array, or anything else that forces the list causes this behaviour. I am confused as to why it would ever require 8 GB of memory to compute a list containing 400 MB of data. Even if every element in the list required two 64-bit pointers, that is still only a factor of 5, and I would think GHC would be able to do something more efficient than that. If not this is an obvious bottleneck for the Array package, as constructing any array inherently requires us to allocate far more memory than the array itself.
So, ultimately: Does anyone have any idea why forcing a list requires such huge amounts of memory, which has such a high cost on performance?
EDIT: user2407038 provided a link to the very helpful GHC Memory Footprint reference. This explains exactly the data sizes of everything, and almost entirely explains the huge overhead: An [Int] is specified as requiring 5N+1 words of memory, which at 8 bytes per word gives 40 bytes per element. In this example that would suggest 4 GB, which accounts for half the total peak usage. It is easy to then believe that the evaluation of sum would then add a similar factor, so this answers my question.
Thanks to all commenters for your help.
EDIT: As I mentioned above, I originally encountered this behaviour why trying to construct an Array. Having had a bit of a dig into GHC.Arr I have found what I think is the root cause of this behaviour when constructing an array: The constructor folds over the list to compose a program in the ST monad that it then runs. Obviously the ST can't be executed until it is completely composed, and in this case the ST construct will be large and linear in the size of the input. To avoid this behaviour we would have to somehow modify the constructor to stream elements from the list as it adds them in ST.
There are multiple factors that come to play here. The first one is that GHC will lazily lift l out of handle. This would enable handle to reuse l, so that you don't have to recalculate it every time, but in this case it creates a space leak. You can check this if you -ddump-simplified core:
Main.handle_l :: [Int]
[GblId,
Str=DmdType,
Unf=Unf{Src=<vanilla>, TopLvl=True, Value=False, ConLike=False,
WorkFree=False, Expandable=False, Guidance=IF_ARGS [] 40 0}]
Main.handle_l =
case Main.handle3 of _ [Occ=Dead] { GHC.Types.I# y_a1HY ->
GHC.Enum.eftInt 0 y_a1HY
}
The functionality to calculate the [0..10^7] 1 is hidden away in other functions, but essentially, handle_l = [0..10^7], at top-level (TopLvl=True). It won't get reclaimed, since you may or may not use handle again. If we use handle s = show $ length l, l itself will be inlined. You will not find any TopLvl=True function that has type [Int].
So GHC detects that you use l twice and creates a top-level CAF. How big is that CAF? An Int takes two words:
data Int = I# Int#
One for I#, one for Int#. How much for [Int]?
data [a] = [] | (:) a ([a]) -- pseudo, but similar
That's one word for [], and three words for (:) a ([a]). A list of [Int] with size N will therefore have a total size of (3N + 1) + 2N words, in your case 5N+1 words. Given your memory, I assume a word is 8byte on your plattform, so we end up with
5 * 10^8 * 8 bytes = 4 000 000 000 bytes
So how do we get rid of that list? The first option we have is to get rid of l:
handle _ = show $ seq (length [0..10^8]) (sum [0..10^8])
This will now run in constant memory due to foldr/buildr rules. While we have [0..10^8] there twice, they don't share the same name. If we check the -stats, we will see that it runs in constant memory:
> SO.exe +RTS -s
5000000050000000 4,800,066,848 bytes allocated in the heap
159,312 bytes copied during GC
43,832 bytes maximum residency (2 sample(s))
20,576 bytes maximum slop
1 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 9154 colls, 0 par 0.031s 0.013s 0.0000s 0.0000s
Gen 1 2 colls, 0 par 0.000s 0.000s 0.0001s 0.0002s
INIT time 0.000s ( 0.000s elapsed)
MUT time 4.188s ( 4.232s elapsed)
GC time 0.031s ( 0.013s elapsed)
EXIT time 0.000s ( 0.001s elapsed)
Total time 4.219s ( 4.247s elapsed)
%GC time 0.7% (0.3% elapsed)
Alloc rate 1,146,284,620 bytes per MUT second
Productivity 99.3% of total user, 98.6% of total elapsed
But that's not really nice, since we now have to track all the uses of [0..10^8]. What if we create a function instead?
handle :: String -> String
handle _ = show $ seq (length $ l ()) (sum $ l ())
where
{-# INLINE l #-}
l _ = [0..10^7] :: [Int]
This works, but we must inline l, otherwise we get the same problem as before if we use opti­miza­tions. -O1 (and -O2) enable -ffull-laziness, which—together with common sub­expres­sion eli­mi­na­tion—would lift l () to the top. So we either need to inline it or use -O2 -fno-full-laziness to prevent that behaviour.
1 Had to decrease the list size, otherwise I would have started swapping.

Can't figure out how to program in outputs for the given situations

So, I'm working on an assignment for my intro to computer science class. The assignment is as follows.
There is an organism whose population can be determined according to
the following rules:
The organism requires at least one other organism to propagate. Thus,
if the population goes to 1, then the organism will become extinct in
one time cycle (e.g. one breeding season). In an unusual turn of
events, an even number of organisms is not a good thing. The
organisms will form pairs and from each pair, only one organism will
survive If there are an odd number of organisms and this number is
greater than 1 (e.g., 3,5,7,9,…), then this is good for population
growth. The organisms cannot pair up and in one time cycle, each
organism will produce 2 other organisms. In addition, one other
organism will be created. (As an example, let us say there are 3
organisms. Since 3 is an odd number greater than 1, in one time
cycle, each of the 3 organisms will produce 2 others. This yields 6
additional organisms. Furthermore, there is one more organism
produced so the total will be 10 organisms, 3 originals, 6 produced by
the 3, and then 1 more.)
A: Write a program that tests initial populations from 1 to 100,000.
Find all populations that do not eventually become extinct.
Write your answer here:
B: Find the value of the initial population that eventually goes
extinct but that has the largest number of time cycles before it does.
Write your answer here:
The general idea of what I have so far is (lacking sytanx) is this with P representing the population
int generations = 0;
{
if (P is odd) //I'll use a modulus modifier to divide by two and if the result is not 0 then I'll know it's odd
P = 3P + 1
else
P = 1/2 P
generations = generations + 1
}
The problem for me is that I'm uncertain how to tell what numbers will not go extinct or how to figure out which population takes the longest time to go extinct. Any suggestions would be helpful.
Basically what you want to do is this: wrap your code into a while-loop that exits if either P==1 or generations > someMaxValue.
Wrap this construct into a for-loop that counts from 1 to 100,000 and uses this count to set the initial P.
If you always store the generations after your while-loop (e.g. into an array) you can then search for the greatest element in the array.
This problem can actually be harder than it looks at the first sight. First, you should use memorization to speed things up - for example, with 3 you get 3 -> 10 -> 5 -> 16 -> 8 -> 4 -> 2 -> 1 -> 0, so you know the answer for all those numbers as well (note that every power of 2 will extinct).
But as pointed out by #Jerry, the problem is with the generations which eventually do not extinct - it will be difficult to say when to actually stop. The only chance is that there will (always) be a recurrence (number of organisms you already passed once when examining the current number of organisms), then you can say for sure that the organisms will not extinct.
Edit: I hacked a solution quickly and if it is correct, you are lucky - every population between 1-100,000 seems to eventually extinct (as my program terminated so I didn't actually need to check for recurrences). Will not give you the solution for now so that you can try by yourself and learn, but according to my program the largest number of cycles is 351 (and the number is close to 3/4 of the range). According to the google search for Collatz conjecture, that is a correct number (they say 350 to go to population of 1, where I'm adding one extra cycle to 0), also the initial population number agrees.
One additional hint: Check for integer overflow, and use 64-bit integer (unsigned __int64, unsigned long long) to calculate the population growth, as with 32-bit unsignet int, there is already an overflow in the range of 1-100,000 (the population can indeed grow much higher intermediately) - that was a problem in my initial solution, although it did not change the result. With 64-bit ints I was able to calculate up to 100,000,000 in relatively decent time (didn't try more; optimized release MSVC build), for that I had to limit the memo table to first 80,000,000 items to not go out of memory (compiled in 32-bit with LARGEADDRESSAWARE to be able to use up to 4 GB of memory - when compiled 64-bit the table could of course be larger).

Finding all values that occurs odd number of times in huge list of positive integers

I came across this question from a colleague.
Q: Given a huge list (say some thousands)of positive integers & has many values repeating in the list, how to find those values occurring odd number of times?
Like 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 1 2 3 4 5 6 1 2 3 4 5 1 2 3 4 1 2 3 1 2 1...
Here,
1 occrus 8 times
2 occurs 7 times (must be listed in output)
3 occurs 6 times
4 occurs 5 times (must be listed in output)
& so on... (the above set of values is only for explaining the problem but really there would be any positive numbers in the list in any order).
Originally we were looking at deriving a logic (to be based on c).
I suggested the following,
Using a hash table and the values from the list as an index/key to the table, keep updating the count in the corresponding index every time when the value is encountered while walking through the list; however, how to decide on the size of the hash table?? I couldn't say it surely though it might require Hashtable as big as the list.
Once the list is walked through & the hash table is populated (with the 'count' number of occurrences for each values/indices), only way to find/list the odd number of times occurring value is to walk through the table & find it out? Is that's the only way to do?
This might not be the best solution given this scenario.
Can you please suggest on any other efficient way of doing it so??
I sought in SO, but there were queries/replies on finding a single value occurring odd number of times but none like the one I have mentioned.
The relevance for this question is not known but seems to be asked in his interview...
Please suggest.
Thank You,
If the values to be counted are bounded by even a moderately reasonable limit then you can just create an array of counters, and use the values to be counted as the array indices. You don't need a tight bound, and "reasonable" is somewhat a matter of platform. I would not hesitate to take this approach for a bound (and therefore array size) sufficient for all uint16_t values, and that's not a hard limit:
#define UPPER_BOUND 65536
uint64_t count[UPPER_BOUND];
void count_values(size_t num_values, uint16_t values[num_values]) {
size_t i;
memset(count, 0, sizeof(count));
for (i = 0; i < num_values; i += 1) {
count[values[i]] += 1;
)
}
Since you only need to track even vs. odd counts, though, you really only need one bit per distinct value in the input. Squeezing it that far is a bit extreme, but this isn't so bad:
#define UPPER_BOUND 65536
uint8_t odd[UPPER_BOUND];
void count_values(size_t num_values, uint16_t values[num_values]) {
size_t i;
memset(odd, 0, sizeof(odd));
for (i = 0; i < num_values; i += 1) {
odd[values[i]] ^= 1;
)
}
At the end, odd[i] contains 1 if the value i appeared an odd number of times, and it contains 0 if i appeared an even number of times.
On the other hand, if the values to be counted are so widely distributed that an array would require too much memory, then the hash table approach seems reasonable. In that case, however, you are asking the wrong question. Rather than
how to decide on the size of the hash table?
you should be asking something along the lines of "what hash table implementation doesn't require me to manage the table size manually?" There are several. Personally, I have used UTHash successfully, though as of recently it is no longer maintained.
You could also use a linked list maintained in order, or a search tree. No doubt there are other viable choices.
You also asked
Once the list is walked through & the hash table is populated (with the 'count' number of occurrences for each values/indices), only way to find/list the odd number of times occurring value is to walk through the table & find it out? Is that's the only way to do?
If you perform the analysis via the general approach we have discussed so far then yes, the only way to read out the result is to iterate through the counts. I can imagine alternative, more complicated, approaches wherein you switch numbers between lists of those having even counts and those having odd counts, but I'm having trouble seeing how whatever efficiency you might gain in readout could fail to be swamped by the efficiency loss at the counting stage.
In your specific case, you can walk the list and toggle the value's existence in a set. The resulting set will contain all of the values that appeared an odd number of times. However, this only works for that specific predicate, and the more generic count-then-filter algorithm you describe will be required if you wanted, say, all of the entries that appear an even number of times.
Both algorithms should be O(N) time and worst-case O(N) space, and the constants will probably be lower for the set-based algorithm, but you'll need to benchmark it against your data. In practice, I'd run with the more generic algorithm unless there was a clear performance problem.

Permutations of English Alphabet of a given length

So I have this code. Not sure if it works because the runtime for the program is still continuing.
void permute(std::vector<std::string>& wordsVector, std::string prefix, int length, std::string alphabet) {
if (length == 0) {
//end the recursion
wordsVector.push_back(prefix);
}
else {
for (int i = 0; i < alphabet.length(); ++i) {
permute(wordsVector, prefix + alphabet.at(i), length - 1, alphabet);
}
}}
where I'm trying to get all combinations of characters in the English alphabet of a given length. I'm not sure if the approach is correct at the moment.
Alphabet consists of A-Z in a string of length 26. WordsVectors holds all the different combinations of words. prefix is meant to pass through recursively until a word is made and length is self explanatory.
Example, if I give the length of 7 to the function, I expect a size of 26 x 25 x 24 x 23 x 22 x 21 x 20 = 3315312000 if I'm correct, following the formula for permutations.
I don't think programs are meant to run this long so either I'm hitting an infinite loop or something is wrong with my approach. Please advise. Thanks.
Surely the stack would overflow but concentrating on your question even if you write an iterative program it will take a long time ( not an infinite loop just very long )
[26L, 650L, 15600L, 358800L, 7893600L, 165765600L, 3315312000L, 62990928000L, 1133836704000L, 19275223968000L, 308403583488000L, 4626053752320000L, 64764752532480000L, 841941782922240000L, 10103301395066880000L, 111136315345735680000L, 1111363153457356800000L, 10002268381116211200000L, 80018147048929689600000L, 560127029342507827200000L, 3360762176055046963200000L, 16803810880275234816000000L, 67215243521100939264000000L, 201645730563302817792000000L, 403291461126605635584000000L, 403291461126605635584000000L]
The above list is the number of possibilities for 1<=n<=26. You can see as n increases number of possibilities increases tremendously. Say you have 1GHz processor that does 10^9 operations per second. Say consider number of possibilities for n=26 its 403291461126605635584000000L. Its evident that if you sit down to list all possibilities its so so long ( so so many years ) that
you will feel it has hit an infinite loop. Finally I have not looked that closely into your code , but in nutshell even if you write it correctly,iteratively and don't store (again can't have this much memory) and just print all possibilities its going to take long time for larger values of n.
EDIT
As jaromax and others said if you just want to write it for smaller values of n,
say less than 10-12 you can write an iterative program to list/print them. It will run quite fast for small values. But if you also want to store them them then n will have to be say less than 5 say. (Really depends on how much RAM is available or you could find some permutations write them to disk, then depends on how much disk memory you can spare, again refer the number of possibilities list I posted above. It gives a rough idea of both time and space complexity).
I think there could be quite a problem that you do this on stack. A large part of the calculation you do recursively and this means every time allocated space for function.
Try to reformulate it linearly. I think I had such a problem before.
Your question implies you think there are 26x25x24x ... permutations
Your code doesn't have anything I can see to avoid "AAAAAAA" being a permutation, in which case there are 26x26x26x ...
So in addition to being a very complicated way of counting in base 26, I think it's also giving bad answers?

randomized quick sort [crashes on some inputs]

I removed the codes because it's homework. If you actually needs the help, you can either look at the discussion I had with George B (below), or PM me.
Hi guys. This is a homework assignment. I have tested it against other sorting algorithms, and Q.S. is the only one that is crashing on some random inputs.
The program is quit long (with other stuff), but input is randomly generated....
I spent a few hours tracing the codes and still couldn't figure out any error....
Q.S. is probably very easy for the professionals, so I hope to receive advices on this implementation....
Any input is appreciated!
Q: What is "random"?
A: A portion of generation is included.
void randomArray(unsigned long*& A, unsigned long size)
{
//Note that RAND_MAX is a little small for some compilers (2^16-1).
//In order to test our algorithms on large arrays without huge
//numbers of duplicates, we'll set the high-order and low-order
//parts of the return value with two random values.
A = new unsigned long[size];
for(unsigned long i=0; i<size; i++)
A[i] = (rand()<<16) | (rand());
//Another note: initially, if you want to test your program out with smaller
//arrays and small numbers, just reduce A[i] mod k for some small value k as in the following:
//A[i] = rand() % 16;
//this may help you debug at first.
}
Q: What kind of error?
Well, I am not getting compilation error. Without Q.S., I can ran other four sorting algorithm without problems (I can continuously running the sorting). When Q.S is activated, after running the program one or two or three times, or even at the first time of running, the program ends (I am using Eclipse, so the consoles ends).
enter the number of elements, or a
negative number to quit: 5 {some
arrays}
selection sort took 0 seconds. merge
sort took 0 seconds. quick sort took 0
seconds. heap sort took 0 seconds.
bucket sort took 0 seconds. {output of
5 sorted arrays}
enter the number of elements, or a
negative number to quit: 6 {some
arrays}
selection sort took 0 seconds. merge
sort took 0 seconds. quick sort took 0
seconds. heap sort took 0 seconds.
bucket sort took 0 seconds.
{output of 5 sorted arrays}
enter the number of elements, or a
negative number to quit: 8 {arrays}
--- console ends---
Again, the problem is that it crashes quite often, so this suggests that there is a high possibility of access violation,,, but doing 10+ tracings I don't see the problem.... (maybe I overloaded my brain stack -_- )
Thanks.
Hint:
q is unsigned (the result of the partition function)
so, q-1 is also unsigned
what if q is zero?
(It is homework so you have to figure it out I guess :) )
Trace your algorithm with the array {2,5,2}. Obviously your program crashes as soon as there are duplicate numbers in your list. The first call of Partition will return 2 as the index of r. Thus, the second call of quickSort(A,3,2) will access memory locations not within array boundaries. Its always a good idea to do the boundary checks for arrays manually and generate understandable output to more easily trace and debug your program.