Retracing a previously followed path

Retracing a previously followed path - c++

I have a grid of 1000 x 1000 in which a person Q travels from a start point A to a stop point B. When Q starts out from A, he walks randomly till he reaches B. By walking randomly, I mean that for any position (i,j) where Q is currently, Q can travel to (i+1,j) , (i-1,j) , (i,j+1), (i,j-1) with equal probability. If Q reaches B in this manner, he gets a treasure stored at B and now he wants to retrace the exact same path he followed from A to B , only backwards.
Is there a way to implement this in C++ without explicitly storing the path in a vector?

You might be able to do something like this:
Store the random number seed
Get a random number between 1 and 4 for a directional move
Store a move count, beginning with 0 (already at destination)
For each move where you don't get to your destination, increment the count.
Minus a fixed number from your random number each time.
Once you reach your destination, traverse the move count seed in reverse, going from count to 0, and taking the opposite move.
The point is to relate the move count and the seed. Assuming the random seed is a fornal function, given the same input, you should always get the same output. You could store the initial time, fix the time step, and then allow your seed to be the current time otheach time step, but the idea is to allow your seed to be related to a count.
Using this method, you should be able to extract your path using only the begin time and the amount of ticks it took to reach the target. Also, an added bonus: you can also store how long it took to get to your destination in ticks and get other variables dependent on that time state.

Use a reversible pseudo-random generator.
For instance, with a linear congruential generator Y = (a.X+b) mod c, it is probably possible to invert the relation as X = (a'.Y+b') mod c'.
With such a generator, you can go back and forth freely along the path.
Suggestion for a quick (but not supported by theory) approach: use an accumulator and add an arbitrary constant, ignoring the overflows; this process is exactly inverted by subtraction. Take two independent bits of the accumulator to form your random number.

Try recursion :
void travel(){
if(treasureFound())
return;
else {
NextStep p;
chooseNextStep(&p);
travel();
moveBackwards(&p);
return;
}
}
You could also store your path, but you don't have to store all the coordinates, 1 char per move is enough to describe your move, 'N' 'S' 'E' 'W' for example
also NextStep in my example could be a char
Also in my example, if you prefer to store data on the heap et not in the stack, use a pointer !

You can store it implicitly via recursion.
The idea is fairly simple: You test if you are at the treasure and return true if you are. Otherwise you randomly pick a different route and return its result to either process the path or backtrace if necessary. Changing this approach to match exactly what your loop is doing should not be too hard (e.g. maybe you only want to backtrace once all options are exhausted).
Upon returning from each function, you immediately have the path in reverse.
bool walk_around(size_t x, size_t y) {
if(treasure(x, y)) return true;
if(better_abort()) return false;
size_t x2, y2;
randomly_choose(&x2, &y2);
if(walk_around(x2, y2))
{
std::cout << x << "," << y << "\n";
return true;
}
else return false;
}
Note that there is a danger in this approach: The thread stack (where the data to return from functions is stored) is usually limited to a few MB. You are in the area where it is possible to require enough space (1000*1000 recursive calls if your maze creates a hamiltonian path) that you might need to increase this limit. A quick google search will turn up an approach appropriate for your OS.

Related

How do I calculate the time complexity of the following function?

Here is a recursive function. Which traverses a map of strings(multimap<string, string> graph). Checks the itr -> second (s_tmp) if the s_tmp is equal to the desired string(Exp), prints it (itr -> first) and the function is executed for that itr -> first again.
string findOriginalExp(string Exp){
cout<<"*****findOriginalExp Function*****"<<endl;
string str;
if(graph.empty()){
str ="map is empty";
}else{
for(auto itr=graph.begin();itr!=graph.end();itr++){
string s_tmp = itr->second;
string f_tmp = itr->first;
string nll = "null";
//s_tmp.compare(Exp) == 0
if(s_tmp == Exp){
if(f_tmp.compare(nll) == 0){
cout<< Exp <<" :is original experience.";
return Exp;
}else{
return findOriginalExp(itr->first);
}
}else{
str="No element is equal to Exp.";
}
}
}
return str;
}
There are no rules for stopping and it seems to be completely random. How is the time complexity of this function calculated?

I am not going to analyse your function but instead try to answer in a more general way. It seems like you are looking for an simple expression such as O(n) or O(n^2) for the complexity for your function. However, not always complexity is that simple to estimate.
In your case it strongly depends on what are the contents of graph and what the user passes as parameter.
As an analogy consider this function:
int foo(int x){
if (x == 0) return x;
if (x == 42) return foo(42);
if (x > 0) return foo(x-1);
return foo(x/2);
}
In the worst case it never returns to the caller. If we ignore x >= 42 then worst case complexity is O(n). This alone isn't that useful as information for the user. What I really need to know as user is:
Don't ever call it with x >= 42.
O(1) if x==0
O(x) if x>0
O(ln(x)) if x < 0
Now try to make similar considerations for your function. The easy case is when Exp is not in graph, in that case there is no recursion. I am almost sure that for the "right" input your function can be made to never return. Find out what cases those are and document them. In between you have cases that return after a finite number of steps. If you have no clue at all how to get your hands on them analytically you can always setup a benchmark and measure. Measuring the runtime for input sizes 10,50, 100,1000.. should be sufficient to distinguish between linear, quadratic and logarithmic dependence.
PS: Just a tip: Don't forget what the code is actually supposed to do and what time complexity is needed to solve that problem (often it is easier to discuss that in an abstract way rather than diving too deep into code). In the silly example above the whole function can be replaced by its equivalent int foo(int){ return 0; } which obviously has constant complexity and does not need to be any more complex than that.

This function takes a directed graph and a vertex in that graph and chases edges going into it backwards to find a vertex with no edge pointing into it. The operation of finding the vertex "behind" any given vertex takes O(n) string comparisons in n the number of k/v pairs in the graph (this is the for loop). It does this m times, where m is the length of the path it must follow (which it does through the recursion). Therefore, it has time complexity O(m * n) string comparisons in n the number of k/v pairs and m the length of the path.
Note that there's generally no such thing as "the" time complexity for just some function you see written in code. You have to define what variables you want to describe the time in terms of, and also the operations with which you want to measure the time. E.g. if we want to write this purely in terms of n the number of k/v pairs, you run into a problem, because if the graph contains a suitably placed cycle, the function doesn't terminate! If you further constrain the graph to be acyclic, then the maximum length of any path is constrained by m < n, and then you can also get that this function does O(n^2) string comparisons for an acyclic graph with n edges.

You should approximate the control flow of the recursive calling by using a recurrence relation. It's been like 30 years since I took college classes in Discrete Math, but generally you do like pseuocode, just enough to see how many calls there are. In some cases just counting how many are on the longest condition on the right hand side is useful, but you generally need to plug one expansion back in and from that derive a polynomial or power relationship.

Which is the best away to create and store cycles using c/c++?

Which is the best away to create and store cycles using c/c++?
I have the structs:
struct CYCLE {
vector<Arc> route;
float COST;
}
struct Arc {
int i, j;
Arc () {};
Arc (const Arc& obj): i(obj.i), j(obj.j) {};
Arc(int _i, int _j) : i(_i), j(_j) {}
};
To store the cycles that have already been created, I thought about using:
vector<CYCLE> ConjCycles;
For each cycle created, I need to verify if this cycle has not yet been added to the ConjCycles.
The cycle: 1-2-2-1; is the same as the cycle: 2-2-1-2.
How can I detect that cycles like those are the same?
I thought about using a map to control this.
However, I don't know how to set a key to the cycle, so that the two cycles described above have the same key.

You have quite a lot of redundancy in your cycle representation, e. g. for a cycle 1-3-2-4-1:
{ (1, 3), (3, 2), (2, 4), (4, 1) }
If we consider a cycle being a cyclic graph, then you store the edges in your data structure. It would be more efficient to store the vertices instead:
struct Cycle
{
std::vector<int> vertices;
};
The edges you get implicitly from vertices[n] and vertices[n + 1]; the last vertex is always the same as the first one, so do not store it explicitly, the last edge then will be vertices[vertices.size() - 1], vertices[0].
Be aware that this is only internal representation; you still can construct the cycle from a sequence of edges (Arcs). You'd most likely check the sequence in the constructor and possibly throw an exception, if it is invalid (there are alternatives, though, if you dislike exceptions...).
Then you need some kind of equivalence. My proposition would be:
if the number of vertices is not equal, the cycles cannot be equal.
it might shorten the rest of the algorithm (but that would yet have to be evaluated!), if you count the number of occurrences for each vertex id, these must match
search the minimum vertex id for each cycle, from this on, compare each subsequent value, wrapping around in the vector, if the end is reached.
if sequences match, your done; this does not yet cover the case that there are multiple minimum values, though; if this happens, you might just repeat the step trying the next minimum value in one cycle, staying with the same in the other. You might try to do the same in parallel with the maxima, or if you have counted them anyway (see above), use of minima/maxima the ones with less elements.
Edit: Further improvement (idea inspired by [Scheff]'s comment to the question):
Instead of re-trying each minimum found, we preferably should select some kind of absolute minimum from the relative minima found so far; a relative minimum x is smaller than a relative minimum y if the successor of x is smaller than the successor of y; if both successors are equal, look at the next successors, and so on. If you discover more than one absolute minimum (if some indirect successor gets equal to the initial minium), then you have a sequence some sub-cycle repeating itself multiple times (1-2-3-1-2-3-1-2-3). Then it does not matter, which "absolute" minimum you select...
You'd definitely skip step 2 above then, though.
Find the minimum already in the constructor and store it. Then comparison gets easy, you just start in both cycles at their respective minimum...

Is std::sort the best choice to do in-place sort for a huge array with limited integer value?

I want to sort an array with huge(millions or even billions) elements, while the values are integers within a small range(1 to 100 or 1 to 1000), in such a case, is std::sort and the parallelized version __gnu_parallel::sort the best choice for me?
actually I want to sort a vecotor of my own class with an integer member representing the processor index.
as there are other member inside the class, so, even if two data have same integer member that is used for comparing, they might not be regarded as same data.

Counting sort would be the right choice if you know that your range is so limited. If the range is [0,m) the most efficient way to do so it have a vector in which the index represent the element and the value the count. For example:
vector<int> to_sort;
vector<int> counts;
for (int i : to_sort) {
if (counts.size() < i) {
counts.resize(i+1, 0);
}
counts[i]++;
}
Note that the count at i is lazily initialized but you can resize once if you know m.
If you are sorting objects by some field and they are all distinct, you can modify the above as:
vector<T> to_sort;
vector<vector<const T*>> count_sorted;
for (const T& t : to_sort) {
const int i = t.sort_field()
if (count_sorted.size() < i) {
count_sorted.resize(i+1, {});
}
count_sorted[i].push_back(&t);
}
Now the main difference is that your space requirements grow substantially because you need to store the vectors of pointers. The space complexity went from O(m) to O(n). Time complexity is the same. Note that the algorithm is stable. The code above assumes that to_sort is in scope during the life cycle of count_sorted. If your Ts implement move semantics you can store the object themselves and move them in. If you need count_sorted to outlive to_sort you will need to do so or make copies.
If you have a range of type [-l, m), the substance does not change much, but your index now represents the value i + l and you need to know l beforehand.
Finally, it should be trivial to simulate an iteration through the sorted array by iterating through the counts array taking into account the value of the count. If you want stl like iterators you might need a custom data structure that encapsulates that behavior.
Note: in the previous version of this answer I mentioned multiset as a way to use a data structure to count sort. This would be efficient in some java implementations (I believe the Guava implementation would be efficient) but not in C++ where the keys in the RB tree are just repeated many times.

You say "in-place", I therefore assume that you don't want to use O(n) extra memory.
First, count the number of objects with each value (as in Gionvanni's and ronaldo's answers). You still need to get the objects into the right locations in-place. I think the following works, but I haven't implemented or tested it:
Create a cumulative sum from your counts, so that you know what index each object needs to go to. For example, if the counts are 1: 3, 2: 5, 3: 7, then the cumulative sums are 1: 0, 2: 3, 3: 8, 4: 15, meaning that the first object with value 1 in the final array will be at index 0, the first object with value 2 will be at index 3, and so on.
The basic idea now is to go through the vector, starting from the beginning. Get the element's processor index, and look up the corresponding cumulative sum. This is where you want it to be. If it's already in that location, move on to the next element of the vector and increment the cumulative sum (so that the next object with that value goes in the next position along). If it's not already in the right location, swap it with the correct location, increment the cumulative sum, and then continue the process for the element you swapped into this position in the vector.
There's a potential problem when you reach the start of a block of elements that have already been moved into place. You can solve that by remembering the original cumulative sums, "noticing" when you reach one, and jump ahead to the current cumulative sum for that value, so that you don't revisit any elements that you've already swapped into place. There might be a cleverer way to deal with this, but I don't know it.
Finally, compare the performance (and correctness!) of your code against std::sort. This has better time complexity than std::sort, but that doesn't mean it's necessarily faster for your actual data.

You definitely want to use counting sort. But not the one you're thinking of. Its main selling point is that its time complexity is O(N+X) where X is the maximum value you allow the sorting of.
Regular old counting sort (as seen on some other answers) can only sort integers, or has to be implemented with a multiset or some other data structure (becoming O(Nlog(N))). But a more general version of counting sort can be used to sort (in place) anything that can provide an integer key, which is perfectly suited to your use case.
The algorithm is somewhat different though, and it's also known as American Flag Sort. Just like regular counting sort, it starts off by calculating the counts.
After that, it builds a prefix sums array of the counts. This is so that we can know how many elements should be placed behind a particular item, thus allowing us to index into the right place in constant time.
since we know the correct final position of the items, we can just swap them into place. And doing just that would work if there weren't any repetitions but, since it's almost certain that there will be repetitions, we have to be more careful.
First: when we put something into its place we have to increment the value in the prefix sum so that the next element with same value doesn't remove the previous element from its place.
Second: either
keep track of how many elements of each value we have already put into place so that we dont keep moving elements of values that have already reached their place, this requires a second copy of the counts array (prior to calculating the prefix sum), as well as a "move count" array.
keep a copy of the prefix sums shifted over by one so that we stop moving elements once the stored position of the latest element
reaches the first position of the next value.
Even though the first approach is somewhat more intuitive, I chose the second method (because it's faster and uses less memory).
template<class It, class KeyOf>
void countsort (It begin, It end, KeyOf key_of) {
constexpr int max_value = 1000;
int final_destination[max_value] = {}; // zero initialized
int destination[max_value] = {}; // zero initialized
// Record counts
for (It it = begin; it != end; ++it)
final_destination[key_of(*it)]++;
// Build prefix sum of counts
for (int i = 1; i < max_value; ++i) {
final_destination[i] += final_destination[i-1];
destination[i] = final_destination[i-1];
}
for (auto it = begin; it != end; ++it) {
auto key = key_of(*it);
// while item is not in the correct position
while ( std::distance(begin, it) != destination[key] &&
// and not all items of this value have reached their final position
final_destination[key] != destination[key] ) {
// swap into the right place
std::iter_swap(it, begin + destination[key]);
// tidy up for next iteration
++destination[key];
key = key_of(*it);
}
}
}
Usage:
vector<Person> records = populateRecords();
countsort(records.begin(), records.end(), [](Person const &){
return Person.id()-1; // map [1, 1000] -> [0, 1000)
});
This can be further generalized to become MSD Radix Sort,
here's a talk by Malte Skarupke about it: https://www.youtube.com/watch?v=zqs87a_7zxw
Here's a neat visualization of the algorithm: https://www.youtube.com/watch?v=k1XkZ5ANO64

The answer given by Giovanni Botta is perfect, and Counting Sort is definitely the way to go. However, I personally prefer not to go resizing the vector progressively, but I'd rather do it this way (assuming your range is [0-1000]):
vector<int> to_sort;
vector<int> counts(1001);
int maxvalue=0;
for (int i : to_sort) {
if(i > maxvalue) maxvalue = i;
counts[i]++;
}
counts.resize(maxvalue+1);
It is essentially the same, but no need to be constantly managing the size of the counts vector. Depending on your memory constraints, you could use one solution or the other.

C++ recursive function, calling current depth

I'm writing a function for calculating integrals recursively, using the trapezoid rule. For some f(x) on the interval (a,b), the method is to calculate the area of the big trapezoid with side (b-a) and then compare it with the sum of small trapezoids formed after dividing the interval into n parts. If the difference is larger than some given error, the function is called again for each small trapezoid and the results summed. If the difference is smaller, it returns the arithmetic mean of the two values.
The function takes two parameters, a function pointer to the function which is to be integrated and a constant reference to an auxiliary structure, which contains information such as the interval (a,b), the amount of partitions, etc:
struct Config{
double min,max;
int partitions;
double precision;
};
The problem arises when I want to change the amount of partitions with each iteration, for the moment let's say just increment by one. I see no way of doing this without resorting to calling the current depth of the recurrence:
integrate(const Config &conf, funptr f){
double a=conf.min,b=conf.max;
int n=conf.partitions;
//calculating the trapezoid areas here
if(std::abs(bigTrapezoid-sumOfSmallTrapezoids) > conf.precision){
double s=0.;
Config configs = new Config[n];
int newpartitions = n+(calls);
for(int i=0; i < n;++i){
configs[i]={ a+i*(b-a)/n , a+(i+1)*(b-a)/n , newpartitions};
s+=integrate(configs[i],f);
}
delete [] configs;
return s; }
else{
return 0.5*(bigTrapezoid+sumOfSmallTrapezoids);}
}
The part I'm missing here is of course a way to find (calls). I have tried doing something similar to this answer, but it does not work, in fact it freezes the pc until makefile kills the process. But perhaps I'm doing it wrong. I do not want to add an extra parameter to the function or an additional variable to the structure. How should I proceed?

You cannot "find" calls, but you can definitely pass it yourself, like this:
integrate(const Config &conf, funptr f, int calls=0) {
...
s+=integrate(configs[i],f, calls+1);
...
}

It seems to me that 'int newpartitions = n + 1;' would be enough, no? At every recursion level, the number of partitions increases by one. Say conf.partitions starts off at 1. If the routine needs to recurse down a new level, newpartitions is 2, and you will build 2 new Config instances each with '2' as the value for partitions. Recursing down another level, newpartitions is 3, and you build 3 Configs, each with '3' as 'partitions', and so on.
The trick here is to make sure your code is robust enough to avoid infinite recursion.
By the way, it seems inefficient to me to use dynamic allocation for Config instances that have to be destroyed after the loop. Why not build a single Config instance on the stack inside the loop? Your code should run much faster that way.

Testing for a new random value that doesn't exist in a set

During testing, I'm stuck with testing a piece of code that receives a list of numbers and that's supposed to return a new random key that doesn't exist in the list.
The valid range is any number between 1 and 1,000,000 - which makes it too hard to brute-force in tests.
What's the best approach for testing this? I've considered testing with a smaller range (say, 100) but that too, given the basic randomizing algorithms, will take too long once the list gets close to its maximal size.

You can pick a random number in 1-1000000 and then search linearly forward until you find a free place (eventually overlapping to 1 after 1000000 has failed to match).
This way the distribution of numbers is not linear (it is when the set is mostly empty, but then gets worse and worse), but this is much faster than checking a random value each time (and I hope the skew from randomness doesn't matter too much for a test) but OTOH you're sure you only need one call to random() and it can't ever take more than 1000000 checks to find the empty space.

I wonder if you could break your functionnality (or test or both) in two parts:
The random number generation (at least the part that belong in your code, not the standard API call I guess, unless you want to test that too). For example, you could have this code (or a more refined version, according to your technologies) :
The fact that you call your method must return a value that is not in the list.
public class RandomGenerator {
public int getValue() {
return `<random implementation>`;
}
}
public class RandomNewGenerator {
RandomGenerator randomGenerator = new RandomGenerator();
public int getValue(List<Integer> ints) {
// note that a Set<Integer> would be more efficient
while(true) {
Integer i = randomGenerator.getValue();
if (!ints.contains(i)) {
return i;
}
}
}
}
In real code, I would change stuff (use an interface, inject using Spring and so on)...
That way, in your test for RandomNewGenerator, you can override the RandomGenerator with an implementation that returns a known serie of values. You can then test your RandomNewGenerator without facing any random.
I believe this is indeed the spirit of JUnit tests, to make them simple, lightning fast, and even better : repeatable! This last quality actually allow your tests to be used as regression tests, which is so convenient.
Example test code:
public class RandomNewGeneratorTest {
// do the set up
private List<Integer> empties = ...//
private List<Integer> basics = ... // set up to include 1,2, 7, 8
private class Random extends RandomNewGenerator {
int current;
Random(int initial) {
current = initial;
}
public int getValue() {
return current++; // incremental values for test, not random
}
}
public void testEmpty() {
RandomNewGenerator randomNewGenerator = new RandomNewGenerator();
// do a simple injection of dependency
randomNewGenerator.randomGenerator = new Random(1);
// random starts at 1, builds up
assertEquals(1, randomNewGenerator.getValue(empties);
assertEquals(2, randomNewGenerator.getValue(empties);
assertEquals(3, randomNewGenerator.getValue(empties);
assertEquals(4, randomNewGenerator.getValue(empties);
}
public void testBasic() {
RandomNewGenerator randomNewGenerator = new RandomNewGenerator();
// do a simple injection of dependency
randomNewGenerator.randomGenerator = new Random(5);
// random starts at 5, builds up
// I expect 7, 8 to be skipped
assertEquals(5, randomNewGenerator.getValue(basics);
assertEquals(6, randomNewGenerator.getValue(basics);
assertEquals(9, randomNewGenerator.getValue(basics);
}
}
Note that this code is only a raw sample. You could alter it any way you need, for example by giving to the random generator a sequence of the values it must return. You could test for returning twice in a row the same number, for example.

One approach that might work would be to take the initial list and populate a 1 million element vector for all indices i from 1...1,000,000 with 1 if i is taken, and 0 if i is not taken.
Count the size of the initial list, call this size s.
Generate a random number j, 0 <= j < s. Loop through the array, and find the jth element which is 0, and return that.
Edit: On closer inspection of #lapo's answer - my answer appears to amount to the same thing, but be a bit slower.

The distribution of Lapo's answer isn't linear once your set of chosen numbers starts to get too full. You'll get even distribution of integers with the following modifications:
Hold your initial set of numbers in a bit array, where each element in the bit array corresponds to a number in your initial set. True indicates that an item exists in the set, false otherwise.
Next, create an array of integers from 1 - 1,000,000. Shuffle the array. This set of numbers will be your new keys.
Hold a pointer to the last index from your list of new keys. When you want to generate a new key, increment the pointer to the next item in new keys; you can test to see if its already chosen in your initial set in constant time. If the item already exists in the set, increment the pointer to the next item in new keys, otherwise return the item and set its state in the bit array to true.

What means to long? Comparing a value against 1.000.000 values in a list should only take a few milliseconds. And I can see no other solution then comparing against all values besides the list is sorted and you can narrow the range to inspect. You could of course sort the list and then perform a binary search taking no more then 20 steps, but sorting would be munch more expensive then a linear search.
I just did a test on a quite slow pc and it took about 20 ms to scan a list with 1.000.000 numbers for a given number in C#. Using an array it took 14 ms. Isn't that fast enough? A binary search did the job in 0.3 microseconds. Finally using a hash set the lookup took only about 90 nanoseconds.
If you have to write the algorithm, I suggest to do a simple trick. Take to lists - one with the assigned numbers, one with the unassigned numbers starting with all numbers from 1 to 1.000.000. If you need a new number, just get a random number between zero (inclusivly) and the length of the unassigned numbers list (exclusivly), pick the number at this index and move it to the assigned numbers list. Done.
I tested this approach, too, and it took about 460 milliseconds to get all 1.000.000 numbers from the unassigned to the assigned numbers list using a hash set for the unassigned numbers to speed up the deletion and a list for the assigned numbers. That are only about 460 nanoseconds to generate a new unique random number in the given range. You have to use a sorted dictionary to avoid interferences between the random number generator and the hash algorithm.
Finally you could also take the numbers from 1 to 1.000.000, but them into a list, shuffle them for a while, and the just take one after the other from the list. Besides the initial time to shuffle the list this will run in no time at all.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js