Would this method be efficient at finding string permuations - c++

#include <iostream>
#include <string>
using namespace std;
int main()
{
string word;
cin>>word;
int s = word.size();
string original_word = word;
do
{
for(decltype(s) i =1; i!= s;++i){
auto temp =word[i-1];
word[i-1] = word[i];
word[i] = temp;
cout<<word<<endl;
}
}while(word!=original_word);
}
Is this solution efficient and how does it compare by doing this recursively?
Edit: When I tested the program it displayed all permutations
i.e cat produced:
cat
act
atc
tac
tca
cta

Let's imagine tracing this code on the input 12345. On the first pass through the do ... while loop, your code steps the array through these configurations:
21345
23145
23415
23451
Notice that after this iteration of the loop finishes, you've cyclically shifted the array one step. This means that at the end of the next do ... while loop, you'll have cyclically shifted the array twice, then three times, then four times, etc. After n iterations, this will reset the array back to its original configuration. Since each pass of bubbling the character to the end goes through n intermediary steps, this means that your approach will generate at most n2 different permutations of the input string. However, there are n! possible permutations of the input string, and n! greatly exceeds n2 for all n ≥ 4. As a result, this approach can't generate all possible permutations, since it doesn't produce enough unique combinations before returning back to the start.
If you're interested in learning about a ton of different ways to enumerate permutations by individual swaps, you may want to pick up a copy of The Art of Computer Programming or search online for different methods. This is a really interesting topic and in the course of working through these algorithms I think you'll learn a ton of ways to analyze different algorithms and prove correctness.

Related

Go through the array from left to right and collect as many numbers as possible

CSES problem (https://cses.fi/problemset/task/2216/).
You are given an array that contains each number between 1…n exactly once. Your task is to collect the numbers from 1 to n in increasing order.
On each round, you go through the array from left to right and collect as many numbers as possible. What will be the total number of rounds?
Constraints: 1≤n≤2⋅10^5
This is my code on c++:
int n, res=0;
cin>>n;
int arr[n];
set <int, greater <int>> lastEl;
for(int i=0; i<n; i++) {
cin>>arr[i];
auto it=lastEl.lower_bound(arr[i]);
if(it==lastEl.end()) res++;
else lastEl.erase(*it);
lastEl.insert(arr[i]);
}
cout<<res;
I go through the array once. If the element arr[i] is smaller than all the previous ones, then I "open" a new sequence, and save the element as the last element in this sequence. I store the last elements of already opened sequences in set. If arr[i] is smaller than some of the previous elements, then I take already existing sequence with the largest last element (but less than arr[i]), and replace the last element of this sequence with arr[i].
Alas, it works only on two tests of three given, and for the third one the output is much less than it shoud be. What am I doing wrong?
Let me explain my thought process in detail so that it will be easier for you next time when you face the same type of problem.
First of all, a mistake I often made when faced with this kind of problem is the urge to simulate the process. What do I mean by "simulating the process" mentioned in the problem statement? The problem mentions that a round takes place to maximize the collection of increasing numbers in a certain order. So, you start with 1, find it and see that the next number 2 is not beyond it, i.e., 2 cannot be in the same round as 1 and form an increasing sequence. So, we need another round for 2. Now we find that, 2 and 3 both can be collected in the same round, as we're moving from left to right and taking numbers in an increasing order. But we cannot take 4 because it starts before 2. Finally, for 4 and 5 we need another round. That's makes a total of three rounds.
Now, the problem becomes very easy to solve if you simulate the process in this way. In the first round, you look for numbers that form an increasing sequence starting with 1. You remove these numbers before starting the second round. You continue this way until you've exhausted all the numbers.
But simulating this process will result in a time complexity that won't pass the constraints mentioned in the problem statement. So, we need to figure out another way that gives the same output without simulating the whole process.
Notice that the position of numbers is crucial here. Why do we need another round for 2? Because it comes before 1. We don't need another round for 3 because it comes after 2. Similarly, we need another round for 4 because it comes before 2.
So, when considering each number, we only need to be concerned with the position of the number that comes before it in the order. When considering 2, we look at the position of 1? Does 1 come before or after 2? It it comes after, we don't need another round. But if it comes before, we'll need an extra round. For each number, we look at this condition and increment the round count if necessary. This way, we can figure out the total number of rounds without simulating the whole process.
#include <iostream>
#include <vector>
using namespace std;
int main(int argc, char const *argv[])
{
int n;
cin >> n;
vector <int> v(n + 1), pos(n + 1);
for(int i = 1; i <= n; ++i){
cin >> v[i];
pos[v[i]] = i;
}
int total_rounds = 1; // we'll always need at least one round because the input sequence will never be empty
for(int i = 2; i <= n; ++i){
if(pos[i] < pos[i - 1]) total_rounds++;
}
cout << total_rounds << '\n';
return 0;
}
Next time when you're faced with this type of problem, pause for a while and try to control your urge to simulate the process in code. Almost certainly, there will be some clever observation that will allow you to achieve optimal solution.

How do I print out vectors in different order every time

I'm trying to make two vectors. Where vector1 (total1) is containing some strings and vector2(total2) is containing some random unique numbers(that are between 0 and total1.size() - 1)
I want to make a program that print out total1s strings, but in different order every turn. I don't want to use iterators or something because I want to improve my problem solving capacity.
Here is the specific function that crash the program.
for (unsigned i = 0; i < total1.size();)
{
v1 = rand() % total1.size();
for (unsigned s = 0; s < total1.size(); ++s)
{
if (v1 == total2[s])
;
else
{
total2.push_back(v1);
++i;
}
}
}
I'm very grateful for any help that I can get!
Can I suggest you change of algorithm?. Because, even if your current one is correctly implemented ("s", in your code, must go from 0 to total2.size not total1.size and if element is found, break and generate a new random), it has the following drawback: assume vectors of 1.000.000 elements and you are trying the last random number. You have one probability in 1.000.000 of find a random number not previously used. That is a very small amount.Last but one number has a probability of 2 in 1.000.000 also small. In conclusion, your program will loop and expend lots of CPU resources.
Your best alternative is follow #NathanOliver suggestion and look for function std::shuffle. The manual page shows the implementation algorithm, that is what you are looking for.
Another simple algorithm, with some pros and cons, is:
init total2 with sequence 0,1,2,...,n where n is the size total1 - 1
choice two random numbers, i1 and i2, in range [0,n-1].
Swap elements i1 and i2 in total2.
repeat from (2) a fixed number of times "R".
This method allows to known a priori the necessary steps and to control the level of "randomness" of the final vector (bigger R is more random). However, it is far to be good in its randomness quality.
Another method, better in the probabilistic distribution:
fill a list L with number 0,1,2,...size total1-1.
choice a random number i between 0 and the size of list L - 1 .
Store in total2 the i-th element in list L.
Remove this element from L.
repeat from (2) until L is empty.
If you just want to shuffle vector<string> total1, you can do this without using helping vector<int> total2. Here is an implementation based on Fisher–Yates shuffle.
for(int i=n-1; i>=1; i--) {
int j=rand()%(i+1);
swap(total1[j], total1[i]); // your prof might not allow use of swap:)
}
If you must use vector<int> total2 then shuffle it using above algorithm. Next you can use it to create a new vector<string> result from total1 where result[i]=total1[total2[i]].

Increase string overlap matrix building efficiency

I have a huge list (N = ~1million) of strings 100 characters long that I'm trying to find the overlaps between. For instance, one string might be
XXXXXXXXXXXXXXXXXXAACTGCXAACTGGAAXA (and so on)
I need to build an N by N matrix that contains the longest overlap value for every string with every other string. My current method is (pseudocode)
read in all strings to array
create empty NxN matrix
compare each string to every string with a higher array index (to avoid redoing comparisons)
Write longest overlap to matrix
There's a lot of other stuff going on, but I really need a much more efficient way to build the matrix. Even with the most powerful computing clusters I can get my hands on this method takes days.
In case you didn't guess, these are DNA fragments. X indicates "wild card" (probe gave below a threshold quality score) and all other options are a base (A, C, T, or G). I tried to write a quaternary tree algorithm, but this method was far too memory intensive.
I'd love any suggestions you can give for a more efficient method; I'm working in C++ but pseudocode/ideas or other language code would also be very helpful.
Edit: some code excerpts that illustrate my current method. Anything not particularly relevant to the concept has been removed
//part that compares them all to each other
for (int j=0; j<counter; j++) //counter holds # of DNA
for (int k=j+1; k<counter; k++)
int test = determineBestOverlap(DNArray[j],DNArray[k]);
//boring stuff
//part that compares strings. Definitely very inefficient,
//although I think the sheer number of comparisons is the main problem
int determineBestOverlap(string str1, string str2)
{
int maxCounter = 0, bestOffset = 0;
//basically just tries overlapping the strings every possible way
for (int j=0; j<str2.length(); j++)
{
int counter = 0, offset = 0;
while (str1[offset] == str2[j+offset] && str1[offset] != 'X')
{
counter++;
offset++;
}
if (counter > maxCounter)
{
maxCounter = counter;
bestOffset = j;
}
}
return maxCounter;
} //this simplified version doesn't account for flipped strings
Do you really need to know the match between ALL string pairs? If yes, then you will have to compare every string with every other string, which means you will need n^2/2 comparisons, and you will need one half terabyte of memory even if you just store one byte per string pair.
However, i assume what you really are interested in is long strings, those that have more than, say, 20 or 30 or even more than 80 characters in common, and you probably don't really want to know if two string pairs have 3 characters in common while 50 others are X and the remaining 47 don't match.
What i'd try if i were you - still without knowing if that fits your application - is:
1) From each string, extract the largest substring(s) that make(s) sense. I guess you want to ignore 'X'es at the start and end entirely, and if some "readable" parts are broken by a large number of 'X'es, it probably makes sense to treat the readable parts individually instead of using the longer string. A lot of this "which substrings are relevant?" depends on your data and application that i don't really know.
2) Make a list of these longest substrings, together with the number of occurences of each substring. Order this list by string length. You may, but don't really have to, store the indexes of every original string together with the substring. You'll get something like (example)
AGCGCTXATCG 1
GAGXTGACCTG 2
.....
CGCXTATC 1
......
3) Now, from the top to the bottom of the list:
a) Set the "current string" to the string topmost on the list.
b) If the occurence count next to the current string is > 1, you found a match. Search your original strings for the substring if you haven't remembered the indexes, and mark the match.
c) Compare the current string with all strings of the same length, to find matches where some characters are X.
d) Remove the 1st character from the current string. If the resulting string is already in your table, increase its occurence counter by one, else enter it into the table.
e) Repeat 3b with the last, instead of the first, character removed from the current string.
f) Remove the current string from the list.
g) Repeat from 3a) until you run out of computing time, or your remaining strings become too short to be interesting.
If this is a better algorithm depends very much on your data and which comparisons you're really interested in. If your data is very random/you have very few matches, it will probably take longer than your original idea. But it might allow you to find the interesting parts first and skip the less interesting parts.
I don't see many ways to improve the fact that you need to compare each string with each other including shifting them, and that is by itself super long, a computation cluster seems the best approach.
The only thing I see how to improve is the string comparison by itself: replace A,C,T,G and X by binary patterns:
A = 0x01
C = 0x02
T = 0x04
G = 0x08
X = 0x0F
This way you can store one item on 4 bits, i.e. two per byte (this might not be a good idea though, but still a possible option to investigate), and then compare them quickly with a AND operation, so that you 'just' have to count how many consecutive non zero values you have. That's just a way to process the wildcard, sorry I don't have a better idea to reduce the complexity of the overall comparison.

code to solve the "Theater Row" brain teaser

I was reading a book called "Fifty Challenging Problems in Probability", which is filled with lots of probability-related brain teasers. I wasn't able to solve one of the problems there, and wasn't able to understand the solution, either. So, I was writing a code to get a better feeling. Here is the original problem.
The Theater Row:
Eight elegible bachelors and seven beautiful models happen randomly to have purchased single seats in the same 15-seat row of a theater. On the average, how many pairs of adjacent seats are ticketed for marriageable couples?
And here is my code, getting an average number of adjacent pairs out of 100 random sampling:
#include <iostream>
#include <vector>
#include <cstdlib>
#include <algorithm>
#include <numeric>
using namespace std;
// computes the probability for the "theater row" problem
// in the book fifty challenging probabilty problems.
vector<int> reduce(vector<int>& seats); // This function reduces a sequence to a form
// in which there is no adjacent 0's or 1's.
// *example: reduce(111001)=101*
int main()
{
srand(time(0));
int total=15;
int Num=100;
int count0=0; // number of women
int count1=0; // number of men
vector<int> seats; // vector representing a seat assignment,
// seats.size()=total
vector<int> vpair; // vector that has number of adjacent pairs
// as its element, size.vpair()=Num
for (int i=0; i<Num; ++i) {
count0=count1=0;
while ((count1-count0)!=1) {
count0=count1=0;
seats.clear();
for (int j=0; j<total; ++j) {
int r=rand()%2;
if (r==0)
++count0;
else
++count1;
seats.push_back(r);
}
}
for (int k=0;k<seats.size();++k)
cout<<seats[k];
reduce(seats);
for (int k=0;k<seats.size();++k)
cout<<" "<<seats[k];
vpair.push_back(seats.size()-1); // seats.size()-1 is the number
// of adj pairs.
cout<<endl;
}
double avg=static_cast<double>(accumulate(vpair.begin(),vpair.end(),0))/vpair.size();
cout<<"average pairs: "<<avg<<endl;
return 0;
}
vector<int> reduce(vector<int>& seats)
{
vector<int>::iterator iter = seats.begin();
while (iter!=seats.end()) {
if (iter+1==seats.end())
++iter;
else if (*iter==*(iter+1))
iter=seats.erase(iter);
else
++iter;
}
return seats;
}
The code generates random series of 0's (representing women) and 1's (men). It then "reduces" the random sequence so that there are no repeating 0's or 1's. For example, if the code generates a random sequence of 011100110010011 (which has 7 adjacent pairs), the sequence is reduced to 01010101. In the reduced format, to figure out the number of adjacent pairs, you just need to get the "size-1".
Here are my questions.
The answer to the question (according to the book) is 7.47, while I get an average of about 7 or so from the code. Does anybody see where the discrepancy originates?
My code seems quite inefficient sometimes. Is it due to the way I generate a random sequence? (As you can see, to generate a random sequence of 8 men and 7 women, I keep asking for a random sequence of size 15 until it happens to have 8 men(or "1") and 7 women(or "0"). Is there a better way to produce a random sequence when there is a constraint like this?
I am not so proficient when it comes to programming. I'd appreciate any comments. Thank you for you help!!
This Problem is hilarious.
There are 1307674368000 possible combinations.
There is 203212800 combinations where 1 couple gets together.
But there are 3048192000 combinations where 2 couples get together.
Think the key to this problem would be doing a smaller scale problem first and use that info to create your answer. This is just a expected value problem.
Edit: Instead of running simulations, you could just get the exact answer using expected value, will have to think harder, but you also will be exact. I'll take a little bit to see if I can come up with the exact answer and post it.
Important Edit(Read):
Does your code account for if you if you get more than 8 0's or 8 1's. Sense you only can at most have 8 men and 7 women, then it should automatically feel the rest with the left over symbols.

Writing a C++ version of the algebra game 24

I am trying to write a C++ program that works like the game 24. For those who don't know how it is played, basically you try to find any way that 4 numbers can total 24 through the four algebraic operators of +, -, /, *, and parenthesis.
As an example, say someone inputs 2,3,1,5
((2+3)*5) - 1 = 24
It was relatively simple to code the function to determine if three numbers can make 24 because of the limited number of positions for parenthesis, but I can not figure how code it efficiently when four variables are entered.
I have some permutations working now but I still cannot enumerate all cases because I don't know how to code for the cases where the operations are the same.
Also, what is the easiest way to calculate the RPN? I came across many pages such as this one:
http://www.dreamincode.net/forums/index.php?showtopic=15406
but as a beginner, I am not sure how to implement it.
#include <iostream>
#include <vector>
#include <algorithm>
using namespace std;
bool MakeSum(int num1, int num2, int num3, int num4)
{
vector<int> vi;
vi.push_back(num1);
vi.push_back(num2);
vi.push_back(num3);
vi.push_back(num4);
sort(vi.begin(),vi.end());
char a1 = '+';
char a2 = '-';
char a3 = '*';
char a4 = '/';
vector<char> va;
va.push_back(a1);
va.push_back(a2);
va.push_back(a3);
va.push_back(a4);
sort(va.begin(),va.end());
while(next_permutation(vi.begin(),vi.end()))
{
while(next_permutation(va.begin(),va.end()))
{
cout<<vi[0]<<vi[1]<<vi[2]<<vi[3]<< va[0]<<va[1]<<va[2]<<endl;
cout<<vi[0]<<vi[1]<<vi[2]<<va[0]<< vi[3]<<va[1]<<va[2]<<endl;
cout<<vi[0]<<vi[1]<<vi[2]<<va[0]<< va[1]<<vi[3]<<va[2]<<endl;
cout<<vi[0]<<vi[1]<<va[0]<<vi[2]<< vi[3]<<va[1]<<va[2]<<endl;
cout<<vi[0]<<vi[1]<<va[0]<<vi[2]<< va[1]<<vi[3]<<va[2]<<endl;
}
}
return 0;
}
int main()
{
MakeSum(5,7,2,1);
return 0;
}
So, the simple way is to permute through all possible combinations. This is slightly tricky, the order of the numbers can be important, and certainly the order of operations is.
One observation is that you are trying to generate all possible expression trees with certain properties. One property is that the tree will always have exactly 4 leaves. This means the tree will also always have exactly 3 internal nodes. There are only 3 possible shapes for such a tree:
A
/ \
N A
/ \ (and the mirror image)
N A
/ \
N N
A
/ \
N A
/ \
A N (and the mirror image)
/ \
N N
A
/` `\
A A
/ \ / \
N N N N
In each spot for A you can have any one of the 4 operations. In each spot for N you can have any one of the numbers. But each number can only appear for one N.
Coding this as a brute force search shouldn't be too hard, and I think that after you have things done this way it will become easier to think about optimizations.
For example, + and * are commutative. This means that mirrors that flip the left and right children of those operations will have no effect. It might be possible to cut down searching through all such flips.
Someone else mentioned RPN notation. The trees directly map to this. Here is a list of all possible trees in RPN:
N N N N A A A
N N N A N A A
N N N A A N A
N N A N N A A
N N A N A N A
That's 4*3*2 = 24 possibilities for numbers, 4*4*4 = 64 possibilities for operations, 24 * 64 * 5 = 7680 total possibilities for a given set of 4 numbers. Easily countable and can be evaluated in a tiny fraction of a second on a modern system. Heck, even in basic on my old Atari 8 bit I bet this problem would only take minutes for a given group of 4 numbers.
You can just use Reverse Polish Notation to generate the possible expressions, which should remove the need for parantheses.
An absolutely naive way to do this would be to generate all possible strings of 4 digits and 3 operators (paying no heed to validity as an RPN), assume it is in RPN and try to evaluate it. You will hit some error cases (as in invalid RPN strings). The total number of possibilities (if I calculated correctly) is ~50,000.
A more clever way should get it down to ~7500 I believe (64*24*5 to be exact): Generate a permutation of the digits (24 ways), generate a triplet of 3 operators (4^3 = 64 ways) and now place the operators among the digits to make it valid RPN(there are 5 ways, see Omnifarious' answer).
You should be able to find permutation generators and RPN calculators easily on the web.
Hope that helps!
PS: Just FYI: RPN is nothing but the postorder traversal of the corresponding expression tree, and for d digits, the number is d! * 4^(d-1) * Choose(2(d-1), (d-1))/d. (The last term is a catalan number).
Edited: The solution below is wrong. We also need to consider the numbers makeable with just x_2 and x_4, and with just x_1 and x_4. This approach can still work, but it's going to be rather more complex (and even less efficient). Sorry...
Suppose we have four numbers x_1, x_2, x_3, x_4. Write
S = { all numbers we can make just using x_3, x_4 },
Then we can rewrite the set we're interested in, which I'll call
T = { all numbers we can make using x_1, x_2, x_3, x_4 }
as
T = { all numbers we can make using x_1, x_2 and some s from S }.
So an algorithm is to generate all possible numbers in S, then use each number s in S in turn to generate part of T. (This will generalise fairly easily to n numbers instead of just 4).
Here's a rough, untested code example:
#include <set> // we can use std::set to store integers without duplication
#include <vector> // we might want duplication in the inputs
// the 2-number special case
std::set<int> all_combinations_from_pair(int a, int b)
{
std::set results;
// here we just use brute force
results.insert(a+b); // = b+a
results.insert(a-b);
results.insert(b-a);
results.insert(a*b); // = b*a
// need to make sure it divides exactly
if (a%b==0) results.insert(a/b);
if (b%a==0) results.insert(b/a);
return results;
}
// the general case
std::set<int> all_combinations_from(std::vector<int> inputs)
{
if (inputs.size() == 2)
{
return all_combinations_from_pair(inputs[0], inputs[1]);
}
else
{
std::set<int> S = all_combinations_from_pair(inputs[0], inputs[1]);
std::set<int> T;
std::set<int> rest = S;
rest.remove(rest.begin());
rest.remove(rest.begin()); // gets rid of first two
for (std::set<int>.iterator i = S.begin(); i < S.end(); i++)
{
std::set<int> new_inputs = S;
new_inputs.insert(*i);
std::set<int> new_outputs = all_combinations_from(new_inputs);
for (std::set<int>.iterator j = new_outputs.begin(); j < new_outputs.end(); j++)
T.insert(*j); // I'm sure you can do this with set_union()
}
return T;
}
}
If you are allowed to use the same operator twice, you probably don't want to mix the operators into the numbers. Instead, perhaps use three 0's as a placeholder for where operations will occur (none of the 4 numbers are 0, right?) and use another structure to determine which operations will be used.
The second structure could be a vector<int> initialized with three 1's followed by three 0's. The 0's correspond to the 0's in the number vector. If a 0 is preceded by zero 1's, the corresponding operation is +, if preceded by one 1, it's -, etc. For example:
6807900 <= equation of form ( 6 # 8 ) # ( 7 # 9 )
100110 <= replace #'s with (-,-,/)
possibility is (6-8)-(7/9)
Advance through the operation possibilities using next_permutation in an inner loop.
By the way, you can also return early if the number-permutation is an invalid postfix expression. All permutations of the above example less than 6708090 are invalid, and all greater are valid, so you could start with 9876000 and work your way down with prev_permutation.
Look up the Knapsack problem (here's a link to get you started: http://en.wikipedia.org/wiki/Knapsack_problem), this problem is pretty close to that, just a little harder (and the Knapsack problem is NP-complete!)
One thing that might make this faster than normal is parallelisation. Check out OpenMP. Using this, more than one check is carried out at once (your "alg" function) thus if you have a dual/quad core cpu, your program should be faster.
That said, if as suggested above the problem is NP-complete, it'll be faster, not necessarily fast.
i wrote something like this before. You need a recursive evaluator. Call evaluate, when you hit "(" call evaluate again otherwise run along with digits and operators till you hit ")", now return the result of the -+*/ operations the the evaluate instance above you