Need help optimizing a program that finds all possible substrings

Need help optimizing a program that finds all possible substrings - c++

I have to find all possible, unique substrings from a bunch of user-input strings. This group of substrings has to be alphabetically sorted without any duplicate elements, and the group must be queryable by number. Here's some example input and output:
Input:
3 // This is the user's desired number of strings
abc // So the user inputs 3 strings
abd
def
2 // This is the user's desired number of queries
7 // So the user inputs 2 queries
2
Output:
// From the alphabetically sorted group of unique substrings,
bd // This is the 7th substring
ab // And this is the 2nd substring
Here's my implementation:
#include <map>
#include <iostream>
using namespace std;
int main() {
int number_of_strings;
int number_of_queries;
int counter;
string current_string;
string current_substr;
map<string, string> substrings;
map<int, string> numbered_substrings;
int i;
int j;
int k;
// input step
cin >> number_of_strings;
string strings[number_of_strings];
for (i = 0; i < number_of_strings; ++i)
cin >> strings[i];
cin >> number_of_queries;
int queries[number_of_queries];
for (i = 0; i < number_of_queries; ++i)
cin >> queries[i];
// for each string in 'strings', I want to insert every possible
// substring from that string into my 'substrings' map.
for (i = 0; i < number_of_strings; ++i) {
current_string = strings[i];
for (j = 1; j <= current_string.length(); ++j) {
for (k = 0; k <= current_string.length()-j; ++k) {
current_substr = current_string.substr(k, j);
substrings[current_substr] = current_substr;
}
}
}
// my 'substrings' container is now sorted alphabetically and does
// not contain duplicate elements, because the container is a map.
// but I want to make the map queryable by number, so I'm iterating
// through 'substrings' and assigning each value to an int key.
counter = 1;
for (map<string,string>::iterator it = substrings.begin();
it != substrings.end(); ++it) {
numbered_substrings[counter] = it->second;
++counter;
}
// output step
for (i = 0; i < number_of_queries; ++i) {
if (queries[i] > 0 && queries[i] <= numbered_substrings.size()) {
cout << numbered_substrings[queries[i]] << endl;
} else {
cout << "INVALID" << endl;
}
}
return 0;
}
I need to optimize my algorithm, but I'm not sure how to do it. Maybe it's the fact that I have a second for loop for assigning new int keys to each substring. Help?

Check out Suffix tree. It usually runs in O(n) time:
This article was helpful for me:
http://allisons.org/ll/AlgDS/Tree/Suffix/

Minor notes:
1. include <string>
2. careful with those } else {; one day you'll have a lot of else if branches
and a lot of lines and you'll wonder where an if starts and where it ends
3. careful with unsigned versus signed mismatching... again, one day it will
come back and bite (also, it's nice to compile without errors or warnings)
4. don't try to define static arrays with a variable size
5. nice with ++ i. not many know it has a slight performance boost
(maybe not noticeable with today's processors but still)
While I do agree that using proper algorithms when needed (say bubble sort, heap sort etc. for sorting, binary search, binary trees etc. for searching), sometimes I find it nice to do an optimization on current code. Imagine having a big project and implementing something requires rewrites... not many are willing to wait for you (not to mention the required unit testing, fat testing and maybe fit testing). At least my opinion. [and yes, I know some are gonna say that if it is so complicated then it was written badly from the start - but hey, you can't argue with programmers that left before you joined the team :P]
But I do agree, using existing stuff is a good alternative when called for. But back to the point. I tested it with
3, abc, def, ghi
4, 1, 3, 7, 12
I can't say whether yours is any slower than mine or vice-versa; perhaps a random string generator that adds maybe 500 inputs (then calculates all subs) might be a better test, but I am too lazy at 2 in the morning. At most, my way of writing it might help you (at least to me it seems simpler and uses less loops and assignments). Not a fan of vectors, cos of the slight overhead, but I used it to keep up with your requirement of dynamic querying... a static array of a const would be faster, obviously.
Also, while not my style of naming conventions, I decided to use your names so you can follow the code easier.
Anyway, take a look and tell me what you think:
#include <map>
#include <iostream>
#include <string> // you forgot to add this... trust me, it's important :)
#include <vector> // not a fan, but it's not that bad IF you want dynamic buffers
#include <strstream>
using namespace std;
int main ()
{
unsigned int number_of_strings = 0;
// string strings[number_of_strings]; // don't do this... you can't assign static arrays of a variable size
// this just defaults to 0; you're telling the compiler
cin >> number_of_strings;
map <string, string> substrings;
string current_string, current_substr;
unsigned int i, j, k;
for (i = 0; i < number_of_strings; ++ i)
{
cin >> current_string;
substrings[current_string] = current_string;
for (j = 1; j <= current_string.length(); ++ j)
{
for (k = 0; k <= current_string.length() - j; ++ k)
{
current_substr = current_string.substr(k, j);
substrings[current_substr] = current_substr;
}
}
}
vector <string> numbered_substrings;
for (map <string, string>::iterator it = substrings.begin(); it != substrings.end(); ++ it)
numbered_substrings.push_back(it->second);
unsigned int number_of_queries = 0;
unsigned int query = 0;
cin >> number_of_queries;
current_string.clear();
for (i = 0; i < number_of_queries; ++ i)
{
cin >> query;
-- query;
if ((query >= 0) && (query < numbered_substrings.size()))
current_string = current_string + numbered_substrings[query] + '\n';
else
cout << "INVALID: " << query << '\n' << endl;
}
cout << current_string;
return 0;
}

Related

Output numbers in reverse (C++) w/ vectors

I'm stuck for the first time on a lab for this class. Please help!
The prompt is:
Write a program that reads a list of integers, and outputs those integers in reverse. The input begins with an integer indicating the number of integers that follow. For coding simplicity, follow each output integer by a comma, including the last one.
Ex: If the input is:
5 2 4 6 8 10
the output is:
10,8,6,4,2,
2 questions: (1) Why does the vector not take user input unless the const int is included? (2) Why does the code not work in general? It seems to properly output, but with an error, and does not include the end line?
#include <iostream>
#include <vector>
using namespace std;
int main() {
const int MAX_ELEMENTS = 20;
vector<int> userInts(MAX_ELEMENTS);
unsigned int i;
int numInts;
cin >> numInts;
for (i = 0; i < numInts; ++i) {
cin >> userInts.at(i);
}
for (i = (numInts - 1); i >= 0; --i) {
cout << userInts.at(i) << ",";
}
cout << endl;
return 0;
}

Firstly, you need to specify the size because you are not using the vector's push_back functionality. Since you are only using at, you must specify the size ahead of time. Now, there's a few ways to do this.
Example 1:
cin >> numInts;
vector<int> userInts(numInts); // set the size AFTER the user specifies it
for (i = 0; i < numInts; ++i) {
cin >> userInts.at(i);
}
Alternatively, using push_back you can do:
vector<int> userInts; // set the size AFTER the user specifies it
for (i = 0; i < numInts; ++i) {
int t;
cin >> t;
userInts.push_back(t);
}
As for looping backwards, i >= 0 will always be true for unsigned numbers. Instead, you can use iterators.
for ( auto itr = userInts.rbegin(); itr != userInts.rend(); ++itr ) {
cout << *itr;
}
If you need to use indexes for the reverse loop, you can do:
for ( i = numInts - 1; i != ~0; --i ) { // ~0 means "not 0", and is the maximum value, I believe this requires c++17 or 20 though
cout << userInts.at(i);
}

with unsigned int i; the condition i >= 0 is always true. Eventually you will access an out-of-range element, which will throw std::out_of_range.

To answer your other question
std::vector userInts;
create a vector with no entries
userInts.at(i)
tries to access the (non existnat) ith entry.
You have 2 choices
create vector with a lot of empty etries
ask the vector to dynamically grow
The first one is what you did
const int MAX_ELEMENTS = 20;
vector<int> userInts(MAX_ELEMENTS);
Or you can do
userInts.push_back(x);
this will make sure there is enough space in the vector and add the new element to the end.

C++ How to optimize this algorithm ? (std::map)

The problem is the following: We are given a number 's', s ∈ [0, 10^6], and a number 'n', n ∈ [0, 50000], then n numbers, and we have to find how many number pairs' sum is equal to the 's' number (and we must use either maps or sets to solve it)
Here is the example:
Input:
5 (this is s)
6 (this is n)
1
4
3
6
-1
5
Output:
2
explanation : these are the (1,4) and (6,−1) pairs. (1 +4 = 5 and 6 + (-1) = 5)
Here is my "solution" , I don't even know if it's correct, but it works for the example that we got.
#include <iostream>
#include <map>
#include <iterator>
using namespace std;
int main()
{
cin.tie(0);
ios::sync_with_stdio(false);
int s;
cin >> s;
int n;
cin >> n;
map<int, int> numbers;
int element;
int counter = 0;
for(int i=0; i<n;i++)
{
cin >> element;
numbers.insert(pair<int, int>(element, s-element));
}
for(map<int, int>::iterator it = numbers.begin(); it != numbers.end(); it++)
{
map<int, int>::iterator it2 = it;
while(it2 != numbers.end())
{
if(it->second == it2->first)
{
counter++;
break;
}
it2++;
}
}
cout << counter << "\n";
return 0;
}
Thanks for the answers in advance! I'm still a beginner and I'm learning, sorry.

element, s-element is a good idea but there is no reason to store all the pairs and only then check for duplicates. This removes the O(n^2) loop you have there at the end.
The standard way using hashing would be:
seen=unordered_map<number,count>()
for 1...n:
e = read_int()
if (s-e) in seen:
duplicates+=seen[s-e] # Found new seen[s-e] duplicates.
if e in seen:
seen[e]+=1
else:
seen.insert(e,1)
return duplicates

Here's a brute-force method, using a vector:
int target_s = 0;
int quantity_numbers = 0;
std::cin >> target_s >> quantity_numbers;
std::vector<int> data(quantity_numbers);
for (int i = 0; i < quantity_numbers; ++i)
{
cin >> data[i];
}
int count = 0;
for (int i = 0; i < quantity_numbers; ++i)
{
for (j = 0; j < quantity_numbers; ++j)
{
if (i == j) continue;
int pair_sum = data[i] + data[j];
if (pair_sum == target_s) ++count;
}
}
std::cout << count;
The above code includes the cases where pair <a,b> == s and pair <b,a> == s. Not sure if the requirement only wants pair <a,b> in this case.

As always with this kind of questions, the selection of the appropriate algorithm will improve your solution. Writing some "better" C++ code, will nearly never help. Also, brute forcing is nearly never a solution for such an algorithm.
With the following described approach (which was of course not invented by me), we need just one std::map (or even better, a std::unordered_map) and one for loop. We do not need to store the read values in an additional std::vector or such alike. So, we can come up with low memory condumption and fast computation.
Approach. Any time, after reading a value, we will calculate the delta from the desired sum.
If we look at the required condition that the current value and some previuosly read value, should add up to the desired sum, we can write the following mathematical equations:
currentValue + previouslyReadValue = desiredSum
or
desiredSum - currentValue = previouslyReadValue
or with
delta = desiredSum - currentValue
-->
delta == previouslyReadValue
So, we need to look at the already read values and if they are equal to the delta (Because then they would add up the the desired sum), add their count of occurence the the resulting count of valid pairs.
The already read values and their count of occurence will be stored in a std::unordered_map.
All this will result in a 10 line solution:
#include <iostream>
#include <unordered_map>
int main() {
// Initialize our working variables
int numberOfValues{}, desiredSum{}, currentValue{}, resultingCount{};
// Read basic parameters. Desired sum and overall number of input values.
std::cin >> desiredSum >> numberOfValues;
// Here, we will store all values and their count of occurence
std::unordered_map<int, int> valuesAndCount{};
// Read all values and operate on them
for (int i{}; i < numberOfValues; ++i) {
std::cin >> currentValue; // Read from cin
const int delta{ desiredSum - currentValue }; // Calculate the delta from the desired sum
// Look, if the calculated delta is already in the map. Becuase, if the delta and the
// current value sum up to our desired sum, then we found a valid pair.
if (valuesAndCount.find(delta) != valuesAndCount.end())
// Increase the resulting count, by the number of times that this delta value has already been there
resultingCount += valuesAndCount[delta];
// Nothing special, Just cound the occurence of this value.
valuesAndCount[currentValue]++;
}
return !!(std::cout << resultingCount);
}

Finding if a string is contained in another string without "find" in c++

I wrote this program to find if a string is contained in another string (see paragraph below this, I tried to explain what I want it to do). When I test it, sometimes it works, most of the times it gives me the error "String subscript out of range". I'm very new to C++, I'd appreciate someone to tell me how can I improve this code or why I'm being dumb, because I really don't get why it doesn't work.
what i want this to do is find if string one can be found in string way;
so i want it to check for every letter of string way if the letter [i] is equal to the first letter of the string one (way[i+0]==one[0]),
and way[i+1]==one[1] and so on for all letters in one.
so for example way = abankjve and one = ank
it takes the first letter in way (a) and gets the first letter in one(a). the're equal. but we see that way[0+1] is not equal to one[1]. so o can't be true.
it goes on like this till it gets to way[2]=a. way[2+0]=one[0]. o is true. then it checks way[2+1]=one[1]. true! then it checks way[2+2]=one[2]. true! then
one is contained in way.
#include <iostream>
using namespace std;
int main()
{
string way, one;
bool o=false;
cin >> way;
cin >> one;
for (int i = 0; i < way.size(); i++)
{
for (int k = 0; k < one.size(); k++)
{
if (way[i + k]==one[k])
{
o = true;
}
}
}
cout << o << endl;
}

If you think about it, way[i+k] will result in index out of range.
say way is length of 5 and one is length of 3.
i+k's range is 0 <= i + k <= 6. Witch is bigger than the possible index of way.
change first for loop for (int i = 0; i < way.size(); i++) to
for (int i = 0; i <= (int)way.size() - one.size(); i++)
Note I've used static_cast to int. *.size() returns unsigned int so if one's size is bigger than way's size, the result won't be what you've imagined.

#include <iostream>
#include <string>
int main()
{
std::string way, one;
std::cin >> way;
std::cin >> one;
bool found{};
for (size_t i = 0; i < way.size() - one.size()+1; i++)
{
if(one == way.substr(i, one.size())) {
found = true;
break;
}
}
std::cout << found;
}
Demo

Cyclical vector - Finding the least possible 'cost' (From CodeChef)

This is a question from Codechef but please bear with me.
https://www.codechef.com/ZCOPRAC/problems/ZCO12004
The contest is for the preparation of the Zonal Computing Olympiad held in India, so its not a competitive contest from which I'd earn something as such. Just need a little help to see what is wrong with my code, because I have a feeling I've overlooked something big and stupid. :P
The problem basically states:
Imagine there is a vector or array such that the last element is
linked to the first one. Find the lowest possible sum from adding at
least one of each adjacent pairs of elements. (refer to link please)
So answer for {1,2,1,2,2} output would be 4 by adding 1+1+2.
Here is my solution:
Basically what it does is that it iterates backwards, from the end of the vector to the beginning, and stores the lowest possible sum that can be achieved from that vector onwards, in vector M. Done using dynamic programming, basically.
The first two elements of M are the possible answers. Then I do some checks to see which is possible. If M[1] is less than M[0] then the last element of the array/vector should have been included in the sum calculated in M[1].
#include <algorithm>
#include <iostream>
#include <vector>
#define print(arr) for(auto pos = arr.begin(); pos != arr.end(); ++pos) cout << *pos << " "; cout << endl;
typedef long long int ll;
using namespace std;
int main() {
int N;
ll x;
cin >> N;
vector <ll> A;
vector <ll> M(N+2);
fill(M.begin(),M.end(),0);
for (int i = 0; i < N; i++) {
cin >> x;
A.push_back(x);
}
for (int i = N-1; i >= 0; i--) {
M[i] = A[i]+*min_element(M.begin()+i+1, M.begin()+i+3);
}
if (M[0] <= M[1]) cout << M[0] << endl;
else if (M[1] < M[0]) {
if (M[N-1] <= (M[N-2])) cout << M[1] << endl;
else cout << M[0] << endl;
}
}
However, I could not pass 2 of the test cases in subtask 2. I think the last part of my code is incorrect. Any idea what I could be doing wrong? Either that, or I have misunderstood the question. The term "adjacent pairs" is sort of ambiguous. So if there are 4 numbers 3,4,5,6 does adjacent pairs mean adjacent pairs to be {(3,4) (4,5) (5,6) (6,3)} or {either (3,4) and (5,6) or (4,5) and (6,3)}? My code considers the former.
EDIT:
Thanks a lot #User_Targaryen cleared some doubts about this question! Basically my implementation was the same as yours as my idea behind using dynamic programming was the same. Only that in this case my M (your dp) was the reverse of yours. Anyway I got AC! :) (I had left some silly debugging statements and was wondering for 15 mins what went wrong xD) Updated solution:
#include <algorithm>
#include <iostream>
#include <vector>
#define print(arr) for(auto pos = arr.begin(); pos != arr.end(); ++pos) cout << *pos << " "; cout << endl;
typedef long long int ll;
using namespace std;
int main() {
int N;
ll x, sum = 0;
cin >> N;
vector <ll> A;
vector <ll> M(N+2);
fill(M.begin(),M.end(),0);
for (int i = 0; i < N; i++) {
cin >> x;
A.push_back(x);
}
for (int i = N-1; i >= 0; i--) {
M[i] = A[i]+*min_element(M.begin()+i+1, M.begin()+i+3);
}
//print(M);
reverse(A.begin(), A.end());
vector <ll> M2(N+2);
fill(M2.begin(),M2.end(),0);
for (int i = N-1; i >= 0; i--) {
M2[i] = A[i]+*min_element(M2.begin()+i+1, M2.begin()+i+3);
}
//print(M2);
cout << min(M[0], M2[0]) << endl;
}

I am attaching my accepted solution here:
#include<iostream>
using namespace std;
int main()
{
int i,j,k,n;
cin>>n;
int a[n],dp1[n],dp2[n];
int ans;
for(i=0;i<n;i++)
{
cin>>a[i];
dp1[i]=0;
dp2[i]=0;
}
if(n <= 2)
cout<< min(a[0],a[1]);
else{
i = 2;
dp1[0] = a[0];
dp1[1] = a[1];
while (i < n){
dp1[i] = a[i] + min(dp1[i-1],dp1[i-2]);
i = i + 1;
}
dp2[0] = a[n-1];
dp2[1] = a[n-2];
i = n-3;
j = 2;
while(i >= 0){
dp2[j] = a[i] + min(dp2[j-1],dp2[j-2]);
i = i - 1;
j = j + 1;
}
ans = min(dp1[n-1], dp2[n-1]);
cout<<ans;
}
return 0;
}
dp1[i] means the most optimal solution till now by including the i-th element in the solution
dp2[i] means the most optimal solution till now by including the i-th element in the solution
dp1[] is calculated from left to right, while dp2[] is calculated from right to left
The minimum of dp1[n-1] and dp2[n-1] is the final answer.
I did your homework!
Edit: #Alex: Dynamic Programming is something that is very difficult to teach. It is something that comes naturally with some practice. Let us consider my solution (forget about your solution for some time):
dp1[n-1] means that I included the last element definitely in the solution, and the constraint that at least one of any 2 adjacent elements need to picked, is satisfied because it always follows:
dp1[i] = a[i] + min(dp1[i-1],dp1[i-2]);
dp2[n-1] means that I included the first element definitely in the solution, and the constraint that at least one of any 2 adjacent elements need to picked, is satisfied also.
So, the minimum of the above two, will give me the final result.

The idea in your M[i] array is "the minimum cost for a solution, assuming the index i is included in it".
The condition if (M[0] <= M[1]) means "if including index 0 is better than not including it, done".
If this condition doesn't hold, then, first of all, the check if (M[1] < M[0]) is superfluous - remove it. It won't fix any bugs, but will at least reduce confusion.
If the condition is false, you should output M[1], but only if it corresponds to a valid solution. That is, since index 0 is not chosen, the last index should be chosen. However, with your data structure it's impossible to know whether M[1] corresponds to a solution that chose last index - this information is lost.
To fix this, consider building two arrays - add e.g. an array L whose meaning is "the minimum cost for a solution, assuming the index i is included in it, and also index N-1 is included in it".
Then, at the end of your program, output the minimum of M[0] and L[1].

ACM 1113 - Multiple Morse Matches

I'm trying to solve the ACM 1113 (http://uva.onlinejudge.org/index.php?option=com_onlinejudge&Itemid=8&page=show_problem&problem=3554) and I think I got a valid solution (at least the output seems to be ok for multiple entries that I've tried), the only problem is my solution is being rejected by the submission system and I don't know why since it doesn't take that long to run on my machine, could anyone please help me?
/*
* Multiple morse matches
*/
#include <iostream>
#include <vector>
#include <string>
#include <map>
using namespace std;
std::map<char,string> decodeToMorse;
string toMorse(string w){
string morse = "";
for(int i = 0; i < w.size(); i++){
morse = morse + decodeToMorse[w[i]];
}
return morse;
}
int findPossibleTr( string morse, vector<string> dictMorse, vector<string> dictWords, int index){
int count = 0;
for(int i = 0; i < dictMorse.size(); i++){
if(morse.compare( index, dictMorse[i].size(), dictMorse[i]) == 0){
//cout<<"Found " << dictWords[i] << " on index "<<index<<endl;
if(index+dictMorse[i].size()>=morse.size()){
//cout<<"Adding one for "<< dictWords[i]<<endl;
count+=1;
//return 1;
}else{
count += findPossibleTr(morse, dictMorse, dictWords, index+dictMorse[i].size());
}
}
}
return count;
}
int main(){
int ncases;
cin>>ncases;
decodeToMorse['A'] = ".-";
decodeToMorse['B'] = "-...";
decodeToMorse['C'] = "-.-.";
decodeToMorse['D'] = "-..";
decodeToMorse['E'] = ".";
decodeToMorse['F'] = "..-.";
decodeToMorse['G'] = "--.";
decodeToMorse['H'] = "....";
decodeToMorse['I'] = "..";
decodeToMorse['J'] = ".---";
decodeToMorse['K'] = "-.-";
decodeToMorse['L'] = ".-..";
decodeToMorse['M'] = "--";
decodeToMorse['N'] = "-.";
decodeToMorse['O'] = "---";
decodeToMorse['P'] = ".--.";
decodeToMorse['Q'] = "--.-";
decodeToMorse['R'] = ".-.";
decodeToMorse['S'] = "...";
decodeToMorse['T'] = "-";
decodeToMorse['U'] = "..-";
decodeToMorse['V'] = "...-";
decodeToMorse['W'] = ".--";
decodeToMorse['X'] = "-..-";
decodeToMorse['Y'] = "-.--";
decodeToMorse['Z'] = "--..";
for(int i = 0; i < ncases; i++){
vector<string> dictMorse;
vector<string> dictWords;
string morse;
cin >> morse;
int ndict;
cin >> ndict;
for(int j = 0; j < ndict; j++){
string dictw;
cin >> dictw;
dictMorse.push_back(toMorse(dictw));
dictWords.push_back(dictw);
}
cout<<findPossibleTr(morse,dictMorse, dictWords,0)<<endl;
if(ncases != 1 && i != ncases-1)
cout<<endl;
}
}
I've tried the following input:
3
.---.-.---...
7
AT
ATC
COS
OS
A
T
C
.---.--.-.-.-.---...-.---.
6
AT
TACK
TICK
ATTACK
DAWN
DUSK
.........
5
E
EE
EEE
EEEE
EEEEE
And I get the following output (as expected):
5
2
236
Only problem is that when I submit it to the judge system it says the algorithm spends more than its maximum time limit (3s). Any ideas?

Your algorithm runs out of time because it performs an exhaustive search for all distinct phrases within the dictionary that match the given Morse code. It tries every single possible concatenation of the words in the dictionary.
While this does give the correct answer, it takes time exponential in both the length of the given Morse string and the number of words in the dictionary. The question does actually mention that the number of distinct phrases is at most 2 billion.
Here's a simple test case that demonstrates this behavior:
1
... // 1000 dots
2
E
EE
The correct answer would be over 1 billion in this case, and an exhaustive search would have to enumerate all of them.
A way to solve this problem would be to use memoization, a dynamic programming technique. The key observation here is that a given suffix of the Morse string will always match the same number of distinct phrases.
Side note: in your original code, you passed morse, dictMorse and dictWords by value to your backtracking function. This results in the string and the two vectors being copied at every invocation of the recursive function, which is unnecessary. You can pass by reference, or (since this is in a competitive programming context where the guidelines of good code architecture can be bent) just declare them in global scope. I opted for the former here:
int findPossibleTr( const string &morse, const vector<string> &dictMorse, const vector<string> &dictWords, vector<int> &memo, int index ) {
if (memo[index] != -1) return memo[index];
int count = 0;
/* ... */
return memo[index] = count;
}
And in your initialization:
/* ... */
vector<int> memo(morse.size(), -1); // -1 here is a signal that the values are yet unknown
cout << findPossibleTr(morse, dictMorse, dictWords, memo, 0) << endl;
/* ... */
This spits out the answer 1318412525 to the above test case almost instantly.
For each of the T test cases, findPossibleTr is computed only once for each of the M suffixes of the Morse string. Each computation considers each of the N words once, with the comparison taking time linear in the length K of the word. In general, this takes O(TMNK) time which, depending on the input, might take too long. However, since matches seem to be relatively sparse in Morse code, it should run in time.
A more sophisticated approach would be to make use of a data structure such as a trie to speed up the string matching process, taking O(TMN) time in total.
Another note: it is not actually necessary for decodeToMorse to be a map. It can simply be an array or a vector of 26 strings. The string corresponding to character c is then decodeToMorse[c - 'A'].

I'm writing up my process for this situation, hope it helps.
I would first analyse the algorithm to see if it's fast enough for the problem. For example if the input of n can be as large as 10^6 and the time limit being 1 sec, then an O(n2) algorithm is not going to make it.
Then, would test against an input as 'heavy' as possible for the problem statement (max number of test cases with max input length or whatever). If it exceeds the time limit, there might be something in the code that can be optimized to get a lower constant factor. It's possible that after all the hard optimizations it's still not fast enough. In that case I would go back to step #1
After making sure the algorithm is ok, I would try to generate random inputs and try a few rounds to see if there're any peculiar cases the algorithm is yet to cover.

There are three things I'd suggest doing to improve the performance of this code.
Firstly, all the arguments to toMorse and findPossibleTr are being passed by value. This will make a copy, which for objects like std::string and std::vector will be doing memory allocations. This will be quite costly, especially for the recursive calls to findPossibleTr. To fix it, change the function declarations to take const references, like so:
string toMorse(const string& w)
int findPossibleTr( const string& morse, const vector<string>& dictMorse, const vector<string>& dictWords, int index)
Secondly, string concatenation in toMorse is doing allocations making and throwing away lots of strings. Using a std::stringstream will speed that up:
#include <sstream>
string toMorse(const string& w){
stringstream morse;
for(int i = 0; i < w.size(); i++){
morse << decodeToMorse[w[i]];
}
return morse.str();
}
Finally, we can reuse the vectors inside the loop in main, instead of destructing the old ones and creating new ones each iteration by using clear().
// ...
vector<string> dictMorse;
vector<string> dictWords;
for(size_t i = 0; i < ncases; i++){
dictMorse.clear();
dictWords.clear();
string morse;
cin >> morse;
// ...
Putting it all together on my machine gives me a 30% speed up, from 0.006s to 0.004s on your test case. Not too bad. As a bonus, if you are on an Intel platform, Intel's optimization manual says that unsigned integers are faster than signed integers, so I've switched all ints to size_ts, which also fixes up some warnings. The complete code now becomes
/*
* Multiple morse matches
* Filipe C
*/
#include <iostream>
#include <vector>
#include <string>
#include <sstream>
#include <map>
using namespace std;
std::map<char,string> decodeToMorse;
string toMorse(const string& w){
stringstream morse;
for(size_t i = 0; i < w.size(); i++){
morse << decodeToMorse[w[i]];
}
return morse.str();
}
size_t findPossibleTr( const string& morse, const vector<string>& dictMorse, const vector<string>& dictWords, size_t index){
size_t count = 0;
for(size_t i = 0; i < dictMorse.size(); i++){
if(morse.compare( index, dictMorse[i].size(), dictMorse[i]) == 0){
//cout<<"Found " << dictWords[i] << " on index "<<index<<endl;
if(index+dictMorse[i].size()>=morse.size()){
//cout<<"Adding one for "<< dictWords[i]<<endl;
count+=1;
//return 1;
}else{
count += findPossibleTr(morse, dictMorse, dictWords, index+dictMorse[i].size());
}
}
}
return count;
}
int main(){
size_t ncases;
cin>>ncases;
decodeToMorse['A'] = ".-";
decodeToMorse['B'] = "-...";
decodeToMorse['C'] = "-.-.";
decodeToMorse['D'] = "-..";
decodeToMorse['E'] = ".";
decodeToMorse['F'] = "..-.";
decodeToMorse['G'] = "--.";
decodeToMorse['H'] = "....";
decodeToMorse['I'] = "..";
decodeToMorse['J'] = ".---";
decodeToMorse['K'] = "-.-";
decodeToMorse['L'] = ".-..";
decodeToMorse['M'] = "--";
decodeToMorse['N'] = "-.";
decodeToMorse['O'] = "---";
decodeToMorse['P'] = ".--.";
decodeToMorse['Q'] = "--.-";
decodeToMorse['R'] = ".-.";
decodeToMorse['S'] = "...";
decodeToMorse['T'] = "-";
decodeToMorse['U'] = "..-";
decodeToMorse['V'] = "...-";
decodeToMorse['W'] = ".--";
decodeToMorse['X'] = "-..-";
decodeToMorse['Y'] = "-.--";
decodeToMorse['Z'] = "--..";
vector<string> dictMorse;
vector<string> dictWords;
for(size_t i = 0; i < ncases; i++){
dictMorse.clear();
dictWords.clear();
string morse;
cin >> morse;
size_t ndict;
cin >> ndict;
for(size_t j = 0; j < ndict; j++){
string dictw;
cin >> dictw;
dictMorse.push_back(toMorse(dictw));
dictWords.push_back(dictw);
}
cout<<findPossibleTr(morse,dictMorse, dictWords,0)<<endl;
if(ncases != 1 && i != ncases-1)
cout<<endl;
}
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Need help optimizing a program that finds all possible substrings - c++

Check out Suffix tree. It usually runs in O(n) time: This article was helpful for me: http://allisons.org/ll/AlgDS/Tree/Suffix/

Related

Output numbers in reverse (C++) w/ vectors

C++ How to optimize this algorithm ? (std::map)

Finding if a string is contained in another string without "find" in c++

Cyclical vector - Finding the least possible 'cost' (From CodeChef)

ACM 1113 - Multiple Morse Matches

Categories

Resources