Finding Randomly Order Substring in String

Finding Randomly Order Substring in String - c++

For my first part of the question, We have given a long string of input and we have to count the occurrence for it.
For eg.
Input = AXBHAAGHXAXBH
Find = AXBH
Output = 2
This can be achieved by using the string.find("term") loop. Eg.
#include <string>
#include <iostream>
int main()
{
int occurrences = 0;
std::string::size_type pos = 0;
std::string inputz = "AXBHAAGHXAXBH";
std::string target = "AXBH";
while ((pos = inputz.find(target, pos )) != std::string::npos) {
++ occurrences;
pos += target.length();
}
std::cout << occurrences << std::endl;
}
However, I am not sure how to do the second part where, it needs to take into account the random structure:
Random structure refers to any orientation of our find. Important note: The find occurrences are always grouped together but can have different structure.
I do not want to use cases because some sample find are too big eg. Find AXBHNMB would have too many cases to consider and would prefer a more general approach.
Eg. AXBH is find, then AXHB is also acceptable for the occurence
A proper example:
Input = AXBHAAGHXAXBH**ABHX**NBMN**AHBX**
Find = AXBH
Output = 4
Prefer if you please code it for the given example with link to explanation/explanation to any new function you use.

You are correct that checking all permutations would take a lot of time. Fortunately we don't need to do that. What we can do is store the string to find in a std::map<char, int>/std::unordered_map<char, int> and then grab sub strings from the string to search through, convert those into the same type of map and see if those maps are equal. This lets use compare without caring about the order, it just makes sure we have the correct amount of each character. So we would have something like
int main()
{
std::string source = "AHAZHBCHZCAHAHZEHHAAZHBZBZHHAAZAAHHZBAAAAHHHHZZBEWWAAHHZ ";
std::string string_to_find = "AAHHZ";
int counter = 0;
// build map of the characters to find
std::unordered_map<char, int> to_find;
for (auto e : string_to_find)
++to_find[e];
// loop through the string, grabbing string_to_find chunks and comparing
for (std::size_t i = 0; i < source.size() - string_to_find.size();)
{
std::unordered_map<char, int> part;
for (std::size_t j = i; j < string_to_find.size() + i; ++j)
++part[source[j]];
if (to_find == part)
{
++counter;
i += string_to_find.size();
}
else
{
++i;
}
}
std::cout << counter;
}

A naive approach is to iterate over the given string and searching the target string.
In each chunk, we need to sort the portion and compare if it matches with the target string.
#include <string>
#include <iostream>
#include <algorithm>
int main()
{
int occurrences = 0;
std::string::size_type pos = 0;
std::string inputz = "AXBHAAGHXAXBH**ABHX**NBMN**AHBX**";
std::string target = "AXBH";
std::sort(target.begin(), target.end());
int inputz_length = inputz.length();
int target_length = target.length();
int i=0;
for(i=0; i<=inputz_length-target_length; i++)
{
std::string sub = inputz.substr(i, target_length);
std::sort(sub.begin(), sub.end());
if (target.compare(sub) == 0)
{
std::cout << i<<"-->"<< target<<"-->" << sub << std::endl;
occurrences++;
i=i+target_length;
}
}
std::cout << occurrences << std::endl;
return 0;
}
Output:
0-->ABHX-->ABHX
9-->ABHX-->ABHX
15-->ABHX-->ABHX
27-->ABHX-->ABHX
4
Extra function: Uses sort function from algorithm header file.
Time complexity: more than O(n2)

One solution is to find a canonical representation for both the search string and a substring. Two fast approaches are possible.
1) Sort the substring.
2) Calculate a histogram of letters.
Option 2 can be calculated incrementally by incrementing histogram bins for the incoming letters and decrementing the bins for outgoing letters in the search window.
While updating the histogram bin, one can also check if this particular update toggles the overall matching:
// before adding the incoming letter
if (h[incoming] == target[incoming]) matches--;
else if (++h[incoming] == target[incoming]) matches++;
// before subtracting outgoing letter
if (h[outgoing] == target[outgoing]) matches--;
else if (--h[outgoing] == target[outgoing]) matches++;
if (matches == number_of_unique_letters) occurences++;
Then the overall complexity becomes O(n).

Related

C++ String comparison not working in vector iteration for word counting algorithm

I am new on programming with c++ and currently trying to create a program to count the amount of each words from a string from .txt file.
My Issue right now is that when I utilized vector to store each words and count the same words with comparison, it sometimes skipped some words.
for(int i = 0;i<words.size();i++) { //Using nested for loops to counts the words
finalWords.push_back(words[i]);//Words that are unique will be counted
int counts = 1;
for(int j = i + 1; j<words.size();j++) {
if(words[i] == words[j]) {
counts++;
words.erase(words.begin() + j); //Removing the words that is not unique
}
continue;
}
wordCount.push_back(counts);
}
In my full code, words is a string vector filled with similar words, finalWords are an empty string vector and wordCount is int vector to store the amount of the word from the finalWords vector. I thought the problem are unprinted characters like newline character, but when I checked the input its not the strings nearing line break that the comparison operator failed to compare properly. Is there something I missed? If there is, what do I need to do to fix it?
Thank you in advance!

When you erase the element at index j then the next element will be at index j, not at index j+1.
The loop should go somewhat like this:
for(int j = i + 1; j<words.size(); ) { // no increment here
if (erasse_it) {
words.erase(words.begin() + j);
// no increment here
} else {
++j; // increment here
}
}
However, as others mentioned your code is unnecessarily compilcated and inefficient.
You can use a std::unordered_map to count frequencies:
std::unordered_map<std::string, unsigned> freq;
for (const auto& word : words) {
++freq[word];
}
for (const auto& f : freq) {
std::cout << f.first << " appears " << f.second << " times";
}

Finding the longest palindromic substring (suboptimally)

I'm working on a coding exercise that asks me to find the longest palindromic substring when given an input string. I know my solution isn't optimal in terms of efficiency but I'm trying to get a correct solution first.
So far this is what I have:
#include <string>
#include <algorithm>
#include <iostream>
class Solution {
public:
string longestPalindrome(string s) {
string currentLongest = "";
for (int i = 0; i < s.length(); i++)
{
for (int j = i; j <= s.length(); j++)
{
string testcase = s.substr(i, j);
string reversestring = testcase;
std::reverse(reversestring.begin(), reversestring.end());
if (testcase == reversestring)
{
if (testcase.length() > currentLongest.length())
{
currentLongest = testcase;
}
}
}
}
return currentLongest;
}
};
It works for several test cases, but also fails on a lot of other test cases. I suspect something is going wrong in the most inner loop of my code, but am not sure exactly what. I'm attempting to generate all possible substrings and then check if they are palindromes by comparing them with their reverse; after I establish they are a palindrome I check if it's longer than the current longest palindrome I have found.

because you are not trying all the possible solution
in c++ , substr takes two parameters the first are the starting index , and the second is the length of the substring
how ever in you program you don't check for the string which starts at index 4 and have length of three for example
in the second for loop you shoud start from index 1 not from index i

ACM 1113 - Multiple Morse Matches

I'm trying to solve the ACM 1113 (http://uva.onlinejudge.org/index.php?option=com_onlinejudge&Itemid=8&page=show_problem&problem=3554) and I think I got a valid solution (at least the output seems to be ok for multiple entries that I've tried), the only problem is my solution is being rejected by the submission system and I don't know why since it doesn't take that long to run on my machine, could anyone please help me?
/*
* Multiple morse matches
*/
#include <iostream>
#include <vector>
#include <string>
#include <map>
using namespace std;
std::map<char,string> decodeToMorse;
string toMorse(string w){
string morse = "";
for(int i = 0; i < w.size(); i++){
morse = morse + decodeToMorse[w[i]];
}
return morse;
}
int findPossibleTr( string morse, vector<string> dictMorse, vector<string> dictWords, int index){
int count = 0;
for(int i = 0; i < dictMorse.size(); i++){
if(morse.compare( index, dictMorse[i].size(), dictMorse[i]) == 0){
//cout<<"Found " << dictWords[i] << " on index "<<index<<endl;
if(index+dictMorse[i].size()>=morse.size()){
//cout<<"Adding one for "<< dictWords[i]<<endl;
count+=1;
//return 1;
}else{
count += findPossibleTr(morse, dictMorse, dictWords, index+dictMorse[i].size());
}
}
}
return count;
}
int main(){
int ncases;
cin>>ncases;
decodeToMorse['A'] = ".-";
decodeToMorse['B'] = "-...";
decodeToMorse['C'] = "-.-.";
decodeToMorse['D'] = "-..";
decodeToMorse['E'] = ".";
decodeToMorse['F'] = "..-.";
decodeToMorse['G'] = "--.";
decodeToMorse['H'] = "....";
decodeToMorse['I'] = "..";
decodeToMorse['J'] = ".---";
decodeToMorse['K'] = "-.-";
decodeToMorse['L'] = ".-..";
decodeToMorse['M'] = "--";
decodeToMorse['N'] = "-.";
decodeToMorse['O'] = "---";
decodeToMorse['P'] = ".--.";
decodeToMorse['Q'] = "--.-";
decodeToMorse['R'] = ".-.";
decodeToMorse['S'] = "...";
decodeToMorse['T'] = "-";
decodeToMorse['U'] = "..-";
decodeToMorse['V'] = "...-";
decodeToMorse['W'] = ".--";
decodeToMorse['X'] = "-..-";
decodeToMorse['Y'] = "-.--";
decodeToMorse['Z'] = "--..";
for(int i = 0; i < ncases; i++){
vector<string> dictMorse;
vector<string> dictWords;
string morse;
cin >> morse;
int ndict;
cin >> ndict;
for(int j = 0; j < ndict; j++){
string dictw;
cin >> dictw;
dictMorse.push_back(toMorse(dictw));
dictWords.push_back(dictw);
}
cout<<findPossibleTr(morse,dictMorse, dictWords,0)<<endl;
if(ncases != 1 && i != ncases-1)
cout<<endl;
}
}
I've tried the following input:
3
.---.-.---...
7
AT
ATC
COS
OS
A
T
C
.---.--.-.-.-.---...-.---.
6
AT
TACK
TICK
ATTACK
DAWN
DUSK
.........
5
E
EE
EEE
EEEE
EEEEE
And I get the following output (as expected):
5
2
236
Only problem is that when I submit it to the judge system it says the algorithm spends more than its maximum time limit (3s). Any ideas?

Your algorithm runs out of time because it performs an exhaustive search for all distinct phrases within the dictionary that match the given Morse code. It tries every single possible concatenation of the words in the dictionary.
While this does give the correct answer, it takes time exponential in both the length of the given Morse string and the number of words in the dictionary. The question does actually mention that the number of distinct phrases is at most 2 billion.
Here's a simple test case that demonstrates this behavior:
1
... // 1000 dots
2
E
EE
The correct answer would be over 1 billion in this case, and an exhaustive search would have to enumerate all of them.
A way to solve this problem would be to use memoization, a dynamic programming technique. The key observation here is that a given suffix of the Morse string will always match the same number of distinct phrases.
Side note: in your original code, you passed morse, dictMorse and dictWords by value to your backtracking function. This results in the string and the two vectors being copied at every invocation of the recursive function, which is unnecessary. You can pass by reference, or (since this is in a competitive programming context where the guidelines of good code architecture can be bent) just declare them in global scope. I opted for the former here:
int findPossibleTr( const string &morse, const vector<string> &dictMorse, const vector<string> &dictWords, vector<int> &memo, int index ) {
if (memo[index] != -1) return memo[index];
int count = 0;
/* ... */
return memo[index] = count;
}
And in your initialization:
/* ... */
vector<int> memo(morse.size(), -1); // -1 here is a signal that the values are yet unknown
cout << findPossibleTr(morse, dictMorse, dictWords, memo, 0) << endl;
/* ... */
This spits out the answer 1318412525 to the above test case almost instantly.
For each of the T test cases, findPossibleTr is computed only once for each of the M suffixes of the Morse string. Each computation considers each of the N words once, with the comparison taking time linear in the length K of the word. In general, this takes O(TMNK) time which, depending on the input, might take too long. However, since matches seem to be relatively sparse in Morse code, it should run in time.
A more sophisticated approach would be to make use of a data structure such as a trie to speed up the string matching process, taking O(TMN) time in total.
Another note: it is not actually necessary for decodeToMorse to be a map. It can simply be an array or a vector of 26 strings. The string corresponding to character c is then decodeToMorse[c - 'A'].

I'm writing up my process for this situation, hope it helps.
I would first analyse the algorithm to see if it's fast enough for the problem. For example if the input of n can be as large as 10^6 and the time limit being 1 sec, then an O(n2) algorithm is not going to make it.
Then, would test against an input as 'heavy' as possible for the problem statement (max number of test cases with max input length or whatever). If it exceeds the time limit, there might be something in the code that can be optimized to get a lower constant factor. It's possible that after all the hard optimizations it's still not fast enough. In that case I would go back to step #1
After making sure the algorithm is ok, I would try to generate random inputs and try a few rounds to see if there're any peculiar cases the algorithm is yet to cover.

There are three things I'd suggest doing to improve the performance of this code.
Firstly, all the arguments to toMorse and findPossibleTr are being passed by value. This will make a copy, which for objects like std::string and std::vector will be doing memory allocations. This will be quite costly, especially for the recursive calls to findPossibleTr. To fix it, change the function declarations to take const references, like so:
string toMorse(const string& w)
int findPossibleTr( const string& morse, const vector<string>& dictMorse, const vector<string>& dictWords, int index)
Secondly, string concatenation in toMorse is doing allocations making and throwing away lots of strings. Using a std::stringstream will speed that up:
#include <sstream>
string toMorse(const string& w){
stringstream morse;
for(int i = 0; i < w.size(); i++){
morse << decodeToMorse[w[i]];
}
return morse.str();
}
Finally, we can reuse the vectors inside the loop in main, instead of destructing the old ones and creating new ones each iteration by using clear().
// ...
vector<string> dictMorse;
vector<string> dictWords;
for(size_t i = 0; i < ncases; i++){
dictMorse.clear();
dictWords.clear();
string morse;
cin >> morse;
// ...
Putting it all together on my machine gives me a 30% speed up, from 0.006s to 0.004s on your test case. Not too bad. As a bonus, if you are on an Intel platform, Intel's optimization manual says that unsigned integers are faster than signed integers, so I've switched all ints to size_ts, which also fixes up some warnings. The complete code now becomes
/*
* Multiple morse matches
* Filipe C
*/
#include <iostream>
#include <vector>
#include <string>
#include <sstream>
#include <map>
using namespace std;
std::map<char,string> decodeToMorse;
string toMorse(const string& w){
stringstream morse;
for(size_t i = 0; i < w.size(); i++){
morse << decodeToMorse[w[i]];
}
return morse.str();
}
size_t findPossibleTr( const string& morse, const vector<string>& dictMorse, const vector<string>& dictWords, size_t index){
size_t count = 0;
for(size_t i = 0; i < dictMorse.size(); i++){
if(morse.compare( index, dictMorse[i].size(), dictMorse[i]) == 0){
//cout<<"Found " << dictWords[i] << " on index "<<index<<endl;
if(index+dictMorse[i].size()>=morse.size()){
//cout<<"Adding one for "<< dictWords[i]<<endl;
count+=1;
//return 1;
}else{
count += findPossibleTr(morse, dictMorse, dictWords, index+dictMorse[i].size());
}
}
}
return count;
}
int main(){
size_t ncases;
cin>>ncases;
decodeToMorse['A'] = ".-";
decodeToMorse['B'] = "-...";
decodeToMorse['C'] = "-.-.";
decodeToMorse['D'] = "-..";
decodeToMorse['E'] = ".";
decodeToMorse['F'] = "..-.";
decodeToMorse['G'] = "--.";
decodeToMorse['H'] = "....";
decodeToMorse['I'] = "..";
decodeToMorse['J'] = ".---";
decodeToMorse['K'] = "-.-";
decodeToMorse['L'] = ".-..";
decodeToMorse['M'] = "--";
decodeToMorse['N'] = "-.";
decodeToMorse['O'] = "---";
decodeToMorse['P'] = ".--.";
decodeToMorse['Q'] = "--.-";
decodeToMorse['R'] = ".-.";
decodeToMorse['S'] = "...";
decodeToMorse['T'] = "-";
decodeToMorse['U'] = "..-";
decodeToMorse['V'] = "...-";
decodeToMorse['W'] = ".--";
decodeToMorse['X'] = "-..-";
decodeToMorse['Y'] = "-.--";
decodeToMorse['Z'] = "--..";
vector<string> dictMorse;
vector<string> dictWords;
for(size_t i = 0; i < ncases; i++){
dictMorse.clear();
dictWords.clear();
string morse;
cin >> morse;
size_t ndict;
cin >> ndict;
for(size_t j = 0; j < ndict; j++){
string dictw;
cin >> dictw;
dictMorse.push_back(toMorse(dictw));
dictWords.push_back(dictw);
}
cout<<findPossibleTr(morse,dictMorse, dictWords,0)<<endl;
if(ncases != 1 && i != ncases-1)
cout<<endl;
}
}

Getting Word Frequency From Vector In c++

I have googled this question and couldn't find an answer that worked with my code so i wrote this to get the frequency of the words the only issue is that i am getting the wrong number of occurrences of words apart form one that i think is a fluke. Also i am checking to see if a word has already been entered into the vector so i don't count the same word twice.
fileSize = textFile.size();
vector<wordFrequency> words (fileSize);
int index = 0;
for(int i = 0; i <= fileSize - 1; i++)
{
for(int j = 0; j < fileSize - 1; j++)
{
if(string::npos != textFile[i].find(textFile[j]) && words[i].Word != textFile[j])
{
words[j].Word = textFile[i];
words[j].Times = index++;
}
}
index = 0;
}
Any help would be appreciated.

Consider using a std::map<std::string,int> instead. The map class will handle ensuring that you don't have any duplicates.

Using an associative container:
typedef std::unordered_map<std::string, unsigned> WordFrequencies;
WordFrequencies count(std::vector<std::string> const& words) {
WordFrequencies wf;
for (std::string const& word: words) {
wf[word] += 1;
}
return wf;
}
It is hard to get simpler...
Note: you can replace unordered_map with map if you want the worlds sorted alphabetically, and you can write custom comparisons operations to treat them case-insensitively.

try this code instead if you do not want to use a map container..
struct wordFreq{
string word;
int count;
wordFreq(string str, int c):word(str),count(c){}
};
vector<wordFreq> words;
int ffind(vector<wordFreq>::iterator i, vector<wordFreq>::iterator j, string s)
{
for(;i<j;i++){
if((*i).word == s)
return 1;
}
return 0;
}
Code for finding the no of occurrences in a textfile vector is then:
for(int i=0; i< textfile.size();i++){
if(ffind(words.begin(),words.end(),textfile[i])) // Check whether word already checked for, if so move to the next one, i.e. avoid repetitions
continue;
words.push_back(wordFreq(textfile[i],1)); // Add the word to vector as it was not checked before and set its count to 1
for(int j = i+1;j<textfile.size();j++){ // find possible duplicates of textfile[i]
if(file[j] == (*(words.end()-1)).word)
(*(words.end()-1)).count++;
}
}

Need help optimizing a program that finds all possible substrings

I have to find all possible, unique substrings from a bunch of user-input strings. This group of substrings has to be alphabetically sorted without any duplicate elements, and the group must be queryable by number. Here's some example input and output:
Input:
3 // This is the user's desired number of strings
abc // So the user inputs 3 strings
abd
def
2 // This is the user's desired number of queries
7 // So the user inputs 2 queries
2
Output:
// From the alphabetically sorted group of unique substrings,
bd // This is the 7th substring
ab // And this is the 2nd substring
Here's my implementation:
#include <map>
#include <iostream>
using namespace std;
int main() {
int number_of_strings;
int number_of_queries;
int counter;
string current_string;
string current_substr;
map<string, string> substrings;
map<int, string> numbered_substrings;
int i;
int j;
int k;
// input step
cin >> number_of_strings;
string strings[number_of_strings];
for (i = 0; i < number_of_strings; ++i)
cin >> strings[i];
cin >> number_of_queries;
int queries[number_of_queries];
for (i = 0; i < number_of_queries; ++i)
cin >> queries[i];
// for each string in 'strings', I want to insert every possible
// substring from that string into my 'substrings' map.
for (i = 0; i < number_of_strings; ++i) {
current_string = strings[i];
for (j = 1; j <= current_string.length(); ++j) {
for (k = 0; k <= current_string.length()-j; ++k) {
current_substr = current_string.substr(k, j);
substrings[current_substr] = current_substr;
}
}
}
// my 'substrings' container is now sorted alphabetically and does
// not contain duplicate elements, because the container is a map.
// but I want to make the map queryable by number, so I'm iterating
// through 'substrings' and assigning each value to an int key.
counter = 1;
for (map<string,string>::iterator it = substrings.begin();
it != substrings.end(); ++it) {
numbered_substrings[counter] = it->second;
++counter;
}
// output step
for (i = 0; i < number_of_queries; ++i) {
if (queries[i] > 0 && queries[i] <= numbered_substrings.size()) {
cout << numbered_substrings[queries[i]] << endl;
} else {
cout << "INVALID" << endl;
}
}
return 0;
}
I need to optimize my algorithm, but I'm not sure how to do it. Maybe it's the fact that I have a second for loop for assigning new int keys to each substring. Help?

Check out Suffix tree. It usually runs in O(n) time:
This article was helpful for me:
http://allisons.org/ll/AlgDS/Tree/Suffix/

Minor notes:
1. include <string>
2. careful with those } else {; one day you'll have a lot of else if branches
and a lot of lines and you'll wonder where an if starts and where it ends
3. careful with unsigned versus signed mismatching... again, one day it will
come back and bite (also, it's nice to compile without errors or warnings)
4. don't try to define static arrays with a variable size
5. nice with ++ i. not many know it has a slight performance boost
(maybe not noticeable with today's processors but still)
While I do agree that using proper algorithms when needed (say bubble sort, heap sort etc. for sorting, binary search, binary trees etc. for searching), sometimes I find it nice to do an optimization on current code. Imagine having a big project and implementing something requires rewrites... not many are willing to wait for you (not to mention the required unit testing, fat testing and maybe fit testing). At least my opinion. [and yes, I know some are gonna say that if it is so complicated then it was written badly from the start - but hey, you can't argue with programmers that left before you joined the team :P]
But I do agree, using existing stuff is a good alternative when called for. But back to the point. I tested it with
3, abc, def, ghi
4, 1, 3, 7, 12
I can't say whether yours is any slower than mine or vice-versa; perhaps a random string generator that adds maybe 500 inputs (then calculates all subs) might be a better test, but I am too lazy at 2 in the morning. At most, my way of writing it might help you (at least to me it seems simpler and uses less loops and assignments). Not a fan of vectors, cos of the slight overhead, but I used it to keep up with your requirement of dynamic querying... a static array of a const would be faster, obviously.
Also, while not my style of naming conventions, I decided to use your names so you can follow the code easier.
Anyway, take a look and tell me what you think:
#include <map>
#include <iostream>
#include <string> // you forgot to add this... trust me, it's important :)
#include <vector> // not a fan, but it's not that bad IF you want dynamic buffers
#include <strstream>
using namespace std;
int main ()
{
unsigned int number_of_strings = 0;
// string strings[number_of_strings]; // don't do this... you can't assign static arrays of a variable size
// this just defaults to 0; you're telling the compiler
cin >> number_of_strings;
map <string, string> substrings;
string current_string, current_substr;
unsigned int i, j, k;
for (i = 0; i < number_of_strings; ++ i)
{
cin >> current_string;
substrings[current_string] = current_string;
for (j = 1; j <= current_string.length(); ++ j)
{
for (k = 0; k <= current_string.length() - j; ++ k)
{
current_substr = current_string.substr(k, j);
substrings[current_substr] = current_substr;
}
}
}
vector <string> numbered_substrings;
for (map <string, string>::iterator it = substrings.begin(); it != substrings.end(); ++ it)
numbered_substrings.push_back(it->second);
unsigned int number_of_queries = 0;
unsigned int query = 0;
cin >> number_of_queries;
current_string.clear();
for (i = 0; i < number_of_queries; ++ i)
{
cin >> query;
-- query;
if ((query >= 0) && (query < numbered_substrings.size()))
current_string = current_string + numbered_substrings[query] + '\n';
else
cout << "INVALID: " << query << '\n' << endl;
}
cout << current_string;
return 0;
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Finding Randomly Order Substring in String - c++

Related

C++ String comparison not working in vector iteration for word counting algorithm

Finding the longest palindromic substring (suboptimally)

ACM 1113 - Multiple Morse Matches

Getting Word Frequency From Vector In c++

Need help optimizing a program that finds all possible substrings

Categories

Resources