Finding whether strings are anagrams - Revisited - c++

I am sure this question has been discussed in the distant past several times on Stack Overflow. I am just trying to verify whether my answer is valid or not. I saw this question in this thread. I am sorry that this post is a duplicate of the thread and if it has to be removed, I shall do it.
I thought of doing this in a much more simple way. By just XORing the characters in the string.
So O(n) for XORing each character and O(1) for comparison of last characters in both the strings which gives a O(n) solution.
Even though the last characters may be any special symbol but if the strings are anagrams they still end up being the same. Am I right in this logic?
So instead of doing all sorting and hashing can this solution be adopted? My code goes like this:
char a[7] = "Length";
char b[7] = "enghtL";
for (int i = 1; i < 6; i++) {
a[i] = a[i] ^ a[i-1];
b[i] = b[i] ^ b[i-1];
}
if (a[5] == b[5]) {
cout << "\n The strings are anagrams";
}
else {
cout << "\n No they are not";
}

I'm sorry but this won't work.
Sure, if it is an anagram, the code (if it works correctly) will say so but you will also have a lot of 'false positives' because several (different) strings can yield the same output.

You're condensing all the information about an n-byte string into a single byte - effectively a very basic hash. Whenever you get a hash collision between two strings that aren't anagrams you'll return a false positive.
If you want an O(n) method for finding anagrams, sort the words using a counting sort then compare the results for equality.

This code might be useful as part of a solution.
My approach would be:
Check if the two words have the same amount of letters, if not then it cannot be an anagram.
Then sort the letters or each word into alphabetical (or any other) order
Step through the word lists and return false if the letters are not equal.

In C++ you can use string reverse iterators:
std::string s1 = "Length";
std::string s2 = "htgneL";
std::string s3 = "htgnel";
if (s1 == std::string(s2.rbegin(), s2.rend()))
std::cout << "s1 and s2 are anagram" << std::endl;
else
std::cout << "s1 and s2 aren't anagram" << std::endl;
if (s1 == std::string(s3.rbegin(), s3.rend()))
std::cout << "s1 and s3 are anagram" << std::endl;
else
std::cout << "s1 and s3 aren't anagram" << std::endl;

Related

Issue Comparing strings for an Answer Key (C++)

I'm working on a midterm project for my coding class, and while I've gotten the majority of kinks worked out I'm struggling with comparing two string values and determining if they are equal or not. The strings in question are ANSWERKEYand studentAnswers. The former is a constant that the latter is compared to.
The code in question is as follows.
if (studentAnswers == ANSWERKEY)
{
percentScore = 100.0;
cout << "Score: " << percentScore << " % " << 'A' << endl;
}
else if (studentAnswers != ANSWERKEY)
{
int count = 0;
double answerCount = 0.0;
while (count < ANSWERKEY.length())
{
if (studentAnswers.substr(count, count+1) == ANSWERKEY.substr(count, count+1)
{
answerCount++;
count++;
}
else
{
cout << "Incorrect answer." << endl;
count++;
}
}
percentScore = ((answerCount) / (double)ANSWERKEY.length()) * 100.0;
cout << "Percent score is " << percentScore << "%" << endl;
}
The exact issue I'm facing is that I can't work out a better way to compare the strings. With the current method, the output is the following:
The intro to the code runs fine. Only when I get to checking the answers against the key, in this case "abcdefabcdefabcdefab", do I run into issues. Regardless of what characters are changed, the program marks roughly half of all characters as mismatching and drops the score down because of it.
I've thought of using a pair of arrays, but then I can't find a solution to setting up the array when some values of it are empty. If the student's answers are too short, e.g. only 15 characters long, I don't know how to compare the blank space, or even store it in the array.
Thank you for any help you can give.
First:
if (studentAnswers == ANSWERKEY)
{...}
else if (studentAnswers != ANSWERKEY)
{ ...}
looks like an overkill when comparing strings. And where is the else part ?
Second, this is risky. Read the IEE754 and articles about cancellation, or even SO:
double answerCount = 0.0;
...
answerCount++
Third:
You are checking character by character using substr. To me it feels like using a hammer to kill a bacteria.
studentAnswers.substr(count, count+1) == ANSWERKEY.substr(count, count+1)
Fourth:
What if studentAnswers is shorter than ANSWERKEY ?
Conclusion:
You need to clarify inputs/expected outputs and use the debugger to better understand what is happening during execution. Carefully check all your variables at each step fo your program.

While loop in C++ (using break)

I'm currently working through the book C++ Primer (recommended on SO book list). An exercise was given that was essentially read through some strings, check if any strings were repeated twice in succession, if a string was repeated print which word and break out of the loop. If no word was repeated, print that. Here is my solution, I'm wondering a) if it's not a good solution and b) is my test condition for no repeated words ok? Because I had to add 1 to the variable to get it to work as expected. Here is my code:
#include <iostream>
#include <vector>
#include <string>
using namespace std;
int main() {
vector<string> words = {"Cow", "Cat", "Dog", "Dog", "Bird"};
string tempWord;
unsigned int i = 0;
while (i != words.size())
{
if (words[i] == tempWord)
{
cout << "Loop exited as the word " << tempWord << " was repeated.";
break;
}
else
{
tempWord = words[i];
}
// add 1 to i to test equality as i starts at 0
if (i + 1 == words.size())
cout << "No word was repeated.";
++i;
}
return 0;
}
The definition of "good solution" will somewhat depend on the requirements - the most important will always be "does it work" - but then there may be speed and memory requirements on top.
Yours seems to work (unless you have the first string being blank, in which case it'll break); so it's certainly not that bad.
The only suggestion I could make is that you could have a go at writing a version that doesn't keep a copy of one of the strings, because what if they're really really big / lots of them and copying them will be an expensive process?
I would move the test condition outside of the loop, as it seems unnecessary to perform it at every step. For readability I would add a bool:
string tempWord;
unsigned int i = 0;
bool exited = false;
while (i != words.size())
{
if (words[i] == tempWord)
{
cout << "Loop exited as the word " << tempWord << " was repeated.";
exited = true;
break;
}
else
{
tempWord = words[i];
}
++i;
}
// Doing the check afterwards instead
if (!exited)
{
cout << "No word was repeated.";
}
a) if it's not a good solution
For the input specified it is a good solution (it works). However, tempWord is not initialized, so the first time the loop runs it will test against an empty string. Because the input does not contain an empty string, it works. But if your input started with an empty string it would falsely find as repeating.
b) is my test condition for no repeated words ok? Because I had to add 1 to the variable to get it to work as expected.
Yes, and it is simply because the indexing of the array starts from zero, and you are testing it against the count of items in the array. So for example an array with count of 1 will have only one element which will be indexed as zero. So you were right to add 1 to i.
As an answer for the training task your code (after some fixes suggested in other answers) look good. However, if this was a real world problem (and therefore it didn't contain strange restrictions like "use a for loop and break"), then its writer should also consider ways of improving readability.
Usage of default STL algorithm is almost always better than reinventing the wheel, so I would write this code as follows:
auto equal = std::find_adjacent(words.begin(), words.end());
if (equal == words.end())
{
cout << "No word was repeated" << endl;
}
else
{
cout << "Word " << *equal << " was repeated" << endl;
}

Iterator woes with C++ strings

For an assignment, we have to do a large program that manipulates C++ strings in a variety of ways. Most of it is working, but this particular function is messing with me. I am trying to cycle through a string and remove all non-alphanumeric (tab, blank newline) characters before the first alphanumeric one occurs, and then end the string when the first non-alphanumeric character appears again. For example, " bob jon" would be saved as "bob". Something is going wrong where every string is considered empty. Most peers have been telling be that
*(point++) = *marker;
can't be done and that I should change this before trying anything else...is this a way to increment the iterator while assigning its value to another iterator's value? Is it that the problem or something else?
void clean_entry( const string& j, string& k )
{
string::iterator point = k.begin();
bool checker = false;
//cycle through the constant string and check for numbers and letters
for(string::const_iterator marker = j.cbegin(); marker!=j.cend(); ++marker)
{
if( isalnum(*marker) == true )
{
*(point++) = *marker; //copy to the new k string if alphanum, and increment iterator
cout << "I found a real letter!" << endl; //debugging
checker = true;
}
else if( checker == true )
break;
}
cout << "So far we have " << k << endl; //debugging
if (checker == false )
k = "(empty word)";
cout << "The new string is " << k << " apparently." << endl; //debugging
}
isalnum doesn't return bool. It returns int. The guarantee is that it returns nonzero if the character is alphanumeric and zero otherwise.This means that you can't compare the return value to true, as that comparison causes true to be converted to int, yielding 1, before the comparison is done.. if(isalnum(*marker)) is both idiomatic and actually works.Similarly, if( checker == true ) is bloated and should be if(checker), and if (checker == false ) should be if(!checker).
Your interface is questionable, since the caller must ensure that k's size is large enough to accommodate the resulting string. Better to clear k and then use push_back() or similar rather than an iterator.
On the assumption that k.size() is sufficiently large, there's nothing wrong with *(point++) = *marker;.

How to check if a string contains spaces/tabs/new lines (anything that's blank)?

I know there's an "isspace" function that checks for spaces, but that would require me to iterate through every character in the string, which can be bad on performance since this would be called a lot. Is there a fast way to check if a std::string contains only spaces?
ex:
function(" ") // returns true
function(" 4 ") // returns false
One solution I've thought of is to use regex, then i'll know that it only contains whitespace if it's false... but i'm not sure if this would be more efficient than the isspace function.
regex: [\w\W] //checks for any word character(a,b,c..) and non-word character([,],..)
thanks in advance!
With a regular string, the best you can do will be of the form:
return string::find_first_not_of("\t\n ") == string::npos;
This will be O(n) in the worst case, but without knowing else about the string, this will be the best you can do.
Any method would, of necessity, need to look at each character of the string. A loop that calls isspace() on each character is pretty efficient. If isspace() is inlined by the compiler, then this would be darn near optimal.
The loop should, of course, abort as soon as a non-space character is seen.
You are making the assumption regex doesnt iterate over the string. Regex is probably much heavier than a linear search since it might build a FSM and traverse based on that.
The only way you could speed it up further and make it a near-constant time operation is to amortize the cost by iterating on every update to the string and caching a bool/bit that tracks if there is a space-like character, returning that value if no changes have been made since, and updating that bit whenever you do a write operation to that string. However, this sacrifices/slows that speed of modifying operations in order to increase the speed of your custom has_space().
For what it's worth, a locale has a function (scan_is) to do things like this:
#include <locale>
#include <iostream>
#include <iomanip>
int main() {
std::string inputs[] = {
"all lower",
"including a space"
};
std::locale loc(std::locale::classic());
std::ctype_base::mask m = std::ctype_base::space;
for (int i=0; i<2; i++) {
char const *pos;
char const *b = &*inputs[i].begin();
char const *e = &*inputs[i].end();
std::cout << "Input: " << std::setw(20) << inputs[i] << ":\t";
if ((pos=std::use_facet<std::ctype<char> >(loc).scan_is(m, b, e)) == e)
std::cout << "No space character\n";
else
std::cout << "First space character at position " << pos - b << "\n";
}
return 0;
}
It's probably open to (a lot of) question whether this gives much (if any) real advantage over using isspace in a loop (or using std::find_if).
You can also use find_first_not_of if you all the characters to be in a given list.
Then you can avoid loops.
Here is an example
#include <string>
#include <algorithm>
using namespace std;
int main()
{
string str1=" ";
string str2=" u ";
bool ContainsNotBlank1=(str1.find_first_not_of("\t\n ")==string::npos);
bool ContainsNotBlank2=(str2.find_first_not_of("\t\n ")==string::npos);
bool ContainsNotBlank3=(str2.find_first_not_of("\t\n u")==string::npos);
cout << ContainsNotBlank1 <<endl;
cout << ContainsNotBlank2 <<endl;
cout << ContainsNotBlank3 <<endl;
return 0;
}
Output:
1: because only blanks and a tab
0: because u is not into the list "\t\n "
1: because str2 contains blanks, tabs and a u.
Hope it helps
Tell me if you have any questions

Loop Design: Counting & Subsequent Code Duplication

Exercise 3-3 in Accelerated C++ has led me to two broader questions about loop design. The exercise's challenge is to read an arbitrary number of words into a vector, then output the number of times a given word appears in that input. I've included my relevant code below:
string currentWord = words[0];
words_sz currentWordCount = 1;
// invariant: we have counted i of the current words in the vector
for (words_sz i = 1; i < size; ++i) {
if (currentWord != words[i]) {
cout << currentWord << ": " << currentWordCount << endl;
currentWord = words[i];
currentWordCount = 0;
}
++currentWordCount;
}
cout << currentWord << ": " << currentWordCount << endl;
Note that the output code has to occur again outside the loop to deal with the last word. I realize I could move it to a function and simply call the function twice if I was worried about the complexity of duplicated code.
Question 1: Is this sort of workaround is common? Is there a typical way to refactor the loop to avoid such duplication?
Question 2: While my solution is straightforward, I'm used to counting from zero. Is there a more-acceptable way to write the loop respecting that? Or is this the optimal implementation?
Why can't you use a map http://www.cplusplus.com/reference/stl/map/ with word as key and value as the count?