Code review, C++, Anagram method

Code review, C++, Anagram method - c++

I'm doing some practice questions from the book "Cracking the coding interview" and wanted to get some people to review my code for bugs and optimizations. Any feedback would be greatly appreciated.
Question: Write a method to decide if two strings are anagrams or not.
/*
Time complexity: O(n^2)
Space complexity: O(n)
*/
bool IsAnagram(std::string str1, std::string str2)
{
if(str1.length() != str2.length())
return false;
for(int i = 0; i < str1.length();i++)
{
bool found = false;
int j = 0;
while(!found && j < str2.length())
{
if(str1[i] == str2[j])
{
found = true;
str2[j] = NULL;
}
j++;
}
if(!found)
return false;
}
return true;
}

This is more efficient generally
#include <iostream>
#include <string>
#include <algorithm>
bool IsAnagram(std::string& str1, std::string& str2)
{
if(str1.length() != str2.length())
return false;
std::sort(str1.begin(), str1.end());
std::sort(str2.begin(), str2.end());
return str1.compare(str2) == 0;
}
int main(int argc, char* argv[])
{
std::string an1("army");
std::string an2("mary");
if(IsAnagram(an1, an2))
std::cout << "Hooray!\n";
return 0;
}
For those who dislike the mutating strings then maybe this is a better option. Could either remove reference to parameters 1 and 2 or make a copy inside function as here. This way, parameters can be const.
bool IsAnagram2(const std::string& str1, const std::string& str2)
{
if(str1.length() != str2.length())
return false;
std::string cpy1(str1), cpy2(str2);
std::sort(cpy1.begin(), cpy1.end());
std::sort(cpy2.begin(), cpy2.end());
return cpy1.compare(cpy2) == 0;
}

O(n) algorithm. Instead of sorting (which is O(n lg n)), count up the character occurrences in s1 and compare it to the character occurrences in s2.
#include <string>
#include <iostream>
#include <limits>
bool IsAnagram(const std::string& s1, const std::string& s2)
{
if (s1.size() != s2.size()) {
return false;
}
int count[std::numeric_limits<char>::max() + (std::size_t)1] = {};
for (auto c : s1) {
count[c]++;
}
for (auto c : s2) {
if (!count[c]) {
return false;
}
count[c]--;
}
return true;
}
int main(int argc, char **argv)
{
std::cout << IsAnagram(argv[1], argv[2]) << std::endl;
return 0;
}

There is already standard algorithm std::is_permutation that allows to perform the task simply
#include <iostream>
#include <iomanip>
#include <string>
#include <algorithm>
int main()
{
std::string s( "aab" );
std::string t( "aba" );
std::cout << std::boolalpha
<< ( s.size() == t.size() &&
std::is_permutation( s.begin(), s.end(), t.begin() ) )
<< std::endl;
return 0;
}
The output is
true
So all ypu need is to see how the algorithm is realized.:)
If you want a separate function then it will look like
bool IsAnagram( const std::string &s1, const std::string &s2 )
{
return s1.size() == s2.size() &&
std::is_permutation( s1.begin(), s1.end(), s2.begin() );
}
To use std::sort is not a good approach because original strings will be changed or you have to pass them to the function by value.

Related

Performance warning for isspace function, conversion from int to bool

#include <iostream>
#include <algorithm>
#include <vector>
#include <string>
#include <iterator>
using namespace std;
bool notSpace(char c) {
return !isspace(c);
}
bool isSpace(char c) {
return isspace(c);
}
vector<string> split(const string& s) {
vector<string> words;
string::const_iterator i = s.begin();
while (i != s.end()) {
i = find_if(i, s.end(), notSpace); // " "
if (i != s.end()) {
string::const_iterator j = i;
j = find_if(i, s.end(), isSpace);
words.push_back(string(i, j));
i = j;
}
}
return words;
}
int main() {
string test = "Hello world, I'm a simple guy";
vector<string> words = split(test);
for (vector<string>::size_type i = 0; i < words.size();i++) {
cout << words[i] << endl;
}
return 0;
}
When I compile the code I get this warning:
warning C4800: 'int': forcing value to bool 'true' or 'false'
(performance warning)
on the return of this function:
bool isSpace(char c) {
return isspace(c);
}
Is good habit changing isspace(c) to (isspace(c) != 0) ? Or is it just an unnecessary fussiness?

Take a look at the code below:
#include <iostream>
using namespace std;
bool f()
{
return 2;
}
int main()
{
cout <<f()<<endl;
return 0;
}
it will print 1 when you return 2, that's why you get the warning.
someone may think a bool is kind of small integer, but it isn't.
If you go back to C, there was no bool type, that's why many C methods (like isspace) returns int, even WINDOWS type of BOOL is actually kind of integer and can return other values but TRUE (1) or FALSE (0).

Count the number of unique words (case does not matter for this count)

Hey so I'm having trouble figuring out the code to count the number of unique words. My thought process in terms of psudeocode was first making a vector so something like vector<string> unique_word_list;Then I would get the program to read each line so I would have something likewhile(getline(fin,line)). The hard part for me is coming up with the code where I check the vector(array) to see if the string is already in there. If it's in there I just increase the word count(simple enough) but if its not in there then I just add a new element to the vector. I would really appreciate if someone could help me out here. I feel like this is not hard but for some reason I can't think of the code for comparing the string with whats inside of the array and determining if its a unique word or not.

Don't use a vector - use a container that maintains uniqueness, like std::set or std::unordered_set. Just convert the string into lower case (using std::tolower) before you add it:
std::set<std::string> words;
std::string next;
while (file >> next) {
std::transform(next.begin(), next.end(), next.begin(), std::tolower);
words.insert(next);
}
std::cout << "We have " << words.size() << " unique words.\n"

Cannot help myself writing an answer that makes use of C++ beautiful library. I'd do it like this, with a std::set:
#include <algorithm>
#include <cctype>
#include <string>
#include <set>
#include <fstream>
#include <iterator>
#include <iostream>
int main()
{
std::ifstream ifile("test.txt");
std::istream_iterator<std::string> it{ifile};
std::set<std::string> uniques;
std::transform(it, {}, std::inserter(uniques, uniques.begin()),
[](std::string str) // make it lower case, so case doesn't matter anymore
{
std::transform(str.begin(), str.end(), str.begin(), ::tolower);
return str;
});
// display the unique elements
for(auto&& elem: uniques)
std::cout << elem << " ";
// display the size:
std::cout << std::endl << uniques.size();
}
You can also define a new string type in which you change the char_traits so the comparison becomes case-insensitive. This is the code you'd need (much more lengthy than before, but you may end up reusing it), the char_traits overload is copy/pasted from cppreference.com:
#include <algorithm>
#include <cctype>
#include <string>
#include <set>
#include <fstream>
#include <iterator>
#include <iostream>
struct ci_char_traits : public std::char_traits<char> {
static bool eq(char c1, char c2) { return toupper(c1) == toupper(c2); }
static bool ne(char c1, char c2) { return toupper(c1) != toupper(c2); }
static bool lt(char c1, char c2) { return toupper(c1) < toupper(c2); }
static int compare(const char* s1, const char* s2, size_t n) {
while ( n-- != 0 ) {
if ( toupper(*s1) < toupper(*s2) ) return -1;
if ( toupper(*s1) > toupper(*s2) ) return 1;
++s1; ++s2;
}
return 0;
}
static const char* find(const char* s, int n, char a) {
while ( n-- > 0 && toupper(*s) != toupper(a) ) {
++s;
}
return s;
}
};
using ci_string = std::basic_string<char, ci_char_traits>;
// need to overwrite the insertion and extraction operators,
// otherwise cannot use them with our new type
std::ostream& operator<<(std::ostream& os, const ci_string& str) {
return os.write(str.data(), str.size());
}
std::istream& operator>>(std::istream& os, ci_string& str) {
std::string tmp;
os >> tmp;
str.assign(tmp.data(), tmp.size());
return os;
}
int main()
{
std::ifstream ifile("test.txt");
std::istream_iterator<ci_string> it{ifile};
std::set<ci_string> uniques(it, {}); // that's it
// display the unique elements
for (auto && elem : uniques)
std::cout << elem << " ";
// display the size:
std::cout << std::endl << uniques.size();
}

How to check if string ends with .txt

I am learning basic C++, and right now I have gotten a string from a user and I want to check if they typed the entire file name (including .txt) or not. I have the string, but how can I check if the string ends with ".txt" ?
string fileName;
cout << "Enter filename: \n";
cin >> fileName;
string txt = fileName.Right(4);
The Right(int) method only works with CString, so the above code does not work. I want to use a regular string, if possible. Any ideas?

Unfortunately this useful function is not in the standard library. It is easy to write.
bool has_suffix(const std::string &str, const std::string &suffix)
{
return str.size() >= suffix.size() &&
str.compare(str.size() - suffix.size(), suffix.size(), suffix) == 0;
}

Using boost ends_with predicate:
#include <boost/algorithm/string/predicate.hpp>
if (boost::ends_with(fileName, ".txt")) { /* ... */ }

You've gotten quite a few answers already, but I decided to add yet another:
bool ends_with(std::string const &a, std::string const &b) {
auto len = b.length();
auto pos = a.length() - len;
if (pos < 0)
return false;
auto pos_a = &a[pos];
auto pos_b = &b[0];
while (*pos_a)
if (*pos_a++ != *pos_b++)
return false;
return true;
}
Since you have gotten quite a few answers, perhaps a quick test and summary of results would be worthwhile:
#include <iostream>
#include <string>
#include <vector>
#include <time.h>
#include <iomanip>
bool ends_with(std::string const &a, std::string const &b) {
auto len = b.length();
auto pos = a.length() - len;
if (pos < 0)
return false;
auto pos_a = &a[pos];
auto pos_b = &b[0];
while (*pos_a)
if (*pos_a++ != *pos_b++)
return false;
return true;
}
bool ends_with_string(std::string const& str, std::string const& what) {
return what.size() <= str.size()
&& str.find(what, str.size() - what.size()) != str.npos;
}
bool has_suffix(const std::string &str, const std::string &suffix)
{
return str.size() >= suffix.size() &&
str.compare(str.size() - suffix.size(), suffix.size(), suffix) == 0;
}
bool has_suffix2(const std::string &str, const std::string &suffix)
{
bool index = str.find(suffix, str.size() - suffix.size());
return (index != -1);
}
bool isEndsWith(const std::string& pstr, const std::string& substr)
{
int tlen = pstr.length();
int slen = substr.length();
if (slen > tlen)
return false;
const char* tdta = pstr.c_str();
const char* sdta = substr.c_str();
while (slen)
{
if (tdta[tlen] != sdta[slen])
return false;
--slen; --tlen;
}
return true;
}
bool ends_with_6502(const std::string& str, const std::string& end) {
size_t slen = str.size(), elen = end.size();
if (slen <= elen) return false;
while (elen) {
if (str[--slen] != end[--elen]) return false;
}
return true;
}
bool ends_with_rajenpandit(std::string const &file, std::string const &suffix) {
int pos = file.find(suffix);
return (pos != std::string::npos);
}
template <class F>
bool test(std::string const &label, F f) {
static const std::vector<std::pair<std::string, bool>> tests{
{ "this is some text", false },
{ "name.txt.other", false },
{ "name.txt", true }
};
bool result = true;
std::cout << "Testing: " << std::left << std::setw(20) << label;
for (auto const &s : tests)
result &= (f(s.first, ".txt") == s.second);
if (!result) {
std::cout << "Failed\n";
return false;
}
clock_t start = clock();
for (int i = 0; i < 10000000; i++)
for (auto const &s : tests)
result &= (f(s.first, ".txt") == s.second);
clock_t stop = clock();
std::cout << double(stop - start) / CLOCKS_PER_SEC << " Seconds\n";
return result;
}
int main() {
test("Jerry Coffin", ends_with);
test("Dietrich Epp", has_suffix);
test("Dietmar", ends_with_string);
test("Roman", isEndsWith);
test("6502", ends_with_6502);
test("rajenpandit", ends_with_rajenpandit);
}
Results with gcc:
Testing: Jerry Coffin 3.416 Seconds
Testing: Dietrich Epp 3.461 Seconds
Testing: Dietmar 3.695 Seconds
Testing: Roman 3.333 Seconds
Testing: 6502 3.304 Seconds
Testing: rajenpandit Failed
Results with VC++:
Testing: Jerry Coffin 0.718 Seconds
Testing: Dietrich Epp 0.982 Seconds
Testing: Dietmar 1.087 Seconds
Testing: Roman 0.883 Seconds
Testing: 6502 0.927 Seconds
Testing: rajenpandit Failed
Yes, those were run on identical hardware, and yes I ran them a number of times, and tried different optimization options with g++ to see if I could get it to at least come sort of close to matching VC++. I couldn't. I don't have an immediate explanation of why g++ produces so much worse code for this test, but I'm fairly confident that it does.

Use std::string::substr
if (filename.substr(std::max(4, filename.size())-4) == std::string(".txt")) {
// Your code here
}

you can just use another string to verify the extension like this :
string fileName;
cout << "Enter filename: \n";
cin >> fileName;
//string txt = fileName.Right(4);
string ext="";
for(int i = fileName.length()-1;i>fileName.length()-5;i--)
{
ext += fileName[i];
}
cout<<ext;
if(ext != "txt.")
cout<<"error\n";
checking if equals "txt." cause i starts with the length of the filename so ext is filled in the opposite way

bool has_suffix(const std::string &str, const std::string &suffix)
{
std::size_t index = str.find(suffix, str.size() - suffix.size());
return (index != std::string::npos);
}

The easiest approach is probably to verify that the string is long enough to hold ".txt" at all and to see if the string can be found at the position size() - 4, e.g.:
bool ends_with_string(std::string const& str, std::string const& what) {
return what.size() <= str.size()
&& str.find(what, str.size() - what.size()) != str.npos;
}

This is something that, unfortunately enough, is not present in the standard library and it's also somewhat annoying to write. This is my attempt:
bool ends_with(const std::string& str, const std::string& end) {
size_t slen = str.size(), elen = end.size();
if (slen < elen) return false;
while (elen) {
if (str[--slen] != end[--elen]) return false;
}
return true;
}

2 options I can think of beside mentioned ones:
1) regex - prob overkill for this, but simple regexes are nice and readable IMHO
2) rbegin - kind of nice, could be I am missing something, but here it is:
bool ends_with(const string& s, const string& ending)
{
return (s.size()>=ending.size()) && equal(ending.rbegin(), ending.rend(), s.rbegin());
}
http://coliru.stacked-crooked.com/a/4de3eafed3bff6e3

This should do it.
bool ends_with(const std::string & s, const std::string & suffix) {
return s.rfind(suffix) == s.length() - suffix.length();
}
Verify here

Since C++17 you can utilize the path class from the filesystem library.
#include <filesystem>
bool ends_with_txt(const std::string& fileName) {
return std::filesystem::path{fileName}.extension() == ".txt";
}

Here is the "fully self-written" solution:
bool isEndsWith(const std::string& pstr, const std::string& substr) const
{
int tlen = pstr.length();
int slen = substr.length();
if(slen > tlen)
return false;
const char* tdta = pstr.c_str();
const char* sdta = substr.c_str();
while(slen)
{
if(tdta[tlen] != sdta[slen])
return false;
--slen; --tlen;
}
return true;
}

C++: case-insensitive first-n-characters string comparison

My question is similar to this, but I have two strings (as char *) and the task is to replace strnicmp function (avaible only for MS VC) with something like boost::iequals.
Note strnicmp is not stricmp - it only compares first n characters.
Is there any solution simplier than this:
void foo(const char *s1, const char *s2)
{
...
std::string str1 = s1;
std::string str2 = s2;
int n = 7;
if (boost::iequals(str1.substr(0, n), str2)) {
...
}
}

If it's really necessary, write your own function:
bool mystrnicmp(char const* s1, char const* s2, int n){
for(int i=0; i < n; ++i){
unsigned char c1 = static_cast<unsigned char>(s1[i]);
unsigned char c2 = static_cast<unsigned char>(s2[i]);
if(tolower(c1) != tolower(c2))
return false;
if(c1 == '\0' || c2 == '\0')
break;
}
return true;
}

For case insensitivity, you need a custom comparison function
(or functor):
struct EqIgnoreCase
{
bool operator()( char lhs, char rhs ) const
{
return ::tolower( static_cast<unsigned char>( lhs ) )
== ::tolower( static_cast<unsigned char>( rhs ) );
}
};
If I understand correctly, you're checking for a prefix. The
simplest way to do this is:
bool
isPrefix( std::string const& s1, std::string const& s2 )
{
return s1.size() <= s2.size()
&& std::equals( s1.begin(), s1.end(), s2.begin(), EqIgnoreCase() );
}
(Note the check of the sizes. s1 can't be a prefix of s2 if
it it longer than s2. And of course, std::equals will
encounter undefined behavior if called with s1 longer than
s2.)

For a function defined in terms of C strings (character pointers) going "up" to STL strings seems horribly inefficient, but maybe that's totally premature thinking on my part.
I would consider a straight C solution "simpler", but again that depends on one's perspective.
#include <ctype.h>
void foo(const char *s1, const char *s2)
{
size_t i, n = 7;
for(i = 0; i < n; i++)
{
if(tolower(s1[i]) != tolower(s2[i]))
return;
if(s[i] == '\0' && s2[i] == '\0')
break;
}
/* Strings are equal, do the work. */
...
}
This assumes that if both strings end before the length of the prefix has been exhausted, it's a match.
Of course the above assumes ASCII strings where tolower() makes sense.

I suggest to write the function yourselfs, like this:
bool strnicmp2(const char *s, const char *t, size_t n) {
while (n > 0 && *s && *t && tolower(*s) == tolower(*t)) {
++s;
++t;
--n;
}
return n == 0 || !*s || !*t;
}

something like this ought to work..
#include <iostream>
#include <string>
#include <cctype>
#include <cstring>
#include <algorithm>
struct isequal
{
bool operator()(int l, int r) const
{
return std::tolower(l) == std::tolower(r);
}
};
bool istrncmp(const char* s1, const char* s2, size_t n)
{
size_t ls1 = std::strlen(s1);
size_t ls2 = std::strlen(s2);
// this is strict, but you can change
if (ls1 < n || ls2 < n)
return false;
return std::equal(s1, s1 + n, s2, isequal());
}
int main(void)
{
std::cout << istrncmp("fooB", "fooA", 3) << std::endl;
std::cout << istrncmp("fooB", "fooA", 5) << std::endl;
std::cout << istrncmp("fooB", "f1oA", 3) << std::endl;
return 0;
}

I don't know if this counts as simpler or not, but it has fewer lines and speed should be pretty good.
#include <boost/iterator/transform_iterator.hpp>
#include <algorithm>
#include <cctype>
bool equal_insensitive_n( char const *a, char const *b, size_t n ) {
n = std::min( n, std::min( ::strlen( a ) + 1, ::strlen( b ) + 1 ) );
#define tilc(S) boost::make_transform_iterator( (S), ::tolower )
return std::equals( tilc(a), tilc(a) + n, tilc(b) );
#undef tilc
}

Bit Operation For Finding String Difference

The following string of mine tried to find difference between two strings.
But it's horribly slow as it iterate the length of string:
#include <string>
#include <vector>
#include <iostream>
using namespace std;
int hd(string s1, string s2) {
// hd stands for "Hamming Distance"
int dif = 0;
for (unsigned i = 0; i < s1.size(); i++ ) {
string b1 = s1.substr(i,1);
string b2 = s2.substr(i,1);
if (b1 != b2) {
dif++;
}
}
return dif;
}
int main() {
string string1 = "AAAAA";
string string2 = "ATATT";
string string3 = "AAAAA";
int theHD12 = hd(string1,string2);
cout << theHD12 << endl;
int theHD13 = hd(string1,string3);
cout << theHD13 << endl;
}
Is there a fast alternative to do that?
In Perl we can have the following approach:
sub hd {
return ($_[0] ^ $_[1]) =~ tr/\001-\255//;
}
which is much2 faster than iterating the position.
I wonder what's the equivalent of it in C++?

Try to replace the for loop by:
for (unsigned i = 0; i < s1.size(); i++ ) {
if (b1[i] != b2[i]) {
dif++;
}
}
This should be a lot faster because no new strings are created.

Fun with the STL:
#include <numeric> //inner_product
#include <functional> //plus, equal_to, not2
#include <string>
#include <stdexcept>
unsigned int
hd(const std::string& s1, const std::string& s2)
{
// TODO: What should we do if s1.size() != s2.size()?
if (s1.size() != s2.size()){
throw std::invalid_argument(
"Strings passed to hd() must have the same lenght"
);
}
return std::inner_product(
s1.begin(), s1.end(), s2.begin(),
0, std::plus<unsigned int>(),
std::not2(std::equal_to<std::string::value_type>())
);
}

Use iterators:
int GetHammingDistance(const std::string &a, const std::string &b)
{
// Hamming distance is not defined for strings of different lengths.
ASSERT(a.length() == b.length());
std::string::const_iterator a_it = a.begin();
std::string::const_iterator b_it = b.begin();
std::string::const_iterator a_end = a.end();
std::string::const_iterator b_end = b.end();
int distance = 0;
while (a_it != a_end && b_it != b_end)
{
if (*a_it != *b_it) ++distance;
++a_it; ++b_it;
}
return distance;
}

Choice 1: Modify your original code to be as effecient as possable.
int hd(string const& s1, string const& s2)
{
// hd stands for "Hamming Distance"
int dif = 0;
for (std::string::size_type i = 0; i < s1.size(); i++ )
{
char b1 = s1[i];
char b2 = s2[i];
dif += (b1 != b2)?1:0;
}
return dif;
}
Second option use some of the STL algorithms to do the heavy lifting.
struct HammingFunc
{
inline int operator()(char s1,char s2)
{
return s1 == s2?0:1;
}
};
int hd(string const& s1, string const& s2)
{
int diff = std::inner_product(s1.begin(),s1.end(),
s2.begin(),
0,
std::plus<int>(),HammingFunc()
);
return diff;
}

Some obvious points that might make it faster:
Pass the strings as const references, not by value
Use the indexing operator [] to get characters, not a method call
Compile with optimization on

You use strings.
As explained here
The hunt for the fastest Hamming Distance C implementation
if you can use char* my experiements conclude that for Gcc 4.7.2 on an Intel Xeon X5650 the fastest general purpose hamming distance calculating function for small strings (char arrays) is:
// na = length of both strings
unsigned int HammingDistance(const char* a, unsigned int na, const char* b) {
unsigned int num_mismatches = 0;
while (na) {
if (*a != *b)
++num_mismatches;
--na;
++a;
++b;
}
return num_mismatches;
}
If your problem allows you to set an upper distance limit, so that you don't care for greater distances and this limit is always less than the strings' length, the above example can be furhterly optimized to:
// na = length of both strings, dist must always be < na
unsigned int HammingDistance(const char* const a, const unsigned int na, const char* const b, const unsigned int dist) {
unsigned int i = 0, num_mismatches = 0;
while(i <= dist)
{
if (a[i] != b[i])
++num_mismatches;
++i;
}
while(num_mismatches <= dist && i < na)
{
if (a[i] != b[i])
++num_mismatches;
++i;
}
return num_mismatches;
}
I am not sure if const does anything regarding speed, but i use it anyways...

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Code review, C++, Anagram method - c++

Related

Performance warning for isspace function, conversion from int to bool

Count the number of unique words (case does not matter for this count)

How to check if string ends with .txt

C++: case-insensitive first-n-characters string comparison

Bit Operation For Finding String Difference

Categories

Resources