How to compare long strings in C++? - c++

I know how to compare two strings with "==" or "compare", but if the string is very long, should we use a hash function and then compare with hash code ?
static int n = 100000;
bool TestCompare(const string& a, const string& b) {
return a == b;
}
bool TestCompareHash(const string& a, const string& b) {
std::hash<std::string> hash_fn;
std::size_t str_hash_a = hash_fn(a);
std::size_t str_hash_b = hash_fn(b);
return str_hash_a == str_hash_b;
}
int main()
{
string a(100, 'a');
string b(100, 'c');
std::chrono::time_point<std::chrono::system_clock> now = std::chrono::system_clock::now();
for (int i = 0; i < n; i++) {
TestCompare(a, b);
}
std::chrono::duration<float> difference = std::chrono::system_clock::now() - now;
cout << "difference.count() 1: " << difference.count() << endl;
now = std::chrono::system_clock::now();
for (int i = 0; i < n; i++) {
TestCompareHash(a, b);
}
difference = std::chrono::system_clock::now() - now;
cout << "difference.count() 2: " << difference.count() << endl;
return 0;
}
I tested such a code and found that the hash_test will slow down when the string becomes longer, why ?
when string length is 100
difference.count() 1: 0.00263665
difference.count() 2: 0.00713478 //hash
when string length is 10000
difference.count() 1: 0.00322366
difference.count() 2: 1.99765 //hash
I made some improvements to the test from the comments like "make both strings exact matches except for the last character".
It seems that doing hashing does not save the amount of calculations. It may be possible to do these operations in the database to avoid a single point of problem, but it may not make much sense in comparing strings?

In your case the main issue is that you need to compute those hashes first and that costs more than comparison of strings (which "compares chars until they don't match", O(n) complexity at worst). You didn't provide hash_fn() but it generally must "go over all chars" (O(n) complexity).
Hashes would help if you compute and store them once and then expect to compare the strings many times.
Note the hashes can be used only to compare for equality (e.g. no > or <).

Related

difference between string size() function and strlen in this particular case

I recently did this question
Specification:
Input Format The first line contains the number of test cases, T. Next,
T lines follow each containing a long string S.
Output Format For each long string S, display the number of times SUVO
and SUVOJIT appears in it.
I wrote the following code for this :
#include <bits/stdc++.h>
using namespace std;
int main() {
int t;
cin >> t;
while (t--) {
int suvo = 0;
int suvojit = 0;
string s;
cin >> s;
for (int i = 0; i <= s.size() - 7; i++) {
if (s.substr(i, 7) == "SUVOJIT")
suvojit++;
}
for (int i = 0; i <= s.size() - 4; i++) {
if (s.substr(i, 4) == "SUVO")
suvo++;
}
cout << "SUVO = " << suvo - suvojit << ", SUVOJIT = " << suvojit << "\n";
}
return 0;
}
The code about gave out of bounds exception for substr() function for this test case:
15
RSUVOYDSUVOJITNSUVOUSUVOJITESUVOSUVOSGSUVOKSUVOJIT
SUVOJITWSUVOSUVOJITTSUVOCKSUVOJITNSUVOSUVOJITSUVOJITSUVOSUVOSUVOJITTSUVOJ
SUVOSUVOSUVOJITASUVOJITGCEBISUVOJITKJSUVORSUVOQCGVHRQLFSUVOOHPFNJTNSUVOJITKSSUVO
SUVOJITSUVOJITJGKSUVOJITISUVOJITKJLUSUVOJITUBSUVOX
MMHBSUVOFSUVOFMSUVOJITUMSUVOJITPSVYBYPMCSUVOJIT
OASUVOSUVOJITSUVOSTDYYJSUVOJITSUVOJITSUVO
RLSUVOCPSUVOJITYSUVOSUVOOGSUVOOESUVOJITMSUVO
WVLFFSUVOJITSUVOVSUVORLESUVOJITPSUVOJITSUVO
RSUVOSUVOJITQWSUVOUMASUVOSUVOJITXNNRRUNUSUVOJIT
HYLSSUVOSUVOSUVOJITPOSUVOJIT
DGMUCSSSUVOJITMJSUVOHSUVOCWTGSUVOJIT
OBNSSUVOYSUVOSUVOJITSUVOJITRHFDSUVODSUVOJITEGSUVOSUVOSUVOJITSUVOSUVOJITSSUVOSUVOSUVOSSUVOJIT
AG
NSUVOJITSUVOSUVOJIT
CGJGDSUVOEASUVOJITSGSUVO
However, when instead of using the s.size() function, I converted the string into a char constant and took the length of it using strlen, then the code caused no error and everything went smoothly.
So, my question is... Why did this happen?
This is my working code with the change:
#include <bits/stdc++.h>
using namespace std;
int main() {
int t;
cin >> t;
while (t--) {
int suvo = 0;
int suvojit = 0;
string s;
cin >> s;
int le = strlen(&s[0]);
for (int i = 0; i <= le - 7; i++) {
if (s.substr(i, 7) == "SUVOJIT")
suvojit++;
}
for (int i = 0; i <= le - 4; i++) {
if (s.substr(i, 4) == "SUVO")
suvo++;
}
cout << "SUVO = " << suvo - suvojit << ", SUVOJIT = " << suvojit << "\n";
}
return 0;
}
In one case, you use size_t, in the other case you use int.
If the length is for example 6 characters, then s.size () - 7 is not -1, but one huge number and everything goes wrong. But if you write int len = strlen (...), then len - 7 is indeed -1 and everything is fine.
When I see a number subtracted from size_t, that's an immediate red flag. Write "i + 7 ≤ s.size()", not "i ≤ s.size() - 7".
First of all, in my testing your second leads to a problem as well:
Second, especially with older compilers (well, libraries, really) this can be horrendously inefficient, creating a huge number of temporary strings that you only use to compare with another string1.
So, let's consider how the job should be done instead. std::string has a member named find for situations like this. It returns the position of one string inside another, or std::string::npos if there is none. It allows you to specify a starting position at which to begin searching, when you don't want to start from the beginning.
We also, of course, have two instances of essentially identical code, once to search for SUVO, the other to search for SUVOJIT. The code would be much better off with the search code moved into a function, so we only have the search code in one place.
int count_pos(std::string const &haystack, std::string const &needle) {
size_t pos = 0;
int ret = 0;
while ((pos = haystack.find(needle, pos)) != std::string::npos) {
++ret;
++pos;
}
return ret;
}
Note that this also eliminates quite a bit more messy "stuff" like having to compute the maximum possible position at which at match could take place.
1. Why does compiler/library age matter? Older libraries often used a COW string that dynamically allocated storage for every string. More recent ones typically include what's called a "short string optimization", where storage for a short string is allocated inside the string object itself, avoiding the dynamic allocation.

String having maximum number of given substrings made after swapping some characters?

So, this is an interview question that I was going through.
I have strings a, b, and c. I want to obtain string k by swapping some letters in a, so that k should contain as many non-overlapping substrings equal either to b or c as possible. Substring of string x is a string formed by consecutive segment of characters from x. Two substrings of string x overlap if there is position i in string x occupied by both of them.
Input: The first line contains string a, the second line contains string b, and the third line contains string c (1 ≤ |a|, |b|, |c| ≤ 10^5, where |s| denotes the length of string s).
All three strings consist only of lowercase English letters.
It is possible that b and c coincide.
Output: Find one of possible strings k.
Example:
I/P
abbbaaccca
ab
aca
O/P
ababacabcc
this optimal solutions has three non-overlaping substrings equal to either b or c on positions 1 – 2 (ab), 3 – 4 (ab), 5 – 7 (aca).
Now, the approach that I could think of was to make a character count array for each of the strings, and then proceed ahead. Basically, iterate over the original string (a), check for occurences of b and c. If not there, swap as many characters as possible to make either b or c (whichever is shorter). But, clearly this is not the optimal approach.
Can anyone suggest something better? (Only pseudocode will be enough)
Thanks!
First thing is you'll need to do is count the number of occurrences of each character of each string. The occurrences count of a will be your knapsack, whom you'll need to fill with as many b's or c's.
Note that when I say knapsack I mean the character count vector of a, and inserting b to a will mean reducing the character count vector of a by the character count vector of b.
I'm a little bit short with my mathematical prove, but you'll need to
insert as many b as possible to the knapsack
Insert as many c as possible to the knapsack (in the space that left after 1).
If a removal of a b from the knapsack will enable an insertion of more c, remove b from the knapsack. Otherwise, finish.
Fill as many c that you can to the knapsack
Repeat 3-4.
Throughout the program count the number of b and c in the knapsack and the output should be:
[b_count times b][c_count times c][char_occurrence_left_in_knapsack_for_char_x times char_x for each char_x in lower_case_english]
This should solve your problem at O(n).
Assuming that allowed characters have ASCII code 0-127, I would write a function to count the occurence of each character in a string:
int[] count(String s) {
int[] res = new int[128];
for(int i=0; i<res.length(); i++)
res[i] = 0;
for(int i=0; i<a.length(); i++)
res[i]++;
return res;
}
We can now count occurrences in each string:
int aCount = count(a);
int bCount = count(b);
int cCount = count(c);
We can then write a function to count how many times a string can be carved out of characters of another string:
int carveCount(int[] strCount, int[] subStrCount) {
int min = Integer.MAX_VALUE;
for(int i=0; i<subStrCount.length(); i++) {
if (subStrCount[i] == 0)
continue;
if (strCount[i] >= subStrCount[i])
min = Math.min(min, strCount[i]-subStrCount[i]);
else {
return 0;
}
}
for(int i=0; i<subStrCount.length(); i++) {
if (subStrCount[i] != 0)
strStrCount[i] -= min;
}
return min;
}
and call the function:
int bFitCount = carve(aCount, bCount);
int cFitCount = carve(aCount, cCount);
EDIT: I didn't realize you wanted all characters originally in a, fixing here.
Finally, to produce the output:
StringBuilder sb = new StringBuilder();
for(int i=0; i<bFitCount; i++) {
sb.append(b);
for(int i=0; i<cFitCount; i++) {
sb.append(c);
for(int i=0; i<aCount.length; i++) {
for(int j=0; j<aCount[i]; j++)
sb.append((char)i);
}
return sb.toString();
One more comment: if the goal is to maximize the number of repetitions(b)+repetitions(c), then you may want to first swab b and c if c is shorter. This way if they share some characters you have better chance of increasing the result.
The algorithm could be optimized further, but as it is it should have complexity O(n), where n is the sum of the length of the three strings.
A related problem is called Knapsack problem.
This is basically the solution described by #Tal Shalti.
I tried to keep everything readable.
My program return abbcabacac as one of the string with the most occurences (3).
To get all permutations without repeating a permutation I use std::next_permutation from algorithm. There not much happening in the main function. I only store the number of occurrences and the permutation, if a higher number of occurrences was achieved.
int main()
{
std::string word = "abbbaaccca";
std::string patternSmall = "ab";
std::string patternLarge = "aca";
unsigned int bestOccurrence = 0;
std::string bestPermutation = "";
do {
// count and remove occurrence
unsigned int occurrences = FindOccurences(word, patternLarge, patternSmall);
if (occurrences > bestOccurrence) {
bestOccurrence = occurrences;
bestPermutation = word;
std::cout << word << " .. " << occurences << std::endl;
}
} while (std::next_permutation(word.begin(), word.end()));
std::cout << "Best Permutation " << bestPermutation << " with " << bestOccurrence << " occurrences." << std::endl;
return 0;
}
This function handles the basic algorithm. pattern1 is the longer pattern, so it will be searched for last. If a pattern is found, it will be replaced with the string "##", since this should be very rare in the English language.
The variable occurrenceCounter keeps track of the number of found occurences.
unsigned int FindOccurrences(const std::string& word, const std::string& pattern1, const std::string& pattern2)
{
unsigned int occurrenceCounter = 0;
std::string tmpWord(word);
// '-1' makes implementation of while() easier
std::string::size_type i = -1;
i = -1;
while (FindPattern(tmpWord, pattern2, ++i)) {
occurrenceCounter++;
tmpWord.replace(tmpWord.begin() + i, tmpWord.begin() + i + pattern2.size(), "##");
}
i = -1;
while (FindPattern(tmpWord, pattern1, ++i)) {
occurrenceCounter++;
tmpWord.replace(tmpWord.begin() + i, tmpWord.begin() + i + pattern1.size(), "##");
}
return occurrenceCounter;
}
This function returns the first position of the found pattern. If the pattern is not found, std::string::npos is returned by string.find(...). Also string.find(...) starts to search for the pattern starting by index i.
bool FindPattern(const std::string& word, const std::string& pattern, std::string::size_type& i)
{
std::string::size_type foundPosition = word.find(pattern, i);
if (foundPosition == std::string::npos) {
return false;
}
i = foundPosition;
return true;
}

Determining most freq char element in a vector<char>?

I am trying to determine the most frequent character in a vector that has chars as its elements.
I am thinking of doing this:
looping through the vector and creating a map, where a key would be a unique char found in the vector. The corresponding value would be the integer count of the frequency of that char.
After I have gone through all elements in the vector, the map will
contain all character frequencies. Thus I will then have to find
which key had the highest value and therefore determine the most
frequent character in the vector.
This seems quite convoluted though, thus I was wondering if someone could suggest if this method would be considered 'acceptable' in terms of performance/good coding
Can this be done in a better way?
If you are only using regular ascii characters, you can make the solution a bit faster - instead of using a map, use an array of size 256 and count the occurrences of the character with a given code 'x' in the array cell count[x]. This will remove an logarithm(256) from your solution and thus will make it a bit faster. I do not think much more can be done with respect to optimization of this algorithm.
Sorting a vector of chars and then iterating through that looking for the maximum run lengths seems to be 5 times faster than using the map approach (using the fairly unscientific test code below acting on 16M chars). On the surface both functions should perform close to each other because they execute with O(N log N). However the sorting method probably benefits from branch prediction and move semantics of the in-place vector sort.
The resultant output is:
Most freq char is '\334', appears 66288 times.
usingSort() took 938 milliseconds
Most freq char is '\334', appears 66288 times.
usingMap() took 5124 milliseconds
And the code is:
#include <iostream>
#include <map>
#include <vector>
#include <chrono>
void usingMap(std::vector<char> v)
{
std::map<char, int> m;
for ( auto c : v )
{
auto it= m.find(c);
if( it != m.end() )
m[c]++;
else
m[c] = 1;
}
char mostFreq;
int count = 0;
for ( auto mi : m )
if ( mi.second > count )
{
mostFreq = mi.first;
count = mi.second;
}
std::cout << "Most freq char is '" << mostFreq << "', appears " << count << " times.\n";
}
void usingSort(std::vector<char> v)
{
std::sort( v.begin(), v.end() );
char currentChar = v[0];
char mostChar = v[0];
int currentCount = 0;
int mostCount = 0;
for ( auto c : v )
{
if ( c == currentChar )
currentCount++;
else
{
if ( currentCount > mostCount)
{
mostChar = currentChar;
mostCount = currentCount;
}
currentChar = c;
currentCount = 1;
}
}
std::cout << "Most freq char is '" << mostChar << "', appears " << mostCount << " times.\n";
}
int main(int argc, const char * argv[])
{
size_t size = 1024*1024*16;
std::vector<char> v(size);
for ( int i = 0; i < size; i++)
{
v[i] = random() % 256;
}
auto t1 = std::chrono::high_resolution_clock::now();
usingSort(v);
auto t2 = std::chrono::high_resolution_clock::now();
std::cout
<< "usingSort() took "
<< std::chrono::duration_cast<std::chrono::milliseconds>(t2-t1).count()
<< " milliseconds\n";
auto t3 = std::chrono::high_resolution_clock::now();
usingMap(v);
auto t4 = std::chrono::high_resolution_clock::now();
std::cout
<< "usingMap() took "
<< std::chrono::duration_cast<std::chrono::milliseconds>(t4-t3).count()
<< " milliseconds\n";
return 0;
}

How many palindromes can be formed by selections of characters from a string?

I'm posting this on behalf of a friend since I believe this is pretty interesting:
Take the string "abb". By leaving out
any number of letters less than the
length of the string we end up with 7
strings.
a b b ab ab bb abb
Out of these 4 are palindromes.
Similarly for the string
"hihellolookhavealookatthispalindromexxqwertyuiopasdfghjklzxcvbnmmnbvcxzlkjhgfdsapoiuytrewqxxsoundsfamiliardoesit"
(a length 112 string) 2^112 - 1
strings can be formed.
Out of these how many are
palindromes??
Below there is his implementation (in C++, C is fine too though). It's pretty slow with very long words; he wants to know what's the fastest algorithm possible for this (and I'm curious too :D).
#include <iostream>
#include <cstring>
using namespace std;
void find_palindrome(const char* str, const char* max, long& count)
{
for(const char* begin = str; begin < max; begin++) {
count++;
const char* end = strchr(begin + 1, *begin);
while(end != NULL) {
count++;
find_palindrome(begin + 1, end, count);
end = strchr(end + 1, *begin);
}
}
}
int main(int argc, char *argv[])
{
const char* s = "hihellolookhavealookatthis";
long count = 0;
find_palindrome(s, strlen(s) + s, count);
cout << count << endl;
}
First of all, your friend's solution seems to have a bug since strchr can search past max. Even if you fix this, the solution is exponential in time.
For a faster solution, you can use dynamic programming to solve this in O(n^3) time. This will require O(n^2) additional memory. Note that for long strings, even 64-bit ints as I have used here will not be enough to hold the solution.
#define MAX_SIZE 1000
long long numFound[MAX_SIZE][MAX_SIZE]; //intermediate results, indexed by [startPosition][endPosition]
long long countPalindromes(const char *str) {
int len = strlen(str);
for (int startPos=0; startPos<=len; startPos++)
for (int endPos=0; endPos<=len; endPos++)
numFound[startPos][endPos] = 0;
for (int spanSize=1; spanSize<=len; spanSize++) {
for (int startPos=0; startPos<=len-spanSize; startPos++) {
int endPos = startPos + spanSize;
long long count = numFound[startPos+1][endPos]; //if str[startPos] is not in the palindrome, this will be the count
char ch = str[startPos];
//if str[startPos] is in the palindrome, choose a matching character for the palindrome end
for (int searchPos=startPos; searchPos<endPos; searchPos++) {
if (str[searchPos] == ch)
count += 1 + numFound[startPos+1][searchPos];
}
numFound[startPos][endPos] = count;
}
}
return numFound[0][len];
}
Explanation:
The array numFound[startPos][endPos] will hold the number of palindromes contained in the substring with indexes startPos to endPos.
We go over all pairs of indexes (startPos, endPos), starting from short spans and moving to longer ones. For each such pair, there are two options:
The character at str[startPos] is not in the palindrome. In that case, there are numFound[startPos+1][endPos] possible palindromes - a number that we have calculated already.
character at str[startPos] is in the palindrome (at its beginning). We scan through the string to find a matching character to put at the end of the palindrome. For each such character, we use the already-calculated results in numFound to find number of possibilities for the inner palindrome.
EDIT:
Clarification: when I say "number of palindromes contained in a string", this includes non-contiguous substrings. For example, the palindrome "aba" is contained in "abca".
It's possible to reduce memory usage to O(n) by taking advantage of the fact that calculation of numFound[startPos][x] only requires knowledge of numFound[startPos+1][y] for all y. I won't do this here since it complicates the code a bit.
Pregenerating lists of indices containing each letter can make the inner loop faster, but it will still be O(n^3) overall.
I have a way can do it in O(N^2) time and O(1) space, however I think there must be other better ways.
the basic idea was the long palindrome must contain small palindromes, so we only search for the minimal match, which means two kinds of situation: "aa", "aba". If we found either , then expand to see if it's a part of a long palindrome.
int count_palindromic_slices(const string &S) {
int count = 0;
for (int position=0; position<S.length(); position++) {
int offset = 0;
// Check the "aa" situation
while((position-offset>=0) && (position+offset+1)<S.length() && (S.at(position-offset))==(S.at(position+offset+1))) {
count ++;
offset ++;
}
offset = 1; // reset it for the odd length checking
// Check the string for "aba" situation
while((position-offset>=0) && position+offset<S.length() && (S.at(position-offset))==(S.at(position+offset))) {
count ++;
offset ++;
}
}
return count;
}
June 14th, 2012
After some investigation, I believe this is the best way to do it.
faster than the accepted answer.
Is there any mileage in making an initial traversal and building an index of all occurances of each character.
h = { 0, 2, 27}
i = { 1, 30 }
etc.
Now working from the left, h, only possible palidromes are at 3 and 17, does char[0 + 1] == char [3 -1] etc. got a palindrome. does char [0+1] == char [27 -1] no, No further analysis of char[0] needed.
Move on to char[1], only need to example char[30 -1] and inwards.
Then can probably get smart, when you've identified a palindrome running from position x->y, all inner subsets are known palindromes, hence we've dealt with some items, can eliminate those cases from later examination.
My solution using O(n) memory and O(n^2) time, where n is the string length:
palindrome.c:
#include <stdio.h>
#include <string.h>
typedef unsigned long long ull;
ull countPalindromesHelper (const char* str, const size_t len, const size_t begin, const size_t end, const ull count) {
if (begin <= 0 || end >= len) {
return count;
}
const char pred = str [begin - 1];
const char succ = str [end];
if (pred == succ) {
const ull newCount = count == 0 ? 1 : count * 2;
return countPalindromesHelper (str, len, begin - 1, end + 1, newCount);
}
return count;
}
ull countPalindromes (const char* str) {
ull count = 0;
size_t len = strlen (str);
size_t i;
for (i = 0; i < len; ++i) {
count += countPalindromesHelper (str, len, i, i, 0); // even length palindromes
count += countPalindromesHelper (str, len, i, i + 1, 1); // odd length palindromes
}
return count;
}
int main (int argc, char* argv[]) {
if (argc < 2) {
return 0;
}
const char* str = argv [1];
ull count = countPalindromes (str);
printf ("%llu\n", count);
return 0;
}
Usage:
$ gcc palindrome.c -o palindrome
$ ./palindrome myteststring
EDIT: I misread the problem as the contiguous substring version of the problem. Now given that one wants to find the palindrome count for the non-contiguous version, I strongly suspect that one could just use a math equation to solve it given the number of distinct characters and their respective character counts.
Hmmmmm, I think I would count up like this:
Each character is a palindrome on it's own (minus repeated characters).
Each pair of the same character.
Each pair of the same character, with all palindromes sandwiched in the middle that can be made from the string between repeats.
Apply recursively.
Which seems to be what you're doing, although I'm not sure you don't double-count the edge cases with repeated characters.
So, basically, I can't think of a better way.
EDIT:
Thinking some more,
It can be improved with caching, because you sometimes count the palindromes in the same sub-string more than once. So, I suppose this demonstrates that there is definitely a better way.
Here is a program for finding all the possible palindromes in a string written in both Java and C++.
int main()
{
string palindrome;
cout << "Enter a String to check if it is a Palindrome";
cin >> palindrome;
int length = palindrome.length();
cout << "the length of the string is " << length << endl;
int end = length - 1;
int start = 0;
int check=1;
while (end >= start) {
if (palindrome[start] != palindrome[end]) {
cout << "The string is not a palindrome";
check=0;
break;
}
else
{
start++;
end--;
}
}
if(check)
cout << "The string is a Palindrome" << endl;
}
public String[] findPalindromes(String source) {
Set<String> palindromes = new HashSet<String>();
int count = 0;
for(int i=0; i<source.length()-1; i++) {
for(int j= i+1; j<source.length(); j++) {
String palindromeCandidate = new String(source.substring(i, j+1));
if(isPalindrome(palindromeCandidate)) {
palindromes.add(palindromeCandidate);
}
}
}
return palindromes.toArray(new String[palindromes.size()]);
}
private boolean isPalindrome(String source) {
int i =0;
int k = source.length()-1;
for(i=0; i<source.length()/2; i++) {
if(source.charAt(i) != source.charAt(k)) {
return false;
}
k--;
}
return true;
}
I am not sure but you might try whit fourier. This problem remined me on this: O(nlogn) Algorithm - Find three evenly spaced ones within binary string
Just my 2cents

bool function problem - always returns true?

#include <iostream>
#include <string>
#include <algorithm>
#include <cstdlib>
#include <cstdio>
using namespace std;
static bool isanagram(string a, string b);
int main(void)
{
int i,n,j,s;
cin >> n;
string a, b;
cin >> a >> b;
if(!isanagram(a,b)) cout << "False" << endl;
else cout << "True" << endl;
return 0;
}
static bool isanagram(string a, string b)
{
int i, j, size, s=0;
size = a.size();
bool k;
for(i=0;i<size;i++)
{
k=false;
for(j=0;j<size;j++)
{
if(a[i] == b[j]) { k = true; break; }
}
if(k==true) s+=1;
}
cout << a[2] << b[2] << endl;
if(s == size) return true;
else return false;
}
I don't know where exactly is the problem so i just pasted the whole code.
It should be a simple program capable for finding if two strings are anagrams, but it's not working and i don't know why. I used pointers in the program so thought the might be the problem and removed them, i removed other things additionally but still it's not working. If you can give it a look-see and tell me some idea where i might've gone wrong with my code ?
Thank you in advance.
The logic for your isanagram function is fatally flawed - it will never work correctly, even if you manage to fix the bugs in it.
You need to make sure that you have a correct algorithm before you start coding. One simple algorithm might be:
sort a
sort b
isanagram = (a == b)
It's not always return true:
Here's my input:
0
sdf
fda
Here's output I got:
fa
False
Regarding your task: if performance is not an issue for you task, just sort 2 strings (using std::sort) and compare results.
Regarding your style:
use string::length() instead of size() -- it's more idiomatic
instead of if(s == size) return true; else return false; consider return s == size
pass your strings by const reference, not by value
consider declaring variables as close to point of their usage as possible (but not closely) and initialize them when declaring (i, j, k, size all fit this hint)
Your approach is fine but it has a small flaw. You ensuring that every char from string a is present in string. So if a = "aab" and b = "abc", your approach will flag them as anagram. You also need to take the count of char in account.
The definition of anagram is:
An anagram is a type of word play, the result of rearranging the letters of a word or phrase to produce a new word or phrase, using all the original letters exactly once;
Easiest way as many have suggested is to ensure that the strings are of the same length . If they are, sort the two string and check for equality.
If you want to patch your approach, you can make the char in string b NULL after it has been matched with a char in string a.
Something like:
if(a[i] == b[j]) { b[j] = 0; k = true; break; }
in place of your:
if(a[i] == b[j]) { k = true; break; }
This way once a char of b has been matched it cannot participate again.
There are essentially two ways of checking for anagrams:
Sort both strings and see if they match. If they are anagrams, they will both have the same letters and a sort would order them into the same sequence.
Count the frequency of each char in each string. If they are anagrams, the frequency counts for each char will be the same for both strings.
First things first: don't declare the method static. It's a confusing keyword at the best of times given all the roles it can fulfill... so reserve for times when you really have to (method or attribute of a class that is not tied to any instance for example).
Regarding the algorithm: you're nearly there, but presence only is not sufficient, you need to take the number of characters in account too.
Let's do it simply:
bool anagram(std::string const& lhs, std::string const& rhs)
{
if (lhs.size() != rhs.size()) return false; // does not cost much...
std::vector<int> count(256, 0); // count of characters
for (size_t i = 0, max = lhs.size(); i != max; ++i)
{
++count[lhs[i]];
--count[rhs[i]];
}
for (size_t i = 0, max = count.size(); i != max; ++i)
if (count[i] != 0) return false;
return true;
} // anagram
Let's see it at work: anagram("abc","cab")
Initialization: count = [0, 0, ...., 0]
First loop i == 0 > ['a': 1, 'c': -1]
First loop i == 1 > ['a': 0, 'b': 1, 'c': -1]
First loop i == 2 > ['a': 0, 'b': 0, 'c': 0 ]
And the second loop will pass without any problem.
Variants include maintaining 2 counts arrays (one for each strings) and then comparing them. It's slightly less efficient... does not really matter though.
int main(int argc, char* argv[])
{
if (argc != 3) std::cout << "Usage: Program Word1 Word2" << std::endl;
else std::cout << argv[1] << " and " << argv[2] << " are "
<< (anagram(argv[1], argv[2]) ? "" : "not ")
<< "anagrams" << std::endl;
}
I see some problems with your code. Basically the algorithm is wrong. It will match characters within a.size(). It takes no account for duplicates (in either a or b).
Essentially, you should sort the strings and then compare for equality.
If you can't sort, at least remove the b characters from the comparison, eliminate the k variable.