c++ string compare algorithm [closed] - c++

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
what is your best string comparison algorithm?
i find O(n)
#include <string>
bool str_cpmr(char* str1, char* str2)
{
int l1 = strlen(str1), l2 = strlen(str2) ;
if(l1 != l2)
return false;
for(int i = 0 ; i < l1 ; i++)
if(str1[i] != str2[i])
return false ;
return true ;
}
and i wonder if there is any other / better solution.
also, how to test that accurately?
i propose to compare
100 matches
100 strings differing by one char swap
is there more to test string compare ?
how is it in stl c++ (slt string::compare) ?
thanks!!!!!

You function is O(n), but still takes roughly double the time necessary -- strlen walks through the string to find the length, then (assuming they're the same length) you walk through the strings again comparing the characters.
Instead of that, I'd walk through the strings until you reach a mismatch or the end of both strings. If you reach a mismatch, you return false. You return true if and only if you reach the end of both strings (simultaneously) without any mismatches first.

Logically it's hard to see how you can check all the values in a string for a single char mismatch in less than O(n) time - assuming you have no other info about the string.
If this is a real application and you have some knowledge of the strngs and the type of differences you could do better on average by checking every Nth char first if you know that it contains sequences of length 'N' eg part or phone numbers.
edit: Note this is still O(n), O() only describes the power of the scaling, it would just be O(n/N) which is still O(n). If you make the string 10x longer checking every Nth entry still takes 10x as long.

what is your best string comparison algorithm?
template< class T, class Alloc >
bool operator==( basic_string<T,Alloc>& lhs, basic_string<T,Alloc>& rhs );.
It compares two strings using only two characters of source code:
a==b;
Here's a non-smartalec answer, written in C:
bool str_cpmr(char* str1, char* str2)
{
while( *str1 && *str2 && *str1++ == *str2++ )
;
return *str1 == *str2;
}
It is exactly one loop, so it is obviously O(n), where n goes as length of the shorter string. Also, it is likely to compile to exactly 2n memory fetches. You can go faster with specialized string instructions (so calling strcmp() will probably go faster than this), but you probably won't go faster in straight C.

Your improved function might look like this:
bool str_cpmr(char* str1, char* str2)
{
if (NULL == str1 || NULL == str2) return false;
while (*str1 && *str2) {
if (*str1++ != *str2++) {
return false;
}
}
return *str1 || *str2 ? false : true;
}

If there is no additional information on the nature of the strings, there is nothing that can be better than O(n), where n is the length of the (shorter) string.
You cannot do with less than n comparisons! Give me an algorithm that can do it with n-1 comparisons. Then there must be a position in the string where the algorithm cannot tell if the characters are different or not. That way I can give you an example where you algorithm with n-1 comparisons fails.
You can only improve this by a constant factor. This will also take into account additional information, e.g. if you know that the underlying hardware compares 32-bit values faster than 8-bit values, then it will better to compare chunks of four characters instead of comparing character by character. You will not do much better.

Related

How to convert one string to another by successive substitutions of characters?

I'm currently trying to design an algorithm that doing such thing:
I got two strings A and B which consist of lowercase characters 'a'-'z'
and I can modify string A using the following operations:
1. Select two characters 'c1' and 'c2' from the character set ['a'-'z'].
2. Replace all characters 'c1' in string A with character 'c2'.
I need to find the minimum number of operations needed to convert string A to string B when possible.
I have 2 ideas that didn't work
1. Simple range-based for cycle that changes string B and compares it with A.
2. Idea with map<char, int> that does the same.
Right now I'm stuck on unit-testing with such situation : 'ab' is transferable to 'ba' in 3 iterations and 'abc' to 'bca' in 4 iterations.
My algorithm is wrong and I need some fresh ideas or working solution.
Can anyone help with this?
Here is some code that shows minimal RepEx:
int Transform(string& A, string& B)
{
int count = 0;
if(A.size() != B.size()){
return -1;
}
for(int i = A.size() - 1; i >= 0; i--){
if(A[i]!=B[i]){
char rep_elem = A[i];
++count;
replace(A.begin(),A.end(),rep_elem,B[i]);
}
}
if(A != B){
return -1;
}
return count;
}
How can I improve this or I should find another ideas?
First of all, don't worry about string operations. Your problem is algorithmic, not textual. You should somehow analyze your data, and only afterwards print your solution.
Start with building a data structure which tells, for each letter, which letter it should be replaced with. Use an array (or std::map<char, char> — it should conceptually be similar, but have different syntax).
If you discover that you should convert a letter to two different letters — error, conversion impossible. Otherwise, count the number of non-trivial cycles in the conversion graph.
The length of your solution will be the number of letters which shouldn't be replaced by themselves plus the number of cycles.
I think the code to implement this would be too long to be helpful.

Runtime error: pointer index expression with base 0x000000000000 overflowed to 0xffffffffffffffff for frequency sort

The question is to sort the letters of a given string in the decreasing order of their frequencies.
Eg: If string = "tree" output = "eert" or "eetr"
I used an unordered_map and counted the frequencies of each letter and added it to the resultant string in the decreasing order of frequencies.
This is what I have tried:
string frequencySort(string s1) {
unordered_map<char,int> temp;
for(char& c: s1)
temp[c]++;
string s = "";
while(!temp.empty()) {
int max = 0;
char c='a';
for(auto it:temp) {
if(it.second > max) {
c = it.first;
max = it.second;
}
}
for(int j=0;j<max;j++)
s = s + c;
temp.erase(temp.find(c));
}
return s;
}
My code is not working for large inputs. And changing int to long long does not make it work. So the maximum frequency is within INT_MAX. I get this error:
Runtime error: pointer index expression with base 0x000000000000 overflowed to 0xffffffffffffffff
I cannot paste the particular test case here as it exceeds the permissible body size for a question.
Any help is appreciated.
There is nothing logically wrong in the code, but there are many inefficiencies that could make you run out of memory in a low-memory machine.
First you pass string by value:
string frequencySort(string s1) {
This makes a new copy of the string each call, wasting twice as much memory than necessary.
Instead, prefer:
string frequencySort(const string & s1) {
The repeated reallocation required for the string can cause fragmentation in the memory manager, and cause more rapid out-of-memory issues:
for(int j=0;j<max;j++)
s = s + c;
To minimize reallocation issues, use reserve
string s = "";
s.reserve(s1.length());
And the biggest performance issue:
s = s + c;
The above code copies the same string again and again. Running in O(N2) and wrecking havoc on the heap with massive fragmentation.
There are also simple inefficiencies in the code that might have a big impact on runtime for large inputs, although they don't affect complexity. The use of unordered_map for such a small set (26 english letters) has a lot of time-overhead. It might be more efficient to use std::map in this case. For large inputs it is more efficient to hold an array
int map[256] = {0};
Unfortunately, for small inputs it might be slower. Also, this will not work so well for wide characters (where there are over 216 possible wide characters). But for ASCII this should work pretty well.
As a benchmark I ran the string that results from this command:
perl -e 'print "abcdefghijklmnopqrstuvwxyza" x 1000000; print "y\n"'
which generates a string of size 26 million characters.
The code with int map[256] completed in less than 4 seconds on my laptop.

Comparing two strings without using strcmp [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 7 years ago.
Improve this question
This is my question:
Write a function name compareStrings(char * str1, char * str1, int i=0), which returns decides whether
the two received string are equal or not. The third parameter decides whether to take case
sensitiveness while comparing strings: 0 means case sensitive, otherwise case sensitive.
The function returns 0 if two strings are equal
Returns 1 if str1 > str2
Returns -1 if str1 < str2.
Example:
compareStrings( “apple”, “Apple” ) returns 1
compareStrings( “apple a day keeps the doctor away”, “apple are good for health” ) returns -1
This code I have done yet but it is not comparing all Ascii's. According to me I must put all Ascii's checks but it would be so long
Please tell me any other logic regarding this Question.
#include<iostream>
using namespace std;
int compareStrings(char * str1, char * str2);
int main()
{
char str1[]="apple";
char str2[]="Apple";
int ret;
ret=compareStrings(str1,str2);
if(ret==0)
cout<<"Both strings are equal"<<endl;
else if(ret==1)
cout<<"string 1 is bigger than 2"<<endl;
else
cout<<"string 1 is lower than 2"<<endl;
return 0;
}
int compareStrings(char * str1, char * str2)
{
for(int i=0;i<20;i++)
{
if(str1[i]==str2[i])
return 0;
else if(str1[i] >= 'A' && str1[i] <= 'Z' &&str2[i] <='a' && str2[i]<='z')
return -1;
else if(str2[i] >= 'A' && str2[i] <= 'Z' &&str1[i] <='a' && str1[i]<='z')
return 1;
}
}
There are multiple problems with the code as shown. I'm ignoring the fact that you aren't using C++ std::string type, though that is another issue.
You only compare the first twenty characters of the strings.
What happens if the strings are longer?
What is the return value from the function if the loop ends?
You compare the first twenty characters of the strings even if the strings are shorter.
You return 0 on the first character that's the same.
You return -1 if the current character in the first string is upper-case and the current character in the second is lower-case, regardless of whether the case-sensitivity flag is set or whether the letters are equivalent.
Similarly you return +1 for the converse condition.
You don't use the isalpha(), isupper(), islower() macros (prefixed with std:: from <cctype> or equivalent functions.
You don't recognize that if one string contains a 7 and the other a 9, you should come to a decision.
Since the comparison function is not supposed to modify either string, the function prototype should use const char * arguments.
Etc.
You will need to rethink your code rather carefully. Ignore case-insensitivity until you have case-sensitive comparisons working correctly. Then you can modify it to handle case-insensitive comparisons too.

C++: Comparing individual elements of a string to their ASCII values? [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 8 years ago.
Improve this question
I am trying to write a small program that determines if a string is a palindrome. Naturally, I want to ignore any character that is not a letter. I planned on achieving this by checking each element of the string by comparing their ASCII values to values that I determined: [65,90] U [97,122]
The following code is a segment from a function in which a string, string aStrn, is passed in.
while(aStrn[index] != '\0')
{
if(aStrn[index] > 64 && aStrn[index] < 91 && aStrn[index] > 96 &&
aStrn[index] < 123)
{
ordered.Push(aStrn[index]);
}
index++;
}
I tested this code by explicitly defining parameters such that if(aStrn[index] != ' ' && aStrn[index] != '\''... etc., and it worked perfectly. However, when I try the method shown above, ordered remains empty.
I can't for the life of me figure out why, so all help is greatly appreciated. I also understand that there is probably a better way to go about this but I would still like to understand why this does not work.
Unless you have a specific reason to do otherwise, you want to put your strings into std::string objects, use std::isalpha to determine whether something is a letter, and probably std::copy_if to copy the qualifying data from the source to the destination.
std::string source = "This is 1 non-palindromic string!";
std::string dest;
std::copy_if(source.begin(), source.end(),
std::back_inserter(dest),
[](unsigned char c) { return std::isalpha(c); });
You might also want to convert the string entirely to lower (or upper) case to make comparisons easier (assuming you want to treat upper and lower case letters as equal). That's also pretty trivial:
std::transform(dest.begin(), dest.end(),
dest.begin(),
[](unsigned char c) { return std::toupper(c); });
Missing parentheses and 'OR' operator. Simple mistake.
if((aStrn[index] > 64 && aStrn[index] < 91) || (aStrn[index] > 96 && aStrn[index] < 123)) fixed it.
you're allowed to compare against character literals.
if (aStrn[index] >= 'a' && aStrn[index] <= 'z' /* ... */) // for example
But there are standard library functions that do the work for you.
if (std::isalpha(aStrn[index])) {
//...
}

Interview: Adding two binary numbers given as strings [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
While looking through some interview questions at http://www.glassdoor.com/Interview/Facebook-Interview-Questions-E40772.htm I came across the following question:
Given two string representations of binary numbers (e.g. "1001", "10") write a function that adds them and returns the result as a string as well (e.g. "1011").
Here's some Python code that I started to write for the problem (it is incomplete right now), but I am not very sure if this is the best (or even correct) approach. I also thought about implementing the same in C++ first, but gave up considering the added complexities in string manipulations.
def add_binary(a,b):
temp = ""
carry = false
//i from len(a) to 1 and j from len(b) to 1
bit_sum = add_bit ((a[i],b[j])
if (bit_sum == "10"):
temp.append("0")
carry = true
elif carry:
temp.append(add_bit("1",bit_sum))
else:
temp.append(bit_sum)
return temp.reverse()
def add_bit(b1, b2):
if b1 == '0':
return b2
elif b2 == '0':
return b1
elif (b1 = '1' and b2 =='1'):
return "10"
else return None
a = "1001"
b = "10"
add_binary(a,b)
First, if the strings are short enough (less than 64 bits), I'd
probably just convert them to an internal integral type
(unsigned long long), do the addition there, and then
reconvert the results. Converting between binary strings and
internal format is really, really trivial.
Otherwise, I'd probably first normallize them so that they have
the maximum length of the results. Something like:
size_t size = std::max( lhs.size(), rhs.size() ) + 1;
lhs.insert( lhs.begin(), size - lhs.size(), '0' );
rhs.insert( rhs.begin(), size - rhs.size(), '0' );
I'd also create a results string of this size:
std::string results( size, '0' );
And a carry variable, initialized to '0':
char carry = '0';
I'd then loop over the three strings, using reverse iterators,
or more likely, just an index (which will ensure accessing the
same element of each string):
size_t current = size;
while ( current != 0 ) {
-- current;
// ...
}
With in the loop, you really only have four possibilities: I'd
just count the '1's (in lhs[current], rhs[current] and
carry), and do a switch on the results, setting
results[current] and carry appropriately:
int onesCount = 0;
if ( carry == '1' ) {
++ onesCount;
}
if ( lhs[current] == '1' ) {
++ onesCount;
}
if ( rhs[current] == '1' ) {
++ onesCount;
}
swith ( onesCount ) {
case 0:
carry = '0';
results[current] = '0';
break;
case 1:
carry = '0';
results[current] = '1';
break;
case 2:
carry = '1';
results[current] = '0';
break;
case 3:
carry = '1';
results[current] = '1';
break;
}
Personally, I think this is the simplest and the cleanest
solution, albeit a bit verbose. Alternatively, you can replace
the switch with something like:
results[current] = onesCount % 2 == 0 ? '0' : '1';
carry = onesCount < 2 ? '0' : '1';
Finally, if desired, you can suppress any leading zeros in the
results (there will be at most one), and maybe assert that
carry == '0' (because if it isn't, we've screwed up our
calculation of the size).
The most difficult part here is the fact that we need to process the strings from right to left. We can do this by either:
Reversing the strings (input and output).
In a recursive call, process the bits when "going back", i.e. first call a recursive add on the "next bits", then add the "current bit".
Use reverse iterators and construct the result from right to left. A problem will still be how to know the resulting length in advance (so you know where to start).
The recursive solution has problems when the numbers are large, i.e. the stack might overflow.
Reversing the strings is the easiest solution, yet not the most efficient one.
A combination of the first and third option would be to process the input strings in reverse using reverse iterators, but construct the result in reverse order (so you can simply append bits), then reverse the result.
This is more or less also your approach. Implement the loop counting i and j from the string length minus 1 (!) until 0, so this will walk through the input strings in reverse order. If carry was set to true, add one to the bit sum in the next iteration. Represent the bit sum of add_bit as an integer, not as a string again, so you can add the carry.
In C++, you have the possibility to iterate through any sequence in reverse order using rbegin() and rend().
in Python you can convert strings to integers with int, which also allows a base for conversion:
num = '111'
print int(num, 2)
prints 7.
For converting the result back to binary:
print "{:b}".format(4)
prints 100