What is the proper function for comparing two C-style strings? - c++

So I have a dilemma. I need to compare two C-style strings and I searched for the functions that would be the most appropiate:
memcmp //Compare two blocks of memory (function)
strcmp //Compare two strings (function )
strcoll //Compare two strings using locale (function)
strncmp //Compare characters of two strings (function)
strxfrm //Transform string using locale (function)
The first one I think is for addresses, so the idea is out.
The second one sounds like the best choice to me, but I wanna hear feedback anyway.
The other three leave me clueless.

For general string comparisons, strcmp is the appropriate function. You should use strncmp to only compare some number of characters from a string (for example, a prefix), and memcmp to compare blocks of memory.
That said, since you're using C++, you should avoid this altogether and use the std::string class, which is much easier to use and generally safer than C-style strings. You can compare two std::strings for equality easily by just using the == operator.
Hope this helps!

Both memcmp and strcmp will work fine. To use the former, you'll need to know the length of the shorter string in advance.

Related

How do I compare base36 values in C++?

As we all know, with C-style string comparisons, the value is dependent on the ASCII value of each character and just uses the strcmp function to compare. I'm confused that what the std::string compare depends on?
Although I have searched Google, I still didn't find the answer.
In addition, if strings are all base36 strings and they are all in lower case, could I compare their values by strings directly? Or I should convert them as a long variable using the strtol function? Which method is better?
Your outset "As we all know, with C-style string comparisons, the value is dependent on the ASCII value of each character..." is unfortunately wrong already. With e.g. UTF-8 strings and various forms of collation, that's simply untrue.
Then "...and just uses the strcmp function to compare." is also wrong, because C-style strings don't have an inherent way to compare but multiple ways that also depend on e.g. the encoding and locale. You could use strcmp() for bytewise equality comparisons though, but that won't always give you expected results.
To answer your question what std::string uses, that's simple. std::string is a specialization of the std::basic_string template and it delegates comparisons to its char_traits template parameter. This parameter typically uses memcmp(). It can not possibly use strcmp(), because other than a C-style string, std::string can include null chars, but strcmp() would stop at those.
std::string compare depends upon 'ASCII values' in exactly the same way that strcmp does.
For base36 comparisons, simple string comparison (either strcmp or std::string) doesn't work because "00123" and "123" are equal when representing base36 integers but they compare differently as strings. Neither does strtol work very well because of integer overflow. Instead you should probably write your own comparison routine that removes leading zeros, then compares length and finally for strings of equal length does a string comparison.

c++ best way to call function with const char* parameter type

what is the best way to call a function with the following declaration
string Extract(const char* pattern,const char* input);
i use
string str=Extract("something","input text");
is there a problem with this usage
should i use the following
char pattern[]="something";
char input[]="input";
//or use pointers with new operator and copy then free?
the both works but i like the first one but i want to know the best practice.
A literal string (e.g. "something") works just fine as a const char* argument to a function call.
The first method, i.e. passing them literally in, is usually preferable.
There are occasions though where you don't want your strings hard-coded into the text. In some ways you can say that, a bit like magic numbers, they are magic words / phrases. So you prefer to use constant identifier to store the values and pass those in instead.
This would happen often when:
1. a word has a special meaning, and is passed in many times in the code to have that meaning.
or
2. the word may be cryptic in some way and a constant identifier may be more descriptive
Unless you plain to have duplicates of the same strings, or alter those strings, I'm a fan of the first way (passing the literals directly), it means less dotting about code to find what the parameters actually are, it also means less work in passing parameters.
Seeing as this is tagged for C++, passing the literals directly allows you to easily switch the function parameters to std::string with little effort.

strcmp or string::compare?

I want to compare two strings. Is it possible with strcmp? (I tried and it does not seem to work). Is string::compare a solution?
Other than this, is there a way to compare a string to a char?
Thanks for the early comments. I was coding in C++ and yes it was std::string like some of you mentioned.
I didn't post the code because I wanted to learn the general knowledge and it is a pretty long code, so it was irrelevant for the question.
I think I learned the difference between C++ and C, thanks for pointing that out. And I will try to use overloaded operators now. And by the way string::compare worked too.
For C++, use std::string and compare using string::compare.
For C use strcmp. If your (i meant your programs) strings (for some weird reason) aren't nul terminated, use strncmp instead.
But why would someone not use something as simple as == for std::string
?
Assuming you mean std::string, why not use the overloaded operators: str1 == str2, str1 < str2?
See std::basic_string::compare and std::basic_string operators reference (in particular, there exists operator==, operator!=, operator<, etc.). What else do you need?
When using C++ use C++ functions viz. string::compare. When using C and you are forced to use char* for string, use strcmp
By your question, "is there a way to compare a string to a char?" do you mean "How do I find out if a particular char is contained in a string?" If so, the the C-library function:
char *strchr(const char *s, int c);
will do it for you.
-- pete
std::string can contain (and compare!) embedded null characters.
are*comp(...) will compare c-style strings, comparing up to the first null character (or the specified max nr of bytes/characters)
string::compare is actually implemented as a template basic_string so you can expect it to work for other types such as wstring
On the unclear phrase to "compare a string to a char" you can compare the char to *string.begin() or lookup the first occurrence (string::find_first_of and string::find_first_not_of)
Disclaimer: typed on my HTC, typos reserved :)

Why isn't ("Maya" == "Maya") true in C++?

Any idea why I get "Maya is not Maya" as a result of this code?
if ("Maya" == "Maya")
printf("Maya is Maya \n");
else
printf("Maya is not Maya \n");
Because you are actually comparing two pointers - use e.g. one of the following instead:
if (std::string("Maya") == "Maya") { /* ... */ }
if (std::strcmp("Maya", "Maya") == 0) { /* ... */ }
This is because C++03, §2.13.4 says:
An ordinary string literal has type “array of n const char”
... and in your case a conversion to pointer applies.
See also this question on why you can't provide an overload for == for this case.
You are not comparing strings, you are comparing pointer address equality.
To be more explicit -
"foo baz bar" implicitly defines an anonymous const char[m]. It is implementation-defined as to whether identical anonymous const char[m] will point to the same location in memory(a concept referred to as interning).
The function you want - in C - is strmp(char*, char*), which returns 0 on equality.
Or, in C++, what you might do is
#include <string>
std::string s1 = "foo"
std::string s2 = "bar"
and then compare s1 vs. s2 with the == operator, which is defined in an intuitive fashion for strings.
The output of your program is implementation-defined.
A string literal has the type const char[N] (that is, it's an array). Whether or not each string literal in your program is represented by a unique array is implementation-defined. (§2.13.4/2)
When you do the comparison, the arrays decay into pointers (to the first element), and you do a pointer comparison. If the compiler decides to store both string literals as the same array, the pointers compare true; if they each have their own storage, they compare false.
To compare string's, use std::strcmp(), like this:
if (std::strcmp("Maya", "Maya") == 0) // same
Typically you'd use the standard string class, std::string. It defines operator==. You'd need to make one of your literals a std::string to use that operator:
if (std::string("Maya") == "Maya") // same
What you are doing is comparing the address of one string with the address of another. Depending on the compiler and its settings, sometimes the identical literal strings will have the same address, and sometimes they won't (as apparently you found).
Any idea why i get "Maya is not Maya" as a result
Because in C, and thus in C++, string literals are of type const char[], which is implicitly converted to const char*, a pointer to the first character, when you try to compare them. And pointer comparison is address comparison.
Whether the two string literals compare equal or not depends whether your compiler (using your current settings) pools string literals. It is allowed to do that, but it doesn't need to. .
To compare the strings in C, use strcmp() from the <string.h> header. (It's std::strcmp() from <cstring>in C++.)
To do so in C++, the easiest is to turn one of them into a std::string (from the <string> header), which comes with all comparison operators, including ==:
#include <string>
// ...
if (std::string("Maya") == "Maya")
std::cout << "Maya is Maya\n";
else
std::cout << "Maya is not Maya\n";
C and C++ do this comparison via pointer comparison; looks like your compiler is creating separate resource instances for the strings "Maya" and "Maya" (probably due to having an optimization turned off).
My compiler says they are the same ;-)
even worse, my compiler is certainly broken. This very basic equation:
printf("23 - 523 = %d\n","23"-"523");
produces:
23 - 523 = 1
Indeed, "because your compiler, in this instance, isn't using string pooling," is the technically correct, yet not particularly helpful answer :)
This is one of the many reasons the std::string class in the Standard Template Library now exists to replace this earlier kind of string when you want to do anything useful with strings in C++, and is a problem pretty much everyone who's ever learned C or C++ stumbles over fairly early on in their studies.
Let me explain.
Basically, back in the days of C, all strings worked like this. A string is just a bunch of characters in memory. A string you embed in your C source code gets translated into a bunch of bytes representing that string in the running machine code when your program executes.
The crucial part here is that a good old-fashioned C-style "string" is an array of characters in memory. That block of memory is often referred to by means of a pointer -- the address of the start of the block of memory. Generally, when you're referring to a "string" in C, you're referring to that block of memory, or a pointer to it. C doesn't have a string type per se; strings are just a bunch of chars in a row.
When you write this in your code:
"wibble"
Then the compiler provides a block of memory that contains the bytes representing the characters 'w', 'i', 'b', 'b', 'l', 'e', and '\0' in that order (the compiler adds a zero byte at the end, a "null terminator". In C a standard string is a null-terminated string: a block of characters starting at a given memory address and continuing until the next zero byte.)
And when you start comparing expressions like that, what happens is this:
if ("Maya" == "Maya")
At the point of this comparison, the compiler -- in your case, specifically; see my explanation of string pooling at the end -- has created two separate blocks of memory, to hold two different sets of characters that are both set to 'M', 'a', 'y', 'a', '\0'.
When the compiler sees a string in quotes like this, "under the hood" it builds an array of characters, and the string itself, "Maya", acts as the name of the array of characters. Because the names of arrays are effectively pointers, pointing at the first character of the array, the type of the expression "Maya" is pointer to char.
When you compare these two expressions using "==", what you're actually comparing is the pointers, the memory addresses of the beginning of these two different blocks of memory. Which is why the comparison is false, in your particular case, with your particular compiler.
If you want to compare two good old-fashioned C strings, you should use the strcmp() function. This will examine the contents of the memory pointed two by both "strings" (which, as I've explained, are just pointers to a block of memory) and go through the bytes, comparing them one-by-one, and tell you whether they're really the same.
Now, as I've said, this is the kind of slightly surprising result that's been biting C beginners on the arse since the days of yore. And that's one of the reasons the language evolved over time. Now, in C++, there is a std::string class, that will hold strings, and will work as you expect. The "==" operator for std::string will actually compare the contents of two std::strings.
By default, though, C++ is designed to be backwards-compatible with C, i.e. a C program will generally compile and work under a C++ compiler the same way it does in a C compiler, and that means that old-fashioned strings, "things like this in your code", will still end up as pointers to bits of memory that will give non-obvious results to the beginner when you start comparing them.
Oh, and that "string pooling" I mentioned at the beginning? That's where some more complexity might creep in. A smart compiler, to be efficient with its memory, may well spot that in your case, the strings are the same and can't be changed, and therefore only allocate one block of memory, with both of your names, "Maya", pointing at it. At which point, comparing the "strings" -- the pointers -- will tell you that they are, in fact, equal. But more by luck than design!
This "string pooling" behaviour will change from compiler to compiler, and often will differ between debug and release modes of the same compiler, as the release mode often includes optimisations like this, which will make the output code more compact (it only has to have one block of memory with "Maya" in, not two, so it's saved five -- remember that null terminator! -- bytes in the object code.) And that's the kind of behaviour that can drive a person insane if they don't know what's going on :)
If nothing else, this answer might give you a lot of search terms for the thousands of articles that are out there on the web already, trying to explain this. It's a bit painful, and everyone goes through it. If you can get your head around pointers, you'll be a much better C or C++ programmer in the long run, whether you choose to use std::string instead or not!

Overloading a method on default arguments

Is it possible to overload a method on default parameters?
For example, if I have a method split() to split a string, but the string has two delimiters, say '_' and "delimit". Can I have two methods something like:
split(const char *str, char delim = ' ')
and
split(const char *str, const char* delim = "delimit");
Or, is there a better way of achieving this? Somehow, my brain isn't working now and am unable to think of any other solution.
Edit: The problem in detail:
I have a string with two delimiters, say for example, nativeProbableCause_Complete|Alarm|Text. I need to separate nativeProbableCause and Complete|Alarm|Text and then further, I need to separate Complete|Alarm|Text into individual words and join them back with space as a separator sometime later (for which I already have written a utility and isn't a big deal). It is only the separation of the delimited string that is troubling me.
No, you cant - if you think about it, the notion of a default means 'use this unless I say otherwise'. If the compiler has 2 options for a default, which one will it choose?
How about implementing as 2 different methods like
split_with_default_delimiter_space
split_with_default_delimiter_delimit
Personally I'd prefer using something like this (more readable.. intent conveying) over the type of overloading that you mentioned... even if it was somehow possible for the compiler to do that.
Why not just call split() twice and explicitly pass the delimiter the second time? Will delimiters always be single characters?
Do you perform any other processing on the 2nd set of words before joining them? If not, then for the second task what you really want to do is replace substrings. This is most easily done with std::string::find and std::string::replace. If you must use c-strings, you could use strstr/strchr/strpbrk, strcpy and strcat, or use just strstr/strchr/strpbrk and join them in place.
You could use a version of split that accepts a variable number of delimiters (split(const char*,vector<string>), if you want to split(const char*, const char**)) or just use Boost Tokenizer.