C++: String Functions

C++: String Functions - c++

I am quite new to C++ programming and am getting to know the basics by reading books.
I came across two interesting functions strcmpi() and stricmp().
I know that both the functions compare strings lexicographically by ignoring the case of the strings.
So I just wanted to know the differences between them.
Any help would be appreciated.

Both functions do the exact same thing (as long as you stick to comparing plain ASCII strings).
The problem is, neither is part of the ANSI C standard, so you can't be sure any of these will be available for a given compiler.
You can have yet other names for the same functionality. _strcmpi() for instance.
There is no standard case-insensitive comparison primitive in C/C++, so each compiler provides its own version with varying names.
The best "standard" variant would be the ISO C++ _stricmp, but I would not bet every compiler on the planet currently supports it.
The reason behind it is that case sensitivity is not as trivial a problem as it might seem, what with all the diacritics of various languages and the extended character encodings.
While plain ASCII string will always be compared the same way, you can expect differences in implementation when trying to compare UTF16 strings or other extended character sets.
Judging by this article, some C++ geeks seem to get a big kick rewriting their own version of it too.

strcmpi and stricmp are case-insensitive versions of strcmp. They work identically in all other respects. _strcmpi and _stricmp are alternate names for strcmpi and stricmp. strcasecmp is an alias for strcmpi.
int strcmp (const char *string1, const char *string2);
int strcmpi (const char *string1, const char *string2);
int stricmp (const char *string1, const char *string2);

Related

When to use std::string vs char*? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
C++ char* vs std::string
I'm new to C++ coming from C# but I really do like C++ much better.
I have an abstract class that defines two constant strings (not static). And I wondered if a const char* would be a better choice. I'm still getting the hang of the C++ standards, but I just figured that there really isn't any reason why I would need to use std::string in this particular case (no appending or editing the string, just writing to the console via printf).
Should I stick to std::string in every case?

Should I stick to std::string in every case?
Yes.
Except perhaps for a few edge cases where you writing a high performance multi-threaded logging lib and you really need to know when a memory allocation is going to take place, or perhaps in fiddling with individual bits in a packet header in some low level protocol/driver.
The problem is that you start with a simple char and then you need to print it, so you use printf(), then perhaps a sprintf() to parse it because std::stream would be a pain for just one int to string. And you end up with an unsafe and unmaintainable mix oc c/c++

I would stick to using std::string instead of const char*, simply because most of the built-in C++ libraries work with strings and not character arrays. std::string has a lot of built-in methods and facilities that give the programmer a lot of power when manipulating strings.

Should I stick to std::string in every case?
There are cases where std::string isn't needed and just a plain char const* will do. However you do get other functionality besides manipulation, you also get to do comparison with other strings and char arrays, and all the standard algorithms to operate on them.
I would say go with std::string by default (for members and variables), and then only change if you happen to see that is the cause of a performance drop (which it won't).

Use std::string when you need to store a value.
Use const char * when you want maximum flexibility, as almost everything can be easily converted to or from one.

This like comparing Apples to Oranges. std::string is a container class while char* is just a pointer to a character sequence.
It really all depends on what you want to do with the string.
Std::string on the other hand can give you a quick access for simple string calculation and manipulation function. Most of those are simple string manipulation functions, nothing fancy really.
So it basically depends on your needs and how your functions are declared. The only advantage for std::string over a char pointer is that it doesnt require a specific lenghth decleration.

Correct use of string storage in C and C++

Popular software developers and companies (Joel Spolsky, Fog Creek software) tend to use wchar_t for Unicode character storage when writing C or C++ code. When and how should one use char and wchar_t in respect to good coding practices?
I am particularly interested in POSIX compliance when writing software that leverages Unicode.
When using wchar_t, you can look up characters in an array of wide characters on a per-character or per-array-element basis:
/* C code fragment */
const wchar_t *overlord = L"ov€rlord";
if (overlord[2] == L'€')
wprintf(L"Character comparison on a per-character basis.\n");
How can you compare unicode bytes (or characters) when using char?
So far my preferred way of comparing strings and characters of type char in C often looks like this:
/* C code fragment */
const char *mail[] = { "ov€rlord#masters.lt", "ov€rlord#masters.lt" };
if (mail[0][2] == mail[1][2] && mail[0][3] == mail[1][3] && mail[0][3] == mail[1][3])
printf("%s\n%zu", *mail, strlen(*mail));
This method scans for the byte equivalent of a unicode character. The Unicode Euro symbol € takes up 3 bytes. Therefore one needs to compare three char array bytes to know if the Unicode characters match. Often you need to know the size of the character or string you want to compare and the bits it produces for the solution to work. This does not look like a good way of handling Unicode at all. Is there a better way of comparing strings and character elements of type char?
In addition, when using wchar_t, how can you scan the file contents to an array? The function fread does not seem to produce valid results.

If you know that you're dealing with unicode, neither char nor wchar_t are appropriate as their sizes are compiler/platform-defined. For example, wchar_t is 2 bytes on Windows (MSVC), but 4 bytes on Linux (GCC). The C11 and C++11 standards have been a bit more rigorous, and define two new character types (char16_t and char32_t) with associated literal prefixes for creating UTF-{8, 16, 32} strings.
If you need to store and manipulate unicode characters, you should use a library that is designed for the job, as neither the pre-C11 nor pre-C++11 language standards have been written with unicode in mind. There are a few to choose from, but ICU is quite popular (and supports C, C++, and Java).

I am particularly interested in POSIX compliance when writing software
that leverages Unicode.
In this case, you'll probably want to use UTF-8 (with char) as your preferred Unicode string type. POSIX doesn't have a lot of functions for working with wchar_t — that's mostly a Windows thing.
This method scans for the byte equivalent of a unicode character. The
Unicode Euro symbol € takes up 3 bytes. Therefore one needs to compare
three char array bytes to know if the Unicode characters match. Often
you need to know the size of the character or string you want to
compare and the bits it produces for the solution to work.
No, you don't. You just compare the bytes. Iff the bytes match, the strings match. strcmp works just as well with UTF-8 as it does with any other encoding.
Unless you want something like a case-insensitive or accent-insensitive comparison, in which case you'll need a proper Unicode library.

You should never-ever compare bytes, or even code points, to decide if strings are equal. That's because of a lot of strings can be identical from user perspective without being identical from code point perspective.

c++ Why is an array of characters used in tutorials about strings?

I have seen tutorials that are using an array of characters in order to demonstrate something with a string object. For example thees tutorials:
http://www.cplusplus.com/reference/string/string/copy/
http://www.cplusplus.com/reference/clibrary/cstdlib/atoi/
I HAVE seen tutorials that are not using a char array in order to demonstrate something. At school, the teacher also doesn't use any arrays. For me, using an array is a bit confusing at first when I'm reading the tutorial (knowing that I'm still a beginner at C++).
I'm just curious to know why are there tutorials that use a char array in order to show one or more of the things that string objects can do.

Storing strings in arrays of characters was the original way to represent a string in the C language. In C, a string is an array of type char. The size of the array is the number of characters, + 1. The +1 is because every string in C must end with a character value of 0. This the NULL terminator or just the terminator.
C-style strings are legal in C++ because C++ is intended to be backwards compatible with C. Also, many library and existing code bases depend on C-style string.
Here is a tutorial on C-style strings. http://www.cprogramming.com/tutorial/c/lesson9.html
FYI: To convert a C++ String to a C-style string, call the method c_str().

In C and C++ strings are/were unusually represented as '\0' terminated arrays of char. With C++ you can use the standard class string, but that is by no means a "natural" thing to do, when representing strings. Many C and C++ programs are still quite content using an array of char.

See the Bjarne Stroustrup's paper "Learning Standard C++ as a New Language"
www2.research.att.com/~bs/new_learning.pdf

Strings appeared when the STL appeared, when the C++ standard was formed somewhere in the 1990s if i remember well. Until then (for example Turbo C++ which is still used at my school... unfortunately), there was no 'string' object in C++, so everyone used char arrays. They are still widely used, because strings don't really introduce many new things that char arrays can't do, and many people don't like them. Strings are in fact null ended char arrays, but they hide this behind a class.
One problem with strings is that not all library functions support them. For example, the printf family of functions, atoi (which comes from 'ascii to integer', also atof and all the others), these don't support strings. Also, in larger projects, sometimes you need to work in C, and strings don't exist in C, only char arrays.
The good thing about strings is that they are implemented in such a way that it is very easy to convert from and to char arrays.

strcmp or string::compare?

I want to compare two strings. Is it possible with strcmp? (I tried and it does not seem to work). Is string::compare a solution?
Other than this, is there a way to compare a string to a char?
Thanks for the early comments. I was coding in C++ and yes it was std::string like some of you mentioned.
I didn't post the code because I wanted to learn the general knowledge and it is a pretty long code, so it was irrelevant for the question.
I think I learned the difference between C++ and C, thanks for pointing that out. And I will try to use overloaded operators now. And by the way string::compare worked too.

For C++, use std::string and compare using string::compare.
For C use strcmp. If your (i meant your programs) strings (for some weird reason) aren't nul terminated, use strncmp instead.
But why would someone not use something as simple as == for std::string
?

Assuming you mean std::string, why not use the overloaded operators: str1 == str2, str1 < str2?

See std::basic_string::compare and std::basic_string operators reference (in particular, there exists operator==, operator!=, operator<, etc.). What else do you need?

When using C++ use C++ functions viz. string::compare. When using C and you are forced to use char* for string, use strcmp

By your question, "is there a way to compare a string to a char?" do you mean "How do I find out if a particular char is contained in a string?" If so, the the C-library function:
char *strchr(const char *s, int c);
will do it for you.
-- pete

std::string can contain (and compare!) embedded null characters.
are*comp(...) will compare c-style strings, comparing up to the first null character (or the specified max nr of bytes/characters)
string::compare is actually implemented as a template basic_string so you can expect it to work for other types such as wstring
On the unclear phrase to "compare a string to a char" you can compare the char to *string.begin() or lookup the first occurrence (string::find_first_of and string::find_first_not_of)
Disclaimer: typed on my HTC, typos reserved :)

Case-insensitive UTF-8 string collation for SQLite (C/C++)

I am looking for a method to compare and sort UTF-8 strings in C++ in a case-insensitive manner to use it in a custom collation function in SQLite.
The method should ideally be locale-independent. However I won't be holding my breath, as far as I know, collation is very language-dependent, so anything that works on languages other than English will do, even if it means switching locales.
Options include using standard C or C++ library or a small (suitable for embedded system) and non-GPL (suitable for a proprietary system) third-party library.
What I have so far:
strcoll with C locales and std::collate/std::collate_byname are case-sensitive. (Are there case-insensitive versions of these?)
I tried to use a POSIX strcasecmp, but it seems to be not defined for locales other than "POSIX"
In the POSIX locale, strcasecmp() and strncasecmp() do upper to lower conversions, then a byte comparison. The results are unspecified in other locales.
And, indeed, the result of strcasecmp does not change between locales on Linux with GLIBC.
#include <clocale>
#include <cstdio>
#include <cassert>
#include <cstring>
const static char *s1 = "Äaa";
const static char *s2 = "äaa";
int main() {
printf("strcasecmp('%s', '%s') == %d\n", s1, s2, strcasecmp(s1, s2));
printf("strcoll('%s', '%s') == %d\n", s1, s2, strcoll(s1, s2));
assert(setlocale(LC_ALL, "en_AU.UTF-8"));
printf("strcasecmp('%s', '%s') == %d\n", s1, s2, strcasecmp(s1, s2));
printf("strcoll('%s', '%s') == %d\n", s1, s2, strcoll(s1, s2));
assert(setlocale(LC_ALL, "fi_FI.UTF-8"));
printf("strcasecmp('%s', '%s') == %d\n", s1, s2, strcasecmp(s1, s2));
printf("strcoll('%s', '%s') == %d\n", s1, s2, strcoll(s1, s2));
}
This is printed:
strcasecmp('Äaa', 'äaa') == -32
strcoll('Äaa', 'äaa') == -32
strcasecmp('Äaa', 'äaa') == -32
strcoll('Äaa', 'äaa') == 7
strcasecmp('Äaa', 'äaa') == -32
strcoll('Äaa', 'äaa') == 7
P. S.
And yes, I am aware about ICU, but we can't use it on the embedded platform due to its enormous size.

What you really want is logically impossible. There is no locale-independent, case-insensitive way of sorting strings. The simple counter-example is "i" <> "I" ? The naive answer is no, but in Turkish these strings are unequal. "i" is uppercased to "İ" (U+130 Latin Capital I with dot above)
UTF-8 strings add extra complexity to the question. They're perfectly valid multi-byte char* strings, if you have an appropriate locale. But neither the C nor the C++ standard defines such a locale; check with your vendor (too many embedded vendors, sorry, no genearl answer here). So you HAVE to pick a locale whose multi-byte encoding is UTF-8, for the mbscmp function to work. This of course influences the sort order, which is locale dependent. And if you have NO locale in which const char* is UTF-8, you can't use this trick at all. (As I understand it, Microsoft's CRT suffers from this. Their multi-byte code only handles characters up to 2 bytes; UTF-8 needs 3)
wchar_t is not the standard solution either. It supposedly is so wide that you don't have to deal with multi-byte encodings, but your collation will still depend on locale (LC_COLLATE) . However, using wchar_t means you now choose locales that do not use UTF-8 for const char*.
With this done, you can basically write your own ordering by converting strings to lowercase and comparing them. It's not perfect. Do you expect L"ß" == L"ss" ? They're not even the same length. Yet, for a German you have to consider them equal. Can you live with that?

I don't think there's a standard C/C++ library function you can use. You'll have to roll your own or use a 3rd-party library. The full Unicode specification for locale-specific collation can be found here: http://www.unicode.org/reports/tr10/ (warning: this is a long document).

On Windows you can call fall back on the OS function CompareStringW and use the NORM_IGNORECASE flag. You'll have to convert your UTF-8 strings to UTF-16 first. Otherwise, take a look at IBM's International Components for Unicode.

I believe you will need to roll your own or use an third party library. I recommend a third party library because there are a lot of rules that need to be followed to get true international support - best to let someone who is an expert deal with them.

I have no definitive answer in the form of example code, but I should point out that an UTF-8 bytestream contains, in fact, Unicode characters and you have to use the wchar_t versions of the C/C++ runtime library.
You have to convert those UTF-8 bytes into wchar_t strings first, though. This is not very hard, as the UTF-8 encoding standard is very well documented. I know this, because I've done it, but I can't share that code with you.

If you are using it to do searching and sorting for your locale only, I suggest your function to call a simple replace function that convert both multi-byte strings into one byte per char ones using a table like:
A -> a
Ã -> a
á -> a
ß -> ss
Ç -> c
and so on
Then simply call strcmp and return the results.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

C++: String Functions - c++

Related

When to use std::string vs char*? [duplicate]

Correct use of string storage in C and C++

c++ Why is an array of characters used in tutorials about strings?

strcmp or string::compare?

Case-insensitive UTF-8 string collation for SQLite (C/C++)

Categories

Resources