C++: set of C-strings - c++

I want to create one so that I could check whether a certain word is in the set using set::find
However, C-strings are pointers, so the set would compare them by the pointer values by default. To function correctly, it would have to dereference them and compare the strings.
I could just pass the constructor a pointer to the strcmp() function as a comparator, but this is not exactly how I want it to work. The word I might want to check could be part of a longer string, and I don't want to create a new string due to performance concerns. If there weren't for the set, I would use strncmp(a1, a2, 3) to check the first 3 letters. In fact, 3 is probably the longest it could go, so I'm fine with having the third argument constant.
Is there a way to construct a set that would compare its elements by calling strncmp()? Code samples would be greatly appreciated.
Here's pseudocode for what I want to do:
bool WordInSet (string, set, length)
{
for (each word in set)
{
if strncmp(string, word, length) == 0
return true;
}
return false;
}
But I'd prefer to implement it using the standard library functions.

You could create a comparator function object.
struct set_object {
bool operator()(const char* first, const char* second) {
return strncmp(first, second, 3);
}
};
std::set<const char*, set_object> c_string_set;
However it would be far easier and more reliable to make a set of std::strings.

Make a wrapper function:
bool myCompare(const char * lhs, const char * rhs)
{
return strncmp(lhs, rhs, 3) < 0;
}

Assuming a constant value as a word length looks like asking for trouble to me. I recommend against this solution.
Look: The strcmp solution doesn't work for you because it treats the const char* arguments as nul-terminated strings. You want a function which does exactly the same, but treats the arguments as words - which translates to "anything-not-a-letter"-terminated string.
One could define strcmp in a generic way as:
template<typename EndPredicate>
int generic_strcmp(const char* s1, const char* s2) {
char c1;
char c2;
do {
c1 = *s1++;
c2 = *s2++;
if (EndPredicate(c1)) {
return c1 - c2;
}
} while (c1 == c2);
return c1 - c2;
}
If EndPredicate is a function which returns true iff its argument is equal to \0, then we obtain a regular strcmp which compares 0-terminated strings.
But in order to have a function which compares words, the only required change is the predicate. It's sufficient to use the inverted isalpha function from <cctype> header file to indicate that the string ends when a non-alphabetic character is encountered.
So in your case, your comparator for the set would look like this:
#include <cctype>
int wordcmp(const char* s1, const char* s2) {
char c1;
char c2;
do {
c1 = *s1++;
c2 = *s2++;
if (!isalpha(c1)) {
return c1 - c2;
}
} while (c1 == c2);
return c1 - c2;
}

Related

std::set find behavior with char * type

I have below code line:
const char *values[] = { "I", "We", "You", "We"};
std::set<const char*> setValues;
for( int i = 0; i < 3; i++ ) {
const char *val = values[i];
std::set<const char*>::iterator it = setValues.find( val );
if( it == setValues.end() ) {
setValues.insert( val );
}
else {
cout << "Existing value" << endl;
}
}
With this I am trying to insert non-repeated values in a set, but somehow code is not hitting to print for existing element and duplicate value is getting inserted.
What is wrong here?
The std::set<T>::find uses a default operator < of the type T.
Your type is const char*. This is a pointer to an address in memory so the find method just compares address in memory of given string to addresses in memory of all strings from set. These addresses are different for each string (unless compiler optimizes it out).
You need to tell std::set how to compare strings correctly. I can see that AnatolyS already wrote how to do it in his answer.
You should define less predicate for const char* and pass into the set template to make the set object works correctly with pointers:
struct cstrless {
bool operator()(const char* a, const char* b) const {
return strcmp(a, b) < 0;
}
};
std::set<const char*, cstrless> setValues;
Unless you use a custom comparison function object, std::set uses operator<(const key_type&,key_type&) by default. Two pointers are equal if, and only if they point to the same object.
Here is an example of three objects:
char a[] = "apple";
char b[] = "apple";
const char (&c)[6] = "apple"
First two are arrays, the third is an lvalue reference that is bound to a string literal object that is also an array. Being separate objects, their address is of course also different. So, if you were to write:
setValues.insert(a)
bool is_in_map = setValues.find("apple") != setValues.end();
The value of is_in_map would be false, because the set contains only the address of the string in a, and not the address of the string in the literal - even though the content of the strings are same.
Solution: Don't use operator< to compare pointers to c strings. Use std::strcmp instead. With std::set, this means using a custom comparison object. However, you aren't done with caveats yet. You must still make sure that the strings stay in memory as long as they are pointed to by the keys in the set. For example, this would be a mistake:
char a[] = "apple";
setValues.insert(a);
return setValues; // oops, we returned setValues outside of the scope
// but it contains a pointer to the string that
// is no longer valid outside of this scope
Solution: Take care of scope, or just use std::string.
(This answer plagiarises my own answer about std::map here)

Map C-style string to int using C++ STL?

Mapping of string to int is working fine.
std::map<std::string, int> // working
But I want to map C-style string to int
For example:
char A[10] = "apple";
map<char*,int> mapp;
mapp[A] = 10;
But when I try to access the value mapped to "apple" I am getting a garbage value instead of 10. Why it doesn't behave the same as std::string?
map<char*,int> mapp;
They key type here is not "c string". At least not, if we define c string to be "an array of characters, with null terminator". The key type, which is char*, is a pointer to a character object. The distinction is important. You aren't storing strings in the map. You are storing pointers, and the strings live elsewhere.
Unless you use a custom comparison function object, std::map uses operator<(const key_type&,key_type&) by default. Two pointers are equal if, and only if they point to the same object.
Here is an example of three objects:
char A[] = "apple";
char B[] = "apple";
const char (&C)[6] = "apple"
First two are arrays, the third is an lvalue reference that is bound to a string literal object that is also an array. Being separate objects, their address is of course also different. So, if you were to write:
mapp[A] = 10;
std::cout << mapp[B];
std::cout << mapp[C];
The output would be 0 for each, because you hadn't initialized mapp[B] nor mapp[C], so they will be value initialized by operator[]. The key values are different, even though each array contains the same characters.
Solution: Don't use operator< to compare pointers to c strings. Use std::strcmp instead. With std::map, this means using a custom comparison object. However, you aren't done with caveats yet. You must still make sure that the strings must stay in memory as long as they are pointed to by the keys in the map. For example, this would be a mistake:
char A[] = "apple";
mapp[A] = 10;
return mapp; // oops, we returned mapp outside of the scope
// but it contains a pointer to the string that
// is no longer valid outside of this scope
Solution: Take care of scope, or just use std::string.
It can be done but you need a smarter version of string:
struct CString {
CString(const char *str) {
strcpy(string, str);
}
CString(const CString &copy); // Copy constructor will be needed.
char string[50]; // Or char * if you want to go that way, but you will need
// to be careful about memory so you can already see hardships ahead.
bool operator<(const CString &rhs) {
return strcmp(string, rhs.string) < 0;
}
}
map<CString,int> mapp;
mapp["someString"] = 5;
But as you can likely see, this is a huge hassle. There are probably some things that i have missed or overlooked as well.
You could also use a comparison function:
struct cmpStr{
bool operator()(const char *a, const char *b) const {
return strcmp(a, b) < 0;
}
};
map<char *,int> mapp;
char A[5] = "A";
mapp[A] = 5;
But there is a lot of external memory management, what happens if As memory goes but the map remains, UB. This is still a nightmare.
Just use a std::string.

C++ Substitute function

I'm having some problems with the following code:
/* replace c1 with c2 in s, returning s */
char *substitute(char *s, char c1, char c2)
{
char *r = s;
if (s == 0) return 0;
for (; *s; ++s)
if (*s == c1) *s = c2;
return r;
}
void substitute(char c1, char c2);
int main()
{
string s = "apples";
char a;
char b;
cout << "Before swap of Char : " << s << endl;
*substitute(&a, &b);
cout << "After swap of Char : " << s << endl;
system("pause");
}
The code above should replace any occurrences of char1 in the string with char2. I think I have the function down right but calling it is a bit of an issue as the Substitute part in main is showing errors.
My question is how do I continue on from here and call the function in main?
EDIT:
I've read through the answers that have been given but I'm still confused on what to do as I'm a beginner..
EDIT Again:
I've worked it out! :)
If you are in c++(11), you might want to use the standard library and the language facilities:
std::string input = "apples";
const char from='a';
const char to='b';
std::for_each(input.begin(),input.end(),
[&](char& current) {
if(current==from)
current=to;
});
or even more concise
for (char& current : input) {
if(current==from)
current=to;
}
Here are the issues I see with the code:
substitute() should get 3 arguments, char*,char,char, or if you have later a function substitute(char,char). However, you are sending char*,char* to it, so the compiler doesn't know what function to invoke (unless you have another function with this signature which is not showed here). This is the reason for the compile time error
You are trying to modify a string literal, it could create a run time error, if you will fix the compile time error. Note that the string "apples" should not be modified, as it is string literal. You will need to copy it and then change it. The exact behavior of modifying it is undefined, as pointed by #6502 (reference on comments)
Your code is poorly idented (though the edit fixed this issue).
a,b are not initialized and contain 'junk' values.
You're passing two arguments while your function requires 3, plus that the function itself will not work as intended.
Also, on a side note, use cin.get() instead of system("pause");
Just use the method replace of the string class.
As is, you can call the function like this:
char a = 's', b='t';
char s[] = "some string";
s = substitute(s, a, b);
The second and third argument are not pointers, so you can just pass a and b, you don't have to pass &a or &b.
Note that since you're simply modifying the string in the first argument, there's really no reason to assign it to anything. substitute(a, b); would do exactly the same as s = substitute(s, a, b);.
And if you don't have to use your return value, there's really no reason to return it in the first place. You can change your function to this:
/* replace c1 with c2 in s, returning s */
void substitute(char *s, char c1, char c2)
{
if (s == 0) return;
for (; *s; ++s)
if (*s == c1) *s = c2;
}
Initialize a and b then call the substitute method as substitute(s,&a, &b);
Remove the method prototype void substitute(char c1, char c2);as you don't need it.

How to force std::map::find() to search by value

From what I have deduced, the std::map::find() method searches the map by comparising pointer address instead of values. Example:
std::string aa = "asd";
const char* a = aa.c_str();
const char* b = "asd";
// m_options is a std::map<const char*, int )
m_options.insert( std::make_pair( a, 0 ) );
if( m_options.find( b ) != m_options.end() ) {
// won't reach this place
}
I am kinda surprised (because I am using primitive types instead of some class) and I think that I have done something wrong, if not then how to force it to use value instead of address?
You are using char * as a key type for the map. For the pointer types, comparison is performed by their address (as the map cannot know that these pointers are NULL-terminated 8-bit strings).
To achieve your goal, you could create the map with custom compare function, e.g.:
bool MyStringCompare(const char *s1, const char *s2) {
return strcmp(s1, s2) < 0;
}
...
std::map<const char*, int, MyStringCompare> m_options;
Or consider using std::string as the key type.
Actually, map uses a strict ordering comparison operator to look for values, not the equality operator. Anyway, you can achieve this by passing a custom functor that compares the values of the strings, or do the right thing and use std::string instead.

return char1 + char2? Isn't it possible?

I'm trying to return a string from a function. Which basically adds some chars together and return the string representation.
string toString() {
char c1, c2, c3;
// some code here
return c1 + c2; // Error: invalid conversion from `char' to `const char*'
}
it is possible to return boolean values like return c1 == 'x'. Isn't it possible to return string values? I know that it is possible to it like this:
string result;
result.append(c1, c2);
return result;
I'm new to C++ so I thought that there must be more elegant solution around.
No, you can't do that because adding two char's together doesn't give you a string. It gives you another char; in this case 'a'+'b' actually gives you '├' (on Windows with the standard CP_ACP code page). Char is an ordinal type, like integers and the compiler only knows how to add them in the most basic of ways. Strings are a completely different beast.
You can do it, but you have to be explicit:
return string(1, c1) + string(1, c2)
This will construct two temporary strings, each initialized to one repetition of the character passed as the second parameter. Since operator+ is defined for strings to be a concatenation function, you can now do what you want.
char types in C++ (as well as in C) are integral types. They behave as integral types. Just like when you write 5 + 3 in your code, you expect to get integral 8 as the result (and not string "53"), when you write c1 + c2 in your code above you should expect to get an integral result - the arithmetic sum of c1 and c2.
If you actually want to concatenate two characters to form a string, you have to do it differently. There are many ways to do it. For example, you can form a C-style string
char str[] = { c1, c2, `\0` };
which will be implicitly converted to std::string by
return str;
Or you can build a std::string right away (which can also be done in several different ways).
You can convert each char to a string then use +:
return string(1, c1)+string(1, c2);
Alternately, string has the + operator overload to work with characters, so you can write:
return string(1, c1) + c2;
No matter what method you choose, you will need to convert the integral type char to either a C-style string (char*) or a C++ style string (std::string).
return string(1, c1) + c2;
This constructs a 1-character string, containing c1, then adds (overloaded to concatenate) c2 (creating another string), then returns it.
No, they just adds up the character codes. You need to convert them to strings.
You need to create a string from the chars.
And then return the string (actually a copy of the string)