Map C-style string to int using C++ STL? - c++

Mapping of string to int is working fine.
std::map<std::string, int> // working
But I want to map C-style string to int
For example:
char A[10] = "apple";
map<char*,int> mapp;
mapp[A] = 10;
But when I try to access the value mapped to "apple" I am getting a garbage value instead of 10. Why it doesn't behave the same as std::string?

map<char*,int> mapp;
They key type here is not "c string". At least not, if we define c string to be "an array of characters, with null terminator". The key type, which is char*, is a pointer to a character object. The distinction is important. You aren't storing strings in the map. You are storing pointers, and the strings live elsewhere.
Unless you use a custom comparison function object, std::map uses operator<(const key_type&,key_type&) by default. Two pointers are equal if, and only if they point to the same object.
Here is an example of three objects:
char A[] = "apple";
char B[] = "apple";
const char (&C)[6] = "apple"
First two are arrays, the third is an lvalue reference that is bound to a string literal object that is also an array. Being separate objects, their address is of course also different. So, if you were to write:
mapp[A] = 10;
std::cout << mapp[B];
std::cout << mapp[C];
The output would be 0 for each, because you hadn't initialized mapp[B] nor mapp[C], so they will be value initialized by operator[]. The key values are different, even though each array contains the same characters.
Solution: Don't use operator< to compare pointers to c strings. Use std::strcmp instead. With std::map, this means using a custom comparison object. However, you aren't done with caveats yet. You must still make sure that the strings must stay in memory as long as they are pointed to by the keys in the map. For example, this would be a mistake:
char A[] = "apple";
mapp[A] = 10;
return mapp; // oops, we returned mapp outside of the scope
// but it contains a pointer to the string that
// is no longer valid outside of this scope
Solution: Take care of scope, or just use std::string.

It can be done but you need a smarter version of string:
struct CString {
CString(const char *str) {
strcpy(string, str);
}
CString(const CString &copy); // Copy constructor will be needed.
char string[50]; // Or char * if you want to go that way, but you will need
// to be careful about memory so you can already see hardships ahead.
bool operator<(const CString &rhs) {
return strcmp(string, rhs.string) < 0;
}
}
map<CString,int> mapp;
mapp["someString"] = 5;
But as you can likely see, this is a huge hassle. There are probably some things that i have missed or overlooked as well.
You could also use a comparison function:
struct cmpStr{
bool operator()(const char *a, const char *b) const {
return strcmp(a, b) < 0;
}
};
map<char *,int> mapp;
char A[5] = "A";
mapp[A] = 5;
But there is a lot of external memory management, what happens if As memory goes but the map remains, UB. This is still a nightmare.
Just use a std::string.

Related

How does the '->' operator work and is it a good implementation to modify a large string?

I want to begin with saying that I have worked with pointers before and I assumed I understood how they worked. As in,
int x = 5;
int *y = &x;
*y = 3;
std::cout << x; // Would output 3
But then I wanted to make a method which modifies a rather large string and I believe therefore it would be better to pass a reference to the string in order to avoid passing the entire string back and fourth. So I pass my string to myFunc() and I do the same thing as I did with the numbers above. Which means I can modify *str as I do in the code below. But in order to use methods for String I need to use the -> operator.
#include <iostream>
#include <string>
int myFunc(std::string *str) { // Retrieve the address to which str will point to.
*str = "String from myFunc"; // This is how I would normally change the value of myString
str->replace(0, 1, "s"); // Replacing index 0 with a lowercase s.
return 0;
}
int main() {
std::string myString << "String from main";
myFunc(&myString); // Pass address of myString to myFunc()
}
My questions are:
Since str in myFunc is an address, why can an address use an
operator such as -> and how does it work? Is it as simple as the
object at the address str's method is used? str->replace(); // str->myString.replace()?
Is this a good implementation of modifying a large string or would it better to pass the string to the method and return the string when its modified??
ptr->x is identical to (*ptr).x unless -> is overridden for a type you're dereferencing. On normal pointers, that works as you'd expect it to.
As for implementation, profile it when you implement it. You can't know what compiler will do with this once you turn optimizations on. For example, if given function gets inlined, you won't even have any extra indirection in the first place and it won't matter which way you do it. As long as you don't allocate a new string, differences should generally be negligible.
str is a pointer to std::string object. The arrow operator, ->, is used to dereference the pointer and then access its member. Alternatively, you can also write (*str).replace(0,1,"s"); here, * dereferences the pointer and then . access the member function replace().
Pointers are often confusing; it is better to use references when possible.
void myFunc(std::string &str) { // Retrieve the address to which str will point to.
str = "String from myFunc"; // This is how I would normally change the value of myString
str.replace(0, 1, "s"); // Replacing index 0 with a lowercase s.
}
int main() {
std::string myString = "String from main";
myFunc(myString); // Pass address of myString to myFunc()
}
Is this a good implementation of modifying a large string or would it better to pass the string to the method and return the string when its modified??
If you don't want to change the original string then create a new string and return it.
If it's ok for your application to modify the original string then do it. Also you can return a reference to a modified string if you need to chain function calls.
std::string& myFunc(std::string &str) { // Retrieve the address to which str will point to.
str = "String from myFunc"; // This is how I would normally change the value of myString
return str.replace(0, 1, "s"); // Replacing index 0 with a lowercase s.
}

Convert to std::string and get const char * in one line

I have a number that I need to convert to a const char * (an API I'm using them requires const char * as input to many of its functions). The following works:
int num = 5;
std::string s = std::to_string(5);
const char * p = s.c_str();
as suggested by answers like those in how to convert from int to char*?, but it involves creating the seemingly unnecessary variable s, so I tried the following, but it doesn't work (p points to an empty string afterwards):
int num = 5;
const char * p = std::to_string(num).c_str();
Is there a clean way I can accomplish this? Why doesn't the second example work? The behavior is very similar to what happens if I made this obvious mistake:
const char * p;
{
std::string tempStr( "hi" );
p = tempStr.c_str( );
// p points to "hi" string.
}
// now p points to "" string.
Which makes me suspect that the issue std::to_string(num) immediately goes out of scope or something similar because it's not used to directly initialize anything.
std::string encapsulates managing dynamic memory (created with new[] and delete[]). Let's break it down.
const char * p = std::to_string(num).c_str();
Create a std::string (with a human-readable representation of num).
Get the new[]ly allocated const char* to the string.
Assign that value to p.
Destroy the std::string → delete[] the allocated const char*.
p points to... deallocated data
If you are using a pointer, the data that the pointer points to must exist throughout the lifetime of that pointer.
So, no, there is no way around this other than new[]ing a copy of the string, which you will have to explicitly delete[] later. And at that point, you've thrown the baby out with the bath and have no need to use std::string.
Create a string that lives at least as long as you want to refer to its internal data.
Just use std::string it does everything you want and everything that you would have to do manually if you don't use it.
When you need to pass a const char* to a const char* function simply use std::string::c_str() like this:
some_api_function(mystring.c_str()); // passes a const char*
What you need is a function which returns a char* which holds your value and can be used to manage its lifetime. The problematic version is broken because the char* points to memory which it does not manage.
For example:
std::unique_ptr<char[]> str(int32_t x)
{
std::unique_ptr<char[]> res(new char[12]);
snprintf(res.get(), 12, "%d", x);
return res;
}
Usestd::string everywhere and don't use const char* when not nessecary. They are basically the same thing. I use const char* only when I'm using a file-path.
Use std::string everywhere and your program should work.

std::set find behavior with char * type

I have below code line:
const char *values[] = { "I", "We", "You", "We"};
std::set<const char*> setValues;
for( int i = 0; i < 3; i++ ) {
const char *val = values[i];
std::set<const char*>::iterator it = setValues.find( val );
if( it == setValues.end() ) {
setValues.insert( val );
}
else {
cout << "Existing value" << endl;
}
}
With this I am trying to insert non-repeated values in a set, but somehow code is not hitting to print for existing element and duplicate value is getting inserted.
What is wrong here?
The std::set<T>::find uses a default operator < of the type T.
Your type is const char*. This is a pointer to an address in memory so the find method just compares address in memory of given string to addresses in memory of all strings from set. These addresses are different for each string (unless compiler optimizes it out).
You need to tell std::set how to compare strings correctly. I can see that AnatolyS already wrote how to do it in his answer.
You should define less predicate for const char* and pass into the set template to make the set object works correctly with pointers:
struct cstrless {
bool operator()(const char* a, const char* b) const {
return strcmp(a, b) < 0;
}
};
std::set<const char*, cstrless> setValues;
Unless you use a custom comparison function object, std::set uses operator<(const key_type&,key_type&) by default. Two pointers are equal if, and only if they point to the same object.
Here is an example of three objects:
char a[] = "apple";
char b[] = "apple";
const char (&c)[6] = "apple"
First two are arrays, the third is an lvalue reference that is bound to a string literal object that is also an array. Being separate objects, their address is of course also different. So, if you were to write:
setValues.insert(a)
bool is_in_map = setValues.find("apple") != setValues.end();
The value of is_in_map would be false, because the set contains only the address of the string in a, and not the address of the string in the literal - even though the content of the strings are same.
Solution: Don't use operator< to compare pointers to c strings. Use std::strcmp instead. With std::set, this means using a custom comparison object. However, you aren't done with caveats yet. You must still make sure that the strings stay in memory as long as they are pointed to by the keys in the set. For example, this would be a mistake:
char a[] = "apple";
setValues.insert(a);
return setValues; // oops, we returned setValues outside of the scope
// but it contains a pointer to the string that
// is no longer valid outside of this scope
Solution: Take care of scope, or just use std::string.
(This answer plagiarises my own answer about std::map here)

Character pointer access

I wanted to access character pointer ith element. Below is the sample code
string a_value = "abcd";
char *char_p=const_cast<char *>(a_value.c_str());
if(char_p[2] == 'b') //Is this safe to use across all platform?
{
//do soemthing
}
Thanks in advance
Array accessors [] are allowed for pointer types, and result in defined and predictable behaviors if the offset inside [] refers to valid memory.
const char* ptr = str.c_str();
if (ptr[2] == '2') {
...
}
Is correct on all platforms if the length of str is 3 characters or more.
In general, if you are not mutating the char* you are looking at, it best to avoid a const_cast and work with a const char*. Also note that std::string provides operator[] which means that you do not need to call .c_str() on str to be able to index into it and look at a char. This will similarly be correct on all platforms if the length of str is 3 characters or more. If you do not know the length of the string in advance, use std::string::at(size_t pos), which performs bound checking and throws an out_of_range exception if the check fails.
You can access the ith element in a std::string using its operator[]() like this:
std::string a_value = "abcd";
if (a_value[2] == 'b')
{
// do stuff
}
If you use a C++11 conformant std::string implementation you can also use:
std::string a_value = "abcd";
char const * p = &a_value[0];
// or char const * p = a_value.data();
// or char const * p = a_value.c_str();
// or char * p = &a_value[0];
21.4.1/5
The char-like objects in a basic_string object shall be stored contiguously.
21.4.7.1/1: c_str() / data()
Returns: A pointer p such that p + i == &operator[](i) for each i in [0,size()].
The question is essentially about querying characters in a string safely.
const char* a = a_value.c_str();
is safe unless some other operation modifies the string after it. If you can guarantee that no other code performs a modification prior to using a, then you have safely retrieved a pointer to a null-terminated string of characters.
char* a = const_cast<char *>(a_value.c_str());
is never safe. You have yielded a pointer to memory that is writeable. However, that memory was never designed to be written to. There is no guarantee that writing to that memory will actually modify the string (and actually no guarantee that it won't cause a core dump). It's undefined behaviour - absolutely unsafe.
reference here: http://en.cppreference.com/w/cpp/string/basic_string/c_str
addressing a[2] is safe provided you can prove that all possible code paths ensure that a represents a pointer to memory longer than 2 chars.
If you want safety, use either:
auto ch = a_string.at(2); // will throw an exception if a_string is too short.
or
if (a_string.length() > 2) {
auto ch = a_string[2];
}
else {
// do something else
}
Everyone explained very well for most how it's safe, but i'd like to extend a bit if that's ok.
Since you're in C++, and you're using a string, you can simply do the following to access a caracter (and you won't have any trouble, and you still won't have to deal with cstrings in cpp :
std::string a_value = "abcd";
std::cout << a_value.at(2);
Which is in my opinion a better option rather than going out of the way.
string::at will return a char & or a const char& depending on your string object. (In this case, a const char &)
In this case you can treat char* as an array of chars (C-string). Parenthesis is allowed.

std::map.count using c-strings does not work?

I wish to use c-strings instead of std::string for a performance situation. I have the following code:
std::map<const char*, int> myMap;
.
.
.
myMap.insert(std::pair<const char*, int>(str.c_str(), myint));
std::cout << myMap.count(str.c_str()) << std::endl;
Strangely enough the value I just entered returns 0 for count()?
By default, std::map uses std::less to compare the keys (which is the same as <, really, except it's guaranteed to work on unrelated pointers too). Which means it just does pointer comparison, definitely not what you want.
Just use the C++11 string type (std::string) instead of a legacy type used for nul-terminated strings (const char*) and you'll be fine.
Why do you think using raw C strings will increase performance?
Anyway, std::map has no special treatment for char pointers. It treats them like any other kind of pointer and not like strings, which means that it simply compares the keys with std::less. Perhaps confusingly, this is different from the behaviour of C++ streams, which do behave in a special way when passed a char const *.
You'd get the same behaviour with something like std::map<double *, int>, std::map<long *, int> or std::map<MyClass *, int>. It's interesting to note that the pointer comparison works because std::less is guaranteed to work with pointers, even though pointer comparison with < is formally unspecified behaviour.
So, you are obviously not interested in comparing the pointer values directly. If you want lexicographical string comparison, you can specify the comparison for your map via the third template parameter:
std::map<char const *, int, RawPointerComparion>
What I called RawPointerComparison in this example must be a functor taking two pointers and returning whether the first is less than the second. You can use the strcmp C function for that. This should do the trick:
struct RawPointerComparison
{
bool operator()(char const *lhs, char const *rhs) const
{
return strcmp(lhs, rhs) < 0;
}
};
It seems that you use variable str to enter different strings in the map. For example
str = "first";
myMap.insert( { str.c_str(), 1 } );
str = "second";
myMap.insert( { str.c_str(), 2 } );
str = "first";
std::cout << myMap.count(str.c_str()) << std::endl;
In this case the first str.c_str() is not equal to the last str.c_str() (where you compare pointers to allocated strings) because different memory regions were allocated in these cases.
If you would do the following
str = "first";
myMap.insert( { str.c_str(), 1 } );
std::cout << myMap.count(str.c_str()) << std::endl;
without intermediate statements then the result would be the output 1.
It seems that you are doing what you do not want.:)