Substring of an element in a set - c++

Is there a way to find and replace subset of a char*/string in a set?
Example:
std::set<char*> myset;
myset.insert("catt");
myset.insert("world");
myset.insert("hello");
it = myset.subsetfind("tt");
myset.replace(it, "t");

There are at least three reasons why this won't work.
std::set provides only the means to search the set for a value that compares equally to the value being searched for, and not to a value that matches some arbitrary portion of the value.
The shown program is undefined behavior. A string literal, such as "hello" is a const char *, and not a char *. No self-respecting C++ compiler will allow you to insert a const char * into a container of char *s. And you can't modify const values, by definition, anyway.
Values in std::set cannot be modified. To effect the modification of an existing value in a set, it must be erase()d, then the new value insert()ed.
std::set is simply not the right container for the goals you're trying to accomplish.

No, you can't (or at least shouldn't) modify the key while it's in the set. Doing so could change the relative order of the elements, in which case the modification would render the set invalid.
You need to start with a set of things you can modify. Then you need to search for the item, remove it from the set, modify it, then re-insert the result back into the set.
std::set<std::string> myset {"catt", "world", "hello"};
auto pos = std::find_if(myset.begin(), myset.end(), [](auto const &s) { return s.find("tt");};
if (pos != myset.end()) {
auto temp = *pos;
myset.remove(pos);
auto p= temp.find("tt");
temp.replace(p, 2, "t");
myset.insert(temp);
}

You cannot modify elements within a set.
You can find strings that contain the substring using std::find_if. Once you find matching elements, you can remove each from the set and add a modified copy of the string, with the substring replaced with something else.
PS. Remember that you cannot modify string literals. You will need to allocate some memory for the strings.
PPS. Implicit conversion of string literal to char* has been deprecated since C++ was standardized, and since C++11 such conversion is ill-formed.
PPPS. The default comparator will not be correct when you use pointers as the element type. I recommend you to use std::string instead. (A strcmp based comparator approach would also be possible, although much more prone to memory bugs).

You could use std::find_if with a predicate function/functor/lambda that searches for the substring you want.

Related

Redefine data area for faster access

New to c++. I've searched but probably using wrong terms.
I want to find which slot in an array of many slots a few bytes long literal value is stored. Currently check each slot sequentially.
If I can use an internal function to scan the whole array as if it was one big string, I feel this would be much faster. (Old COBOL programmer).
Any way I can do this please?
I want to find which slot in an array of many slots a few bytes long literal value is stored. Currently check each slot sequentially.
OK, I'm going to take a punt and infer that:
you want to store string literals of any length in some kind of container.
the container must be mutable (i.e. you can add literals at will)
there will not be duplicates in the container.
you want to know whether a string literal as been stored in the container previously, and what "position" it was at so that you can remove it if necessary.
the string literals will be inserted in random lexicographical order and need not be sorted.
The container that springs to mind is the std::unordered_set
#include <unordered_set>
std::unordered_set<std::string> tokens;
int main()
{
tokens.emplace("foo");
tokens.emplace("bar");
auto it = tokens.find("baz");
assert(it == tokens.end()); // not found
it = tokens.find("bar"); // will be found
assert(it != tokens.end());
tokens.erase(it); // remove the token
}
The search time complexity of this container is O(1).
As you already found out by the comments, "scanning as one big string" is not the way to go in C++.
Typical in C++ when using C-style arrays and normally fast enough for linear search is
auto myStr = "result";
auto it = std::find_if(std::begin(arr), std::end(arr),
[myStr](const char* const str) { return std::strcmp(mystr,str) == 0; });
Remember that string compare function stop at the first wrong character.
More C++ style:
std::vector<std::string> vec = { "res1", "res2", "res3" };
std::string myStr = "res2";
auto it = std::find(vec.begin(), vec.end(), myStr);
If you are interested in very fast lookup for a large container, std::unordered_set is the way to go, but the "slot" has lost its meaning then, but maybe in that case std::unordered_map can be used.
std::unordered_set<std::string> s= { "res1", "res2", "res3" };
std::string myStr = "res2";
auto it = s.find(myStr);
All code is written as example, not compiled/tested

Applying c++ "lower_bound" on an array of char strings

I am trying the lower_bound function in C++.
Used it multiple times for 1 d datatypes.
Now, I am trying it on a sorted array dict[5000][20] to find strings of size <=20.
The string to be matched is in str.
bool recurseSerialNum(char *name,int s,int l,char (*keypad)[3],string str,char (*dict)[20],int
dictlen)
{
char (*idx)[20]= lower_bound(&dict[0],&dict[0]+dictlen,str.c_str());
int tmp=idx-dict;
if(tmp!=dictlen)
printf("%s\n",*idx);
}
As per http://www.cplusplus.com/reference/algorithm/lower_bound/?kw=lower_bound , this function is supposed to return the index of 'last'(beyond end) in case no match is found i.e. tmp should be equal dictlen.
In my case, it always returns the beginning index i.e. I get tmp equal to 0 both 1. When passed a string that is there in the dict and 2. When passed a string that is not there in the dict.
I think the issue is in handling and passing of the pointer. The default comparator should be available for this case as is available in case of vector. I also tried passing an explicit one, to no avail.
I tried this comparator -
bool compStr(const char *a, const char *b){
return strcmp(a,b)<0;
}
I know the ALTERNATE is to used vector ,etc, but I would like to know the issue in this one.
Searched on this over google as well as SO, but did not find anything similar.
There are two misunderstandings here, I believe.
std::lower_bound does not check if an element is part of a sorted range. Instead it finds the leftmost place where an element could be inserted without breaking the ordering.
You're not comparing the contents of the strings but their memory addresses.
It is true that dict in your case is a sorted range in that the sense that the memory addresses of the inner arrays are ascending. Where in relation to this str.c_str() lies is, of course, undefined. In practice, dict is a stack object, you will often find that the memory range for the heap (where str.c_str() invariably lies) is below that of the stack, in which case lower_bound quite correctly tells you that if you wanted to insert this address into the sorted range of addresses as which you interpret dict, you'd have to do so at the beginning.
For a solution, since there is an operator<(char const *, std::string const &), you could simply write
char (*idx)[20] = lower_bound(&dict[0], &dict[0] + dictlen, str);
...but are you perhaps really looking for std::find?

C string map key

Is there any issue with using a C string as a map key?
std::map<const char*, int> imap;
The order of the elements in the map doesn't matter, so it's okay if they are ordered using std::less<const char*>.
I'm using Visual Studio and according to MSDN (Microsoft specific):
In some cases, identical string literals can be "pooled" to save space in the executable file. In string-literal pooling, the compiler causes all references to a particular string literal to point to the same location in memory, instead of having each reference point to a separate instance of the string literal.
It says that they are only pooled in some cases, so it seems like accessing the map elements using a string literal would be a bad idea:
//Could these be referring to different map elements?
int i = imap["hello"];
int j = imap["hello"];
Is it possible to overload operator== for const char* so that the actual C string and not the pointer values would be used to determine if the map elements are equal:
bool operator==(const char* a, const char* b)
{
return strcmp(a, b) == 0 ? true : false;
}
Is it ever a good idea to use a C string as a map key?
Is it possible to overload operator== for const char* so that the actual C string and not the pointer values would be used to determine if the map elements are equal
No it's not, and yes, it's not a good idea for exactly the reason pointed out in the question and because you don't need char*, you can use a std::string instead. (you can provide a custom compare function - as pointed out by simonc, but I'd advise against it)
//Could these be referring to different map elements?
int i = imap["hello"];
int j = imap["hello"];
Yes, and they can even refer to elements that don't exist yet, but they'll be created by operator[] and be value initialized. The same issue exists with assignment:
imap["hello"] = 0;
imap["hello"] = 1;
The map could now have 1 or 2 elements.
You can provide a map with a custom comparitor which compares the C strings
std::map<const char*,YourType,CstrCmp>;
bool CstrCmp::operator()(const char* a, const char* b) const
{
return strcmp(a, b) < 0;
}
First, in order to introduce ordering over map keys you need to define a "less-than" comparison. A map says that two elements are "equivalent" if neither is less than the other. It's a bad idea to use char* for map keys because you will need to do memory management somewhere outside the map.
In most realistic scenarios when you query a map your keys will not be literals.
On the other hand, if you maintain a pool of string literals yourself and assign an ID to every literal you could use those IDs for map keys.
To summarize, I wouldn't rely on Microsoft saying "In some cases literals may be pooled". If you fill the map with literals and if you query the map with literals as keys you might as well use enum for keys.

Is pointer arithmetic possible with C++ string class?

After programming a little in C I decided to jump right into C++. At first I was pleased with the presence of the string class and being able to treat strings as whole units instead of arrays of characters. But I soon found that the C-style strings had the advantage of letting the program move through it character by character, using pointer arithmetic, and carry out a desired logical operation.
I have now found myself in a situation that requires this but the compiler tells me it is unable to convert from type string to the C-style strings. So I was wondering, is there a way to use pointer arithmetic to reference single characters or to pass arguments to a function as the address of the first character while still using the string class without having to create arrays of characters or do I just want to have my cake and eat it too?
string characters can be accessed by index, pointers, and through the use of iterators.
if you wanted to use iterators, you could make a function that checks whether a string has a space in it or not:
bool spacecheck(const string& s)
{
string::const_iterator iter = s.begin();
while(iter != s.end()){
if (isspace(*iter))
return true;
else
++iter;
}
}
At the beginning of the function, I initialized an iterator to the beginning of the string s by using the .begin() function, which in this case returns an iterator to the first character in a string. In the while function, the condition is that iter != s.end(). In this case end() returns in iterator referring to the element after the last character of the string. In the body, (*iter), which is the value pointed to by iter, is sent to the function isspace(), which checks if a character is a space. If it fails, iter is incremented, which makes iter point to the next element of the string.
I am learning c++ myself and by writing all of this stuff out it has helped my own understanding some. I hope I did not offend you if this all seemed very simple to you, I was just trying to be concise.
I am currently learning from Accelerated c++ and I could not recommend it highly enough!
You can use &your_string[0] to get a pointer to the initial character in the string. You can also use your_string.begin() to get an iterator into the string that you can treat almost like a pointer (dereference it, do arithmetic on it, etc.)
You might be better off telling us more about what you're trying to accomplish though. Chances are pretty good that there's a better way to do it than with a pointer.
Edit: For something like counting the number of vowels in a string, you almost certainly want to use an algorithm -- in this case, std::count_if is probably the most suitable:
struct is_vowel {
bool operator()(char ch) {
static const char vowels[] = "aeiouAEIOU";
return strchr(vowels, ch) != NULL;
}
};
int vowels = std::count_if(my_string.begin(), my_string.end(), is_vowel());
We're still using begin(), but not doing any pointer(-like) arithmetic on it.

Is it possible to construct an "infinite" string?

Is there any real sequence of characters that always compares greater than any other string?
My first thought was that a string constructed like so:
std::basic_string<T>(std::string::max_size(), std::numeric_limits<T>::max())
Would do the trick, provided that the fact that it would almost definitely fail to work isn't such a big issue. So I presume this kind of hackery could only be accomplished in Unicode, if it can be accomplished at all. I've never heard of anything that would indicate that it really is possible, but neither have I heard tell that it isn't, and I'm curious.
Any thoughts on how to achieve this without a possibly_infinite<basic_string<T>>?
I assume that you compare strings using their character value. I.e. one character acts like a digit, a longer string is greater than shorter string, etc.
s there any real sequence of characters that always compares greater than any other string?
No, because:
Let's assume there is a string s that is always greater than any other string.
If you make a copy of s, the copy will be equal to s. Equal means "not greater". Therefore there can be a string that is not greater than s.
If you make a copy of s and append one character at the end, it will be greater than original s. Therefore there can be a string that is greater than s.
Which means, it is not possible to make s.
I.e.
A string s that is always greater than any other string cannot exist. A copy of s (copy == other string) will be equal to s, and "equal" means "not greater".
A string s that is always greater or equal to any other string, can exist if a maximum string size has a reasonable limit. Without a size limit, it will be possible to take a copy of s, append one character at the end, and get a string that is greater than s.
In my opinion, the proper solution would be to introduce some kind of special string object that represents infinitely "large" string, and write a comparison operator for that object and standard string. Also, in this case you may need custom string class.
It is possible to make string that is always less or equal to any other string. Zero length string will be exactly that - always smaller than anything else, and equal to other zero-length strings.
Or you could write counter-intuitive comparison routine where shorter string is greater than longer string, but in this case next code maintainer will hate you, so it is not a good idea.
Not sure why would you ever need something like that, though.
You probably need a custom comparator, for which you define a magic "infinite string" value and which will always treat that value as greater than any other.
Unicode solves a lot of problems, but not that one. Unicode is just a different encoding for a character, 1, 2 or 4 bytes, they are still stored in a plain array. You can use infinite strings when you find a machine with infinite memory.
Yes. How you do it, I have no idea :)
You should try to state what you intend to achieve and what your requirements are. In particular, does it have to be a string? is there any limitation on the domain? do they need to be compared with <?
You can use a non-string type:
struct infinite_string {};
bool operator<( std::string const & , infinite_string const & ) {
return true;
}
bool operator<( infinite_string const &, std::string const & ) {
return false;
}
If you can use std::lexicographical_compare and you don't need to store it as a string, then you can write an infinite iterator:
template <typename CharT>
struct infinite_iterator
{
CharT operator*() { return std::numeric_limits<CharT>::max(); }
infinite_iterator& operator++() { return *this; }
bool operator<( const infinite_iterator& ) { return true; }
// all other stuff to make it proper
};
assert( std::lexicographical_compare( str.begin(), str.end(),
infinite_iterator, infinite_iterator ) );
If you can use any other comparisson functor and your domain has some invalid you can use that to your advantage:
namespace detail {
// assume that "\0\0\0\0\0" is not valid in your domain
std::string const infinite( 5, 0 );
}
bool compare( std::string const & lhs, std::string const & rhs ) {
if ( lhs == detail::infinite ) return false;
if ( rhs == detail::infinite ) return true;
return lhs < rhs;
}
if you need an artificial bound within a space of objects that isn't bounded, the standard trick is to add an extra element and define a new comparison operator that enforces your property.
Or implement lazy strings.
Well if you were to dynamically construct a string of equal length as the one that you are comparing to and fill it with the highest ASCII code available (7F for normal ASCII or FF for extended) you would be guaranteed that this string would compare equal to or greater than the one you compare it to.
What's your comparator?
Based on that, you can construct something that is the 'top' of your lattice.