Is pointer arithmetic possible with C++ string class? - c++

After programming a little in C I decided to jump right into C++. At first I was pleased with the presence of the string class and being able to treat strings as whole units instead of arrays of characters. But I soon found that the C-style strings had the advantage of letting the program move through it character by character, using pointer arithmetic, and carry out a desired logical operation.
I have now found myself in a situation that requires this but the compiler tells me it is unable to convert from type string to the C-style strings. So I was wondering, is there a way to use pointer arithmetic to reference single characters or to pass arguments to a function as the address of the first character while still using the string class without having to create arrays of characters or do I just want to have my cake and eat it too?

string characters can be accessed by index, pointers, and through the use of iterators.
if you wanted to use iterators, you could make a function that checks whether a string has a space in it or not:
bool spacecheck(const string& s)
{
string::const_iterator iter = s.begin();
while(iter != s.end()){
if (isspace(*iter))
return true;
else
++iter;
}
}
At the beginning of the function, I initialized an iterator to the beginning of the string s by using the .begin() function, which in this case returns an iterator to the first character in a string. In the while function, the condition is that iter != s.end(). In this case end() returns in iterator referring to the element after the last character of the string. In the body, (*iter), which is the value pointed to by iter, is sent to the function isspace(), which checks if a character is a space. If it fails, iter is incremented, which makes iter point to the next element of the string.
I am learning c++ myself and by writing all of this stuff out it has helped my own understanding some. I hope I did not offend you if this all seemed very simple to you, I was just trying to be concise.
I am currently learning from Accelerated c++ and I could not recommend it highly enough!

You can use &your_string[0] to get a pointer to the initial character in the string. You can also use your_string.begin() to get an iterator into the string that you can treat almost like a pointer (dereference it, do arithmetic on it, etc.)
You might be better off telling us more about what you're trying to accomplish though. Chances are pretty good that there's a better way to do it than with a pointer.
Edit: For something like counting the number of vowels in a string, you almost certainly want to use an algorithm -- in this case, std::count_if is probably the most suitable:
struct is_vowel {
bool operator()(char ch) {
static const char vowels[] = "aeiouAEIOU";
return strchr(vowels, ch) != NULL;
}
};
int vowels = std::count_if(my_string.begin(), my_string.end(), is_vowel());
We're still using begin(), but not doing any pointer(-like) arithmetic on it.

Related

Substring of an element in a set

Is there a way to find and replace subset of a char*/string in a set?
Example:
std::set<char*> myset;
myset.insert("catt");
myset.insert("world");
myset.insert("hello");
it = myset.subsetfind("tt");
myset.replace(it, "t");
There are at least three reasons why this won't work.
std::set provides only the means to search the set for a value that compares equally to the value being searched for, and not to a value that matches some arbitrary portion of the value.
The shown program is undefined behavior. A string literal, such as "hello" is a const char *, and not a char *. No self-respecting C++ compiler will allow you to insert a const char * into a container of char *s. And you can't modify const values, by definition, anyway.
Values in std::set cannot be modified. To effect the modification of an existing value in a set, it must be erase()d, then the new value insert()ed.
std::set is simply not the right container for the goals you're trying to accomplish.
No, you can't (or at least shouldn't) modify the key while it's in the set. Doing so could change the relative order of the elements, in which case the modification would render the set invalid.
You need to start with a set of things you can modify. Then you need to search for the item, remove it from the set, modify it, then re-insert the result back into the set.
std::set<std::string> myset {"catt", "world", "hello"};
auto pos = std::find_if(myset.begin(), myset.end(), [](auto const &s) { return s.find("tt");};
if (pos != myset.end()) {
auto temp = *pos;
myset.remove(pos);
auto p= temp.find("tt");
temp.replace(p, 2, "t");
myset.insert(temp);
}
You cannot modify elements within a set.
You can find strings that contain the substring using std::find_if. Once you find matching elements, you can remove each from the set and add a modified copy of the string, with the substring replaced with something else.
PS. Remember that you cannot modify string literals. You will need to allocate some memory for the strings.
PPS. Implicit conversion of string literal to char* has been deprecated since C++ was standardized, and since C++11 such conversion is ill-formed.
PPPS. The default comparator will not be correct when you use pointers as the element type. I recommend you to use std::string instead. (A strcmp based comparator approach would also be possible, although much more prone to memory bugs).
You could use std::find_if with a predicate function/functor/lambda that searches for the substring you want.

Prefer Iterators Over Pointers?

This question is a bump of a question that had a comment here but was deleted as part of the bump.
For those of you who can't see deleted posts, the comment was on my use of const char*s instead of string::const_iterators in this answer: "Iterators may have been a better path from the get go, since it appears that is exactly how your pointers seems be treated."
So my question is this, do iterators hold string::const_iterators hold any intrinsic value over a const char*s such that switching my answer over to string::const_iterators makes sense?
Introduction
There are many perks of using iterators instead of pointers, among them are:
different code-path in release vs debug, and;
better type-safety, and;
making it possible to write generic code (iterators can be made to work with any data-structure, such as a linked-list, whereas intrinsic pointers are very limited in this regard).
Debugging
Since, among other things, dereferencing an iterator that is passed the end of a range is undefined-behavior, an implementation is free to do whatever it feels necessary in such case - including raising diagnostics saying that you are doing something wrong.
The standard library implementation, libstdc++, provided by gcc will issues diagnostics when it detects something fault (if Debug Mode is enabled).
Example
#define _GLIBCXX_DEBUG 1 /* enable debug mode */
#include <vector>
#include <iostream>
int
main (int argc, char *argv[])
{
std::vector<int> v1 {1,2,3};
for (auto it = v1.begin (); ; ++it)
std::cout << *it;
}
/usr/include/c++/4.9.2/debug/safe_iterator.h:261:error: attempt to
dereference a past-the-end iterator.
Objects involved in the operation:
iterator "this" # 0x0x7fff828696e0 {
type = N11__gnu_debug14_Safe_iteratorIN9__gnu_cxx17__normal_iteratorIPiNSt9__cxx19986vectorIiSaIiEEEEENSt7__debug6vectorIiS6_EEEE (mutable iterator);
state = past-the-end;
references sequence with type `NSt7__debug6vectorIiSaIiEEE' # 0x0x7fff82869710
}
123
The above would not happen if we were working with pointers, no matter if we are in debug-mode or not.
If we don't enable debug mode for libstdc++, a more performance friendly version (without the added bookkeeping) implementation will be used - and no diagnostics will be issued.
(Potentially) better Type Safety
Since the actual type of iterators are implementation-defined, this could be used to increase type-safety - but you will have to check the documentation of your implementation to see whether this is the case.
Consider the below example:
#include <vector>
struct A { };
struct B : A { };
// .-- oops
// v
void it_func (std::vector<B>::iterator beg, std::vector<A>::iterator end);
void ptr_func (B * beg, A * end);
// ^-- oops
int
main (int argc, char *argv[])
{
std::vector<B> v1;
it_func (v1.begin (), v1.end ()); // (A)
ptr_func (v1.data (), v1.data () + v1.size ()); // (B)
}
Elaboration
(A) could, depending on the implementation, be a compile-time error since std::vector<A>::iterator and std::vector<B>::iterator potentially isn't of the same type.
(B) would, however, always compile since there's an implicit conversion from B* to A*.
Iterators are intended to provide an abstraction over pointers.
For example, incrementing an iterator always manipulates the iterator so that if there's a next item in the collection, it refers to that next item. If it already referred to the last item in the collection, after the increment it'll be a unique value that can't be dereferenced, but will compare equal to another iterator pointing one past the end of the same collection (usually obtained with collection.end()).
In the specific case of an iterator into a string (or a vector), a pointer provides all the capabilities required of an iterator, so a pointer can be used as an iterator with no loss of required functionality.
For example, you could use std::sort to sort the items in a string or a vector. Since pointers provide the required capabilities, you can also use it to sort items in a native (C-style) array.
At the same time, yes, defining (or using) an iterator that's separate from a pointer can provide extra capabilities that aren't strictly required. Just for example, some iterators provide at least some degree of checking, to assure that (for example) when you compare two iterators, they're both iterators into the same collection, and that you aren't attempting an out of bounds access. A raw pointer can't (or at least normally won't) provide this kind of capability.
Much of this comes back to the "don't pay for what you don't use" mentality. If you really only need and want the capabilities of native pointers, they can be used as iterators, and you'll normally get code that's essentially identical to what you'd get by directly manipulating pointers. At the same time, for cases where you do want extra capabilities, such as traversing a threaded RB-tree or a B+ tree instead of a simple array, iterators allow you to do that while maintaining a single, simple interface. Likewise, for cases where you don't mind paying extra (in terms of storage and/or run-time) for extra safety, you can get that too (and it's decoupled from things like the individual algorithm, so you can get it where you want it without being forced to use it in other places that may, for example, have too critical of timing requirements to support it.
In my opinion, many people kind of miss the point when it comes to iterators. Many people happily rewrite something like:
for (size_t i=0; i<s.size(); i++)
...into something like:
for (std::string::iterator i = s.begin; i != s.end(); i++)
...and act as if it's a major accomplishment. I don't think it is. For a case like this, there's probably little (if any) gain from replacing an integer type with an iterator. Likewise, taking the code you posted and changing char const * to std::string::iterator seems unlikely to accomplish much (if anything). In fact, such conversions often make the code more verbose and less understandable, while gaining nothing in return.
If you were going to change the code, you should (in my opinion) do so in an attempt at making it more versatile by making it truly generic (which std::string::iterator really isn't going to do).
For example, consider your split (copied from the post you linked):
vector<string> split(const char* start, const char* finish){
const char delimiters[] = ",(";
const char* it;
vector<string> result;
do{
for (it = find_first_of(start, finish, begin(delimiters), end(delimiters));
it != finish && *it == '(';
it = find_first_of(extractParenthesis(it, finish) + 1, finish, begin(delimiters), end(delimiters)));
auto&& temp = interpolate(start, it);
result.insert(result.end(), temp.begin(), temp.end());
start = ++it;
} while (it <= finish);
return result;
}
As it stands, this is restricted to being used on narrow strings. If somebody wants to work with wide strings, UTF-32 strings, etc., it's relatively difficult to get it to do that. Likewise, if somebody wanted to match [ or '{' instead of (, the code would need to be rewritten for that as well.
If there were a chance of wanting to support various string types, we might want to make the code more generic, something like this:
template <class InIt, class OutIt, class charT>
void split(InIt start, InIt finish, charT paren, charT comma, OutIt result) {
typedef std::iterator_traits<OutIt>::value_type o_t;
charT delimiters[] = { comma, paren };
InIt it;
do{
for (it = find_first_of(start, finish, begin(delimiters), end(delimiters));
it != finish && *it == paren;
it = find_first_of(extractParenthesis(it, finish) + 1, finish, begin(delimiters), end(delimiters)));
auto&& temp = interpolate(start, it);
*result++ = o_t{temp.begin(), temp.end()};
start = ++it;
} while (it != finish);
}
This hasn't been tested (or even compiled) so it's really just a sketch of a general direction you could take the code, not actual, finished code. Nonetheless, I think the general idea should at least be apparent--we don't just change it to "use iterators". We change it to be generic, and iterators (passed as template parameters, with types not directly specified here) are only a part of that. To get very far, we also eliminated hard-coding the paren and comma characters. Although not strictly necessary, I also change the parameters to fit more closely with the convention used by standard algorithms, so (for example) output is also written via an iterator rather than being returned as a collection.
Although it may not be immediately apparent, the latter does add quite a bit of flexibility. Just for example, if somebody just wanted to print out the strings after splitting them, he could pass an std::ostream_iterator, to have each result written directly to std::cout as it's produced, rather than getting a vector of strings, and then having to separately print them out.

Applying c++ "lower_bound" on an array of char strings

I am trying the lower_bound function in C++.
Used it multiple times for 1 d datatypes.
Now, I am trying it on a sorted array dict[5000][20] to find strings of size <=20.
The string to be matched is in str.
bool recurseSerialNum(char *name,int s,int l,char (*keypad)[3],string str,char (*dict)[20],int
dictlen)
{
char (*idx)[20]= lower_bound(&dict[0],&dict[0]+dictlen,str.c_str());
int tmp=idx-dict;
if(tmp!=dictlen)
printf("%s\n",*idx);
}
As per http://www.cplusplus.com/reference/algorithm/lower_bound/?kw=lower_bound , this function is supposed to return the index of 'last'(beyond end) in case no match is found i.e. tmp should be equal dictlen.
In my case, it always returns the beginning index i.e. I get tmp equal to 0 both 1. When passed a string that is there in the dict and 2. When passed a string that is not there in the dict.
I think the issue is in handling and passing of the pointer. The default comparator should be available for this case as is available in case of vector. I also tried passing an explicit one, to no avail.
I tried this comparator -
bool compStr(const char *a, const char *b){
return strcmp(a,b)<0;
}
I know the ALTERNATE is to used vector ,etc, but I would like to know the issue in this one.
Searched on this over google as well as SO, but did not find anything similar.
There are two misunderstandings here, I believe.
std::lower_bound does not check if an element is part of a sorted range. Instead it finds the leftmost place where an element could be inserted without breaking the ordering.
You're not comparing the contents of the strings but their memory addresses.
It is true that dict in your case is a sorted range in that the sense that the memory addresses of the inner arrays are ascending. Where in relation to this str.c_str() lies is, of course, undefined. In practice, dict is a stack object, you will often find that the memory range for the heap (where str.c_str() invariably lies) is below that of the stack, in which case lower_bound quite correctly tells you that if you wanted to insert this address into the sorted range of addresses as which you interpret dict, you'd have to do so at the beginning.
For a solution, since there is an operator<(char const *, std::string const &), you could simply write
char (*idx)[20] = lower_bound(&dict[0], &dict[0] + dictlen, str);
...but are you perhaps really looking for std::find?

C++ Deleting a specfic value in a vector without knowing location

if (find(visitable.begin(), visitable.end(), ourstack.returnTop())) { ... }
I want to determine whether the top character in stack ourstack can be found in the vector visitable. If yes, I want this character to be deleted from visitable.
How would I code that? I know vectors use erase, but that requires the specific location of that character (which I don't know).
This is for my maze-path-finding assignment.
Also, my returnTop is giving me an error: class "std.stack<char..." has no member returnTop. I declared #include in the top of my program. What's happening here?
Thanks in advance!
If you are using find, then you already know the location of the character. find returns an iterator to the position where the character is found, or to the value used as end if it cannot find it.
vector<?>::const_iterator iter =
find(visitable.begin(), visitable.end(), ourstack.top());
if( iter != visitable.end() )
{
visitable.erase( iter );
}
As for stack, the function you are looking for is top(). The standard C++ library does not use camelCased identifiers, that looks more like a Java or C# thing.
Just like this:
// Note assume C++0x notation for simplicity since I don't know the type of the template
auto character = ourstack.top();
auto iter = std::find(visitable.begin(), visitable.end(), character);
if (iter != visitable.end())
visitable.erase(iter);
returnTop does not exist in the stack class, but top does.
Alternatively if you want some generic (and rather flamboyant way) of doing it:
// Assume type of vector and stack are the same
template <class T>
void TryRemoveCharacter(std::vector<T>& visitable, const std::stack<T>& ourStack)
{
// Note, could have passed a ref to the character directly, which IMHO makes more sense
const T& ourChar = ourStack.top();
visitable.erase(std::remove_if(visitable.begin(), visitable.end(), [&ourChar](const T& character)
{
// Note, this will not work http://www.cplusplus.com/reference/algorithm/find/
// says that std::find uses the operator== for comparisons but I doubt that
// as compilers typically do not generate equal comparison operator.
// See http://stackoverflow.com/questions/217911/why-dont-c-compilers-define-operator-and-operator
// It's best to either overload the operator== to do a true comparison or
// add a comparison method and invoke it here.
return ourChar == character;
}));
}
Note: this alternative way may not be a good idea for an assignment as your teacher will probably find suspicious that you introduce advanced C++ features (C++0x) all of a sudden.
However for intellectual curiosity it could work ;)
Here's how you may use it:
TryRemoveCharacter(visitable, ourstack);

C++ Tokenizing using iterators in an eof() cycle

I'm trying to adapt this answer
How do I tokenize a string in C++?
to my current string problem which involves reading from a file till eof.
from this source file:
Fix grammatical or spelling errors
Clarify meaning without changing it
Correct minor mistakes
I want to create a vector with all the tokenized words. Example: vector<string> allTheText[0] should be "Fix"
I don't understad the purpose of istream_iterator<std::string> end; but I included cause it was on the original poster's answer.
So far, I've got this non-working code:
vector<string> allTheText;
stringstream strstr;
istream_iterator<std::string> end;
istream_iterator<std::string> it(strstr);
while (!streamOfText.eof()){
getline (streamOfText, readTextLine);
cout<<readTextLine<<endl;
stringstream strstr(readTextLine);
// how should I initialize the iterators it and end here?
}
Edit:
I changed the code to
vector<string> allTheText;
stringstream strstr;
istream_iterator<std::string> end;
istream_iterator<std::string> it(strstr);
while (getline(streamOfText, readTextLine)) {
cout << readTextLine << endl;
vector<string> vec((istream_iterator<string>(streamOfText)), istream_iterator<string>()); // generates RuntimeError
}
And got a RuntimeError, why?
Using a while (!….eof()) loop in C++ is broken because the loop will never be exited when the stream goes into an error state!
Rather, you should test the stream's state directly. Adapted to your code, this could look like this:
while (getline(streamOfText, readTextLine)) {
cout << readTextLine << endl;
}
However, you already have a stream. Why put it into a string stream as well? Or do you need to do this line by line for any reason?
You can directly initialize your vector with the input iterators. No need to build a string stream, and no need to use the copy algorithm either because there's an appropriate constructor overload.
vector<string> vec((istream_iterator<string>(cin)), istream_iterator<string>());
Notice the extra parentheses around the first argument which are necessary to disambiguate this from a function declaration.
EDIT A small explanation what this code does:
C++ offers a unified way of specifying ranges. A range is just a collection of typed values, without going into details about how these values are stored. In C++, these ranges are denoted as half-open intervals [a, b[. That means that a range is delimited by two iterators (which are kind of like pointers but more general; pointers are a special kind of iterator). The first iterator, a, points to the first element of the range. The second, b, points behind the last element. Why behind? Because this allows to iterate over the elements very easily:
for (Iterator i = a; i != b; ++i)
cout << *i;
Like pointers, iterators are dereferenced by applying * to them. This returns their value.
Container classes in C++ (e.g. vector, list) have a special constructor which allows easy copying of values from another range into the new container. Consequently, this constructor expects two iterators. For example, the following copies the C-style array into the vector:
int values[3] = { 1, 2, 3 };
vector<int> v(values, values + 3);
Here, values is synonymous with &values[0] which means that it points to the array's first element. values + 3, thanks to pointer arithmetic, is nearly equivalent to &values[3] (but this is invalid C++!) and points to the virtual element behind the array.
Now, my code above does the exact same as in this last example. The only difference is the type of iterator I use. Instead of using a plain pointer, I use a special iterator class that C++ provides. This iterator class wraps an input stream in such a way that ++ advances the input stream and * reads the next element from the stream. The kind of element is specified by the type argument (hence string in this case).
To make this work as a range, we need to specify a beginning and an end. Alas, we don't know the end of the input (this is logical, since the end of the stream may actually move over time as the user enters more input into a console!). Therefore, to create a virtual end iterator, we pass no argument to the constructor of istream_iterator. Conversely, to create a begin iterator, we pass an input stream. This then creates an iterator that points to the current position in the stream (here, cin).
My above code is functionally equivalent to the following:
istream_iterator<string> front(cin);
istream_iterator<string> back;
vector<string> vec;
for (istream_iterator<string> i = front; i != back; ++i)
vec.push_back(*i);
and this, in turn, is equivalent to using the following loop:
string word;
while (cin >> word)
vec.push_back(word);