String Reversal Memory Consumption Differences - c++

Suppose I implement the following two string reversal algorithms:
void reverse(string &s) {
if(s.size() == 1) return;
string restOfS = s.substr(1);
reverse(restOfS);
s = restOfS + s.at(0);
}
string reverseString(string s) {
if(s.size() == 1) return s;
return reverseString(s.substr(1)) + s.at(0);
}
int main() {
string name = "Dominic Farolino";
reverse(name);
cout << name << endl;
name = reverseString(name);
cout << name << endl;
return 0;
}
One of these obviously modifies the string given to it, and one of returns a new string. Since the first one modifies the given string and uses a reference parameter as its mode of communication to the next recursive stack frame, I at first assumed this would be more efficient since using a reference parameter may help us not duplicate things in memory down the line, however I don't believe that's the case. Obviously we have to use a reference parameter with this void function, but it seems that we are undoing any memory efficiency using a reference parameter may give us since we since we are just declaring a new variable on the stack every time.
In short, it seems that the first one is making a copy of the reference every call, and the second one is making a copy of the value each call and just returning its result, making them of equal memory consumption.
To make the first one more memory efficient I feel like you'd have to do something like this:
void reverse(string &s) {
if(s.size() == 1) return;
reverse(s.substr(1));
s = s.substr(1) + s.at(0);
}
however the compiler won't let me:
error: invalid initialization of non-const reference of type 'std::string& {aka std::basic_string<char>&}' from an rvalue of type 'std::basic_string<char>'
6:6: note: in passing argument 1 of 'void reverse(std::string&)'
Is this analysis correct?

substr() returns a new string every time, complete with all the memory use that goes with that. So if you're going to do N-1 calls to substr(), that's O(N^2) extra memory you're using for no reason.
With std::string though, you can modify it in place, just by iterating over it with a simple for loop. Or just using std::reverse:
void reverseString(string &s) {
std::reverse(s.begin(), s.end());
}
Either way (for loop or algorithm) takes O(1) extra memory instead - it effectively is just a series of swaps, so you just need one extra char as the temporary. Much better.

Related

How to create a function that removes all of a selected character in a C-string?

I want to make a function that removes all the characters of ch in a c-string.
But I keep getting an access violation error.
Unhandled exception at 0x000f17ba in testassignments.exe: 0xC0000005: Access violation writing location 0x000f787e.
void removeAll(char* &s, const char ch)
{
int len=strlen(s);
int i,j;
for(i = 0; i < len; i++)
{
if(s[i] == ch)
{
for(j = i; j < len; j++)
{
s[j] = s[j + 1];
}
len--;
i--;
}
}
return;
}
I expected the c-string to not contain the character "ch", but instead, I get an access violation error.
In the debug I got the error on the line:
s[j] = s[j + 1];
I tried to modify the function but I keep getting this error.
Edit--
Sample inputs:
s="abmas$sachus#settes";
ch='e' Output->abmas$sachus#settes, becomes abmas$sachus#stts
ch='t' Output-> abmas$sachus#stts, becomes abmas$sachus#ss.
Instead of producing those outputs, I get the access violation error.
Edit 2:
If its any help, I am using Microsoft Visual C++ 2010 Express.
Apart from the inefficiency of your function shifting the entire remainder of the string whenever encountering a single character to remove, there's actually not much wrong with it.
In the comments, people have assumed that you are reading off the end of the string with s[j+1], but that is untrue. They are forgetting that s[len] is completely valid because that is the string's null-terminator character.
So I'm using my crystal ball now, and I believe that the error is because you're actually running this on a string literal.
// This is NOT okay!
char* str = "abmas$sachus#settes";
removeAll(str, 'e');
This code above is (sort of) not legal. The string literal "abmas$sachus#settes" should not be stored as a non-const char*. But for backward compatibility with C where this is allowed (provided you don't attempt to modify the string) this is generally issued as a compiler warning instead of an error.
However, you are really not allowed to modify the string. And your program is crashing the moment you try.
If you were to use the correct approach with a char array (which you can modify), then you have a different problem:
// This will result in a compiler error
char str[] = "abmas$sachus#settes";
removeAll(str, 'e');
Results in
error: invalid initialization of non-const reference of type ‘char*&’ from an rvalue of type ‘char*’
So why is that? Well, your function takes a char*& type that forces the caller to use pointers. It's making a contract that states "I can modify your pointer if I want to", even if it never does.
There are two ways you can fix that error:
The TERRIBLE PLEASE DON'T DO THIS way:
// This compiles and works but it's not cool!
char str[] = "abmas$sachus#settes";
char *pstr = str;
removeAll(pstr, 'e');
The reason I say this is bad is because it sets a dangerous precedent. If the function actually did modify the pointer in a future "optimization", then you might break some code without realizing it.
Imagine that you want to output the string with characters removed later, but the first character was removed and you function decided to modify the pointer to start at the second character instead. Now if you output str, you'll get a different result from using pstr.
And this example is only assuming that you're storing the string in an array. Imagine if you actually allocated a pointer like this:
char *str = new char[strlen("abmas$sachus#settes") + 1];
strcpy(str, "abmas$sachus#settes");
removeAll(str, 'e');
Then if removeAll changes the pointer, you're going to have a BAD time when you later clean up this memory with:
delete[] str; //<-- BOOM!!!
The I ACKNOWLEDGE MY FUNCTION DEFINITION IS BROKEN way:
Real simply, your function definition should take a pointer, not a pointer reference:
void removeAll(char* s, const char ch)
This means you can call it on any modifiable block of memory, including an array. And you can be comforted by the fact that the caller's pointer will never be modified.
Now, the following will work:
// This is now 100% legit!
char str[] = "abmas$sachus#settes";
removeAll(str, 'e');
Now that my free crystal-ball reading is complete, and your problem has gone away, let's address the elephant in the room:
Your code is needlessly inefficient!
You do not need to do the first pass over the string (with strlen) to calculate its length
The inner loop effectively gives your algorithm a worst-case time complexity of O(N^2).
The little tricks modifying len and, worse than that, the loop variable i make your code more complex to read.
What if you could avoid all of these undesirable things!? Well, you can!
Think about what you're doing when removing characters. Essentially, the moment you have removed one character, then you need to start shuffling future characters to the left. But you do not need to shuffle one at a time. If, after some more characters you encounter a second character to remove, then you simply shunt future characters further to the left.
What I'm trying to say is that each character only needs to move once at most.
There is already an answer demonstrating this using pointers, but it comes with no explanation and you are also a beginner, so let's use indices because you understand those.
The first thing to do is get rid of strlen. Remember, your string is null-terminated. All strlen does is search through characters until it finds the null byte (otherwise known as 0 or '\0')...
[Note that real implementations of strlen are super smart (i.e. much more efficient than searching single characters at a time)... but of course, no call to strlen is faster]
All you need is your loop to look for the NULL terminator, like this:
for(i = 0; s[i] != '\0'; i++)
Okay, and now to ditch the inner loop, you just need to know where to stick each new character. How about just keeping a variable new_size in which you are going to count up how long the final string is.
void removeAll(char* s, char ch)
{
int new_size = 0;
for(int i = 0; s[i] != '\0'; i++)
{
if(s[i] != ch)
{
s[new_size] = s[i];
new_size++;
}
}
// You must also null-terminate the string
s[new_size] = '\0';
}
If you look at this for a while, you may notice that it might do pointless "copies". That is, if i == new_size there is no point in copying characters. So, you can add that test if you want. I will say that it's likely to make little performance difference, and potentially reduce performance because of additional branching.
But I'll leave that as an exercise. And if you want to dream about really fast code and just how crazy it gets, then go and look at the source code for strlen in glibc. Prepare to have your mind blown.
You can make the logic simpler and more efficient by writing the function like this:
void removeAll(char * s, const char charToRemove)
{
const char * readPtr = s;
char * writePtr = s;
while (*readPtr) {
if (*readPtr != charToRemove) {
*writePtr++ = *readPtr;
}
readPtr++;
}
*writePtr = '\0';
}

String.length woes

Edit: Solutions must compile against Microsoft Visual Studio 2012.
I want to use a known string length to declare another string of the same length.
The reasoning is the second string will act as a container for operation done to the first string which must be non volatile with regards to it.
e.g.
const string messy "a bunch of letters";
string dostuff(string sentence) {
string organised NNN????? // Idk, just needs the same size.
for ( x = 0; x < NNN?; x++) {
organised[x] = sentence[x]++; // Doesn't matter what this does.
}
}
In both cases above, the declaration and the exit condition, the NNN? stands for the length of 'messy'.
How do I discover the length at compile time?
std::string has two constructors which could fit your purposes.
The first, a copy constructor:
string organised(sentence);
The second, a constructor which takes a character and a count. You could initialize a string with a temporary character.
string organised(sentence.length(), '_');
Alternatively, you can:
Use an empty string and append (+=) text to it as you go along, or
Use a std::stringstream for the same purpose.
the stringstream will likely be more efficient.
Overall, I would prefer the copy constructor if the length is known.
std::string isn't a compile time type (it can't be a constexpr), so you can't use it directly to determine the length at compile time.
You could initialize a constexpr char[] and then use sizeof on that:
constexpr char messychar[] = "a bunch of letters";
// - 1 to avoid including NUL terminator which std::string doesn't care about
constexpr size_t messylen = sizeof(messychar) / sizeof(messychar[0]) - 1;
const string messy(messychar);
and use that, but frankly, that's pretty ugly; the length would be compile time, but organized would need to use the count and char constructor that would still be performed on each call, allocating and initializing only to have the contents replaced in the loop.
While it's not compile time, you'd avoid that initialization cost by just using reserve and += to build the new string, which with the #define could be done in an ugly but likely efficient way as:
constexpr char messychar[] = "a bunch of letters";
constexpr size_t messylen = sizeof(messychar) / sizeof(messychar[0]) - 1;
// messy itself may not be needed, but if it is, it's initialized optimally
// by using the compile time calculated length, so there is no need to scan for
// NUL terminators, and it can reserve the necessary space in the initial alloc
const string messy(messychar, messylen);
string dostuff(string sentence) {
string organised;
organized.reserve(messylen);
for (size_t x = 0; x < messylen; x++) {
organised += sentence[x]++; // Doesn't matter what this does.
}
}
This avoids setting organised's values more than once, allocating more than once (well, possibly twice if initial construction performs it) per call, and only performs a single read/write pass of sentence, no full read followed by read/write or the like. It also makes the loop constraint a compile time value, so the compiler has the opportunity to unroll the loop (though there is no guarantee of this, and even if it happens, it may not be helpful).
Also note: In your example, you mutate sentence, but it's accepted by value, so you're mutating the local copy, not the caller copy. If mutation of the caller value is required, accept it by reference, and if mutation is not required, accept by const reference to avoid a copy on every call (I understand the example code was filler, just mentioning this).

Storing const char* gives me random characters

Okay so basically i have a struct like this
struct person{
const char* name;
const char* about_me;
const char* mom_name;
const char* age;
};
And then in order to make my code versatile i have
struct Person PersonAsArray[MAX_ARRAY - 1];
And then i have a file that reads in a bunch of stuff and eventually i parse it. but when i parse it i get a std::string so i gotta convert it to a const char* so heres some more of my code:
getline(file, line);
//break the line up into 2 parts (because in the file its "name=John")
//these two parts are called id and value
if(id == "name"){
const char* CCvalue = value.c_str();
cout << CCvalue << endl; // its fine here
PersonAsArray[i].name = CCvalue; //i is incremented each time i need a new struct
}
if(id == "age"){
PersonAsArray[i].age = atoi(value.c_str());
}
//and some more of this stuff... eventually i have
cout << PersonAsArray[0].name << endl;
cout << PersonAsArray[0].about_me << endl;
cout << PersonAsArray[0].mom_name << endl;
cout << PersonAsArray[0].age << endl;
but when i finally cout everything, i end up with something that looks like this. I'm just a little curious on whats going on and why its giving me symbols? and its not always the same symbols. Sometimes i get the smiley face, sometimes i dont even get the whole row of rectangles. I have no idea what im doing and its probably some major flaw in my coding. But this also happens when i do something like this
string hi = "hello"
for(i = 0; hi[i] != '\0'; i++){
char x = hi[i];
string done = "";
if(x == 'h') done += "abc";
if(x == 'e') done += "zxc";
if(x == 'l') done += "aer";
if(x == 'o') done += "hjg";
cout << done;
}
I think i remember getting these flower like shapes and i think i even saw chinese characters but again they were not consistent even if i didnt change anything in the program, if i ran it several times, i would see several different combination of symbols and sometimes no symbols would appear.
You did not read the documentation!
The value returned by std::string::c_str() does not live forever.
The pointer obtained from c_str() may be invalidated by:
Passing a non-const reference to the string to any standard library function, or
Calling non-const member functions on the string, excluding operator[], at(), front(), back(), begin(), rbegin(), end() and rend().
The destructor is one such "non-const member function".
Once the pointer is invalidated, you cannot use it. When you try, you either get the data stored at some arbitrary place in memory (your computer's futile attempts to make sense of that data, as if it were text, are resulting in the flowers and Chinese characters you describe) or other unpredictable, bizarre symptoms.
Unfortunately you did not present a complete, minimal testcase so we have no idea how value really fits into your code, but it's clear that it does not survive intact between your "its fine here" and your problematic code.
Don't store the result of std::string::c_str() long-term. There's no need to, and it's rarely useful to.
tl;dr Make person store std::strings, not dangling pointers.
The problem is that you have something like
{
std::string value;
// fill value
PersonsAsArray[i].name = value.c_str();
}
Now, value is a local variable which gets destroyed upon exiting the scope in which it is declared. You store the pointer to its internal data to a .name but you are not copying it so after destruction it points to garbage.
You should have a std::string name field instead that const char*, that will handle copying and retaining the content by itself and its copy assignment operator or allocate memory for the const char* manually, for example through strdup.

Modifying the length and contents of the string?

To change the contents of a string in a function such that it reflects in the main function we need to accept the string as reference as indicated below.
Changing contents of a std::string with a function
But in the above code we are changing the size of string also(i.e, more than what it can hold), so why is the program not crashing ?
Program to convert decimal to binary, mind it, the code is not complete and I am just testing the 1st part of the code.
void dectobin(string & bin, int n)
{
int i=0;
while(n!=0)
{
bin[i++]= (n % 2) + '0';
n = n / 2;
}
cout << i << endl;
cout << bin.size() << endl;
cout << bin << endl;
}
int main()
{
string s = "1";
dectobin(s,55);
cout << s << endl;
return 0;
}
O/p: 6 1 1 and the program crashes in codeblocks. While the above code in the link works perfectly fine.
It only outputs the correct result, when i initialize the string in main with 6 characters(i.e, length of the number after it converts from decimal to binary).
http://www.cplusplus.com/reference/string/string/capacity/
Notice that this capacity does not suppose a limit on the length of the string. When this capacity is exhausted and more is needed, it is automatically expanded by the object (reallocating it storage space). The theoretical limit on the length of a string is given by member max_size
If the string resizes itself automatically then why do we need the resize function and then why is my decimal to binary code not working?
Your premise is wrong. You are thinking 1) if I access a string out of bound then my program will crash, 2) my program doesn't crash therefore I can't be accessing a string out of bounds, 3) therefore my apparently out of bounds string accesses must actually resize the string.
1) is incorrect. Accessing a string out of bounds results in undefined behaviour. This is means exactly what it says. Your program might crash but it might not, it's behaviour is undefined.
And it's a fact that accessing a string never changes it's size, that's why we have the resize function (and push_back etc.).
We must get questions like yours several times a week. Undefined behaviour is clearly a concept that newbies find surprising.
Check this link about std::string:
char& operator[] (size_t pos);
const char& operator[] (size_t pos) const;
If pos is not greater than the string length, the function never
throws exceptions (no-throw guarantee). Otherwise, it causes
undefined behavior.
In your while loop you are accessing the bin string with index that is greater than bin.size()
You aren't changing the size of the string anywhere. If the string you pass into the function is of length one and you access it at indices larger than 0, i.e., at bin[1], bin[2], you are not modifying the string but some other memory locations after the string - there might be something else stored there. Corrupting memory in this way does not necessarily directly lead to a crash or an exception. It will once you access those memory locations later on in your program.
Accepting a reference to a string makes it possible to change instances of strings from the calling code inside the called code:
void access(std::string & str) {
// str is the same instance as the function
// is called with.
// without the reference, a copy would be made,
// then there would be two distinct instances
}
// ...
std::string input = "test";
access(input);
// ...
So any function or operator that is called on a reference is effectively called on the referenced instance.
When, similar to your linked question, the code
str = " new contents";
is inside of the body of the access function, then operator= of the input instance is called.
This (copy assignment) operator is discarding the previous contents of the string, and then copying the characters of its argument into newly allocated storage, whose needed length is determined before.
On the other hand, when you have code like
str[1] = 'a';
inside the access function, then this calls operator[] on the input instance. This operator is only providing access to the underlying storage of the string, and not doing any resizing.
So your issues aren't related to the reference, but to misusing the index operator[]:
Calling that operator with an argument that's not less than the strings size/length leads to undefined behaviour.
To fix that, you could resize the string manually before using the index operator.
As a side note: IMO you should try to write your function in a more functional way:
std::string toOct(std::string const &);
That is, instead of modifying the oases string, create a new one.
The bounds of the string are limited by its current content. That is why when you initialise the string with 6 characters you will stay inside bounds for conversion of 55 to binary and program runs without error.
The automatic expansion feature of strings can be utilised using
std::string::operator+=
to append characters at the end of current string. Changed code snippet will look like this:
void dectobin(string & bin, int n){
//...
bin += (n % 2) + '0';
//...
}
Plus you don't need to initialise the original string in main() and your program should now run for arbitrary decimals as well.
int main(){
//...
string s;
dectobin(s,55);
//...
}

Utility of returning a vector by reference that was passed by reference

I recently saw the following code-block as a response to this question: Split a string in C++?
std::vector<std::string> &split(const std::string &s, char delim, std::vector<std::string>
&elems) {
std::stringstream ss(s);
std::string item;
while(std::getline(ss, item, delim)) {
elems.push_back(item);
}
return elems;
}
Why is returning the passed-by-reference array "elems" so important here? Couldn't we make this a void function, or return an integer to indicate success/failure? We are editing the actual array anyway, right?
Thank you!
By returning a reference to the object you passed in you can do some chaining or cascading in one expression and be working with the same vector the whole time. Some people find this conventient: IE
std::vector<std::string> elems;
std::cout << "Number of items:" << split("foo.cat.dog", '.', elems).size();
// get just foo
std::cout << "First item is:" << split("foo.cat.dog", '.', elems)[0];
// change first item to bar
split("foo.cat.dog", '.', elems)[0] = "bar";
It's not returning the memory address, It's actually returning the object by non-const reference. The same way it was passed in. This might seem a bit of overkill because the calling code can either rely on the third parameter passed, which will be populated on return from the function, or the return parameter.
The reason for doing it this way is to allow chaining. So you can do:
split(myString, ',', asAVector).size().
which will perform the function and allow you to chain the results by calling a function on the vector (in this case size)
Despite the neatness, there are some potential drawbacks to this approach: For example, no error code is present in the return value so you are reliant on the function either working proerly or throwing an exception; therefore you'd usually expect to wrap the above with try / catch semantics. Of course, the more chaining you do, the more likely it will be that the possibilities for different types of exception will go up so you may have to cover more catch blocks.
Mind you, passing back by reference is a whole lot better than passing back by pointer. Chaining with pointers is notorious for crashing when one of the functions in the chain decides to fail and return 0.