C++ substring from string

C++ substring from string - c++

I'm pretty new to C++ and I'm need to create MyString class, and its method to create new MyString object from another's substring, but chosen substring changes while class is being created and when I print it with my method.
Here is my code:
#include <iostream>
#include <cstring>
using namespace std;
class MyString {
public:
char* str;
MyString(char* str2create){
str = str2create;
}
MyString Substr(int index2start, int length) {
char substr[length];
int i = 0;
while(i < length) {
substr[i] = str[index2start + i];
i++;
}
cout<<substr<<endl; // prints normal string
return MyString(substr);
}
void Print() {
cout<<str<<endl;
}
};
int main() {
char str[] = {"hi, I'm a string"};
MyString myStr = MyString(str);
myStr.Print();
MyString myStr1 = myStr.Substr(10, 7);
cout<<myStr1.str<<endl;
cout<<"here is the substring I've done:"<<endl;
myStr1.Print();
return 0;
}
And here is the output:
hi, I'm a string
string
stri
here is the substring I've done:
в™¦

Have to walk this through to explain what's going wrong properly so bear with me.
int main() {
char str[] = {"hi, I'm a string"};
Allocated a temporary array of 17 characters (16 letters plus a the terminating null), placed the characters "hi, I'm a string" in it, and ended it off with a null. Temporary means what it sound like. When the function ends, str is gone. Anything pointing at str is now pointing at garbage. It may shamble on for a while and give some semblance of life before it is reused and overwritten, but really it's a zombie and can only be trusted to kill your program and eat its brains.
MyString myStr = MyString(str);
Creates myStr, another temporary variable. Called the constructor with the array of characters. So let's take a look at the constructor:
MyString(char* str2create){
str = str2create;
}
Take a pointer to a character, in this case it will have a pointer to the first element of main's str. This pointer will be assigned to MyString's str. There is no copying of the "hi, I'm a string". Both mains's str and MyString's strpoint to the same place in memory. This is a dangerous condition because alterations to one will affect the other. If one str goes away, so goes the other. If one str is overwritten, so too is the other.
What the constructor should do is:
MyString(char* str2create){
size_t len = strlen(str2create); //
str = new char[len+1]; // create appropriately sized buffer to hold string
// +1 to hold the null
strcpy(str, str2create); // copy source string to MyString
}
A few caveats: This is really really easy to break. Pass in a str2create that never ends, for example, and the strlen will go spinning off into unassigned memory and the results will be unpredictable.
For now we'll assume no one is being particularly malicious and will only enter good values, but this has been shown to be really bad assumption in the real world.
This also forces a requirement for a destructor to release the memory used by str
virtual ~MyString(){
delete[] str;
}
It also adds a requirement for copy and move constructors and copy and move assignment operators to avoid violating the Rule of Three/Five.
Back to OP's Code...
str and myStr point at the same place in memory, but this isn't bad yet. Because this program is a trivial one, it never becomes a problem. myStr and str both expire at the same point and neither are modified again.
myStr.Print();
Will print correctly because nothing has changed in str or myStr.
MyString myStr1 = myStr.Substr(10, 7);
Requires us to look at MyString::Substr to see what happens.
MyString Substr(int index2start, int length) {
char substr[length];
Creates a temporary character array of size length. First off, this is non-standard C++. It won't compile under a lot of compilers, do just don't do this in the first place. Second, it's temporary. When the function ends, this value is garbage. Don't take any pointers to substr because it won't be around long enough to use them. Third, no space was reserved for the terminating null. This string will be a buffer overrun waiting to happen.
int i = 0;
while(i < length) {
substr[i] = str[index2start + i];
i++;
}
OK that's pretty good. Copy from source to destination. What it is missing is the null termination so users of the char array knows when it ends.
cout<<substr<<endl; // prints normal string
And that buffer overrun waiting to happen? Just happened. Whups. You got unlucky because it looks like it worked rather than crashing and letting you know that it didn't. Must have been a null in memory at exactly the right place.
return MyString(substr);
And this created a new MyString that points to substr. Right before substr hit the end of the function and died. This new MyString points to garbage almost instantly.
}
What Substr should do:
MyString Substr(int index2start, int length)
{
std::unique_ptr<char[]> substr(new char[length + 1]);
// unique_ptr is probably paranoid overkill, but if something does go
// wrong, the array's destruction is virtually guaranteed
int i = 0;
while (i < length)
{
substr[i] = str[index2start + i];
i++;
}
substr[length] = '\0';// null terminate
cout<<substr.get()<<endl; // get() gets the array out of the unique_ptr
return MyString(substr.get()); // google "copy elision" for more information
// on this line.
}
Back in OP's code, with the return to the main function that which was substr starts to be reused and overwritten.
cout<<myStr1.str<<endl;
Prints myStr1.str and already we can see some of it has been reused and destroyed.
cout<<"here is the substring I've done:"<<endl;
myStr1.Print();
More death, more destruction, less string.
Things to not do in the future:
Sharing pointers where data should have been copied.
Pointers to temporary data.
Not null terminating strings.

Your function Substr returns the address of a local variable substr indirectly by storing a pointer to it in the return value MyString object. It's invalid to dereference a pointer to a local variable once it has gone out of scope.
I suggest you decide whether your class wraps an external string, or owns its own string data, in which case you will need to copy the input string to a member buffer.

Related

How does the '->' operator work and is it a good implementation to modify a large string?

I want to begin with saying that I have worked with pointers before and I assumed I understood how they worked. As in,
int x = 5;
int *y = &x;
*y = 3;
std::cout << x; // Would output 3
But then I wanted to make a method which modifies a rather large string and I believe therefore it would be better to pass a reference to the string in order to avoid passing the entire string back and fourth. So I pass my string to myFunc() and I do the same thing as I did with the numbers above. Which means I can modify *str as I do in the code below. But in order to use methods for String I need to use the -> operator.
#include <iostream>
#include <string>
int myFunc(std::string *str) { // Retrieve the address to which str will point to.
*str = "String from myFunc"; // This is how I would normally change the value of myString
str->replace(0, 1, "s"); // Replacing index 0 with a lowercase s.
return 0;
}
int main() {
std::string myString << "String from main";
myFunc(&myString); // Pass address of myString to myFunc()
}
My questions are:
Since str in myFunc is an address, why can an address use an
operator such as -> and how does it work? Is it as simple as the
object at the address str's method is used? str->replace(); // str->myString.replace()?
Is this a good implementation of modifying a large string or would it better to pass the string to the method and return the string when its modified??

ptr->x is identical to (*ptr).x unless -> is overridden for a type you're dereferencing. On normal pointers, that works as you'd expect it to.
As for implementation, profile it when you implement it. You can't know what compiler will do with this once you turn optimizations on. For example, if given function gets inlined, you won't even have any extra indirection in the first place and it won't matter which way you do it. As long as you don't allocate a new string, differences should generally be negligible.

str is a pointer to std::string object. The arrow operator, ->, is used to dereference the pointer and then access its member. Alternatively, you can also write (*str).replace(0,1,"s"); here, * dereferences the pointer and then . access the member function replace().
Pointers are often confusing; it is better to use references when possible.
void myFunc(std::string &str) { // Retrieve the address to which str will point to.
str = "String from myFunc"; // This is how I would normally change the value of myString
str.replace(0, 1, "s"); // Replacing index 0 with a lowercase s.
}
int main() {
std::string myString = "String from main";
myFunc(myString); // Pass address of myString to myFunc()
}
Is this a good implementation of modifying a large string or would it better to pass the string to the method and return the string when its modified??
If you don't want to change the original string then create a new string and return it.
If it's ok for your application to modify the original string then do it. Also you can return a reference to a modified string if you need to chain function calls.
std::string& myFunc(std::string &str) { // Retrieve the address to which str will point to.
str = "String from myFunc"; // This is how I would normally change the value of myString
return str.replace(0, 1, "s"); // Replacing index 0 with a lowercase s.
}

How could creating a string change the value pointed to by a const char*?

I've written a function that takes a string and returns a const char * which contains an encoded version of that string. I call this function, and then create a new string. In doing so, I am somehow inadvertently changing the value pointed to my const char *, something which I thought was impossible.
However, when I don't use my own function, but just hardcode a value into my const char array, the value does not change when I create a string. Why is there a difference here, and why would I be able to change the value of a const char array anyways?
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <iostream>
using namespace std;
// returns "#username#FIN"
const char* encodeUsername(string username)
{
username = "#" + username + "#FIN";
return username.c_str();
}
int main(void)
{
string jack("jack");
const char* encodedUsername = "#jack#FIN";
string dummy("hi");
printf("%s\n", encodedUsername); //outputs "#jack#FIN", as expected.
string tim("tim");
const char* encodedUsername2 = encodeUsername(tim);
string dummy2("hi");
printf("%s\n", encodedUsername2); //outputs "hi". Why?
}

To understand why this happens you need to understand several intrinsic properties of C++.
In C++ pointers can point to areas of memory that were freed up. This is something you cannot do in many other languages, and it can hide some severe errors. For example, consider the following code:
char* moo()
{
char* a = new char[20];
strcpy(a, "hello");
delete[] a;
return a;
}
Note that even though I just deleted a, I can return a pointer to it. The calling side will receive that pointer and will have no idea that it points to a freed-up memory. Moreover, if you immediately print the value of the returned value, you will very likely see "hello", because delete usually does not zero-out memory it frees up.
std::string is, roughly speaking, a wrapper around char* that hides all the allocations and deallocations behind a very nice interface, so that you don't need to care about memory management. The constructor of std::string and all operations on it allocate or reallocate the array, and the destructor deallocates it.
When you pass something into a function by value (as you do in your encodeUsername function in line username = "#" + username + "#FIN"), it creates a new object with a copy of what you are passing, which will be destroyed as soon as the function ends. So in this case, as soon as encodeUsername returns, username is destroyed, because it was passed by value, and is contained within the function's scope. Since the object is destroyed, its destructor is called, and at that point the string is deallocated. The pointer to the raw data that you retrieved by calling to c_str() now points to something that does not exist any longer.
When you allocate an object immediately following a deallocation, you are very likely to reuse the memory of the object that was just freed. In your case, as you create a new string, tim, it allocates memory at the same address that was just deallocated when encodeUsername returned.
Now, how can you fix it?
First, if you don't care about the input string (as, if you are OK with overwriting it), you can just pass it by reference:
const char* encodeUsername(string& username)
This will fix it, because username is not a copy, so it is not destroyed at the end of the function. The problem now, however, is that this function will change the value of the string you are passing in, which is very undesirable and creates an unintuitive interface.
Second, you can allocate a new char array before returning it, and then free it at the end of the calling function:
const char* encodeUsername(string username)
{
username = "#" + username + "#FIN";
return strdup(username.c_str());
}
and then at the end of main:
free(encodedUsername);
free(encodedUsername2);
(note that you have to use free and not delete[] because the array was allocated using strdup)
This will work because the char array we return is allocated on the heap right before we return and is not freed. It comes at a price that now the calling function need to free it up, which is, again, an unintuitive interface.
Finally, the proper solution would be to return an std::string instead of a char pointer, in which case the std::string will take care of all the allocations and deallocations for you:
string encodeUsername(string username)
{
username = "#" + username + "#FIN";
return username;
}
And then in the main function:
string encodedUsername2 = encodeUsername(tim);
printf("%s\n", encodedUsername2.c_str());

The lifetime of username terminates when encodeUsername returns, leaving the pointer returned by that function dangling. In other words, it is Undefined Behaviour, which in this case manifests itself in the reuse of the memory pointed to by encodeUsername's return value for the newly-created string.
That won't happen if you return the std::string itself.

C++ Swap string

I am trying to create a non-recursive method to swap a c-style string. It throws an exception in the Swap method. Could not figure out the problem.
void Swap(char *a, char* b)
{
char temp;
temp = *a;
*a = *b;
*b = temp;
}
void Reverse_String(char * str, int length)
{
for(int i=0 ; i <= length/2; i++) //do till the middle
{
Swap(str+i, str+length - i);
}
}
EDIT: I know there are fancier ways to do this. But since I'm learning, would like to know the problem with the code.

It throws an exception in the Swap method. Could not figure out the problem.
No it doesn't. Creating a temporary character and assigning characters can not possibly throw an exception. You might have an access violation, though, if your pointers don't point to blocks of memory you own.
The Reverse_String() function looks OK, assuming str points to at least length bytes of writable memory. There's not enough context in your question to extrapolate past that. I suspect you are passing invalid parameters. You'll need to show how you call Reverse_String() for us to determine if the call is valid or not.
If you are writing something like this:
char * str = "Foo";
Reverse_String(str, 3);
printf("Reversed: '%s'.\n", str);
Then you will definitely get an access violation, because str points to read-only memory. Try the following syntax instead:
char str[] = "Foo";
Reverse_String(str, 3);
printf("Reversed: '%s'.\n", str);
This will actually make a copy of the "Foo" string into a local buffer you can overwrite.

This answer refers to the comment by #user963018 made under #André Caron's answer (it's too long to be a comment).
char *str = "Foo";
The above declares a pointer to the first element of an array of char. The array is 4 characters long, 3 for F, o & o and 1 for a terminating NULL character. The array itself is stored in memory marked as read-only; which is why you were getting the access violation. In fact, in C++, your declaration is deprecated (it is allowed for backward compatibility to C) and your compiler should be warning you as such. If it isn't, try turning up the warning level. You should be using the following declaration:
const char *str = "Foo";
Now, the declaration indicates that str should not be used to modify whatever it is pointing to, and the compiler will complain if you attempt to do so.
char str[] = "Foo";
This declaration states that str is a array of 4 characters (including the NULL character). The difference here is that str is of type char[N] (where N == 4), not char *. However, str can decay to a pointer type if the context demands it, so you can pass it to the Swap function which expects a char *. Also, the memory containing Foo is no longer marked read-only, so you can modify it.
std::string str( "Foo" );
This declares an object of type std::string that contains the string "Foo". The memory that contains the string is dynamically allocated by the string object as required (some implementations may contain a small private buffer for small string optimization, but forget that for now). If you have string whose size may vary, or whose size you do not know at compile time, it is best to use std::string.

Reversing a string/sequence of characters using only pointers

I'm working on an assignment that requires myself to reverse a sequence of characters, which can be of any given type, using a pointer to the "front" of the sequence and a pointer to the "end" of the sequence.
In my current build, I begin by first attempting to switch the "front" and "end" characters. However, I receive an "access violation" during runtime.
My code at the moment:
#include <cstdlib>
#include <string>
#include <iostream>
using namespace std;
class StrReverse
{
public:
StrReverse(); //default constructor
void revStr(); //reverses a given c-string
private:
typedef char* CharPtr;
CharPtr front;
CharPtr end;
CharPtr cStr;
};
int main()
{
StrReverse temp = StrReverse();
temp.revStr();
system("pause");
return 0;
}
//default constructor
StrReverse::StrReverse()
{
cStr = "aDb3EfgZ";
front = new char;
end = new char;
}
//reverses a given string
void StrReverse::revStr()
{
for(int i = 0;i < 4;i++)
{
front = (cStr + i);
end = (cStr + (7 - i));
*front = *end;
}
}
The key restriction with this problem is that the reversal must be done using pointers. I realize that simply reversing a string is trivial, but this restriction has me scratching my head. Any constructive comments would be greatly appreciated!

You assign the string literal "aDb3EfgZ" to cStr, and string literals can't be modified. Your compiler most likely stores the string literal in read only memory, and when you try to write to *front you get an access violation because of that.
To get a modifiable string, make a copy of the literal. For example:
const char *cLit = "aDb3EfgZ";
cStr = new char[strlen(cLit)+1];
strcpy(cStr, cLit);
For further detail see for example this question and the ones mentioned there in the "Linked" section.

There are several problems with your code. For starters, why the class;
this is something I'd expect to be done with a simple function:
void reverse( char* begin, char* end );
And you don't need an index, since you've got the pointers already; you
can just increment and decrement the pointers.
Also, why do you allocate memory in your constructor. Memory that you
never use (or free).
Finally, you don't really inverse anything in your loop. You need to
swap the characters, not just copy the one at the end into the one at
the beginning.
And as for the access violation: a string literal is a constant. You
can't modify it. If you want to do the reverse in place, you'll need to
copy the string somewhere else (or use it to initialize an array).

Your constructor is gonna leak memory because you loose the pointers you allocate the front and end, those allocations aren't even needed. As for your problem, you can loop though the string to find the end using while(*endptr) endptr++;, from there the size of the string is endptr - startptr; which you use to allocate a temp buffer so you can do while(startptr != endptr) *tempbuf++ = *endptr--; then free the old string and set the temp buffer as the new string

The basic technique for an in-place reversal:
get a pointer (call it 'left') to the first character in the string.
get a pointer (call it 'right') to the last character in the string (not counting the trailing NUL character)
while the left pointer is less than the right pointer
swap the characters located by each pointer
increment the left pointer
decrement the right pointer
That's about all there is to it. Production of a reversed copy of the string requires a bit more work.

C++ method parameter passed by reference - memory question

Assume,
void proc(CString& str)
{
str = "123";
}
void runningMethod()
{
CString str="ABC";
proc(str);
}
I understand that at the exit of runningMethod str will be deallocated automatically; in this case, how does C++ delete the old data ("ABC")?
Thanks,
Gil.

"ABC" was overwritten when you said = "123".
Internally, a string is an array of characters. At start, it made a new buffer that contained {'A', 'B', 'C', '\0'}. When you assigned, it just wrote '1' over the 'A', and so on.
When it destructed, it deleted the buffer.

The same happens as if you'd write:
CString foo = "ABC";
foo = "123";

The exact details depend on the implementation of CString, but the important bit is that you don't have to worry about allocation and deallocation now that the class takes care of it for you.

In most cases when you do your assignment in proc() "ABC" will be freed. This is usually done in overloaded operator method. For example here
you have example how such overload looks like.
String& String::operator= (const String& other)
{
char* otherdata = other.data;
char* olddata = data;
if (otherdata != 0)
{
data = new char[other.length+1];
length = other.length;
memcpy(data,otherdata,other.length+1);
}
else
{
data = 0;
length = 0;
}
if (olddata != 0)
{
delete[] olddata;
}
return *this;
}

A couple things to keep in mind here. First, the operator= of a class will generally take care of deleting anything it used to refer to before assigning the new data. Well, that's not entirely true, often times a smart developer will implement operator= by first creating a copy of the incoming class and then swapping current data with the new temporary, which now has ownership and deletes it. The important part to remember though is that before the operator= function exists the old data has, generally speaking, been discarded.
The other thing to keep in mind is that "ABC" is a string literal. The standard doesn't really define how they have to be stored, it simply states limitations that allow certain usual implementations. Very often that string literal will appear as a read-only element within the program data. In that case it will never be deleted so long as the program's image is loaded into memory (when it's running basically). This is the whole reason why code like this is UB:
void f()
{
char * x = "hello"; // points to a string literal.
x[0] = 'H';
}
// correct implementation is:
void f()
{
char x[] = "hello"; // reserved an array of 6 characters and copies string literal content.
x[0] = 'H';
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

C++ substring from string - c++

Related

How does the '->' operator work and is it a good implementation to modify a large string?

How could creating a string change the value pointed to by a const char*?

C++ Swap string

Reversing a string/sequence of characters using only pointers

C++ method parameter passed by reference - memory question

Categories

Resources