Manipulating std::string

Manipulating std::string - c++

The below code does not give any fault/error/warning(although I think there might be some illegal memory access happening). Strangely, the size of the string being printed using 2 different methods(strlen and std::string.size() is coming out differently.
strlen(l_str.c_str()-> is giving the size as 1500, whereas,
l_str.size()-> is giving the size as 0.
#include <string.h>
#include <string>
#include <stdio.h>
#include<iostream>
using namespace std;
void strRet(void* data)
{
char ar[1500];
memset(ar,0,1500);
for(int i=0;i<1500;i++)
ar[i]='a';
memset(data,0,1500); // This might not be correct but it works fine
memcpy(data,ar,1500);
}
int main()
{
std::string l_str;
cout<<endl<<"size before: "<<l_str.length();
int var=10;
strRet((void *)l_str.c_str());
printf("Str after call: %s\n",l_str.c_str());
cout<<endl<<"size after(using strlen): "<<strlen(l_str.c_str());
cout<<endl<<"Size after(using size function): "<<l_str.size();
printf("var value after call: %d\n",var);
return 0;
}
Please suggest, if I'm doing something which I'm not supposed to do!
Also, I wanted to know which memory bytes are being set to 0 when I do memset(data,0,1500);? What I mean to ask is that if suppose, my string variable's starting address is 100, then does memset command sets the memory range [100,1600] as 0? Or is it setting some other memory range?

memset(data,0,1500); // This might not be correct but it works fine
It isn't correct, and it doesn't "work fine". This is Undefined Behaviour, and you're making the common mistake of assuming that if it compiles, and your computer doesn't instantly catch fire, everything is fine.
It really isn't.
I've done something which I wasn't supposed to do!
Yes, you have. You took a pointer to a std::string, a non-trivial object with its own state and behaviour, asked it for the address of some memory it controls, and cast that to void*.
There's no reason to do that, you should very rarely ever see void* in C++ code, and seeing C-style casts to any type is pretty worrying.
Don't take void* pointers into objects with state and behaviour like std::string until you understand what you're doing and why this is wrong. Then, when that day comes, you still won't do it because you'll know better.
We can look at the first problem in some fine detail, if it helps:
(void *)l_str.c_str()
what does c_str() return? A pointer to some memory owned by l_str
where is this memory? No idea, that's l_str's business. If this standard library implementation uses the small string optimization, it may be inside the l_str object. If not, it may be dynamically allocated.
how much memory is allocated at this location? No idea, that's l_str's business. All we can say for sure is that there is at least one legally-addressable char (l_str.c_str()[0] == '\0') and that it's legal to use the address l_str.c_str()+1 (but only as a one-past-the-end pointer, so you can't dereference it)
So, the statement
strRet((void *)l_str.c_str());
passes strRet a pointer to a location containing one or more addressable chars, of which the first is zero. That's everything we can say about it.
Now let's look again at the problematic line
memset(data,0,1500); // This might not be correct but it works fine
why would we expect there to be 1500 chars at this location? If you'd documented strRet as requiring a buffer of at least 1500 allocated chars, would it look reasonable to actually pass l_str.c_str() when you know l_str has just been default constructed as an empty string? It's not like you asked l_str to allocate that storage for you.
You could start to make this work by giving l_str a chance to allocate the memory you intend to write, by calling
l_str.reserve(1500);
before calling strRet. This still won't notify l_str that you filled it with 'a's though, because you did that by changing the raw memory behind its back.
If you want this to work correctly, you could replace the entirety of strRet with
std::string l_str(1500, 'a');
or, if you want to change an existing string correctly, with
void strRet(std::string& out) {
// this just speeds it up, since we know the size in advance
out.reserve(1500);
// this is in case the string wasn't already empty
out.clear();
// and this actually does the work
std::fill_n(std::back_inserter(out), 1500, 'a');
}

Related

Char array returns four times more data than expected

Before I continue, here's the code:
#include <iostream>
using namespace std;
int main() {
char array[] = {'a','b','c'};
cout << array << endl;
return 0;
}
My system:
VisualStudio 2019, default C++ settings
Using Debug build instead of release
When I run this code sample, I get something like this in my console output:
abcXXXXXXXXX
Those X's represent seemingly random characters. I know they're from existing values in memory at that address, but I don't understand why I'm getting 12 bytes back instead of the three from my array.
Now, I know that if I were doing this with ints which are four bytes long, maybe this would make sense but sizeof(array) returns three (ie. three bytes long, I know the sizeof(array) / sizeof(array[0] trick.) And when I do try it with ints, I'm even more confused because I get some four-byte hex number instead (maybe a memory address?)
This may be some trivial question, I'm sorry, but I'm just trying to figure out why it behaves like this. No vectors please, I'm trying to stay as non-STL as possible here.

cout takes this char array and addresses it as a null-terminated string.
Since the terminating character in this array is not the null character (i.e., char(0)), it attempts to print until encountering the null character.
At this point, it attempts to read memory outside of the array which you have allocated, and technically, anything could happen.
For example, there can be different data in that memory every time the function is called, or the memory access operation may even be illegal, depending on the address where array was allocated at the time the function was called.
So the behavior of your program is generally considered undefined (or non-deterministic).

Program creating file path using strdup and strcat crashes when fed more than 39 characters

I am trying to concatenate two char arrays using the function strcat(). However the program crashes.
#include <cstdio>
#include <cstring>
int main() {
const char *file_path = "D:/MyFolder/YetAnotherFolder/test.txt";
const char *file_bk_path = strcat(strdup(file_path), ".bk");
printf("%s\n", file_bk_path);
return 0;
}
The strangest thing to me is that the program indeed produces an output before crashing:
D:/MyFolder/YetAnotherFolder/test.txt.bk
What is the reason for this problem and how it can be fixed?
Error state is reproduced in Windows (MinGW 7.2.0).

strdup is creating new memory for you to hold a duplicate of the string. The memory is only as long as strlen(file_path) + 1. You then try to add an extra 2 characters into memory that you don't own. You will go out of range of the memory created and create some undefined behaviour. It might print because setting the memory and printing the first part could be happening correctly, but it is undefined and anything can happen. Also note, in strdup you need to call free on the memory it creates for you, or you are going to leak some memory.
Here is a much simpler way to do this, use a std::string:
const char *file_path = "D:/MyFolder/YetAnotherFolder/test.txt";
std::string file_bk_path = std::string(file_path) + ".bk";
std::cout << file_bk_path << "\n";
Here is a live example.
If it absolutely needs to be in C-style then you are better off controlling the memory yourself:
const char *file_path = "D:/MyFolder/YetAnotherFolder/test.txt";
const char *bk_string = ".bk";
char *file_bk_path = malloc((strlen(file_path) + strlen(bk_string) + 1)*sizeof(char));
if (!file_bk_path) { exit(1); }
strcpy(file_bk_path, file_path);
strcat(file_bk_path, bk_string);
printf("%s\n", file_bk_path);
free(file_bk_path);
Here is a live example.

As mentioned in the comments and answers, strdup mallocs the length of your path string, plus an extra cell for the string end character '\0'. When you concatenate to this two characters writing after the allocated area.
Following #Ben's comments, I'd like to elucidate some more:
To be clear strcat adds a delimiter, but this is already after the memory you were allocated.
In general unless you specifically hit no-no addresses, the program will probably run fine - in fact this is a common hard to find bug. If for example you allocate some more memory right after that address, you will be deleting said delimiter (so printing the string will read further into the memory.
So in general, you may be OK crash wise. The crash (probably) occurs when the program ends, and the OS cleans up the memory you forgot to free yourself - That extra cell is a memory leak, and will cause the crash. So you do get a full print, and only after a crash.
Of course all of this is undefined behavior, so may depend on the compiler and OS.

Cannot safely delete LPTSTR allocation

Consider:
CCustomDateTime::CCustomDateTime()
{
LPTSTR result = new TCHAR[1024];
time_t _currentTime_t = time(0);
tm now;
localtime_s(&now, &_currentTime_t);
_tasctime_s(result, _tcslen(result), &now);
_currentTime = result;
delete[] result; // Error occurs here
}
CCustomDateTime::~CCustomDateTime()
{
}
__int64 CCustomDateTime::CurrentTimeAsInt64()
{
return _currentTime_t;
}
LPTSTR CCustomDateTime::CurrentTimeAsString()
{
return _currentTime;
}
I am unable to figure out the safest place to call delete[] on result.
If delete[] is ignored everything is fine, but otherwise an error occurs:
HEAP CORUPTION DETECTED at line delete[]

_tcslen(result) is not doing what you think it is.
change
_tasctime_s(result, _tcslen(result), &now);
to
_tasctime_s(result, 1024, &now);

There are a few problems with your code that I can see:
You don't check any of the function calls for errors. Don't ignore the return value. Use it to check for errors.
The second argument to _tasctime_s is the number of elements in the buffer provided. In other words, 1024. But you pass _tcslen(result) which is the length of the null-terminated string. Not only is that the wrong value, but result is at that point not initialised, so your code has undefined behaviour.
You assign a value to _currentTime, and then immediately delete that memory. So, _currentTime is a stale pointer. Any attempt to read from that memory is yet more undefined behaviour.
I don't want to tell you what your code should be, because you have only given us a tiny window into what you are trying to achieve. Dynamically allocating a fixed length array seems pointless. You may as well use automatically allocated storage. Of course, if you do want to return the memory to the caller, then dynamic allocation makes sense, but in that case then surely the caller would be responsible for calling delete[]. Since this code is clearly C++ I have to wonder why you are using raw memory allocation. Why not use standard library classes like std::string?
Looking at your update to the question, you could deallocate the memory in the destructor of your class. Personally though, I would recommend learning about the standard library classes that will greatly simplify your code.

_tcslen maps to strlen or wcslen depending on whether you are using ANSI or Unicode, respectively.
Both these functions return the length of a string, not the size of the buffer. In other words, they take a pointer to the first character of a string and continuously increment the pointer in search of a null terminator.
Calling these functions on an uninitialized buffer is undefined behavior because there's a very good chance that the pointer will get incremented out of the array bounds and elsewhere into the process' memory.

Why does this work: returning C string literal from std::string function and calling c_str()

We recently had a lecture in college where our professor told us about different things to be careful about when programming in different languages.
The following is an example in C++:
std::string myFunction()
{
return "it's me!!";
}
int main(int argc, const char * argv[])
{
const char* tempString = myFunction().c_str();
char myNewString[100] = "Who is it?? - ";
strcat(myNewString, tempString);
printf("The string: %s", myNewString);
return 0;
}
The idea why this would fail is that return "it's me!!" implicitly calls the std::string constructor with a char[]. This string gets returned from the function and the function c_str() returns a pointer to the data from the std::string.
As the string returned from the function is not referenced anywhere, it should be deallocated immediately. That was the theory.
However, letting this code run works without problems.
Would be curious to hear what you think.
Thanks!

Your analysis is correct. What you have is undefined behaviour. This means pretty much anything can happen. It seems in your case the memory used for the string, although de-allocated, still holds the original contents when you access it. This often happens because the OS does not clear out de-allocated memory. It just marks it as available for future use. This is not something the C++ language has to deal with: it is really an OS implementation detail. As far as C++ is concerned, the catch-all "undefined behaviour" applies.

I guess deallocation does not imply memory clean-up or zeroing. And obviously this could lead to a segfault in other circumstances.

I think that the reason is that the stack memory has not been rewriten, so it can get the original data. I created a test function and called it before the strcat.
std::string myFunction()
{
return "it's me!!";
}
void test()
{
std::string str = "this is my class";
std::string hi = "hahahahahaha";
return;
}
int main(int argc, const char * argv[])
{
const char* tempString = myFunction().c_str();
test();
char myNewString[100] = "Who is it?? - ";
strcat(myNewString, tempString);
printf("The string: %s\n", myNewString);
return 0;
}
And get the result:
The string: Who is it?? - hahahahahaha
This proved my idea.

As others have mentioned, according to the C++ standard this is undefined behavior.
The reason why this "works" is because the memory has been given back to the heap manager which holds on to it for later reuse. The memory has not been given back to the OS and thus still belongs to the process. That's why accessing freed memory does not cause a segmentation fault. The problem remains however that now two parts of your program (your code and the heap manager or new owner) are accessing memory that they think uniquely belongs to them. This will destroy things sooner or later.

The fact that the string is deallocated does not necessarily mean that the memory is no longer accessible. As long as you do nothing that could overwrite it, the memory is still usable.

As said above - it's unpredicted behaviour. It doesn't work for me (in Debug configuration).
The std::string Destructor is called immediately after the assignment to the tempString - when the expression using the temporary string object finishes.
Leaving the tempString to point on a released memory (that in your case still contains the "it's me!!" literals).

You cannot conclude there is no problems by getting your result by coincidence.
There are other means to detect 'problems' :
Static analysis.
Valgrind would catch the error, showing you both the offending action (trying to copy from freed zone -by strcat) and the deallocation which caused the freeing.
Invalid read of size 1
at 0x40265BD: strcat (mc_replace_strmem.c:262)
by 0x80A5BDB: main() (valgrind_sample_for_so.cpp:20)
[...]
Address 0x5be236d is 13 bytes inside a block of size 55 free'd
at 0x4024B46: operator delete(void*) (vg_replace_malloc.c:480)
by 0x563E6BC: std::string::_Rep::_M_destroy(std::allocator<char> const&) (in /usr/lib/libstdc++.so.6.0.13)
by 0x80A5C18: main() (basic_string.h:236)
[...]
The one true way would be to prove the program correct. But it is really hard for procedural language, and C++ makes it harder.

Actually, string literals have static storage duration. They are packed inside the executable itself. They are not on the stack, nor dynamically allocated. In the usual case, it is correct that this would be pointing to invalid memory and be undefined behavior, however for strings, the memory is in static storage, so it will always be valid.

Unless I'm missing something, I think this is an issue of scope. myFunction() returns a std::string. The string object is not directly assigned to a variable. But it remains in scope until the end of main(). So, tempString will point to perfectly valid and available space in memory until the end of the main() code block, at which time tempString will also fall out of scope.

How to prevent copying a wild pointer string

My program is crash intermittently when it tries to copy a character array which is not ended by a NULL terminator('\0').
class CMenuButton {
TCHAR m_szNode[32];
CMenuButton() {
memset(m_szNode, '\0', sizeof(m_szNode));
}
};
int main() {
....
CString szTemp = ((CMenuButton*)pButton)->m_szNode; // sometime it crashes here
...
return 0;
}
I suspected someone had not copied the character well ended by '\0', and it ended like:
Stack
m_szNode $%#^&!&!&!*#*#&!(*#(!*##&#&*&##!^&*&#(*!#*((*&*SDFKJSHDF*(&(*&(()(**
Can you tell me what is happening and what should i do to prevent the copying of wild pointer? Help will be very much appreciated!
I guess I'm unable to check if the character array is NULL before copying...

I suspect that your real problem could be that pButton is a bad pointer, so check that out first.
The only way to be 100% sure that a pointer is correct, and points to a correctly sized/allocated object is to never use pointers you didn't create, and never accept/return pointers. You would use cookies, instead, and look up your pointer in some sort of cookie -> pointer lookup (such as a hash table). Basically, don't trust user input.
If you are more concerned with finding bugs, and less about 100% safety against things like buffer overrun attacks, etc. then you can take a less aggressive approach. In your function signatures, where you currently take pointers to arrays, add a size parameter. E.g.:
void someFunction(char* someString);
Becomes
void someFunction(char* someString, size_t size_of_buffer);
Also, force the termination of arrays/strings in your functions. If you hit the end, and it isn't null-terminated, truncate it.
Make it so you can provide the size of the buffer when you call these, rather than calling strlen (or equivalent) on all your arrays before you call them.
This is similar to the approach taken by the "safe string functions" that were created by Microsoft (some of which were proposed for standardization). Not sure if this is the perfect link, but you can google for additional links:
http://msdn.microsoft.com/en-us/library/ff565508(VS.85).aspx

There are two possibilities:
pButton doesn't point to a CMenuButton like you think it does, and the cast is causing undefined behavior.
The code that sets m_szNode is incorrect, overflowing the given size of 32 characters.
Since you haven't shown us either piece of code, it's difficult to see what's wrong. Your initialization of m_szNode looks OK.
Is there any reason that you didn't choose a CString for m_szNode?

My approach would be to make m_szNode a private member in CMenuButton, and explicitly NULL-terminate it in the mutator method.
class CMenuButton {
private:
TCHAR m_szNode[32];
public:
void set_szNode( TCHAR x ) {
// set m_szNode appropriately
m_szNode[ 31 ] = 0;
}
};

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Manipulating std::string - c++

Related

Char array returns four times more data than expected

Program creating file path using strdup and strcat crashes when fed more than 39 characters

Cannot safely delete LPTSTR allocation

Why does this work: returning C string literal from std::string function and calling c_str()

How to prevent copying a wild pointer string

Categories

Resources