Copying C-Style String to Free Store Using Only Dereference - c++

As said in the title, the goal is to copy a C-style string into memory without using any standard library functions or subscripting.
Here is what I have so far [SOLVED]
#include "std_lib_facilities.h"
char* strdup(const char* p)
{
int count = 0;
while (p[count]) ++count;
char* q = new char[count+1];
for (int i = 0; i < count + 1; ++i) *(q + i) = *(p + i);
}
int main()
{
char word[] = "Happy";
char* heap_str = strdup(word);
}
Obviously the problem is that allocating just *p (which is equivalent to p[0]) only allocates the letter "H" to memory. I'm not sure how to go about allocating the C-style string without subscripting or STL functions.

C-style string ends with '\0'. You need to traverse the string inside the function character by character until you encounter '\0' to know how long it is. (This is effectively what you would do by calling strlen() to work it out.) Once you know how long the string is, you can allocate the right amount of memory, which is the length+1 (because of the '\0').
To access the i'th element of an array p, one use subscript: p[i].
Subscript of the form p[i] is formally defined to be *((p)+(i)) by both the C standard (6.5.2.1 of C99) and the C++ standard (5.2.1 of C99). Here, one of p or i is of the type pointer to T, and the other is of integral type (or enumeration in C++). Because array name is converted automatically (in most types of use anyway) to a pointer to the first element of said array, p[i] is thus the i'th element of array p.
And just like basic arithmetic, ((p)+(i)) is equivalent to ((i)+(p)) in pointer arithmetic. This mean *((p)+(i)) is equivalent to *((i)+(p)). Which also mean p[i] is equivalent to i[p].

Well, since this a self-teaching exercise, here's an alternative look at a solution that can be compared/contrasted with KTC's nice explanation of the equivalence between subscripting and pointer arithmetic.
The problem appears to be, "implement a strdup() function without using standard library facilities or subscripting".
I'm going to make an exception for malloc(), as there's no reasonable way to do the above without it, and I think that using it isn't detrimental to what's being taught.
First, let's do a basic implementation of strdup(), calling functions that are similar to the ones we might use from the library:
size_t myStrlen( char* s);
void myStrcpy( char* dst, char* src);
char* strdup( char* p)
{
size_t len = myStrlen( p);
char* dup = (char*) malloc( len + 1); /* include space for the termination character */
if (dup) {
myStrcpy( dup, p);
}
return dup;
}
Now lets implement the worker functions without subscripting:
size_t myStrlen( char* s)
{
size_t len = 0;
while (*s != '\0') { /* when s points to a '\0' character, we're at the end of the string */
len += 1;
s += 1; /* move the pointer to the next character */
}
return len;
}
void myStrcpy( char* dst, char* src)
{
while (*src != '\0') { /* when src points to a '\0' character, we're at the end of the string */
*dst = *src;
++dst; /* move both pointers to next character location */
++src;
}
*dst = '\0'; /* make sure the destination string is properly terminated */
}
And there you have it. I think this satisfies the condition of the assignment and shows how pointers can be manipulated to move though an array of data items instead of using subscripting. Of course, the logic for the myStrlen() and myStrcpy() routines can be moved inline if desired, and more idiomatic expressions where the pointer increment can happen in the expression that copies the data can be used (but I think that's more confusing for beginners).

Related

Character pointer access

I wanted to access character pointer ith element. Below is the sample code
string a_value = "abcd";
char *char_p=const_cast<char *>(a_value.c_str());
if(char_p[2] == 'b') //Is this safe to use across all platform?
{
//do soemthing
}
Thanks in advance
Array accessors [] are allowed for pointer types, and result in defined and predictable behaviors if the offset inside [] refers to valid memory.
const char* ptr = str.c_str();
if (ptr[2] == '2') {
...
}
Is correct on all platforms if the length of str is 3 characters or more.
In general, if you are not mutating the char* you are looking at, it best to avoid a const_cast and work with a const char*. Also note that std::string provides operator[] which means that you do not need to call .c_str() on str to be able to index into it and look at a char. This will similarly be correct on all platforms if the length of str is 3 characters or more. If you do not know the length of the string in advance, use std::string::at(size_t pos), which performs bound checking and throws an out_of_range exception if the check fails.
You can access the ith element in a std::string using its operator[]() like this:
std::string a_value = "abcd";
if (a_value[2] == 'b')
{
// do stuff
}
If you use a C++11 conformant std::string implementation you can also use:
std::string a_value = "abcd";
char const * p = &a_value[0];
// or char const * p = a_value.data();
// or char const * p = a_value.c_str();
// or char * p = &a_value[0];
21.4.1/5
The char-like objects in a basic_string object shall be stored contiguously.
21.4.7.1/1: c_str() / data()
Returns: A pointer p such that p + i == &operator[](i) for each i in [0,size()].
The question is essentially about querying characters in a string safely.
const char* a = a_value.c_str();
is safe unless some other operation modifies the string after it. If you can guarantee that no other code performs a modification prior to using a, then you have safely retrieved a pointer to a null-terminated string of characters.
char* a = const_cast<char *>(a_value.c_str());
is never safe. You have yielded a pointer to memory that is writeable. However, that memory was never designed to be written to. There is no guarantee that writing to that memory will actually modify the string (and actually no guarantee that it won't cause a core dump). It's undefined behaviour - absolutely unsafe.
reference here: http://en.cppreference.com/w/cpp/string/basic_string/c_str
addressing a[2] is safe provided you can prove that all possible code paths ensure that a represents a pointer to memory longer than 2 chars.
If you want safety, use either:
auto ch = a_string.at(2); // will throw an exception if a_string is too short.
or
if (a_string.length() > 2) {
auto ch = a_string[2];
}
else {
// do something else
}
Everyone explained very well for most how it's safe, but i'd like to extend a bit if that's ok.
Since you're in C++, and you're using a string, you can simply do the following to access a caracter (and you won't have any trouble, and you still won't have to deal with cstrings in cpp :
std::string a_value = "abcd";
std::cout << a_value.at(2);
Which is in my opinion a better option rather than going out of the way.
string::at will return a char & or a const char& depending on your string object. (In this case, a const char &)
In this case you can treat char* as an array of chars (C-string). Parenthesis is allowed.

Memory leak when using smart pointers

Consider the following function:
unique_ptr<char> f(const wstring key, const unsigned int length)
{
assert(length <= key.length());
const wstring suffix = key.substr(length, key.length() - length);
const size_t outputSize = suffix.length() + 1; // +1 for null terminator
char * output = new char[outputSize];
size_t charsConverted = 0;
const wchar_t * outputWide = suffix.c_str();
wcstombs_s(&charsConverted, output, outputSize, outputWide, suffix.length());
return unique_ptr<char>(output);
}
The intent here is to accept a wstring, select length characters from the end, and return them as a C-style string that's wrapped in a unique_ptr (as required by another library - I certainly didn't chose that type :)).
One of my peers said in passing that he thinks this leaks memory, but he didn't have time to elaborate, and I don't see it. Can anybody spot it, and if so explain how I ought to fix it? I probably have my blinders on.
It's not necessarily a leak, but it is undefined behavior. You created the char array using new[] but the unique_ptr<char> will call delete, and not delete[] to free the memory. Use unique_ptr<char[]> instead.
Also, your conversion may not always behave the way you want it to. You should make 2 calls to wcstombs_s, in the first one pass nullptr as the second argument. This will return the number of characters required in the output string.
wcstombs_s(&charsConverted, nullptr, 0, outputWide, suffix.length());
Check the return value, and then use the result stored in charsConverted to allocate the output buffer.
auto output = std::unique_ptr<char[]>(new char[charsConverted]);
// now use output.get() to get access to the raw pointer

How to convert a vector<wstring> to a wchar_t**?

I need to create a C compatible (friendly) return type so that my C++ functions can be used to work with C-based functions.
How I can convert a vector of wstring to a wchar_t** array?
You can iterate through the wstring vector and add each wstring::c_str() to your wchart_t** array.
Far better to avoid doing this at all if you possibly can.
If you really have no choice, you'd basically do something like allocating an array of pointers, then allocating space for each string, and copying each individual string in the input to the buffer you allocated.
wchar_t *dupe_string(std::wstring const &input) {
wchar_t *ret = new wchar_t[input.size()+1];
wcscpy(ret, input.c_str());
return ret;
}
wchar_t **ruin(std::vector<std::wstring> const &input) {
wchar_t **trash = new wchar_t*[input.size()];
for (int i=0; i<input.size(); i++)
trash[i] = dupe_string(input[i]);
return trash;
}
Based on the comments, however, I have some misgivings about this applying to the current situation though -- this assumes the input is wide strings, which would typically mean UTF-16 or UTF-32/UCS-4. If the input is really in the form of UTF-8, then the storage elements you're dealing with will really be char, not wchar_t, so your input should be narrow strings (std::string) and the matching output char ** rather than wchar_t **.
wstring is a templated instantiation of basic_string, so its c_str() function returns wchar_t*.
So, you can do something like
std::vector<const wchar_t*> pointers;
pointers.reserve(wstrVec.size());
for (auto it = wstrVec.begin(); it != wstrVec.end(); ++it) {
pointers.push_back(it->c_str());
}
const whcar_t** cptr = pointers.data();
Without more context it's difficult to advise the best way to deal with scope/lifetime issues. Are you writing a library (which suggests you have no control over scope) or providing an api for callbacks from C code you are supervising?
A common approach is to provide a sizing api so that the caller can provide a destination buffer of appropriate size:
size_t howManyWstrings()
{
return wstrVec.size();
}
bool getWstrings(const wchar_t** into, size_t intoSize /*in pointers*/)
{
const size_t vecSize = wstrVec.size();
if (intoSize < vecSize || into == nullptr)
return false;
for (size_t i = 0; i < vecSize; ++i) {
into[i] = wstrVec[i].c_str();
}
return true;
}
It sounds like your C function is expecting a pointer to a wchar_t buffer, and to be able to move this pointer around.
Well, this is mostly easy, though you'll have to manage the lifetime of the pointer. To that end, I suggest not doing this as a return type (and thus letting C ruin your API, not to mention your code's sanity), but performing this logic at the call site of the C function:
/** A function that produces your vector */
std::vector<wchar_t> foo();
/** The C function in question */
void theCFunction(wchar_t**);
int main()
{
std::vector<wchar_t> v = foo();
wchar_t* ptr = &v[0];
theCFunction(&ptr);
}
BTW from the question and some comments it sounds like you misunderstand what char and wchar_t are — they sit below the encoding layer and if you have UTF-8 then you should be storing each byte of your UTF-8 string as, well, as a single byte. This means using chars, as in a std::string. Sure, each individual byte in that string will not necessarily represent a single logical unicode character, but then that is not the point of it.
This is the function for converting a vector of std::wstring to a wchar_t** based string. It also won't leak any memory because of using that DisposeBuffer(); call unlike other answers.
wchar_t ** xGramManipulator::GetCConvertedString(vector< wstring> const &input)
{
DisposeBuffer(); //This is to avoid memory leak for calling this function multiple times
cStringArraybuffer = new wchar_t*[input.size()]; //cStringArraybuffer is a member variable of type wchar_t**
for (int i = 0; i < input.size(); i++)
{
cStringArraybuffer[i] = new wchar_t[input[i].size()+1];
wcscpy_s(cStringArraybuffer[i], input[i].size() + 1, input[i].c_str());
cStringArraySize++;
}
return cStringArraybuffer;
}
And this is the DisposeBuffer Helper Function to avoid memory leaks:
void xGramManipulator::DisposeBuffer(void)
{
for (size_t i = 0; i < cStringArraySize; i++)
{
delete [] cStringArraybuffer[i];
}
delete [] cStringArraybuffer;
cStringArraybuffer = nullptr;
cStringArraySize = 0;
}
And prior to these allocate a dummy space in your constructor:
xGramManipulator::xGramManipulator()
{
//allocating dummy array so that when we try to de-allocate it in GetCConvertedString(), dont encounter any undefined behavior
cStringArraybuffer = new wchar_t*[1];
cStringArraySize = 0;
for (int i = 0; i < 1; i++)
{
cStringArraybuffer[i] = new wchar_t[1 + 1];
cStringArraySize++;
}
}
And it's all done.

C++: How to use new to find store for function return value?

I'm reading the 3rd edition of The C++ Programming Language by Bjarne Stroustrup and attempting to complete all the exercises. I'm not sure how to approach exercise 13 from section 6.6, so I thought I'd turn to Stack Overflow for some insight. Here's the description of the problem:
Write a function cat() that takes two C-style string arguments and
returns a single string that is the concatenation of the arguments.
Use new to find store for the result.
Here's my code thus far, with question marks where I'm not sure what to do:
? cat(char first[], char second[])
{
char current = '';
int i = 0;
while (current != '\0')
{
current = first[i];
// somehow append current to whatever will eventually be returned
i++;
}
current = '';
i = 0;
while (current != '\0')
{
current = second[i];
// somehow append current to whatever will eventually be returned
i++;
}
return ?
}
int main(int argc, char* argv[])
{
char first[] = "Hello, ";
char second[] = "World!";
? = cat(first, second);
return 0;
}
And here are my questions:
How do I use new to find store? Am I expected to do something like std::string* result = new std::string; or should I be using new to create another C-style string somehow?
Related to the previous question, what should I return from cat()? I assume it will need to be a pointer if I must use new. But a pointer to what?
Although the problem doesn't mention using delete to free memory, I know I should because I will have used new to allocate. Should I just delete at the end of main, right before returning?
How do I use new to find store? Am I expected to do something like std::string* result = new std::string; or should I be using new to create another C-style string somehow?
The latter; the method takes C-style strings and nothing in the text suggests that it should return anything else. The prototype of the function should thus be char* cat(char const*, char const*). Of course this is not how you’d normally write functions; manual memory management is completely taboo in modern C++ because it’s so error-prone.
Although the problem doesn't mention using delete to free memory, I know I should because I will have used new to allocate. Should I just delete at the end of main, right before returning?
In this exercise, yes. In the real world, no: like I said above, this is completely taboo. In reality you would return a std::string and not allocate memory using new. If you find yourself manually allocating memory (and assuming it’s for good reason), you’d put that memory not in a raw pointer but a smart pointer – std::unique_ptr or std::shared_ptr.
In a "real" program, yes, you would use std::string. It sounds like this example wants you to use a C string instead.
So maybe something like this:
char * cat(char first[], char second[])
{
char *result = new char[strlen(first) + strlen(second) + 1];
...
Q: How do you "append"?
A: Just write everything in "first" to "result".
As soon as you're done, then continue by writing everything in "second" to result (starting where you left off). When you're done, make sure to append '\0' at the end.
You are supposed to return a C style string, so you can't use std::string (or at least, that's not "in the spirit of the question"). Yes, you should use new to make a C-style string.
You should return the C-style string you generated... So, the pointer to the first character of your newly created string.
Correct, you should delete the result at the end. I expect it may be ignored, as in this particular case, it probably doesn't matter that much - but for completeness/correctness, you should.
Here's some old code I dug up from a project of mine a while back:
char* mergeChar(char* text1, char* text2){
//Find the length of the first text
int alen = 0;
while(text1[alen] != '\0')
alen++;
//Find the length of the second text
int blen = 0;
while(text2[blen] != '\0')
blen++;
//Copy the first text
char* newchar = new char[alen + blen + 1];
for(int a = 0; a < alen; a++){
newchar[a] = text1[a];
}
//Copy the second text
for(int b = 0; b < blen; b++)
newchar[alen + b] = text2[b];
//Null terminate!
newchar[alen + blen] = '\0';
return newchar;
}
Generally, in a 'real' program, you'll be expected to use std::string, though. Make sure you delete[] newchar later!
What the exercise means is to use new in order to allocate memory. "Find store" is phrased weirdly, but in fact that's what it does. You tell it how much store you need, it finds an available block of memory that you can use, and returns its address.
It doesn't look like the exercise wants you to use std::string. It sounds like you need to return a char*. So the function prototype should be:
char* cat(const char first[], const char second[]);
Note the const specifier. It's important so that you'll be able to pass string literals as arguments.
So without giving the code out straight away, what you need to do is determine how big the resulting char* string should be, allocate the required amount using new, copy the two source strings into the newly allocated space, and return it.
Note that you normally don't do this kind of memory management manually in C++ (you use std::string instead), but it's still important to know about it, which is why the reason for this exercise.
It seems like you need to use new to allocate memory for a string, and then return the pointer. Therefore the return type of cat would be `char*.
You could do do something like this:
int n = 0;
int k = 0;
//also can use strlen
while( first[n] != '\0' )
n ++ ;
while( second[k] != '\0' )
k ++ ;
//now, the allocation
char* joint = new char[n+k+1]; //+1 for a '\0'
//and for example memcpy for joining
memcpy(joint, first, n );
memcpy(joint+n, second, k+1); //also copying the null
return joint;
It is telling you to do this the C way pretty much:
#include <cstring>
char *cat (const char *s1, const char *s2)
{
// Learn to explore your library a bit, and
// you'll see that there is no need for a loop
// to determine the lengths. Anything C string
// related is in <cstring>.
//
size_t len_s1 = std::strlen(s1);
size_t len_s2 = std::strlen(s2);
char *dst;
// You have the lengths.
// Now use `new` to allocate storage for dst.
/*
* There's a faster way to copy C strings
* than looping, especially when you
* know the lengths...
*
* Use a reference to determine what functions
* in <cstring> COPY values.
* Add code before the return statement to
* do this, and you will have your answer.
*
* Note: remember that C strings are zero
* terminated!
*/
return dst;
}
Don't forget to use the correct operator when you go to free the memory allocated. Otherwise you'll have a memory leak.
Happy coding! :-)

C++ error - returning a char array

Consider the following code:
char CeaserCrypt(char str[256],int key)
{
char encrypted[256],encryptedChar;
int currentAsci;
encrypted[0] = '\0';
for(int i = 0; i < strlen(str); i++)
{
currentAsci = (int)str[i];
encryptedChar = (char)(currentAsci+key);
encrypted[i] = encryptedChar;
}
return encrypted;
}
Visual Studio 2010 gives an error because the function returns an array. What should I do?
My friend told me to change the signature to void CeaserCrypt(char str[256], char encrypted[256], int key). But I don't think that is correct. How can I get rid of the compile error?
The return type should be char * but this'll only add another problem.
encrypted is "allocated" on the stack of CeaserCrypt and might not be valid when the function returns. Since encrypted would have the same length as the input, do:
int len = strlen(str);
char *encrypted = (char *) malloc(len+1);
encrypted[len] = '\0';
for (int i = 0; i < len; i++) {
// ...
}
Don't forget to deallocate the buffer later, though (with free()).
EDIT: #Yosy: don't feel obliged to just copy/paste. Use this as a pointer to improve your coding practice. Also, to satisfy criticizers: pass an already allocated pointer to your encryption routine using the above example.
It wants you to return a char* rather than a char. Regardless, you shouldn't be returning a reference or a pointer to something you've created on the stack. Things allocated on the stack have a lifetime that corresponds with their scope. After the scope ends, those stack variables are allowed to go away.
Return a std::vector instead of an array.
std::vector<char> CeaserCrypt(char str[256],int key)
{
std::vector<char> encrypted(256);
char encryptedChar;
int currentAsci;
encrypted[0] = '\0';
for(int i = 0; i < strlen(str); ++i)
{
currentAsci = (int)str[i];
encryptedChar = (char)(currentAsci+key);
encrypted[i] = encryptedChar;
}
return encrypted;
}
There's another subtle problem there though: you're casting an integer to a character value. The max size of an int is much larger than a char, so your cast may truncate the value.
Since you're using C++ you could just use an std::string instead. But otherwise, what your friend suggested is probably best.
There are a few problems here. First up:
char CeaserCrypt(char str[256],int key)
As others have pointed out, your return type is incorrect. You cannot return in a single character an entire array. You could return char* but this returns a pointer to an array which will be allocated locally on the stack, and so be invalid once the stack frame is removed (after the function, basically). In English, you'll be accessing that memory address but who knows what's going to be there...
As your friend suggested, a better signature would be:
void CeaserCrypt(char* encrypted, const char str*, const size_t length ,int key)
I've added a few things - a size_t length so you can process any length string. This way, the size of str can be defined as needed. Just make sure char* encrypted is of the same size.
Then you can do:
for(int i = 0; i < length; i++)
{
// ...
For this to work your caller is going to need to have allocated appropriately-sized buffers of the same length, whose length you must pass in in the length parameter. Look up malloc for C. If C++, use a std::string.
If you need C compatibility make encrypted string function argument.
If not, than use C++ std::string instead C style string.
And also In your code encrypted string isn't ending with '\0'
The problem with the original code is that you are trying to return a char* pointer (to which your local array decayed) from a function that is prototyped as one returning a char. A function cannot return arrays in C, nor in C++.
Your friend probably suggested that you change the function in such a way, that the caller is responsible for allocation the required buffer.
Do note, that the following prototypes are completely equal. You can't pass an array as a parameter to normal function.
int func(char array[256]);
int func(char* array);
OTOH, you should (if you can!) decide the language which you use. Better version of the original (in C++).
std::vector<unsigned char> CeaserCrypt(const std::string& str, const int key)
{
std::vector<unsigned char> encrypted(str.begin(), str.end());
for (std::vector<unsigned char>::iterator iter = vec.begin();
iter != vec.end(); ++iter) {
*iter += key;
}
return vec;
}
Do note that overflowing a signed integer causes undefined behavior.
VS2010 is "yelling" at you because you are trying to return a value that is allocated on the stack, and is no longer valid once your function call returns.
You have two choices: 1) Allocate memory on the heap inside your function, or 2) use memory provided to you by the caller. Number 2 is what your friend in suggesting and is a very good way to do things.
For 1, you need to call malloc() or new depending on whether you are working in C or C++. In C, I'd have the following:
char* encrypted = malloc(256 * sizeof(char));
For C++, if you don't want to use a string, try
char* encrypted = new char[256];
Edit: facepalm Sorry about the C noise, I should have looked at the question more closely and realized you are working in C++.
You can just do your Ceaser cipher in place, no need to pass arrays in and out.
char * CeaserCrypt(char str[256], int key)
{
for(unsigned i = 0; i < strlen(str); i++)
{
str[i] += key;
}
return str;
}
As a further simplification, skip the return value.
void CeaserCrypt(char str[256], int key)
{
for(unsigned i = 0; i < strlen(str); i++)
{
str[i] += key;
}
}
well what you're returning isn't a char, but a char array. Try changing the return type to char*(char* and a char array are ostensibly the same thing for the compiler)
char* CeaserCrypt(char str[256],int key)
EDIT: as said in other posts, the encrypted array will probably not be valid after the function call. you could always do a new[] declaration for encrypted, remembering to delete[] it later on.