Passing character arrays in C++ - c++

While passing a character pointer used to reference a string by its address (i.e. directly via its name or &name[0]) the original string must get passed, since we are passing by address.
However, after executing the following code, I got two different values of address for the first element, which, surprisingly, are 2 bytes apart.
Also, modifying the contents of the string in the function, didn't change the content of the array passed, but this is because a new string will have generated a new address, right?
But about the address of the first element being different, how is that possible?
#include<conio.h>
#include<stdio.h>
#include<iostream.h>
void fn(char *arr)
{
cout<<endl<<&arr;
arr="hi";
}
void main()
{
clrscr();
char *arr="hey";
cout<<endl<<"main "<<&arr;//the address is different from that in fn
fn(arr);
cout<<endl<<arr;
}

You are passing a pointer by value, and then comparing the address of the pointer and the copy, which of course differ. If you want to check that they point to the same memory address you can do that:
std::cout << (void*)arr << std::endl;
modifying the contents of the string in the function, didnt change the content of the array passed
You are not modifying the contents of the string, but rather reassigning the copy of the pointer to point to a different string literal. Also note that modifying the pointed memory (the literal) would be undefined behavior.
The only reason that the compiler let the code through (i.e. compiled it) is that there is a backwards compatibility feature that allows you to have a char* that points to the contents of a string a literal (of type const char[]). You should have got a warning and you should avoid doing that.

Just an FYI. I was unable to comment on a similar question about passing character arrays because it was closed as a duplicate, but the issue is fairly important so hopefully you don't mind if i cross-post.
When using strings in a production application you usually go with UTF-8 because it significantly increases the market without a lot of effort.
http://www.joelonsoftware.com/articles/Unicode.html
Most applications also use a string class to encapsulate the characters. Then you can use something like:
void fn(..., const std::string &static_string, ...);
in your header. If you use a library like gettext, your code looks like:
printf(gettext("and suddenly there's one line which is good.."));
where the english strings act as intuitive indices into localization files and you can rapidly and easily switch languages at install or runtime.
If you can't use a class because you're using C then the gettext docs cover this case as well.

Related

why is this filename parameter a char pointer instead of string?

There's a function provided by my uni that is supposed to read in a file. In the parameter, there's the fileName parameter. What I don't understand is why they're using a character pointer instead of a simple string?
Also, how would I call this function with a string filename then?
I'm running C++14 on Visual studio 2017 community edition.
double* read_text(char *fileName, int sizeR, int sizeC)
{
double* data = new double[sizeR*sizeC];
int i = 0;
ifstream myfile(fileName);
if (myfile.is_open())
{
while (myfile.good())
{
if (i > sizeR*sizeC - 1) break;
myfile >> *(data + i);
i++;
}
myfile.close();
}
else cout << "Unable to open file";
//cout << i;
return data;
}
Edit: I get it guys, it's stupid. I'll post a separate question then. Thanks for the fast response!
why is this filename parameter a char pointer instead of string?
Because the designer of the API chose to do so. Most people at Stack Overflow are not the author of that API, so we cannot answer this question accurately.
However, there is another similar question that we can answer: What are reasons to use character pointer as a string argument? Let me answer that question instead:
Not using std::string as an argument allows the user of the API to not create a std::string object. This may be useful when:
The API needs to be used in "freestanding" implementations that don't provide the standard library and thus have no std::string. This probably doesn't apply to your particular example since the implementation of the function uses the standard library, but I include this argument for completeness.
The API needs to be used on systems that provide no dynamic memory allocation, which std::string requires.
The dynamic memory allocation that creation of the string may require may be too slow in the context where the API is used (this won't apply to an API that is going to read from disk, but I include the argument here for completeness).
(const) char* makes it possible to use the API from C. This may be relevant because:
The API may have originally been written for C, and has been inherited to a code base that now uses C++, but the API has not been changed in order to maintain backwards compatibility.
Providing a C compatible API allows using the API from other languages that are able to interface with C. Many languages have support for C interfaces while very few languages have support for C++ interfaces.
Also, how would I call this function with a string filename then?
You can get a pointer to a null terminated string using the c_str member function. Unfortunately the API is badly designed and the argument is non-const, while the pointer returned by c_str is const. You can const_cast the constness of the argument away in order to call this function. This is "OK" because the function doesn't actually modify the pointed string.
If you can require C++17 standard and the source string is non-const, then the data member function will be simpler as it doesn't require const_cast. Prior to C++17 there was no non-const overload, so it would require the same const casting, and prior to C++11 the pointed string was not guaranteed to be null terminated.
To make it clear: Using a non-const string argument for this function is bad design - whether that argument is a character pointer to null terminated string, or a reference to std::string.
P.S.There are other, more serious problems:
The caller of the function cannot possibly know how many numbers were read from the file. There is no way of avoiding UB in case the file has less than sizeR*sizeC values.
Returning a bare pointer that owns dynamic memory resource is a very bad design.
The loop that reads from the file checks whether the read was successful after the value has already been added to the array and the value is never overwritten, so the last element written into the array is always has an unspecified value.

Are the methods in the <cstring> applicable for string class too?

I've tried out using memcpy() method to strings but was getting a "no matching function call" although it works perfectly when I use an array of char[].
Can someone explain why?
www.cplusplus.com/reference/cstring/memcpy/
std::string is an object, not a contiguous array of bytes (which is what memcpy expects). std::string is not char*; std::string contains char* (somewhere really deep).
Although you can pull out the std::string inner byte array by using &str[0] (see note), I strongly encourage you not to. Almost anything you need to do already is implemented as a std::string method. Including appending, subtracting, transforming and anything that makes sense with a text object.
So yes, you can do something as stupid as:
std::string str (100,0);
memcpy(&str[0],"hello world", 11);
but you shouldn't.
Even if you do need memcpy behaviuor, try to use std::copy instead.
Note: this is often done with C functions that expects some buffer, while the developer wants to maintain a RAII style in his code. So he or she produces std::string object but passes it as C string. But if you do clean C++ code you don't need to.
Because there's no matching function call. You're trying to use C library functions with C++ types.

Where does #define or char* strings reside in memory? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Is a string literal in c++ created in static memory?
If I do:
const char* StringPtr = "string0",
then it is definitely somewhere in the memory, and I can get the address of StringPtr.
But if I do:
#define STRING0 "string0", then where does STRING0 reside?
Or, is STRING0 not existing in memory because compiler replace using of STRING0 by "string0"?
As far as I've known, whenever you write any string in your code, compiler must put it somewhere in the memory, but I don't know the exact behavior of it.
But I am not very sure about this.
Can anyone explain how strings that are #define-ed or declared as char* are manipulated by the compiler?
Also, which one is better? To #define, extern const char* or extern const std::stringin the header file for strings?
Thanks!
In almost all cases, the compiler is allowed to put a string literal wherever it wants. There might be one copy for each time the literal appears in source code, or one master copy shared among the instances.
This causes trouble sometimes in C where const doesn't mean the same thing and you are allowed to modify the memory. On one platform all the identical strings get changed, while on another changes don't propagate. As of C++11 string literals don't implicitly lose constness and the mistake is harder to make.
The strings will all be initialized before the program starts, so in effect they are part of the executable binary image. That much is certain.
What would be different is this:
const char StringPtr[] = "string0",
This defines a dedicated array object with a unique address.
stringPtr resides in the executable's data section. If you open your exe in a text editor you will be able to search for it.
Data Segment
The macro exists only for the duration of the preprocessing stage of building your program.
Depending on your compiler, if you use the macro method you can end up with several separate instances of an identical string in your exe, but if you use the char* method you can use just a single instance.
#define STRING0
STRING0 does NOT reside in memory. It does NOT even exist during compilation. In PRE-compilation all occurances of STRING0 are replaced with "string0" by the preprocessor. After this stage, none of the following stages or the compiled applications know of the existance of any symbol of the name STRING0
Once this happens, many of not all instances will end up as unique string literals(your const char* case) all over your code. The answer to where these are stored in memory is better answered by #Potatoswatter and the link provided by #silico
#define is a preprocessor macro. It will replace STRING0 with "string0" during the precompile stage before the code is then compiled.
"string0" resides in the executable's static read-only memory.
StringPtr is a variable, that is why you can take its address. It simply points at the memory address of "string0".
When you do the #define, there is not the compiler, but the preprocessor who replaces, textually, STRING0 with "string0" in the pre-processed source file, before passing it to the compiler proper.
The compiler never sees the STRING0, but only sees "string0" everywhere that you wrote STRING0.
edit:
Each instance of "string0" that replaces the STRING0 that you wrote in the source file is a string literals per se. If those string literals are guaranteed (or declared) as invariant, then the compiler might optimize memory allocation by storing a single copy of this "string0" and point other uses towards that copy (I rephrased this paragraph in edit).
(edit: those identical literal string constants might be merged into a singled copy, however this is is up to the compiler. THe standard does not require or enforce it: http://www.velocityreviews.com/forums/t946521-merging-of-string-literals-guaranteed-by-c-std.html )
As for your last question: the most portable is to declare those as: const char *
later edit: the best discussion about the string literals that I found so far is here: https://stackoverflow.com/a/2245983/1284631
Also, beware that a string literal can also appear in the initialization of statically-allocated char array, when it cannot be merged with other copies of it, since the content of the static array may be overwritten. See the example below, where the two identical string literals "hello" cannot be merged:
#include <stdio.h>
#include <string.h>
int main(){
char x[50]="hello";
printf("x=%s, &x[0]=%p\n",x,&x[0]);
const char *y="hello";
printf("y=%s, &y[0]=%p\n",y,&y[0]);
strcpy(&x[0],"zz");
printf("x=%s, &x[0]=%p\n",x,&x[0]);
return 0;
}
The output of this code is:
x=hello, &x[0]=0x7fff8a964370
y=hello, &y[0]=0x400714
x=zz, &x[0]=0x7fff8a964370

VC++2010 seems to only allocate a single std::string in an fixed-array

This seems simple to me but I'm having trouble actually finding anything explicitly stating this.
Does std::string stringArray[3] not create an array of std::string objects, the way that SomeType typeArray[3] would? The actual number is irrelevant; I just picked 3 arbitrarily.
In the Visual Studio 2010 debugger, it appears to create a single string as opposed to an array of strings. Why?
Wild guess: is it calling the default std::string constructor, and then invoking an unused access of index 3? If so, why doesn't that cause an out of bounds exception on an empty string?
Does this have something to do with overloading the [] operator?
There are plenty of ways to code the same things without specifically using an array of std::string without issue, but what is the explanation/justification for this behavior? It seems counterintuitive to me.
Edit: I found this thread std::string Array Element Access in which the comments on the answer appear to observe the same behavior.
Visual studio's debugger will often show you only the first thing an array or pointer points to. You will have to use stringArray[1] stringArray[2] etc to see farther than that.
The OP is NOT losing his mind. But Visual Studio's integrated debugger certainly is. I believe this is a bust in the detection of operator [] for std::string, which is sad However, it does work correctly for std::vector<std::string>
An excellent reference to how you can get around this using the watch-window (no way to do it using the Auto or Locals window afaik) can be found at this previously posted question. This is also an extremely helpful method for viewing pointer-based dynamic arrays (arrays allocated with Type *p = new Type[n];). Hint: use "varname,n" where varname is the variable (pointer or fixed array), and 'n' is the number of elements to expand.
A demonstration is in order to show a declared C-array of std::string and what the OP was observing, and a std::vector<std::string> to show what things should look like:
int main(int argc, char *argv[])
{
std::string stringArray[3];
std::vector<std::string> vecStrings(3);
return 0; // set bp *here*
}
It creates an array. Either you compiled something different or you misinterpreted the results.
It doesn't create an array in these contexts:
void function(std::string stringArray[3]) { }
This function parameter is a pointer to std::string, you can't have function parameters of array type.
extern std::string strnigArray[3];
This declares but doesn't define an array. It tells the compiler there is an array of three strings somewhere in the program call stringArray but doesn't actually create it.
Otherwise, it creates an array of three strings.
You can check that with:
assert( sizeof(stringArray) == 3*sizeof(std::string) );
or in C++11
static_assert( sizeof(stringArray) == 3*sizeof(std::string), "three strings" );
Yeah the debugger in VS 2010 doesn't do a great job here, but as others have said, it's only a display issue in the debugger, it does work as you'd expect.
In VS 2012 they've done a much better job :-

Why did this work with Visual C++, but not with gcc?

I've been working on a senior project for the last several months now, and a major sticking point in our team's development process has been dealing wtih rifts between Visual-C++ and gcc. (Yes, I know we all should have had the same development environment.) Things are about finished up at this point, but I ran into a moderate bug just today that had me wondering whether Visual-C++ is easier on newbies (like me) by design.
In one of my headers, there is a function that relies on strtok to chop up a string, do some comparisons and return a string with a similar format. It works a little something like the following:
int main()
{
string a, b, c;
//Do stuff with a and b.
c = get_string(a,b);
}
string get_string(string a, string b)
{
const char * a_ch, b_ch;
a_ch = strtok(a.c_str(),",");
b_ch = strtok(b.c_str(),",");
}
strtok is infamous for being great at tokenizing, but equally great at destroying the original string to be tokenized. Thus, when I compiled this with gcc and tried to do anything with a or b, I got unexpected behavior, since the separator used was completely removed in the string. Here's an example in case I'm unclear; if I set a = "Jim,Bob,Mary" and b="Grace,Soo,Hyun", they would be defined as a="JimBobMary" and b="GraceSooHyun" instead of staying the same like I wanted.
However, when I compiled this under Visual C++, I got back the original strings and the program executed fine.
I tried dynamically allocating memory to the strings and copying them the "standard" way, but the only way that worked was using malloc() and free(), which I hear is discouraged in C++. While I'm curious about that, the real question I have is this: Why did the program work when compiled in VC++, but not with gcc?
(This is one of many conflicts that I experienced while trying to make the code cross-platform.)
Thanks in advance!
-Carlos Nunez
This is an example of undefined behavior. You're passing the result of string::c_str(), a const char*, to strtok, which takes a char*. By modifying the contents of the std::string data, you're invoking undefined behavior (you should be getting warnings for this unless you're casting).
When are you checking the value of a and b? In get_string, or in main? get_string is passed copies of a and b, so strtok will most likely not alter the originals in main. However, it could, as you are invoking undefined behavior.
The "right way" to do this is to use malloc/free or new[]/delete[]. You're using a C function, so you're already guilty of the same crime as you would be using malloc/free. A relatively elegant yet safe way to approach this is:
char *ap = strdup(a.c_str());
const char *a_ch = strtok(ap, ",");
/* do whatever it is you do */
free(ap);
Also bear in mind that strtok uses global state, so it won't play well with threads.
Tokens will be automatically replaced by a null-character by function strtok. That is not what you can do with constant data.
To make your code safe and cross-platform consider using boost::tokenizer.
I think the code is working because of differences in string implementation. VC++ string implementation must be making copies when you pass them to a function that could potentially modify the string.