This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Is a string literal in c++ created in static memory?
If I do:
const char* StringPtr = "string0",
then it is definitely somewhere in the memory, and I can get the address of StringPtr.
But if I do:
#define STRING0 "string0", then where does STRING0 reside?
Or, is STRING0 not existing in memory because compiler replace using of STRING0 by "string0"?
As far as I've known, whenever you write any string in your code, compiler must put it somewhere in the memory, but I don't know the exact behavior of it.
But I am not very sure about this.
Can anyone explain how strings that are #define-ed or declared as char* are manipulated by the compiler?
Also, which one is better? To #define, extern const char* or extern const std::stringin the header file for strings?
Thanks!
In almost all cases, the compiler is allowed to put a string literal wherever it wants. There might be one copy for each time the literal appears in source code, or one master copy shared among the instances.
This causes trouble sometimes in C where const doesn't mean the same thing and you are allowed to modify the memory. On one platform all the identical strings get changed, while on another changes don't propagate. As of C++11 string literals don't implicitly lose constness and the mistake is harder to make.
The strings will all be initialized before the program starts, so in effect they are part of the executable binary image. That much is certain.
What would be different is this:
const char StringPtr[] = "string0",
This defines a dedicated array object with a unique address.
stringPtr resides in the executable's data section. If you open your exe in a text editor you will be able to search for it.
Data Segment
The macro exists only for the duration of the preprocessing stage of building your program.
Depending on your compiler, if you use the macro method you can end up with several separate instances of an identical string in your exe, but if you use the char* method you can use just a single instance.
#define STRING0
STRING0 does NOT reside in memory. It does NOT even exist during compilation. In PRE-compilation all occurances of STRING0 are replaced with "string0" by the preprocessor. After this stage, none of the following stages or the compiled applications know of the existance of any symbol of the name STRING0
Once this happens, many of not all instances will end up as unique string literals(your const char* case) all over your code. The answer to where these are stored in memory is better answered by #Potatoswatter and the link provided by #silico
#define is a preprocessor macro. It will replace STRING0 with "string0" during the precompile stage before the code is then compiled.
"string0" resides in the executable's static read-only memory.
StringPtr is a variable, that is why you can take its address. It simply points at the memory address of "string0".
When you do the #define, there is not the compiler, but the preprocessor who replaces, textually, STRING0 with "string0" in the pre-processed source file, before passing it to the compiler proper.
The compiler never sees the STRING0, but only sees "string0" everywhere that you wrote STRING0.
edit:
Each instance of "string0" that replaces the STRING0 that you wrote in the source file is a string literals per se. If those string literals are guaranteed (or declared) as invariant, then the compiler might optimize memory allocation by storing a single copy of this "string0" and point other uses towards that copy (I rephrased this paragraph in edit).
(edit: those identical literal string constants might be merged into a singled copy, however this is is up to the compiler. THe standard does not require or enforce it: http://www.velocityreviews.com/forums/t946521-merging-of-string-literals-guaranteed-by-c-std.html )
As for your last question: the most portable is to declare those as: const char *
later edit: the best discussion about the string literals that I found so far is here: https://stackoverflow.com/a/2245983/1284631
Also, beware that a string literal can also appear in the initialization of statically-allocated char array, when it cannot be merged with other copies of it, since the content of the static array may be overwritten. See the example below, where the two identical string literals "hello" cannot be merged:
#include <stdio.h>
#include <string.h>
int main(){
char x[50]="hello";
printf("x=%s, &x[0]=%p\n",x,&x[0]);
const char *y="hello";
printf("y=%s, &y[0]=%p\n",y,&y[0]);
strcpy(&x[0],"zz");
printf("x=%s, &x[0]=%p\n",x,&x[0]);
return 0;
}
The output of this code is:
x=hello, &x[0]=0x7fff8a964370
y=hello, &y[0]=0x400714
x=zz, &x[0]=0x7fff8a964370
Related
Is it possible to have a function like this:
const char* load(const char* filename_){
return
#include filename_
;
};
so you wouldn't have to hardcode the #include file?
Maybe with a some macro?
I'm drawing a blank, guys. I can't tell if it's flat out not possible or if it just has a weird solution.
EDIT:
Also, the ideal is to have this as a compile time operation, otherwise I know there's more standard ways to read a file. Hence thinking about #include in the first place.
This is absolutely impossible.
The reason is - as Justin already said in a comment - that #include is evaluated at compile time.
To include files during run time would require a complete compiler "on board" of the program. A lot of script languages support things like that, but C++ is a compiled language and works different: Compile and run time are strictly separated.
You cannot use #include to do what you want to do.
The C++ way of implementing such a function is:
Find out the size of the file.
Allocate memory for the contents of the file.
Read the contents of the file into the allocated memory.
Return the contents of the file to the calling function.
It will better to change the return type to std::string to ease the burden of dealing with dynamically allocated memory.
std::string load(const char* filename)
{
std::string contents;
// Open the file
std::ifstream in(filename);
// If there is a problem in opening the file, deal with it.
if ( !in )
{
// Problem. Figure out what to do with it.
}
// Move to the end of the file.
in.seekg(0, std::ifstream::end);
auto size = in.tellg();
// Allocate memory for the contents.
// Add an additional character for the terminating null character.
contents.resize(size+1);
// Rewind the file.
in.seekg(0);
// Read the contents
auto n = in.read(contents.data(), size);
if ( n != size )
{
// Problem. Figure out what to do with it.
}
contents[size] = '\0';
return contents;
};
PS
Using a terminating null character in the returned object is necessary only if you need to treat the contents of the returned object as a null terminated string for some reason. Otherwise, it maybe omitted.
I can't tell if it's flat out not possible
I can. It's flat out not possible.
Contents of the filename_ string are not determined until runtime - the content is unknown when the pre processor is run. Pre-processor macros are processed before compilation (or as first step of compilation depending on your perspective).
When the choice of the filename is determined at runtime, the file must also be read at runtime (for example using a fstream).
Also, the ideal is to have this as a compile time operation
The latest time you can affect the choice of included file is when the preprocessor runs. What you can use to affect the file is a pre-processor macro:
#define filename_ "path/to/file"
// ...
return
#include filename_
;
it is theoretically possible.
In practice, you're asking to write a PHP construct using C++. It can be done, as too many things can, but you need some awkward prerequisites.
a compiler has to be linked into your executable. Because the operation you call "hardcoding" is essential for the code to be executed.
a (probably very fussy) linker again into your executable, to merge the new code and resolve any function calls etc. in both directions.
Also, the newly imported code would not be reachable by the rest of the program which was not written (and certainly not compiled!) with that information in mind. So you would need an entry point and a means of exchanging information. Then in this block of information you could even put pointers to code to be called.
Not all architectures and OSes will support this, because "data" and "code" are two concerns best left separate. Code is potentially harmful; think of it as nitric acid. External data is fluid and slippery, like glycerine. And handling nitroglycerine is, as I said, possible. Practical and safe are something completely different.
Once the prerequisites were met, you would have two or three nice extra functions and could write:
void *load(const char* filename, void *data) {
// some "don't load twice" functionality is probably needed
void *code = compile_source(filename);
if (NULL == code) {
// a get_last_compiler_error() would be useful
return NULL;
}
if (EXIT_SUCCESS != invoke_code(code, data)) {
// a get_last_runtime_error() would also be useful
release_code(code);
return NULL;
}
// it is now the caller's responsibility to release the code.
return code;
}
And of course it would be a security nightmare, with source code left lying around and being imported into a running application.
Maintaining the code would be a different, but equally scary nightmare, because you'd be needing two toolchains - one to build the executable, one embedded inside said executable - and they wouldn't necessarily be automatically compatible. You'd be crying loud for all the bugs of the realm to come and rejoice.
What problem would be solved?
Implementing require_once in C++ might be fun, but you thought it could answer a problem you have. Which is it exactly? Maybe it can be solved in a more C++ish way.
A better alternative, considering also performances etc., to compile a loadable module beforehand, and load it at runtime.
If you need to perform small tunings to the executable, place parameters into an external configuration file and provide a mechanism to reload it. Once the modules conform to a fixed specification, you can even provide "plugins" that weren't available when the executable was first developed.
Currently I'm writing C++ code on Arduino. In an example, I find a expression
Serial.print(F("Disconnected from central:"));
It's obvious that this statement is used to send string to the serial, but why it uses F(string) instead of using string directly?
I try to google it but with no results. If someone know it, I would be greatly appreciated.
This macro is Arduino-specific, it's not "C++" as such.
It places the string in flash memory, to conserve RAM.
It means the string cannot be modified when the program runs.
One current definition is:
#define F(slit) (reinterpret_cast<const __FlashStringHelper *>(PSTR(slit)))
See the source code for more.
F() is one of the most powerful functions, which was added with the 1.0 release of the IDE. I keep mixing the terms macro and function. F() really isn’t a function, it is a #define macro which lives in WString.h
#define F(string_literal) (reinterpret_cast<const __FlashStringHelper *>(PSTR(string_literal)))
It is a macro, which use for storing strings in flash memory rather than RAM.
For a program written in C++, I need two huge arrays of strings that contain data.
They are defined in a header file as follows:
#include <string>
static const string strdataA[30000]={"this is the first line of the data",
"the second line of data",
"other stuff in the third line",
down to
"last line."};
//second array strings
static const string strdataB[60000]={"this is the first line of the data",
"the second line of data",
"other stuff in the third line",
down to
"last line."};
But when I compile this with g++, it takes so long that I have not seen it complete. It also uses about two GB of virtual memory. So I commented out strdataB[], and then the program did compile, but still after a long while. The executable was only about 8 Mb and worked fine.
What can I do in order to speed up the compiling process? I don't mind if I have to change the code, but I don't want to use an external file to load from. I would like an array because it works extremely well for me inside the program.
I read on the net somewhere that "static const" should do the trick, but I learned by experience that it doesn't.
Thanks a lot in advance for any suggestions!
You should not use std::string for that. Use instead plain old const char*:
const char * const strdataA[30000] = {
"one",
"two",
//...
};
The static keyword shouldn't make much of a difference here.
This way, the strings themselves will be stored in the read-only data section as simple literals, and the array itself will be simply an array of pointers. Also, you avoid running the strings constructors/destructors at runtime.
I believe these are known issues in GCC. You do not say what version of GCC you are using, maybe you should try with the newest stable release of GCC, to see if it does or does not improve things.
You probably should not keep all of your string in source code any. You should probably load them from external file at startup or such.
What can I do in order to speed up the compiling process?
const char* strdataA ... should speed up the compilation process. Because in your current version g++ must create huge list of constructor calls for every single string.
While passing a character pointer used to reference a string by its address (i.e. directly via its name or &name[0]) the original string must get passed, since we are passing by address.
However, after executing the following code, I got two different values of address for the first element, which, surprisingly, are 2 bytes apart.
Also, modifying the contents of the string in the function, didn't change the content of the array passed, but this is because a new string will have generated a new address, right?
But about the address of the first element being different, how is that possible?
#include<conio.h>
#include<stdio.h>
#include<iostream.h>
void fn(char *arr)
{
cout<<endl<<&arr;
arr="hi";
}
void main()
{
clrscr();
char *arr="hey";
cout<<endl<<"main "<<&arr;//the address is different from that in fn
fn(arr);
cout<<endl<<arr;
}
You are passing a pointer by value, and then comparing the address of the pointer and the copy, which of course differ. If you want to check that they point to the same memory address you can do that:
std::cout << (void*)arr << std::endl;
modifying the contents of the string in the function, didnt change the content of the array passed
You are not modifying the contents of the string, but rather reassigning the copy of the pointer to point to a different string literal. Also note that modifying the pointed memory (the literal) would be undefined behavior.
The only reason that the compiler let the code through (i.e. compiled it) is that there is a backwards compatibility feature that allows you to have a char* that points to the contents of a string a literal (of type const char[]). You should have got a warning and you should avoid doing that.
Just an FYI. I was unable to comment on a similar question about passing character arrays because it was closed as a duplicate, but the issue is fairly important so hopefully you don't mind if i cross-post.
When using strings in a production application you usually go with UTF-8 because it significantly increases the market without a lot of effort.
http://www.joelonsoftware.com/articles/Unicode.html
Most applications also use a string class to encapsulate the characters. Then you can use something like:
void fn(..., const std::string &static_string, ...);
in your header. If you use a library like gettext, your code looks like:
printf(gettext("and suddenly there's one line which is good.."));
where the english strings act as intuitive indices into localization files and you can rapidly and easily switch languages at install or runtime.
If you can't use a class because you're using C then the gettext docs cover this case as well.
In my application i'm declaring a string variable near the top of my code to define the name of my window class which I use in my calls to RegisterClassEx, CreateWindowEx etc.. Now, I know that an LPCTSTR is a typedef and will eventually follow down to a TCHAR (well a CHAR or WCHAR depending on whether UNICODE is defined), but I was wondering whether it would be better to use this:
static LPCTSTR szWindowClass = TEXT("MyApp");
Or this:
static const TCHAR szWindowClass[] = TEXT("MyApp");
I personally prefer the use of the LPCTSTR as coming from a JavaScript, PHP, C# background I never really considered declaring a string as an array of chars.
But are there actually any advantages of using one over the other, or does it in fact not even make a difference as to which one I choose?
Thank you, in advanced, for your answers.
The two declarations are not identical. The first creates a pointer, the second an array of TCHAR. The difference might not be apparent, because an array will decompose into a pointer if you try to use it, but you'll notice it instantly if you try to put them into a structure for example.
The equivalent declaration to LPCTSTR is:
static const TCHAR * szWindowClass = TEXT("MyApp");
The "L" in LPCTSTR stands for "Long", which hasn't been relevant since 16-bit Windows programming and can be ignored.
Since Unicode strings are native from Windows NT, unless you want your application to run on ANSI-native Windows 9x, always use wide-character strings (WCHAR or wchar_t types).
Relative to your question, both forms may seem equal; but altough both are expected to be allocated in the constant string section of your executable, string literals are not necessarily modifiable when used as array initializers.
e.g: (from C Faq, 16.6) The following code can crash:
char *p = "HELLO";
p[0] = 'H';
So it's better always to use:
char a[] = "HELLO";
Again, from C Faq:
A string literal can be used in two slightly different ways.
As an array initializer (as in the declaration of char a[]), it specifies the initial values of the characters in that array. Anywhere else, it turns into an unnamed, static array of characters, which may be stored in read-only memory, which is why you can't safely modify it. In an expression context, the array is converted at once to a pointer, as usual (see section 6), so the second declaration initializes p to point to the unnamed array's first element.
http://linuxdude.com/Steve_Sumit/C-faq/q1.32.html
The array form is preferable for string literals of this type. The data and code involved takes up (very, very marginally) less space, and the variable can't be changed to point at a different string. (In fact, there's no variable at all, just a name for the address of the first char in the string -- it behaves very much like a literal value in this respect.)