Why do fixed length char arrays in D allow direct string assignment? - d

Strings in D are just immutable arrays of chars. For example.
immutable(char[]) baba = "Baba Yaga";
such that trying something like
char[] baba = "Baba Yaga"; // ERROR!
would throw an exception Error: cannot implicitly convert expression "Baba Yaga" of type string to char[]. You need to copy the string first using its .dup property.
char[] baba = "Baba Yaga".dup;
However, why does the following work?
char[9] baba = "Baba Yaga";
Can you please explain what is going on behind the scenes?

It is actually pretty simple: the compiler knows the size of the string literal and knows the size of the static array and can see that they can simply be copied over, so it does. Then since it knows it is a new copy, immutable need not apply since that only applies to references to the original.
With char[] baba =, the compiler doesn't know all that - in your case it is trying to assign a reference to the original string, meaning the mutable/immutable mismatch applies, so it makes you explicitly clarify that you wanted a copy.

Related

Why is wrong to modify the contents of a pointer to a string litteral?

If I write:
char *aPtr = "blue"; //would be better const char *aPtr = "blue"
aPtr[0]='A';
I have a warning. The code above can work but isn't standard, it has a undefined behavior because it's read-only memory with a pointer at string litteral. The question is:
Why is it like this?
with this code rather:
char a[]="blue";
char *aPtr=a;
aPtr[0]='A';
is ok. I want to understand under the hood what happens
The first is a pointer to a read-only value created by the compiler and placed in a read-only section of the program. You cannot modify the characters at that address because they are read-only.
The second creates an array and copies each element from the initializer (see this answer for more details on that). You can modify the contents of the array, because it's a simple variable.
The first one works the way it does because doing anything else would require dynamically-allocating a new variable, and would require garbage collection to free it. That is not how C and C++ work.
The primary reason that string literals can't be modified (without undefined behavior) is to support string literal merging.
Long ago, when memory was much tighter than today, compiler authors noticed that many programs had the same string literals repeated many times--especially things like mode strings being passed to fopen (e.g., f = fopen("filename", "r");) and simple format strings being passed to printf (e.g., printf("%d\n", a);).
To save memory, they'd avoid allocating separate memory for each instance of these strings. Instead, they'd allocate one piece of memory, and point all the pointers at it.
In a few cases, they got even trickier than that, to merge literals that were't even entirely identical. For example consider code like this:
printf("%s\t%d\n", a);
/* ... */
printf("%d\n", b);
In this case, the string literals aren't entirely identical, but the second one is identical part of the end of the first. In this case, they'd still allocate one piece of memory. One pointer would point to the beginning of the memory, and the other to the position of the %d in that same block of memory.
With a possibility (but no requirement for) string literal merging, it's essentially impossible to say what behavior you'll get when you modify a string literal. If string literals are merged, modifying one string literal might modify others that are identical, or end identically. If string literals are not merged, modifying one will have no effect on any other.
MMUs added another dimension: they allowed memory to be marked as read-only, so attempting to modify a string literal would result in a signal of some sort--but only if the system had an MMU (which was often optional at one time) and also depending on whether the compiler/linker decided to put the string literals in memory they'd marked constant or not.
Since they couldn't define what the behavior would be when you modified a string literal, they decided that modifying a string literal would produce undefined behavior.
The second case is entirely different. Here you've defined an array of char. It's clear that if you define two separate arrays, they're still separate, regardless of content, so modifying one can't possibly affect the other. The behavior is clear and always has been, so doing so gives defined behavior. The fact that the array in question might be initialized from a string literal doesn't change that.

converting a string to a c string

m working on some homework but don't even know where to start on this one. If you could can you throw me in the right direction. This is what i'm suppose to do
Write your own version of the str_c function that takes a C++ string as an argument (with the parameter set as a constant reference variable) and returns a pointer to the equivalent C-string. Be sure to test it with an appropriate driver.
There are different possibilities to write such a function.
First, take a look at the C++ reference for std::string, which is the starting point for your problem.
In the Iterator section on that page, you might find some methods which can help you to get the string character by character.
It can also help to read the documentation for the std::string::c_str method, you'd like to imitate: string::c_string. It's important to understand, how the system works with normal C-strings (char*):
Due to the fact, that a C-string has now length- or size-attribute, a trick is used to determine the end of the string: The last character in the string has to be a '\0'.
Make sure you understand, that a char* string can also be seen as array of characters (char[]). This might help you, when understanding and solving your problem.
as we know, C-string is null-terminated array of char. you can put char by char from std::string to an array of char, and then closed with '\0'. and remember a pointer to a char (char*) is also representation of array of char. you can use this concept

c++ best way to call function with const char* parameter type

what is the best way to call a function with the following declaration
string Extract(const char* pattern,const char* input);
i use
string str=Extract("something","input text");
is there a problem with this usage
should i use the following
char pattern[]="something";
char input[]="input";
//or use pointers with new operator and copy then free?
the both works but i like the first one but i want to know the best practice.
A literal string (e.g. "something") works just fine as a const char* argument to a function call.
The first method, i.e. passing them literally in, is usually preferable.
There are occasions though where you don't want your strings hard-coded into the text. In some ways you can say that, a bit like magic numbers, they are magic words / phrases. So you prefer to use constant identifier to store the values and pass those in instead.
This would happen often when:
1. a word has a special meaning, and is passed in many times in the code to have that meaning.
or
2. the word may be cryptic in some way and a constant identifier may be more descriptive
Unless you plain to have duplicates of the same strings, or alter those strings, I'm a fan of the first way (passing the literals directly), it means less dotting about code to find what the parameters actually are, it also means less work in passing parameters.
Seeing as this is tagged for C++, passing the literals directly allows you to easily switch the function parameters to std::string with little effort.

How do I find the memory address of a string?

I am having a mental block and I know I should know this but I need a little help.
If I declare a string variable like this:
string word = "Hello";
How do I find the memory address of "Hello"?
Edit: This is what I am trying to do...
Write a function that takes one argument, the address of a string, and prints that string once. (Note: you will need to use a pointer to complete this part.)
However, if a second argument, type int, is provided and is nonzero, the function should print the string a number of times equal to the number of times that function has been called at that point. (Note that the number of times the string is printed is not equal to the value of the second argument; it is equal to the number of times the function has been called so far.)
Use either:
std::string::data() if your data isn't null-terminated c-string like.
or
std::string::c_str() if you want the data and be guaranteed to get the null-termination.
Note that the pointer returned by either of these calls doesn't have to be the underlying data the std::string object is manipulating.
Take the address of the first character is the usual way to do it. &word[0]. However, needing to do this if you're not operating with legacy code is usually a sign that you're doing something wrong.
I guess you want a pointer to a plain old C-string? Then use word.c_str(). Note that this is not guaranteed to point to the internal storage of the string, it's just a (constant) C-string version you can work with.
You can use the c_str() function to get a pointer to the C string (const char *); however note that the pointer is invalidated whenever you modify the string; you have to invoke c_str() again as the old string may have been deallocated.
OK, so, I know this question is old but I feel like the obvious answer here is actually:
std::string your_string{"Hello"};
//Take the address of the beginning
auto start_address = &(*your_string.begin())
//Take the address at the end
auto end_address = &(*your_string.end())
In essence this will accomplish the same thing as using:
auto start_address = your_string.c_str();
auto end_address = your_string.c_str() + strlen(your_string.c_str());
However I would prefer the first approach (taking the address of the dereferenced iterator) because:
a) Guaranteed to work with begin/end compatible containers which might not have the c_str method. So for example, if you decided you wanted a QString (the string QT uses) or AWS::String or std::vector to hold your characters, the c_str() approach wouldn't work but the one above would.
b) Possibly not as costly as c_str()... which generally speaking should be implemented similarly to the call I made in the second line of code to get the address but isn't guaranteed to be implemented that way (E.g. if the string you are using is not null terminated it might require the reallocation and mutation of your whole string in order to add the null terminator... which will suddenly make it thread unsafe and very costly, this is not the case for std::string but might be for other types of strings)
c) It communicates intent better, in the end what you want is an address and the first bunch of code expresses exactly that.
So I'd say this approach is better for:
Clarity, compatibility and efficiency.
Edit:
Note that with either approach, if you change your initial string after taking the address the address will be invalidated (since the string might be rellocated). The compiler will not warn you against this and it could cause some very nasty bugs :/
Declare a pointer to the variable and then view it how you would.

initializing char arrays in a way similar to initializing string literals

Suppose I've following initialization of a char array:
char charArray[]={'h','e','l','l','o',' ','w','o','r','l','d'};
and I also have following initialization of a string literal:
char stringLiteral[]="hello world";
The only difference between contents of first array and second string is that second string's got a null character at its end.
When it's the matter of initializing a char array, is there a macro or something that allows us to put our initializing text between two double quotation marks but where the array doesn't get an extra null terminating character?
It just doesn't make sense to me that when a terminating null character is not needed, we should use syntax of first mentioned initialization and write two single quotation marks for each character in the initializer text, as well as virgule marks to separate characters.
I should add that when I want to have a char array, it should also be obvious that I don't want to use it with functions that rely on string literals along with the fact that none of features in which using string literals results, is into my consideration.
I'm thankful for your answers.
It's allowed in C to declare the array as follows, which will initialize it without copying the terminating '\0'
char c[3] = "foo";
But it's illegal in C++. I'm not aware of a trick that would allow it for C++. The C++ Standard further says
Rationale: When these non-terminated arrays are manipulated by standard string routines, there is potential for major catastrophe.
Effect on original feature: Deletion of semantically well-defined feature.
Difficulty of converting: Semantic transformation. The arrays must be declared one element bigger to contain the string terminating ’\0’.
How widely used: Seldom. This style of array initialization is seen as poor coding style.
There is no way of doing what you want. The first way of initializing the array specifies separate initializers for each character, which allows to explicitly leave off the '\0'. The second is initializing a character array from a character string, which in C/C++ is always terminated by a null character.
EDIT: corrected: 'character pointer' --> 'character array'
litb has the technically correct answer.
As for an opinion - I say just live with the 'waste' of the extra '\0'. So many bugs are the result of code expecting a terminating null where one isn't (this advice may seem to go directly against some other advice I gave just a day or two ago about not bothering to zero an entire buffer. I claim there's no contradiction - I still advocated null terminating the string in the buffer).
If you really can't live with the '\0' terminator because of some semantics in the data structure you're dealing with, such as it might be part of some larger packed structure, you can always init the array yourself (which I think should be no less efficient than what the compiler might have done for you):
#define MY_STRING_LITERAL "hello world"
char stringLiteral[sizeof(MY_STRING_LITERAL) - 1];
memcpy( stringLiteral, MY_STRING_LITERAL, sizeof(stringLiteral));
The basic answer is that the vast majority of char arrays are strings - in C, strings are null terminated. C++ inherited that convention. Even when that null isn't needed, most of the time it isn't a problem just to leave it there anyway.
Macros aren't powerful enough to do what you want. Templates would be, except they don't have any compile-time string handling.
Usually, when people want to mix numeric bytes and string literals in the same char-array sequence, they use a string literal but use hex character escapes such as \xFF.
I might have found a way to do what i want though it isn't directly what I wanted, but it likely has the same effect.
First consider two following classes:
template <size_t size>
class Cont{
public:
char charArray[size];
};
template <size_t size>
class ArrayToUse{
public:
Cont<size> container;
inline ArrayToUse(const Cont<size+1> & input):container(reinterpret_cast<const Cont<size> &>(input)){}
};
Before proceeding, you might want to go here and take a look at constant expression constructors and initialization types.
Now look at following code:
const Cont<12> container={"hello world"};
ArrayToUse<11> temp(container);
char (&charArray)[11]=temp.container.charArray;
Finally initializer text is written between two double quotations.