Generate all permutations of a given String in D - d

I'm trying to write a program in D that generates all permutations for a given string. I've been trying to use the function nextPermutation, but it's only compatible with ints. I can't get it to work with a char array. I was wondering if someone could help point me in the right direction? This is what I have so far:
import std.stdio;
import std.algorithm.sorting: nextPermutation;
void main()
{
char array[] = {'a','b','c'};
do
{
writeln(array);
} while (nextPermutation(array));
}

So it isn't only compatible with ints, it is anything that Phobos considers "bidirectional" and "swappable" - an array it can reverse easily and swap individual elements, and it considers plain string to be non-swappable due to UTF-8 encoding. Due to its variable length element encoding, swapping two chars may require reshuffling the entire array, which would be far more expensive than the function allows.
Thus, the easiest way to make this work is to use a type which Phobos considers to be swappable: a UTF-32 string, aka dchar[].
If you just change your char to dchar, it will work.
You might also want to change your array syntax from C style to D style:
dchar[] array = ['a','b','c'];
There you go.
So, I said "it considers" because this is kinda a controversial library decision. I'd argue UTF-32 isn't really swappable for a similar reason that UTF-8 isn't - there can be paired elements and changing their order can corrupt the data. But you don't need to worry about that for simple cases like you have.

Related

Confusion about the necessity of the null-character?

I am reading about why exactly there is a need for null-characters, and then I found this answer which made somewhat sense to me. It states that it is needed because that char arrays (for the C strings) are often allocated much larger than the actual strings and you thereby need a a way to symbolize the end.
But why aren't these array not just constructed with a size deduction based on the initializer (without the null-character that actually is implicitly added when assigning directly to string literals). Like, if the arrays holding the strings are constructed using size deduction, there would not be a need for the null-character because the array was not any bigger than the string, so of course, it would end at the end of that array.
I am reading about why exactly there is a need for null-characters, and then I found this answer which made somewhat sense to me. It states that it is needed because that char arrays (for the C strings) are often allocated much larger than the actual strings and you thereby need a a way to symbolize the end.
The answer is misleading. That's not really the reason for why null termination is needed. The accepted answer with more upvotes is better.
there would not be a need for the null-character because the array was not any bigger than the string, so of course, it would end at the end of that array.
Let us remind ourselves, that we cannot use arrays as function arguments. Even if we could, we wouldn't want to, because it would be slow to copy an entire array into the argument.
Therefore, there is a need to refer to an array indirectly. Indirection is commonly achieved using pointers (or references). Now, we could have a "pointer to character array of size 42", but that is not very useful because then the argument can only point to strings of one particular size.
Instead, the common approach is to use a pointer to the first element of the array. This is so common pattern that the language has a rule that allows the name of the array to implicitly decay into the pointer to first element.
But can you tell how big an array is, based on a pointer to an element of that array? You cannot. You need extra information. The accepted answer of the linked question explains the options that are available for representing the size, and that the designer of C chose the option that uses a terminating character (which was already the convention used by the BCPL language which C is based on).
TL;DR Size information is needed because there is a need to refer to the string indirectly, and that indirection hides the knowledge about the size of the array. Null termination is one way to encode the size information within the content of the string, and it is the way that was chosen by the designer of the C language.
Historically, string arrays are provided with termination symbol(s). Reason is simple: instead of sending two values (head of the array and array length) you just need to pass just one value, head of the array. This simplifies calling signature but places some requirements for caller.
In C/C++ itself, null character is a termination symbol so all runtime functions do work with intention that very first null char they can meet is a line end. Same time, in terms of applied logic, terminal symbol(s) may be different: for example, in HTTP headers there is a CR-LF-CR-LF sequence that marks a end-of-the-header and single CR-LF sequence is just a start-of-next-line.
But why aren't these array not just constructed with a size deduction
based on the initializer (without the null-character that actually is
implicitly added when assigning directly to string literals).
I suppose you mean why you can't write:
char t[] = "abracadabra";
and the compiler would deduce a size of 11?
Because you have 12 characters and not 11. If the array would have size 11, then something would be lost: the byte used to contains the NUL would not have been referenced and compiler wouldn't make a difference in between:
char t[] = "abracadabra"; // an array deduced from a C-string literal
and
char t[11] = { 'a', 'b', 'r', 'a', 'c', 'a', 'b', 'r', 'a' }; // a "real" array not a C-string!
The first would have to release 12 bytes at the end of scope and the second 11.
Historically arrays are just some kind a syntactic sugar on top of pointers arithmetic.
... because that char arrays ... are often allocated much larger than the actual strings
That answer is awful.
C strings can be dynamically allocated, meaning you don't know, before runtime, how long they should be. Instead of pre-allocating a massive array and filling most of it with zeroes, you can just malloc(required_size+1) and stick a single nul character at the end.
Conversely, string literals which are known at compile time, are definitely not "allocated much larger than the actual strings". there wouldn't be any point, since you know exactly how much space is needed in advance.
But why aren't these array not just constructed with a size deduction based on the initializer
size_t expected;
if (read(fd, &expected, sizeof(expected)) == sizeof(expected)) {
char *buf = malloc(expected + 1);
if (buf && read(fd, buf, expected) == expected) {
buf[expected] = 0;
/* now do something with buf */
}
}
there you go, a dynamically-sized string. What would your "size deduction" be? What is the "initializer"?
I could have written a less-ugly example using std::string, since the question is tagged C++, but it's actually C strings you're specifically asking about, and it doesn't make any real difference.
Strings are often manipulated by creating a char array to hold intermediate results and modifying its contents:
char buffer[128];
strcpy(buffer, "Hello, ");
strcat(buffer, "world");
std::cout << buffer << '\n';
After the call to strcpy the buffer has 7 characters that we care about; after the call to strcat it has 12. So the number of characters in the buffer can change, and we need to have a way of indicating how many characters there are that matter. One convention is to put a character count in the first location in the array, and the actual characters after that. Another convention is to put a marker at the end of the characters that matter. There are tradeoffs here, but the decision in C, which was carried through into C++ was to go with an end marker.

How to read a string character by character as a range in D?

How to read a line as a range in D?
I know there is ranges in D, but I just wondered how to simply iterate over each character of a string using this concept?
To show what I'm after, the similar code in Go is:
for _, someChar := range someString {
// Do something
}
That would depend on whether you want to iterate over code units or code points. The language itself iterates over arrays by array elements, and strings are arrays of code units, so if you simply use foreach with type inference, then with
foreach(c; "La Verité")
writeln(c);
the last two characters printed would be gibberish, because é is a code point made up of two UTF-8 code units, and you're printing out individual code units (since char is a UTF-8 code unit). Whereas, if you do
foreach(dchar c; "La Verité")
writeln(c);
then the runtime will decode the code units to code points, and é will be printed as the last character. But none of this is really operating on strings as ranges. foreach operates on arrays natively without having to use the input range API. However, for all string types, the range API looks like
#property bool empty();
#property dchar front();
void popFront();
It operates on strings as ranges of dchar - not their code unit type. This avoids issues with functions like std.algorithm.filter operating on individual code units, since that would make no sense. Operating on code points isn't 100% correct either, since Unicode gets very complicated with regards to combining code points and graphemes and whatnot, but operating on code points is far closer to being correct (and I believe there's work being done on adding range support for graphemes into the standard library for the cases where you need that and are willing to pay the performance hit). So, having the range API for strings operate on them as ranges of dchar is far more correct, and if you did something like
foreach(c; filter!"true"("La Verité"))
writeln(c);
you would be iterating over dchar, and é would print correctly. The downside to all of this of course is the fact that foreach on strings operates on the code unit level by default whereas the range API for strings operate on them as code points, so you have to be careful when mixing array operations and range-based operations on strings. That's also why string and wstring are not considered random-access ranges - just bidirectional ranges. You can't do random access in O(1) on code points when they're made up of varying numbers of code units (whereas dstring is a random-access range, because with UTF-32, every code unit is a code point).
foreach(ch; str)
do_something(ch);
A string is an InputRange. An InputRange implements three things:
empty; is it empty?
front; give me the next item.
popFront; advance the range, otherwise front will return the same.
foreach "understands" how to work with ranges, so it "just works".
But I don't speak Go, so I'm not entirely sure we're speaking the same language.

std.algorithm.joiner(string[],string) - why result elements are dchar and not char?

I try to compile following code:
import std.algorithm;
void main()
{
string[] x = ["ab", "cd", "ef"]; // 'string' is same as 'immutable(char)[]'
string space = " ";
char z = joiner( x, space ).front(); // error
}
Compilation with dmd ends with error:
test.d(8): Error: cannot implicitly convert expression (joiner(x,space).front()) of type dchar to char
Changing char z to dchar z does fix the error message, but I'm interested why it appears in the first place.
Why result of joiner(string[],string).front() is dchar and not char?
(There is nothing on this in documentation http://dlang.org/phobos/std_algorithm.html#joiner)
All strings are treated as ranges of dchar. That's because a dchar is guaranteed to be a single code point, since in UTF-32, every code unit is a code point, whereas in UTF-8 (char) and UTF-16 (wchar), the number of code units per code point varies. So, if you were operating on individual chars or wchars, you'd be operating on pieces of characters rather than whole characters, which would be very bad. If you don't know much about unicode, I'd advise reading this article by Joel Spolsky. It explains things fairly well.
In any case, because operating on individual chars and wchars doesn't make sense, strings of char and wchar are treated as ranges of dchar (ElementType!string is dchar), meaning that as far as ranges are concerned, they don't have length (hasLength!string is false - walkLength needs to be used to get their length), aren't sliceable (hasSlicing!string is false), and aren't indexable (isRandomAccess!string is false). This also means that anything which builds a new range from any kind of string is going to result in a range of dchar. joiner is one of those. There are some functions which understand unicode and special case strings for efficiency, taking advantage of length, slicing, and indexing where they can, but unless their result is ultimately a slice of the original, any range they return is going to have to be made of dchars.
So, front on any range of characters will always be dchar, and popFront will always pop off a full code point.
If you don't know much about ranges, I'd advise reading this. It's a chapter in a book on D which is online and is currently the best tutorial on ranges that we have. We really should get a proper article on ranges (including on how they work with strings) onto dlang.org, but no one's gotten around to writing it yet. Regardless, you're going to need to have at least a basic grasp of ranges to be able to use a lot of D's standard library (especially std.algorithm), because it uses them very heavily.

Initializing a char array in C. Which way is better?

The following are the two ways of initializing a char array:
char charArray1[] = "foo";
char charArray2[] = {'f','o','o','\0'};
If both are equivalent, one would expect everyone to use the first option above (since it requires fewer key strokes). But I've seen code where the author takes the pain to always use the second method.
My guess is that in the first case the string "foo" is stored in the data segment and copied into the array at runtime, whereas in the second case the characters are stored in the code segment and copied into the array at runtime. And for some reason, the author is allergic to having anything in the data segment.
Edit: Assume the arrays are declared local to a function.
Questions: Is my reasoning correct? Which is your preferred style and why?
What about another possibility:
char charArray3[] = {102, 111, 111, 0};
You shouldn't forget the C char type is a numeric type, it just happens the value is often used as a char code. But if I use an array for something not related to text at all, I would would definitely prefer initialize it with the above syntax than encode it to letters and put them between quotes.
If you don't want the terminal 0 you also have to use the second form or in C use:
char charArray3[3] = "foo";
It is a a C feature that nearly nobody knows, but if the compiler does not have room enough to hold the final 0 when initializing a charArray, it does not put it, but the code is legal. However this should be avoided because this feature has been removed from C++, and a C++ compiler would yield an error.
I checked the assembly code generated by gcc, and all the different forms are equivalent. The only difference is that it uses either .string or .byte pseudo instruction to declare data. But tha's just a readability issue and does not make a bit of difference in the resulting program.
I think the second method is used mostly in legacy code where compilers didn't support the first method. Both methods should store the data in the data segments. I prefer the first method due to readability. Also, I needed to patch a program once (can't remember which, it was a standard UNIX tool) to not use /etc (it was for an embedded system). I had a very hard time finding the correct place because they used the second method and my grep couldn't find "etc" anywhere :-)

Why isn't ("Maya" == "Maya") true in C++?

Any idea why I get "Maya is not Maya" as a result of this code?
if ("Maya" == "Maya")
printf("Maya is Maya \n");
else
printf("Maya is not Maya \n");
Because you are actually comparing two pointers - use e.g. one of the following instead:
if (std::string("Maya") == "Maya") { /* ... */ }
if (std::strcmp("Maya", "Maya") == 0) { /* ... */ }
This is because C++03, §2.13.4 says:
An ordinary string literal has type “array of n const char”
... and in your case a conversion to pointer applies.
See also this question on why you can't provide an overload for == for this case.
You are not comparing strings, you are comparing pointer address equality.
To be more explicit -
"foo baz bar" implicitly defines an anonymous const char[m]. It is implementation-defined as to whether identical anonymous const char[m] will point to the same location in memory(a concept referred to as interning).
The function you want - in C - is strmp(char*, char*), which returns 0 on equality.
Or, in C++, what you might do is
#include <string>
std::string s1 = "foo"
std::string s2 = "bar"
and then compare s1 vs. s2 with the == operator, which is defined in an intuitive fashion for strings.
The output of your program is implementation-defined.
A string literal has the type const char[N] (that is, it's an array). Whether or not each string literal in your program is represented by a unique array is implementation-defined. (§2.13.4/2)
When you do the comparison, the arrays decay into pointers (to the first element), and you do a pointer comparison. If the compiler decides to store both string literals as the same array, the pointers compare true; if they each have their own storage, they compare false.
To compare string's, use std::strcmp(), like this:
if (std::strcmp("Maya", "Maya") == 0) // same
Typically you'd use the standard string class, std::string. It defines operator==. You'd need to make one of your literals a std::string to use that operator:
if (std::string("Maya") == "Maya") // same
What you are doing is comparing the address of one string with the address of another. Depending on the compiler and its settings, sometimes the identical literal strings will have the same address, and sometimes they won't (as apparently you found).
Any idea why i get "Maya is not Maya" as a result
Because in C, and thus in C++, string literals are of type const char[], which is implicitly converted to const char*, a pointer to the first character, when you try to compare them. And pointer comparison is address comparison.
Whether the two string literals compare equal or not depends whether your compiler (using your current settings) pools string literals. It is allowed to do that, but it doesn't need to. .
To compare the strings in C, use strcmp() from the <string.h> header. (It's std::strcmp() from <cstring>in C++.)
To do so in C++, the easiest is to turn one of them into a std::string (from the <string> header), which comes with all comparison operators, including ==:
#include <string>
// ...
if (std::string("Maya") == "Maya")
std::cout << "Maya is Maya\n";
else
std::cout << "Maya is not Maya\n";
C and C++ do this comparison via pointer comparison; looks like your compiler is creating separate resource instances for the strings "Maya" and "Maya" (probably due to having an optimization turned off).
My compiler says they are the same ;-)
even worse, my compiler is certainly broken. This very basic equation:
printf("23 - 523 = %d\n","23"-"523");
produces:
23 - 523 = 1
Indeed, "because your compiler, in this instance, isn't using string pooling," is the technically correct, yet not particularly helpful answer :)
This is one of the many reasons the std::string class in the Standard Template Library now exists to replace this earlier kind of string when you want to do anything useful with strings in C++, and is a problem pretty much everyone who's ever learned C or C++ stumbles over fairly early on in their studies.
Let me explain.
Basically, back in the days of C, all strings worked like this. A string is just a bunch of characters in memory. A string you embed in your C source code gets translated into a bunch of bytes representing that string in the running machine code when your program executes.
The crucial part here is that a good old-fashioned C-style "string" is an array of characters in memory. That block of memory is often referred to by means of a pointer -- the address of the start of the block of memory. Generally, when you're referring to a "string" in C, you're referring to that block of memory, or a pointer to it. C doesn't have a string type per se; strings are just a bunch of chars in a row.
When you write this in your code:
"wibble"
Then the compiler provides a block of memory that contains the bytes representing the characters 'w', 'i', 'b', 'b', 'l', 'e', and '\0' in that order (the compiler adds a zero byte at the end, a "null terminator". In C a standard string is a null-terminated string: a block of characters starting at a given memory address and continuing until the next zero byte.)
And when you start comparing expressions like that, what happens is this:
if ("Maya" == "Maya")
At the point of this comparison, the compiler -- in your case, specifically; see my explanation of string pooling at the end -- has created two separate blocks of memory, to hold two different sets of characters that are both set to 'M', 'a', 'y', 'a', '\0'.
When the compiler sees a string in quotes like this, "under the hood" it builds an array of characters, and the string itself, "Maya", acts as the name of the array of characters. Because the names of arrays are effectively pointers, pointing at the first character of the array, the type of the expression "Maya" is pointer to char.
When you compare these two expressions using "==", what you're actually comparing is the pointers, the memory addresses of the beginning of these two different blocks of memory. Which is why the comparison is false, in your particular case, with your particular compiler.
If you want to compare two good old-fashioned C strings, you should use the strcmp() function. This will examine the contents of the memory pointed two by both "strings" (which, as I've explained, are just pointers to a block of memory) and go through the bytes, comparing them one-by-one, and tell you whether they're really the same.
Now, as I've said, this is the kind of slightly surprising result that's been biting C beginners on the arse since the days of yore. And that's one of the reasons the language evolved over time. Now, in C++, there is a std::string class, that will hold strings, and will work as you expect. The "==" operator for std::string will actually compare the contents of two std::strings.
By default, though, C++ is designed to be backwards-compatible with C, i.e. a C program will generally compile and work under a C++ compiler the same way it does in a C compiler, and that means that old-fashioned strings, "things like this in your code", will still end up as pointers to bits of memory that will give non-obvious results to the beginner when you start comparing them.
Oh, and that "string pooling" I mentioned at the beginning? That's where some more complexity might creep in. A smart compiler, to be efficient with its memory, may well spot that in your case, the strings are the same and can't be changed, and therefore only allocate one block of memory, with both of your names, "Maya", pointing at it. At which point, comparing the "strings" -- the pointers -- will tell you that they are, in fact, equal. But more by luck than design!
This "string pooling" behaviour will change from compiler to compiler, and often will differ between debug and release modes of the same compiler, as the release mode often includes optimisations like this, which will make the output code more compact (it only has to have one block of memory with "Maya" in, not two, so it's saved five -- remember that null terminator! -- bytes in the object code.) And that's the kind of behaviour that can drive a person insane if they don't know what's going on :)
If nothing else, this answer might give you a lot of search terms for the thousands of articles that are out there on the web already, trying to explain this. It's a bit painful, and everyone goes through it. If you can get your head around pointers, you'll be a much better C or C++ programmer in the long run, whether you choose to use std::string instead or not!