D2: empty string in a conditional statement - d

In the following code, why does 2 give output but not 3? The removechars statement returns a string with length 0
import std.stdio, std.string;
void main() {
string str = null;
if (str) writeln(1); // no
str = "";
if (str) writeln(2); // yes
if (",&%$".removechars(r"^a-z")) writeln(3); // no
}
Edit: Ok, it may return null, but I'm still a bit puzzled because all of these print true
writeln(",&%$".removechars(r"^a-z") == "");
writeln(",&%$".removechars(r"^a-z") == null);
writeln(",&%$".removechars(r"^a-z").length == 0);
Edit 2: This also prints true, but put either of them in a conditional and you get a different result
writeln("" == null);
Edit 3: Alright, I understand that I cannot test for an empty string the way I did. What led to this question is the following situation. I want to remove chars from a word, but don't want to store an empty string:
if (auto w = word.removechars(r"^a-z"))
wordcount[w]++;
This works when I try it, but that must be because removechars is returning null rather than ""

Because removeChars will return null when no characters match.
(This happens because .dup of an empty string will always be null.)

D arrays, or slices if you prefer, are interesting beasts.
In D an empty array is equal to null, or more appropriately a null array is equal to an empty array, this is why assert("" == null) or assert([] == null). However when using just if(str) you're asking if there is a string here, and for null there isn't an array. It is equivalent to an empty array, but one does not exist.
The proper way to check if something is null: assert(str is null). I'm not sure which is best for converting a string to a bool, but really there can't be a perfect solution because string isn't a boolean.

Always use is and !is (is not) to compare with null. If you want to check if a string is empty check against its length property:
string str;
assert(str is null); // str is null
assert(!str); // str is null
str = "";
assert(str !is null); // no longer null
assert(str); // no longer null
assert(!str.length); // but it's zero length

if(!str.length) {
//dosomething ...
}

Related

Returning an empty string in C++ using getcwd()

I've a question on how to properly return an empty string in C++, see the codeenter code here below:
string Options::getCwd()
{
const char *buffer = getcwd(NULL, 0);
if (buffer == NULL)
{
buffer = "";
}
string base(buffer);
base += '/';
return base;
}
The problem is that if getcwd() fails the returned buffer pointer is NULL, and setting buffer to the empty string solves that. However, I would like to return the empty string in base itself. The current code returns base="/" due to the concatenation of base with the empty string in buffer. Additionally attempting to free() buffer makes the compiler complain since strings in C++ has to be declared const.
Any other ideas on how to solve this in a better way?
You can simply do that:
if (buffer == NULL)
{
return "";
}
The empty string literal will get implicitly converted into an empty std::string which will then get returned.
However you also have other problems, e.g. that getcwd expects a valid buffer as an input and not NULL (edited: clarification at the bottom).
The following total code should should work:
string Options::getCwd()
{
char buffer[1024]; // Or optionally PATH_MAX
char *result = getcwd(buffer, 1024);
if (result == NULL)
{
// Optionally do something with errno here
return "";
}
string base(buffer);
base += '/';
return base;
}
EDIT:
Ok, it seems that is indeed possibly to pass NULL to getcwd and let it allocate the buffer. In that case call free(result) after base has been constructed. I wouldn't recommend the approach since the dynamic allocation is not needed here.

Using strtok tokens in if statements

I'm trying to get my head around how to split arrays and use the tokens in an if statement, however I'm not having much luck.
The below code is for an Arduino. What I am doing is passing the function receviedChars which will be something like:
token0,token1,token2
When i print out func, it reads out c, so I figured that if I compared func to c it should match true. Unfortunately, this doesn't seem to happen.
I'm quite new to C++ and Arduino, and mainly have a web development background so I might be misinterpreting something
const byte numChars = 32;
char receivedChars[numChars];
char *chars_array = strtok(receivedChars, ",");
char *func = chars_array;
Serial.println(func);
if(func == 'c') {
Serial.println("It works");
}
Could someone help me with where I am going wrong please?
First of all, strtok works iteratively. This means that to split a string into tokens you have to call it until it returns NULL:
char* token = strtok(input, ",");
while (token)
{
...
token = strtok(NULL, ",");
}
And the second thing to know is that char * is just a pointer to a block of memory treated as a string. So when you write something like:
char* str = ...;
if (str == 'c')
{
...
}
This actually means "compare an address pointed by variable 'str' with a value of an ASCII code of character 'c' (which is 0x63 in hex)", therefore your condition will be true iff the pointer returned by strtok equals to 0x63 and that is definitely not what you want.
What you really need is strcmp function, that compares two blocks of memory character by character:
char* chars_array = strtok(receivedChars, ",");
if (strcmp(chars_array, "bla") == 0)
{
// a first token is "bla"
}
Swap
if(func == 'c') {
to
if(func[0] == 'c') {
if you want to check if first char is 'c'
'func' is a pointer to the start of an array of characters; comparing it to a character value will almost never yield true. Perhaps you want to compare the character in that array instead.
The main issue is that you should use if(*func == 'c') {, i.e. dereference pointer func, instead of if(func == 'c') {.
Note that you additionally should consider that chars_array might be an empty string or might comprise only ','-characters; in this case, strtok will yield NULL, and probably lets your app crash. Hence, the code should look as follows:
if (func != nullptr) {
Serial.println(func);
if(*func == 'c') {
Serial.println("It works");
}
}

C++ tolower/toupper char pointer

Do you guys know why the following code crash during the runtime?
char* word;
word = new char[20];
word = "HeLlo";
for (auto it = word; it != NULL; it++){
*it = (char) tolower(*it);
I'm trying to lowercase a char* (string). I'm using visual studio.
Thanks
You cannot compare it to NULL. Instead you should be comparing *it to '\0'. Or better yet, use std::string and never worry about it :-)
In summary, when looping over a C-style string. You should be looping until the character you see is a '\0'. The iterator itself will never be NULL, since it is simply pointing a place in the string. The fact that the iterator has a type which can be compared to NULL is an implementation detail that you shouldn't touch directly.
Additionally, you are trying to write to a string literal. Which is a no-no :-).
EDIT:
As noted by #Cheers and hth. - Alf, tolower can break if given negative values. So sadly, we need to add a cast to make sure this won't break if you feed it Latin-1 encoded data or similar.
This should work:
char word[] = "HeLlo";
for (auto it = word; *it != '\0'; ++it) {
*it = tolower(static_cast<unsigned char>(*it));
}
You're setting word to point to the string literal, but literals are read-only, so this results in undefined behavior when you assign to *it. You need to make a copy of it in the dynamically-allocated memory.
char *word = new char[20];
strcpy(word, "HeLlo");
Also in your loop you should compare *it != '\0'. The end of a string is indicated by the character being the null byte, not the pointer being null.
Given code (as I'm writing this):
char* word;
word = new char[20];
word = "HeLlo";
for (auto it = word; it != NULL; it++){
*it = (char) tolower(*it);
This code has Undefined Behavior in 2 distinct ways, and would have UB also in a third way if only the text data was slightly different:
Buffer overrun.
The continuation condition it != NULL will not be false until the pointer it has wrapped around at the end of the address range, if it does.
Modifying read only memory.
The pointer word is set to point to the first char of a string literal, and then the loop iterates over that string and assigns to each char.
Passing possible negative value to tolower.
The char classification functions require a non-negative argument, or else the special value EOF. This works fine with the string "HeLlo" under an assumption of ASCII or unsigned char type. But in general, e.g. with the string "Blåbærsyltetøy", directly passing each char value to tolower will result in negative values being passed; a correct invocation with ch of type char is (char) tolower( (unsigned char)ch ).
Additionally the code has a memory leak, by allocating some memory with new and then just forgetting about it.
A correct way to code the apparent intent:
using Byte = unsigned char;
auto to_lower( char const c )
-> char
{ return Byte( tolower( Byte( c ) ) ); }
// ...
string word = "Hello";
for( char& ch : word ) { ch = to_lower( ch ); }
There are already two nice answers on how to solve your issues using null terminated c-strings and poitners. For the sake of completeness, I propose you an approach using c++ strings:
string word; // instead of char*
//word = new char[20]; // no longuer needed: strings take care for themseves
word = "HeLlo"; // no worry about deallocating previous values: strings take care for themselves
for (auto &it : word) // use of range for, to iterate through all the string elements
it = (char) tolower(it);
Its crashing because you are modifying a string literal.
there is a dedicated functions for this
use
strupr for making string uppercase and strlwr for making the string lower case.
here is an usage example:
char str[ ] = "make me upper";
printf("%s\n",strupr(str));
char str[ ] = "make me lower";
printf("%s\n",strlwr (str));

How to check the contents of a LPTSTR string?

I'm trying to understand why a segmentation fault (SIGSEGV) occurs during the execution of this piece of code. This error occurs when testing the condition specified in the while instruction, but it does not occur at the first iteration, but at the second iteration.
LPTSTR arrayStr[STR_COUNT];
LPTSTR inputStr;
LPTSTR str;
// calls a function from external library
// in order to set the inputStr string
set_input_str(param1, (char*)&inputStr, param3);
str = inputStr;
while( *str != '\0' )
{
if( debug )
printf("String[%d]: %s\n", i, (char*)str);
arrayStr[i] = str;
str = str + strlen((char*)str) + 1;
i++;
}
After reading this answer, I have done some research on the internet and found this article, so I tried to modify the above code, using this piece of code read in this article (see below). However, this change did not solve the problem.
for (LPTSTR pszz = pszzStart; *pszz; pszz += lstrlen(pszz) + 1) {
... do something with pszz ...
}
As assumed in this answer, it seems that the code expects double null terminated arrays of string. Therefore, I wonder how I could check the contents of the inputStr string, in order to check if it actually contains only one null terminator char.
NOTE: the number of characters in the string printed from printf instruction is twice the value returned by the lstrlen(str) function call at the first iteration.
OK, now that you've included the rest of the code it is clear that it is indeed meant to parse a set of consecutive strings. The problem is that you're mixing narrow and wide string types. All you need to do to fix it is change the variable definitions (and remove the casts):
char *arrayStr[STR_COUNT];
char *inputStr;
char *str;
// calls a function from external library
// in order to set the inputStr string
set_input_str(param1, &inputStr, param3);
str = inputStr;
while( *str != '\0' )
{
if( debug )
printf("String[%d]: %s\n", i, str);
arrayStr[i] = str;
str = str + strlen(str) + 1;
i++;
}
Specifically, the issue was occurring on this line:
while( *str != '\0' )
since you hadn't cast str to char * the comparison was looking for a wide nul rather than a narrow nul.
str = str + strlen(str) + 1;
You go out of bounds, change to
str = str + 1;
or simply:
str++;
Of course you are inconsistently using TSTR and strlen, the latter assuming TCHAR = char
In any case, strlen returns the length of the string, which is the number of characters it contains not including the nul character.
Your arithmetic is out by one but you know you have to add one to the length of the string when you allocate the buffer.
Here however you are starting at position 0 and adding the length which means you are at position len which is the length of the string. Now the string runs from offset 0 to offset len - 1 and offset len holds the null character. Offset len + 1 is out of bounds.
Sometimes you might get away with reading it, if there is extra padding, but it is undefined behaviour and here you got a segfault.
This looks to me like code that expects double null terminated arrays of strings. I suspect that you are passing a single null terminated string.
So you are using something like this:
const char* inputStr = "blah";
but the code expects two null terminators. Such as:
const char* inputStr = "blah\0";
or perhaps an input value with multiple strings:
const char* inputStr = "foo\0bar\0";
Note that these final two strings are indeed double null terminated. Although only one null terminator is written explicitly at the end of the string, the compiler adds another one implicitly.
Your question edit throws a new spanner in the works? The cast in
strlen((char*)str)
is massively dubious. If you need to cast then the cast must be wrong. One wonders what LPTSTR expands to for you. Presumably it expands to wchar_t* since you added that cast to make the code compile. And if so, then the cast does no good. You are lying to the compiler (str is not char*) and lying to the compiler never ends well.
The reason for the segmentation fault is already given by Alter's answer. However, I'd like to add that the usual style of parsing a C-style string is more elegant and less verbose
while (char ch = *str++)
{
// other instructions
// ...
}
The scope of ch is only within in the body of the loop.
Aside: Either tag the question as C or C++ but not both, they're different languages.

Converting Zero-Terminated String To D String

Is there a function in Phobos for converting a zero-terminated string into a D-string?
So far I've only found the reverse case toStringz.
I need this in the following snippet
// Lookup user name from user id
passwd pw;
passwd* pw_ret;
immutable size_t bufsize = 16384;
char* buf = cast(char*)core.stdc.stdlib.malloc(bufsize);
getpwuid_r(stat.st_uid, &pw, buf, bufsize, &pw_ret);
if (pw_ret != null) {
// TODO: The following loop maybe can be replace by some Phobos function?
size_t n = 0;
string name;
while (pw.pw_name[n] != 0) {
name ~= pw.pw_name[n];
n++;
}
writeln(name);
}
core.stdc.stdlib.free(buf);
which I use to lookup the username from a user id.
I assume UTF-8 compatiblity for now.
There's two easy ways to do it: slice or std.conv.to:
const(char)* foo = c_function();
string s = to!string(foo); // done!
Or you can slice it if you are going to use it temporarily or otherwise know it won't be written to or freed elsewhere:
immutable(char)* foo = c_functon();
string s = foo[0 .. strlen(foo)]; // make sure foo doesn't get freed while you're still using it
If you think it can be freed, you can also copy it by slicing then duping: foo[0..strlen(foo)].dup;
Slicing pointers works the same way in all array cases, not just strings:
int* foo = get_c_array(&c_array_length); // assume this returns the length in a param
int[] foo_a = foo[0 .. c_array_length]; // because you need length to slice
Just slice the original string (no coping). The $ inside [] is translated to str.length. If the zero is not at the end, just replace the "$ - 1" expression with position.
void main() {
auto str = "abc\0";
str.trimLastZero();
write(str);
}
void trimLastZero (ref string str) {
if (str[$ - 1] == 0)
str = str[0 .. $ - 1];
}
You can do the following to strip away the trailing zeros and convert it to a string:
char[256] name;
getNameFromCFunction(name.ptr, 256);
string s = to!string(cast(char*)name); //<-- this is the important bit
If you just pass in name you will convert it to a string but the trailing zeroes will still be there. So you cast it to a char pointer and voila std.conv.to will convert whatever it meets until a '\0' is encountered.