How to check if char* p reached end of a C string? - c++

template<class IntType>
IntType atoi_unsafe(const char* source)
{
IntType result = IntType();
while (source)
{
auto t = *source;
result *= 10;
result += (*source - 48);
++source;
}
return result;
}
and in main() I have:
char* number = "14256";
atoi_unsafe<unsigned>(number);
but the condition while (source) does not seem to recognize that source has iterated over the entire C string. How should it correctly check for the end of the string?

while(source) is true until the pointer wraps around to 0, but will probably crash well before that in modern systems. You need to dereference the pointer to find a null byte, while(*source).
I hate posting short answers

The pointer does not go to zero at the end of the string; the end of the string is found when the value pointed at becomes zero. Hence:
while (*source != '\0')
You might, more compactly, write the whole function as:
template<class IntType>
IntType atoi_unsafe(const char* source)
{
IntType result = IntType();
char c;
while ((c = *source++) != '\0')
result = result * 10 + (c - '0');
return result;
}
Granted, it does not use the auto keyword. Also note carefully the difference between '\0' and '0'. The parentheses in the assignment in the loop body are not necessary.
Your code only handles strings without a sign - and should arguably validate that the characters are actually digits too (maybe raising an exception if the input is invalid). The 'unsafe' appellation certainly applies. Note, too, that if you instantiate the template for a signed integer type and the value overflows, you invoke undefined behaviour. At least with unsigned types, the arithmetic is defined, even if probably not what is expected.

You need to look for the null-terminator at the end of the string. Waiting for the pointer to wrap around to 0 is probably never going to happen. Use while (*source) for your loop.

Related

How to check an empty array returned from a function in C++

I was asked to write a function that accepts a character array, a zero-based start position, and a length.
It should return a character array containing length characters (len) starting with the start character of the input array
#include<iostream>
#include<vector>
#include<iterator>
using namespace std;
char* lengthChar(char c[],int array_len, int start, int len)
{
char* v = new char[len];
if(start < 0 || len > array_len || (start + len - 1) >= array_len){
return NULL;
}
if((start + len) == start)
{
return v;
}
copy(&c[start], &c[len+start], &v[0]);
return v;
}
My question is when I call my function as
char* r = lengthChar(t,3, 1, 0);
Normally based on my implementation, it should return a pointer pointing to an empty array. However, I can't seem to verify this. When I do if(!r[0]), it doesn't detect it. I even did
char s[] = {};
char* tt = &s[0];
if(r[0] == *tt)
Still nothing. The strange thing is when I cout the value of r[0], nothing is printed. So I don't know what actually is return. How do I verify that it is empty?
Don't use if(!r[0]) to check if NULL was returned. You want to compare directly to NULL using if(!r) or if(r == NULL). This will tell you if the string is empty. Doing if(!r[0]) when you return NULL is undefined behavior so you definitely want to make sure the address is valid before you try and access what it points to.
Another thing to note is that in the case that you return NULL, you function has a memory leak. You need to move char* v = new char[len]; to after you decide if you are going to return NULL. You could call delete [] v; in the if statement, but that makes the code more brittle.
There are a few things going on here. Firstly, I would replace that
if((start+len) == start)
with just
if(len == 0) // if(!len) works too
And also note that you don't need to take the address of an index, so
&c[start] is the same thing as c + start
I would read http://www.cplusplus.com/reference/algorithm/copy/ to make sure you understand that the value being passed is an iterator.
But secondly, your char* v = new char[len] statement is invoking undefined behavior. When you call
new char[len];
You're merely telling the compiler that you want to give space to a new character array. Remember that std::cout is a function. It is going to detect a char array as a c string. This means that the char array needs to be null terminated. If it's not, you are truly just invoking undefined behavior because you're reading memory on places that have been allocated but not initialized.
When you call
if(!r[0])
This doesn't really mean anything at all. r[0] is technically initialized, so it is not a nullptr, but it doesn't have any data in it so it is going to evaluate to true with undefined behavior.
Now, if you want to make this more concrete, I would fill the array with zeros
char* v = new char[len];
memset(v, 0, len);
Now your char is a truly "empty" array.
I think it's just a misunderstanding of what an "empty" array actually means.
Finally, don't listen to the guys who say just use std::vector. They're absolutely right, it's better practice, but it's better to understand how those classes work before you pull out the real power of the standard library. Just saying.

What does `(c = *str) != 0` mean?

int equiv (char, char);
int nmatches(char *str, char comp) {
char c;
int n=0;
while ((c = *str) != 0) {
if (equiv(c,comp) != 0) n++;
str++;
}
return (n);
}
What does "(c = *str) != 0" actually mean?
Can someone please explain it to me or help give me the correct terms to search for an explanation myself?
This expression has two parts:
c = *str - this is a simple assignment of c from dereferencing a pointer,
val != 0 - this is a comparison to zero.
This works, because assignment is an expression, i.e. it has a value. The value of the assignment is the same as the value being assigned, in this case, the char pointed to by the pointer. So basically, you have a loop that traces a null-terminated string to the end, assigning each individual char to c as it goes.
Note that the != 0 part is redundant in C, because the control expression of a while loop is implicitly compared to zero:
while ((c = *str)) {
...
}
The second pair of parentheses is optional from the syntax perspective, but it's kept in assignments like that in order to indicate that the assignment is intentional. In other words, it tells the readers of your code that you really meant to write an assignment c = *str, and not a comparison c == *str, which is a lot more common inside loop control blocks. The second pair of parentheses also suppresses the compiler warning.
Confusingly,
while ((c = *str) != 0) {
is a tautology of the considerably easier to read
while (c = *str) {
This also has the effect of assigning the character at *str to c, and the loop will terminate once *str is \0; i.e. when the end of the string has been reached.
Assignments within conditionals such as the above can be confusing on first glance, (cf. the behaviour of the very different c == *str), but they are such a useful part of C and C++, you need to get used to them.
(c = *str) is an expression and that has a value in itself. It is an assignment, the value of an assignment is the assigned value. So the value of (c = *str) is the value of *str.
The code basically checks, whether the value of *str, which just has been assigned to c is not 0. In case it isn't, then it will call the function equiv with that value.
Once the 0 is assigned, this is the end of the string. The function has to stop reading from the memory, which it does.
It's looping over every character in the string str, assigning them to c and then seeing if c is equal to 0 which would indicate the end of the string.
Although really the code should use '\0' as that is more obviously a NUL character.
We are going through the str in the while loop and extract every char symbol in it until it is equal to zero - the main rule of the end of char string.
Here is 'for' loop equivalent:
for (int i = 0; i < strlen(str); ++i )
std::cout << str[i];
It is just sloppily written code. The intention is to copy a character from the string str into c and then check if it was the null terminator.
The idiomatic way to check for the null terminator in C is an explicit check against '\0':
if(c != '\0')
This is so-called self-documenting code, since the de facto standard way to write the null terminator in C is by using the octal escape sequence \0.
Another mistake is to use assignment inside conditions. This was recognized as bad practice back in the 1980s and since then every compiler gives a warning against such code, "possibly incorrect assignment" or similar. This is bad practice because assignment includes a side effect and expressions with side effects should be kept as simple as possible. But it is also bad practice because it is easy to mix up = and ==.
The code could easily be rewritten as something more readable and safe:
c = *str;
while (c != '\0')
{
if(equiv(c, comp) != 0)
{
n++;
}
str++;
c = *str;
}
You don't need char c since you already have the pointer char *str, also you can replace != 0 with != '\0' for better readability (if not compatibility)
while (*str != '\0')
{
if (equiv((*str),comp)
!= 0)
{ n++; }
str++;
}
To understand what the code does, you can read it like this
while ( <str> pointed-to value is-not <end_of_string> )
{
if (function <equiv> with parameters( <str> pointed-to value, <comp> )
returned non-zero integer value)
then { increment <n> by 1 }
increment pointer <str> by 1 x sizeof(char) so it points to next adjacent char
}

Character pointer access

I wanted to access character pointer ith element. Below is the sample code
string a_value = "abcd";
char *char_p=const_cast<char *>(a_value.c_str());
if(char_p[2] == 'b') //Is this safe to use across all platform?
{
//do soemthing
}
Thanks in advance
Array accessors [] are allowed for pointer types, and result in defined and predictable behaviors if the offset inside [] refers to valid memory.
const char* ptr = str.c_str();
if (ptr[2] == '2') {
...
}
Is correct on all platforms if the length of str is 3 characters or more.
In general, if you are not mutating the char* you are looking at, it best to avoid a const_cast and work with a const char*. Also note that std::string provides operator[] which means that you do not need to call .c_str() on str to be able to index into it and look at a char. This will similarly be correct on all platforms if the length of str is 3 characters or more. If you do not know the length of the string in advance, use std::string::at(size_t pos), which performs bound checking and throws an out_of_range exception if the check fails.
You can access the ith element in a std::string using its operator[]() like this:
std::string a_value = "abcd";
if (a_value[2] == 'b')
{
// do stuff
}
If you use a C++11 conformant std::string implementation you can also use:
std::string a_value = "abcd";
char const * p = &a_value[0];
// or char const * p = a_value.data();
// or char const * p = a_value.c_str();
// or char * p = &a_value[0];
21.4.1/5
The char-like objects in a basic_string object shall be stored contiguously.
21.4.7.1/1: c_str() / data()
Returns: A pointer p such that p + i == &operator[](i) for each i in [0,size()].
The question is essentially about querying characters in a string safely.
const char* a = a_value.c_str();
is safe unless some other operation modifies the string after it. If you can guarantee that no other code performs a modification prior to using a, then you have safely retrieved a pointer to a null-terminated string of characters.
char* a = const_cast<char *>(a_value.c_str());
is never safe. You have yielded a pointer to memory that is writeable. However, that memory was never designed to be written to. There is no guarantee that writing to that memory will actually modify the string (and actually no guarantee that it won't cause a core dump). It's undefined behaviour - absolutely unsafe.
reference here: http://en.cppreference.com/w/cpp/string/basic_string/c_str
addressing a[2] is safe provided you can prove that all possible code paths ensure that a represents a pointer to memory longer than 2 chars.
If you want safety, use either:
auto ch = a_string.at(2); // will throw an exception if a_string is too short.
or
if (a_string.length() > 2) {
auto ch = a_string[2];
}
else {
// do something else
}
Everyone explained very well for most how it's safe, but i'd like to extend a bit if that's ok.
Since you're in C++, and you're using a string, you can simply do the following to access a caracter (and you won't have any trouble, and you still won't have to deal with cstrings in cpp :
std::string a_value = "abcd";
std::cout << a_value.at(2);
Which is in my opinion a better option rather than going out of the way.
string::at will return a char & or a const char& depending on your string object. (In this case, a const char &)
In this case you can treat char* as an array of chars (C-string). Parenthesis is allowed.

atof and non-null terminated character array

using namespace std;
int main(int argc, char *argv[]) {
char c[] = {'0','.','5'};
//char c[] = "0.5";
float f = atof(c);
cout << f*10;
if(c[3] != '\0')
{
cout << "YES";
}
}
OUTPUT: 5YES
Does atof work with non-null terminated character arrays too? If so, how does it know where to stop?
Does atof work with non-null terminated character arrays too?
No, it doesn't. std::atof requires a null-terminated string in input. Failing to satisfy this precondition is Undefined Behavior.
Undefined Behavior means that anything could happen, including the program seeming to work fine. What is happening here is that by chance you have a byte in memory right after the last element of your array which cannot be interpreted as part of the representation of a floating-point number, which is why your implementation of std::atof stops. But that's something that cannot be relied upon.
You should fix your program this way:
char c[] = {'0', '.', '5', '\0'};
// ^^^^
No, atof does not work with non-null terminated arrays: it stops whenever it discovers zero after the end of the array that you pass in. Passing an array without termination is undefined behavior, because it leads the function to read past the end of the array. In your example, the function has likely accessed bytes that you have allocated to f (although there is no certainty there, because f does not need to follow c[] in memory).
char c[] = {'0','.','5'};
char d[] = {'6','7','8'};
float f = atof(c); // << Undefined behavior!!!
float g = atof(d); // << Undefined behavior!!!
cout << f*10;
The above prints 5.678, pointing out the fact that a read past the end of the array has been made.
No... atof() requires a null terminated string.
If you have a string you need to convert that is not null terminated, you could try copying it into a target buffer based on the value of each char being a valid digit. Something to the effect of...
char buff[64] = { 0 };
for( int i = 0; i < sizeof( buff )-1; i++ )
{
char input = input_string[i];
if( isdigit( input ) || input == '-' || input == '.' )
buff[i] = input;
else
break;
}
double result = atof( buff );
From the description of the atof() function on MSDN (probably applies to other compilers) :
The function stops reading the input string at the first character that it cannot recognize as part of a number. This character may be the null character ('\0' or L'\0') terminating the string.
It must either be 0 terminated or the text must contain characters that do not belong to the number.
std::string already terminate a string with NULL!
So why not
std::string number = "7.6";
double temp = ::atof(number.c_str());
You can also do it with the stringstream or boost::lexical_cast
http://www.boost.org/doc/libs/1_53_0/doc/html/boost_lexical_cast.html
http://www.cplusplus.com/reference/sstream/stringstream/
Since C++11, we have std::stof. By replacing atof with std::stof, it would be easier to handle.
I made a handy wrapper if you always pass a known size of char array.
Live Demo
#include <fmt/core.h>
#include <type_traits>
#include <iostream>
// SFINAE fallback
template<typename T, typename =
std::enable_if< std::is_pointer<T>::value >
>
float charArrayToFloat(const T arr){ // Fall back for user friendly compiler errors
static_assert(false == std::is_pointer<T>::value, "`charArrayToFloat()` dosen't allow conversion from pointer!");
return -1;
}
// Valid for both null or non-null-terminated char array
template<size_t sz>
float charArrayToFloat(const char(&arr)[sz]){
// It doesn't matter whether it's null terminated or not
std::string str(arr, sz);
return std::stof(str);
}
int main() {
char number[4] = {'0','.','4','2'};
float ret = charArrayToFloat(number);
fmt::print("The answer is {}. ", ret);
return 0;
}
Output: The answer is 0.42.
Does atof work with non-null terminated character arrays too?
No, this function expects a pointer to a null terminated string. Failing to do so, say for example by passing a pointer to a non-null terminated string(or a non-null terminated character array) is undefined behavior.
Undefined behavior means anything1 can happen including but not limited to the program giving your expected output. But never rely(or make conclusions based) on the output of a program that has undefined behavior.
So the output that you're seeing(maybe seeing) is a result of undefined behavior. And as i said don't rely on the output of a program that has UB. The program may just crash.
So the first step to make the program correct would be to remove UB. Then and only then you can start reasoning about the output of the program.
1For a more technically accurate definition of undefined behavior see this where it is mentioned that: there are no restrictions on the behavior of the program.

Different results using atoi

Could someone explain why those calls are not returning the same expected result?
unsigned int GetDigit(const string& s, unsigned int pos)
{
// Works as intended
char c = s[pos];
return atoi(&c);
// doesn't give expected results
return atoi(&s[pos]);
return atoi(&static_cast<char>(s[pos]));
return atoi(&char(s[pos]));
}
Remark: I'm not looking for the best way to convert a char to an int.
None of your attempts are correct, including the "works as intended" one (it just happened to work by accident). For starters, atoi() requires a NUL-terminated string, which you are not providing.
How about the following:
unsigned int GetDigit(const string& s, unsigned int pos)
{
return s[pos] - '0';
}
This assumes that you know that s[pos] is a valid decimal digit. If you don't, some error checking is in order.
What you are doing is use a std::string, get one character from its internal representation and feed a pointer to it into atoi, which expects a const char* that points to a NULL-terminated string. A std::string is not guaranteed to store characters so that there is a terminating zero, it's just luck that your C++ implementation seems to do this.
The correct way would be to ask std::string for a zero terminated version of it's contents using s.c_str(), then call atoi using a pointer to it.
Your code contains another problem, you are casting the result of atoi to an unsigned int, while atoi returns a signed int. What if your string is "-123"?
Since int atoi(const char* s) accepts a pointer to a field of characters, your last three uses return a number corresponding to the consecutive digits beginning with &s[pos], e.g. it can give 123 for a string like "123", starting at position 0. Since the data inside a std::string are not required to be null-terminated, the answer can be anything else on some implementation, i.e. undefined behaviour.
Your "working" approach also uses undefined behaviour.
It's different from the other attempts since it copies the value of s[pos]to another location.
It seems to work only as long as the adjacent byte in memory next to character c accidentally happens to be a zero or a non-digit character, which is not guaranteed. So follow the advice given by #aix.
To make it work really you could do the following:
char c[2] = { s[pos], '\0' };
return atoi(c);
if you want to access the data as a C string - use s.c_str(), and then pass it to atoi.
atoi expects a C-style string, std::string is a C++ class with different behavior and characteristics. For starters - it doesn't have to be NULL terminated.
atoi takes pointer to char for it's argument. In the first try when you are using the char c it takes pointer to only one character hence you get the answer you want. However in the other attempts what you get is pointer to a char which has happened to be beginning of a string of chars, therefore I assume what you are getting after atoi in the later attempts is a number converted from the chars in positions pos, pos+1, pos+2 and up to the end of the s string.
If you really want to convert just a single char in the string at the position (as opposed to a substring starting at that position and ending at the end of the string), you can do it these ways:
int GetDigit(const string& s, const size_t& pos) {
return atoi(string(1, s[pos]).c_str());
}
int GetDigit2(const string& s, const size_t& pos) {
const char n[2] = {s[pos], '\0'};
return atoi(n);
}
for example.