Finding the length of a character array in c++ [duplicate] - c++

This question already has answers here:
How do I use arrays in C++?
(5 answers)
Closed 8 years ago.
I have an character array of the form
char x[]='asdasdadsadasdas';
int p = sizeof(x)/sizeof(*x)
gives me correct result but when I pass this as an argument in another function like
void X(char* a)
int i = sizeof(a)/sizeof(*a)
and I call it
X(x)
p and i are not equal.How is it possible ?

When the char array gets passed to a function accepting a char*, the array is said to 'decay' to a pointer. This conversion is automatic, and the length information which was previously statically available to the compiler is lost.
Possible solutions are:
Pass the size of the array as an additional parameter.
Make the char-Array 0-terminated (i.e., a string literal) and use strlen. This is the common ways of operating on strings in C and C++. However, this introduces runtime cost which is not strictly necessary. However it provides some convenience (same API for statically and dynamically sized strings) and error resilience (the length is always correct).
Use templates to capture the length of the array. This is explained here..

void X(char* a)
int i = sizeof(a)/sizeof(*a)
Here, a is a pointer and *a is a char. So sizeof(a) won't return the size of the array, but the size of the pointer to the first element in the array.
In order to find the length of the string (which isn't the same as "the size of the array"), you have a few options.
One, don't use a char* at all, but a std::string (may create a temporary):
void X (const std::string& s)
{
size_t i = s.length();
}
Two, scan the string for the null-terminator (linear complexity):
void X (const char* p)
{
size_t i = strlen (p);
}
Three, use a template (needlessly complex code):
template <size_t N> void X (const char (&arr)[N])
{
size_t i = N;
}
Each of the above has it's own set of cons. But this is all best avoided if you take a broader look at your program and see where you can make improvements. Here's one that stands out to me:
char x[]='asdasdadsadasdas';
C-style arrays present their own problems and are best avoided altogether. Instead of using a C-style array, use a tool from the StdLib designed for just this problem:
std::string x = "asdasdadsadasdas";

sizeof(char *)
Gives you the size of a pointer. Eight bytes on my system.
char x[] = "fred";
sizeof(x);
Returns 5. The size of the string with the null termination.
void x(char * c) {
sizeof (*c);
}
Returns the size of a a char.
This is true no matter what the length or original type of the array is passed to void x(). Note that sizeof() is evaluated at compile time. At compile time the compiler cannot normally know what length of array it's been passed. For a run-time evaluation of string length, as said above, use strlen if you actually want the strings length in characters. (Or a template - but that's probably a more unusual solution).

Related

Supply maximum string length to a function

When defining a function which has a char array as one of its arguments, is it recommended to always pass the maximum array length as well as an extra argument? For example:
int somefunction(char* inputarray, int maxsizeofarray)
C-style strings are usually terminated by the \0 character. Most, if not all, of the functions from the standard library, such as strcpy or strcmp expect (and honor!) this convention. I'd suggest that any new function you write adhere to the same convention.
Having said that, in C++ (as opposed to C), I wouldn't use char* at all. Instead, I'd use the standard std::string class.
It totally depends on the function.Example, when you are traversing char array from last element to first element, in that case,it is wise to pass the size. But in C only.In C++, you have std::string .
It's normal to pass the array size when the function is writing to the array, or when it's problematic for the function to determine the size thereof (because it's not necessarily following a convention such as having a NUL terminator). For example:
int read(char* buffer, size_t n);
char *strncat(char *dest, const char *src, size_t n);
struct BinaryBlob
{
BinaryBlob(const char* buffer, size_t n);
...
};

What is the difference between int and char arrays?

What is the difference between int and char arrays below:
int main()
{
int numbers[] = {2,1,3};
char letter[] = {'a','b','\0'};
cout<< numbers<<endl;
cout<< letter<<endl;
}
Output:
0x22ff12 // an address
ab
Why isn't the 213 displayed ?
I know the name of an array will point to the address of its first element, but why
does a char array display different behavior?
There is no operator<< overload that takes arrays, exactly, so the arguments you pass (eg numbers and letter) undergo array-to-pointer conversion, to void* and char* respectively.
There is an overload of operator<<() that takes a const void*, and another that takes a const char*. When you call:
cout<< numbers<<endl;
the const void* version is matched, but when you call:
cout<< letter<<endl;
the const char* version is matched.
In the const void* version, the pointer is displayed, while with the const char* version, the string is displayed up to the null terminator.
When you print an array with cout it will print the base address of the array.
The exception is with char arrays which have been overloaded to print it as a c-string.
If you want to print the elements of the int array, you need to do it element-by-element.
The reason is thatoperator<< overloaded for const char* which prints each character till it encounters \0.
There is no such overload corresponds to int[N] which prints each element in it. Instead when you write cout << numbers, it invokes operator<< which is overloaded for void*, and which prints the address.
However, if you overload operator<< for T[N], then you can print it like that as well.
Here is a simple illustration:
template<typename T, size_t N>
std::ostream & operator<<(std::ostream & out, const T (&a)[N])
{
for(size_t i = 0 ; i < N ; ++i)
out << a[i] << ' ';
return out;
}
int main()
{
int numbers[] = {2,1,3};
char letter[] = {'a','b','\0'};
cout<< numbers<<endl;
cout<< letter<<endl;
}
Output:
2 1 3
a b
Demo : http://ideone.com/O4T9N
In C, and therefore in C++, a string is often represented by an array of chars terminated in a 0. Therefore an overloaded operator<< is provide for the class std::ostream, of which std::cout is an instance , which prints the char* as a string. There is no such common use of int arrays, nor any convention so the operator would 'know' how many elements to output, so the pointer to the array is matched to the version of operator<< which outputs any other pointer by printing its address.
char arrays are special because there is an overload for operator << that displays the content as a string.
All other arrays will have the address displayed by default.
In C/C++ an array is in fact a pointer to the first element. A pointer holds the address where a value is stored. Therefore, if you print the pointer numbers, you will get the address where the first value (2) is stored in memory.
char* is an exception, as it will behave as a string when you try to print it.
Your code mostly refers C. An array of char is de-facto representation of strings in C. On the other hand, an array in C is also a pointer to the memory cell (an address of the cell) that holds the first element of the array.
So, when you print out an array of characters, you in fact print out a string (because C treats it that way). When you're printing an array of integers, you're printing out the address of the first element of the array.
numbers is a pointer. All arrays in C++ are in fact pointers, numbers[3] just means "the value at the memory address &number+3", so you're outputting the memory address of the first element in numbers.
There is no reason the compiler should know where your int[] array ends, but tradition and standard libraries dictate that C strings are null terminated char[] arrays. There is no such tradition or library support for null terminated int[] arrays.
There are C++ pretty printers templates available if you need this functionality. I vaguely recall that one employs an array's bound when the type actually knows the bound, i.e. your code still won't work since you use [] not [3].
Just fyi, your code cannot be fixed by replacing the [] with a [3] inside the STL, although perhaps operator<< could be overloaded.
A char array contains characters.
It can be initialized like:
char arr[4]={'a','b','c','\0'};
char arr[4]={"abc"};
An integer array contains integers.
It can be initialized like:
int arr[4]={1,2,3,4};

Verify type of string (e.g. literal, array, pointer) passed to a function [duplicate]

This question already has answers here:
Restrict passed parameter to a string literal
(6 answers)
Closed 6 years ago.
I have a function, which takes a character array and its size:
void store_string (const char *p, size_t size); // 'p' & 'size' stored in map
The function is wrapped by a macro as:
#define STORE_STRING(X) store_string(X, sizeof(X))
My problem is that I want to somehow prohibit or inform user that only string literal should be passed to this function. Because, if any pointer or local array are stored inside the map then they might go out of scope and create a havoc!
Is there any compile-time way (preferred) or at least run-time way to do it?
You can know it compile time, by simply changing your macro to:
#define STORE_STRING(X) store_string("" X, sizeof(X))
Usage:
char a[] = "abcd", *p;
STORE_STRING(a); // error
STORE_STRING(p); // error
STORE_STRING("abcd"); // ok
This does not work, sizeof(char *) returns the size of the pointer not the memory allocated or string size.
strlen(x) returns the size of a string by looking for the '\0'.
If you need/want a non-macro solution there is one template trick that can help. You can have a function signature that looks like this:
template<int len>
void store_string( char const (&str)[len] ) { store_string_internal( str,len); }
template<int len>
void store_string( char (&str)[len] ) { static_assert( false ); }
The first form accepts strings literals and calls the target function. The second form prevents non-const character arrays from being passed. And yeah, don't offer a char const* version.
This doesn't guarantee that the string is a string literal, but the one syntax needed to bypass it is extremely rare (I've never used it).
Otherwise the macro version from iammilind is nice.

Passing an array as a function parameter in C++

In C++, arrays cannot be passed simply as parameters. Meaning if I create a function like so:
void doSomething(char charArray[])
{
// if I want the array size
int size = sizeof(charArray);
// NO GOOD, will always get 4 (as in 4 bytes in the pointer)
}
I have no way of knowing how big the array is, since I have only a pointer to the array.
Which way do I have, without changing the method signature, to get the size of the array and iterate over it's data?
EDIT: just an addition regarding the solution. If the char array, specifically, was initialized like so:
char charArray[] = "i am a string";
then the \0 is already appended to the end of the array. In this case the answer (marked as accepted) works out of the box, so to speak.
Use templates. This technically doesn't fit your criteria, because it changes the signature, but calling code does not need to be modified.
void doSomething(char charArray[], size_t size)
{
// do stuff here
}
template<size_t N>
inline void doSomething(char (&charArray)[N])
{
doSomething(charArray, N);
}
This technique is used by Microsoft's Secure CRT functions and by STLSoft's array_proxy class template.
Without changing the signature? Append a sentinel element. For char arrays specifically, it could be the null-terminating '\0' which is used for standard C strings.
void doSomething(char charArray[])
{
char* p = charArray;
for (; *p != '\0'; ++p)
{
// if '\0' happens to be valid data for your app,
// then you can (maybe) use some other value as
// sentinel
}
int arraySize = p - charArray;
// now we know the array size, so we can do some thing
}
Of course, then your array itself cannot contain the sentinel element as content.
For other kinds of (i.e., non-char) arrays, it could be any value which is not legal data. If no such value exists, then this method does not work.
Moreover, this requires co-operation on the caller side. You really have to make sure that the caller reserves an array of arraySize + 1 elements, and always sets the sentinel element.
However, if you really cannot change the signature, your options are rather limited.
In general when working with C or low-level C++, you might consider retraining your brain to never consider writing array parameters to a function, because the C compiler will always treat them as pointers anyway. In essence, by typing those square brackets you are fooling yourself in thinking that a real array is being passed, complete with size information. In reality, in C you can only pass pointers. The function
void foo(char a[])
{
// Do something...
}
is, from the point of view of the C compiler, exactly equivalent to:
void foo(char * a)
{
// Do something
}
and obviously that nekkid char pointer contains no length information.
If you're stuck in a corner and can't change the function signature, consider using a length prefix as suggested above. A non-portable but compatible hack is to specify the array length in an size_t field located before the array, something like this:
void foo(char * a)
{
int cplusplus_len = reinterpret_cast<std::size_t *>(a)[-1];
int c_len = ((size_t *)a)[-1];
}
Obviously your caller needs to create the arrays in the appropriate way before passing them to foo.
Needless to say this is a horrible hack, but this trick can get out of trouble in a pinch.
It actually used to be a quite common solution to pass the length in the first element of the array. This kind of structure is often called BSTR (for “BASIC string”), even though this also denoted different (but similar) types.
The advantage over the accepted solution is that determining the length using a sentinel is slow for large strings. The disadvantage is obviously that this is a rather low-level hack that respects neither types nor structure.
In the form given below it also only works for strings of length <= 255. However, this can easily be expanded by storing the length in more than one byte.
void doSomething(char* charArray)
{
// Cast unnecessary but I prefer explicit type conversions.
std::size_t length = static_cast<std::size_t>(static_cast<unsigned char>(charArray[0]));
// … do something.
}
if it's nullterminated, strlen() would work.
You can't determine the size from charArray alone. That information is not automatically passed to the function.
Of course if it's a null-terminated string you can use strlen(), but you have probably considered that already!
Consider passing a std::vector<char> & parameter, or a pair of pointers, or a pointer plus a size parameter.
This is actually more C than C++, in C++ you'd probably rather use a std::vector. However, in C there's no way to know the size of an array. The compile will allow you to do a sizeof if the array was declared in the current scope, and only if it was explicitly declared with a size (EDIT: and "with a size", I mean that it was either declared with an integer size or initialized at declaration, as opposed to being passed as a parameter, thanks for the downvote).
The common solution in C is to pass a second parameter describing the number of elements in the array.
EDIT:
Sorry, missed the part about not wanting to change the method signature. Then there's no solution except as described by others as well, if there's some data that is not allowed within the array, it can be used as a terminator (0 in C-strings, -1 is also fairly common, but it depends on your actual data-type, assuming the char array is hypothetical)
In order for a function to know the number of items in an array that has been passed to it, you must do one of two things:
Pass in a size parameter
Put the size information in the array somehow.
You can do the latter in a few ways:
Terminate it with a NULL or some
other sentinel that won't occur in
normal data.
store the item count in the first entry if the array holds numbers
store a pointer to the last entry if the array contains pointers
try using strlen(charArray);
using the cstring header file. this will produce the number of characters including spaces till it reaches the closing ".
You are guarranteed to receive 4 in a 32-bit PC and that's the correct answer. because of the reason explained here and here.
The short answer is, you are actually testing the sizeof a pointer rather than an array, because "the array is implicitly converted, or decays, into a pointer. The pointer, alas, doesn't store the array's dimension; it doesn't even tell you that the variable in question is an array."
Now that you are using C++, boost::array is a better choice than raw arrays. Because it's an object, you won't loose the dimention info now.
I think you can do this:
size_t size = sizeof(array)/sizeof(array[0]);
PS: I think that the title of this topic isn't correct, too.
Dude you can have a global variable to store the size of the array which will be accessible throughout the program. At least you can pass the size of the array from the main() function to the global variable and you will not even have to change the method signature as the size will be available globally.
Please see example:
#include<...>
using namespace std;
int size; //global variable
//your code
void doSomething(char charArray[])
{
//size available
}

C++ strings: [] vs. *

Been thinking, what's the difference between declaring a variable with [] or * ? The way I see it:
char *str = new char[100];
char str2[] = "Hi world!";
.. should be the main difference, though Im unsure if you can do something like
char *str = "Hi all";
.. since the pointer should the reference to a static member, which I don't know if it can?
Anyways, what's really bugging me is knowing the difference between:
void upperCaseString(char *_str) {};
void upperCaseString(char _str[]) {};
So, would be much appreciated if anyone could tell me the difference? I have a hunch that both might be compiled down the same, except in some special cases?
Ty
Let's look into it (for the following, note char const and const char are the same in C++):
String literals and char *
"hello" is an array of 6 const characters: char const[6]. As every array, it can convert implicitly to a pointer to its first element: char const * s = "hello"; For compatibility with C code, C++ allows one other conversion, which would be otherwise ill-formed: char * s = "hello"; it removes the const!. This is an exception, to allow that C-ish code to compile, but it is deprecated to make a char * point to a string literal. So what do we have for char * s = "foo"; ?
"foo" -> array-to-pointer -> char const* -> qualification-conversion -> char *. A string literal is read-only, and won't be allocated on the stack. You can freely make a pointer point to them, and return that one from a function, without crashing :).
Initialization of an array using a String literal
Now, what is char s[] = "hello"; ? It's a whole other thing. That will create an array of characters, and fill it with the String "hello". The literal isn't pointed to. Instead it is copied to the character-array. And the array is created on the stack. You cannot validly return a pointer to it from a function.
Array Parameter types.
How can you make your function accept an array as parameter? You just declare your parameter to be an array:
void accept_array(char foo[]);
but you omit the size. Actually, any size would do it, as it is just ignored: The Standard says that parameters declared in that way will be transformed to be the same as
void accept_array(char * foo);
Excursion: Multi Dimensional Arrays
Substitute char by any type, including arrays itself:
void accept_array(char foo[][10]);
accepts a two-dimensional array, whose last dimension has size 10. The first element of a multi-dimensional array is its first sub-array of the next dimension! Now, let's transform it. It will be a pointer to its first element again. So, actually it will accept a pointer to an array of 10 chars: (remove the [] in head, and then just make a pointer to the type you see in your head then):
void accept_array(char (*foo)[10]);
As arrays implicitly convert to a pointer to their first element, you can just pass an two-dimensional array in it (whose last dimension size is 10), and it will work. Indeed, that's the case for any n-dimensional array, including the special-case of n = 1;
Conclusion
void upperCaseString(char *_str) {};
and
void upperCaseString(char _str[]) {};
are the same, as the first is just a pointer to char. But note if you want to pass a String-literal to that (say it doesn't change its argument), then you should change the parameter to char const* _str so you don't do deprecated things.
The three different declarations let the pointer point to different memory segments:
char* str = new char[100];
lets str point to the heap.
char str2[] = "Hi world!";
puts the string on the stack.
char* str3 = "Hi world!";
points to the data segment.
The two declarations
void upperCaseString(char *_str) {};
void upperCaseString(char _str[]) {};
are equal, the compiler complains about the function already having a body when you try to declare them in the same scope.
Okay, I had left two negative comments. That's not really useful; I've removed them.
The following code initializes a char pointer, pointing to the start of a dynamically allocated memory portion (in the heap.)
char *str = new char[100];
This block can be freed using delete [].
The following code creates a char array in the stack, initialized to the value specified by a string literal.
char [] str2 = "Hi world!";
This array can be modified without problems, which is nice. So
str2[0] = 'N';
cout << str2;
should print Ni world! to the standard output, making certain knights feel very uncomfortable.
The following code creates a char pointer in the stack, pointing to a string literal... The pointer can be reassigned without problems, but the pointed block cannot be modified (this is undefined behavior; it segfaults under Linux, for example.)
char *str = "Hi all";
str[0] = 'N'; // ERROR!
The following two declarations
void upperCaseString(char *_str) {};
void upperCaseString(char [] _str) {};
look the same to me, and in your case (you want to uppercase a string in place) it really doesn't matters.
However, all this begs the question: why are you using char * to express strings in C++?
As a supplement to the answers already given, you should read through the C FAQ regarding arrays vs. pointers. Yes it's a C FAQ and not a C++ FAQ, but there's little substantial difference between the two languages in this area.
Also, as a side note, avoid naming your variables with a leading underscore. That's reserved for symbols defined by the compiler and standard library.
Please also take a look at the http://c-faq.com/aryptr/aryptr2.html The C-FAQ might prove to be an interesting read in itself.
The first option dynamically allocates 100 bytes.
The second option statically allocates 10 bytes (9 for the string + nul character).
Your third example shouldn't work - you're trying to statically-fill a dynamic item.
As to the upperCaseString() question, once the C-string has been allocated and defined, you can iterate through it either by array indexing or by pointer notation, because an array is really just a convenient way to wrap pointer arithmetic in C.
(That's the simple answer - I expect someone else will have the authoritative, complicated answer out of the spec :))