Size of an Array.... in C/C++? - c++

Okay so you have and array A[]... that is passed to you in some function say with the following function prototype:
void foo(int A[]);
Okay, as you know it's kind of hard to find the size of that array without knowing some sort of ending variable or knowing the size already...
Well here is the deal though. I have seem some people figure it out on a challenge problem, and I don't understand how they did it. I wasn't able to see their source code of course, that is why I am here asking.
Does anyone know how it would even be remotely possible to find the size of that array?? Maybe something like what the free() function does in C??
What do you think of this??
template<typename E, int size>
int ArrLength(E(&)[size]){return size;}
void main()
{
int arr[17];
int sizeofArray = ArrLength(arr);
}

The signature of that function is not that of a function taking an array, but rather a pointer to int. You cannot obtain the size of the array within the function, and will have to pass it as an extra argument to the function.
If you are allowed to change the signature of the function there are different alternatives:
C/C++ (simple):
void f( int *data, int size ); // function
f( array, sizeof array/sizeof array[0] ); // caller code
C++:
template <int N>
void f( int (&array)[N] ); // Inside f, size N embedded in type
f( array ); // caller code
C++ (though a dispatch):
template <int N>
void f( int (&array)[N] ) { // Dispatcher
f( array, N );
}
void f( int *array, int size ); // Actual function, as per option 1
f( array ); // Compiler processes the type as per 2

You cannot do that. Either you have a convention to signal the end of the array (e.g. that it is made of non-zero integers followed by a 0), or you transmit the size of the array (usually as an additional argument).
If you use the Boehm garbage collector (which has a lot of benefit, in particular you allocate with GC_malloc and friends but you don't care about free-ing memory explicitly), you could use the GC_size function to give you the size of a GC_malloc-ed memory zone, but standard malloc don't have this feature.

You're asking what we think of the following code:
template<typename E, int size>
int ArrLength(E(&)[size]){return size;}
void main()
{
int arr[17];
int sizeofArray = ArrLength(arr);
}
Well, void main has never been standard, neither in C nor in C++.
It's int main.
Regarding the ArrLength function, a proper implementation does not work for local types in C++98. It does work for local types by C++11 rules. But in C++11 you can write just end(a) - begin(a).
The implementation you show is not proper: it should absolutely not have int template argument. Make that a ptrdiff_t. For example, in 64-bit Windows the type int is still 32-bit.
Finally, as general advice:
Use std::vector and std::array.
One relevant benefit of this approach is that it avoid throwing away the size information, i.e. it avoids creating the problem you're asking about. There are also many other advantages. So, try it.

The first element could be a count, or the last element could be a sentinel. That's about all I can think of that could work portably.
In new code, for container-agnostic code prefer passing two iterators (or pointers in C) as a much better solution than just passing a raw array. For container-specific code use the C++ containers like vector.

No you can't. Your prototype is equivalent to
void foo(int * A);
there is obviously no size information. Also implementation dependent tricks can't help:
the array variable can be allocated on the stack or be static, so there is no information provided by malloc or friends
if allocated on the heap, a user of that function is not forced to call it with the first element of an allocation.
e.g the following are valid
int B[22];
foo(B);
int * A = new int[33];
foo(A + 25);

This is something that I would not suggest doing, however if you know the address of the beginning of the array and the address of the next variable/structure defined, you could subtract the address. Probably not a good idea though.

Probably an array allocated at compile time has information on its size in the debug information of the executable. Moreover one could search in the code for all the address corresponding to compile time allocated variables and assume the size of the array is minus the difference between its starting address and the next closest starting address of any variable.
For a dinamically allocated variable it should be possible to get its size from the heap data structures.
It is hacky and system dependant, but it is still a possible solution.

One estimate is as follows: if you have for instance an array of ints but know that they are between (stupid example) 0..80000, the first array element that's either negative or larger than 80000 is potentially right past the end of the array.
This can sometimes work because the memory right past the end of the array (I'm assuming it was dynamically allocated) won't have been initialized by the program (and thus might contain garbage values), but might still be part of the allocated pages, depending on the size of the array. In other cases it will crash or fail to provide meaningful output.

All of the other answers are probably better, i.e. you either have to pass the length of the array or terminate it with a special byte sequence.
The following method is not portable, but it works for me in VS2005:
int getSizeOfArray( int* ptr )
{
int size = 0;
void* ptrToStruct = ptr;
long adr = (long)ptrToStruct;
adr = adr - 0x10;
void* ptrToSize = (void*)adr;
size = *(int*)ptrToSize;
size /= sizeof(int);
return size;
}
This is entirely dependent of the memory model of your compiler and system so, again, it is not portable. I bet there are equivalent methods for other platforms. I would never use this in a production environment, merely stating this as an alternative.

You can use this: int n = sizeof(A) / sizeof(A[0]);

Related

Pascal and Delphi Arrays to C/C++ Arrays

In pascal and delphi, arrays have their lengths stored at some offset in memory from the array's pointer. I found that the following code works for me and it gets the length of an array:
type PInt = ^Integer; //pointer to integer.
Function Length(Arr: PInt): Integer;
var
Ptr: PInt;
Begin
Ptr := Arr - sizeof(Integer);
Result := Ptr^ + 1;
End;
Function High(Arr: PInt): Integer; //equivalent to length - 1.
Begin
Result := (Arr - sizeof(Integer))^;
End;
I translated the above code into C++ and it thus becomes:
int Length(int* Arr)
{
int* Ptr = (Arr - sizeof(int));
return *reinterpret_cast<char*>(Ptr) + 1;
}
int High(int* Arr)
{
return *(Arr - sizeof(int));
}
Now assuming the above are equivalent to the Pascal/Delphi versions, how can I write a struct to represent a Pascal Array?
In other words, how can I write a struct such that the following is true:
Length(SomeStructPointer) = SomeStructPointer->size
I tried the following:
typedef struct
{
unsigned size;
int* IntArray;
} PSArray;
int main()
{
PSArray ps;
ps.IntArray = new int[100];
ps.size = 100;
std::cout<<Length((int*) &ps); //should print 100 or the size member but it doesn't.
delete[] ps.IntArray;
}
In Pascal and Delphi, arrays have their lengths stored at
some offset in memory from the array's pointer.
This is not so. The entire premise of your question is wrong. The Delphi functions you present do not work in general. They might work for dynamic arrays. But it is certainly not the case that you can pass an pointer to an array and be sure that the length is stored before it.
And in fact the Delphi code in the question does not even work for dynamic arrays. Your pointer arithmetic is all wrong. You read a value 16 bytes to the left rather than 4 bytes. And you fail to check for nil. So it's all a bit of a disaster really.
Moving on to your C++ code, you are reaping the result of this false premise. You've allocated an array. There's no reason to believe that the int to the left of the array holds the length. Your C++ code is also very broken. But there's little point attempting to fix it because it can never be fixed. The functions you define cannot be implemented. It is simply not the case that an array is stored adjacent to a variable containing the length.
What you are looking for in your C++ code is std::vector. That offers first class support for obtaining the length of the container. Do not re-invent the wheel.
If interop is your goal, then you need to use valid interop types. And Delphi managed dynamic arrays do not qualify. Use a pointer to an array, and a separately passed length.
Why? I can see no good reason to do this. Use idiomatic Pascal in Pascal, use idiomatic C++ in C++. Using sizeof like that also ignores padding, and so your results may vary from platform to platform.
If you want a size, store it in the struct. If you want a non-member length function, just write one that works with the way you wrote the struct. Personally, I suggest using std::array if the size won't change and std::vector if it will. If you absolutely need a non-member length function, try this:
template<typename T>
auto length(const T& t) -> decltype(t.size()) {
return t.size();
}
That will work with both std::array and std::vector.
PS: If you're doing this for "performance reasons", please profile your code and prove that there is a bottleneck before doing something that will become a maintenance hazard.

Defining Array C/C++

What is the difference between this two array definitions and which one is more correct and why?
#include <stdio.h>
#define SIZE 20
int main() {
// definition method 1:
int a[SIZE];
// end definition method 1.
// defintion method 2:
int n;
scanf("%d", &n);
int b[n];
// end definition method 2.
return 0;
}
I know if we read size, variable n, from stdin, it's more correct to define our (block of memory we'll be using) array as a pointer and use stdlib.h and array = malloc(n * sizeof(int)), rather than decalring it as int array[n], but again why?
It's not "more correct" or "less correct". It either is xor isn't correct. In particular, this works in C, but not in C++.
You are declaring dynamic arrays. Better way to declare Dynamic arrays as
int *arr; // int * type is just for simplicity
arr = malloc(n*sizeof(int*));
this is because variable length arrays are only allowed in C99 and you can't use this in c89/90.
In (pre-C99) C and C++, all types are statically sized. This means that arrays must be declared with a size that is both constant and known to the compiler.
Now, many C++ compilers offer dynamically sized arrays as a nonstandard extension, and C99 explicitly permits them. So int b[n] will most likely work if you try it. But in some cases, it will not, and the compiler is not wrong in those cases.
If you know SIZE at compile-time:
int ar[SIZE];
If you don't:
std::vector<int> ar;
I don't want to see malloc anywhere in your C++ code. However, you are fundamentally correct and for C that's just what you'd do:
int* ptr = malloc(sizeof(int) * SIZE);
/* ... */
free(ptr);
Variable-length arrays are a GCC extension that allow you to do:
int ar[n];
but I've had issues where VLAs were disabled but GCC didn't successfully detect that I was trying to use them. Chaos ensues. Just avoid it.
Q1 : First definition is the static array declaration. Perfectly correct.
It is when you have the size known, so no comparison with VLA or malloc().
Q2 : Which is better when taking size as an input from the user : VLA or malloc .
VLA : They are limited by the environment's bounds on the size of automatic
allocation. And automatic variables are usually allocated on the stack which is relatively
small.The limitation is platform specific.Also, this is in c99 and above only.Some ease of use while declaring multidimensional arrays is obtained by VLA.
Malloc : Allocates from the heap.So, for large size is definitely better.For, multidimensional arrays pointers are involved so a bit complex implementataion.
Check http://bytes.com/topic/c/answers/578354-vla-feature-c99-vs-malloc
I think that metod1 could be little bit faster, but both of them are correct in C.
In C++ first will work, but if you want to use a second you should use:
int size = 5;
int * array = new int[size];
and remember to delete it:
delete [] array;
I think it gives you more option to use while coding.
If you use malloc or other dynamic allocation to get a pointer. You will use like p+n..., but if you use array, you could use array[n]. Also, while define pointer, you need to free it; but array does not need to free.
And in C++, we could define user-defined class to do such things, and in STL, there is std::vector which do the array-things, and much more.
Both are correct. the declaration you use depends on your code.
The first declaration i.e. int a[size]; creates an array with a fixed size of 20 elements.
It is helpful when you know the exact size of the array that will be used in the code. for example, you are generating
table of a number n up till its 20th multiple.
The second declaration allows you to make an array of the size that you desire.
It is helpful when you will need an array of different sizes, each time the code is executed for example, you want to generate the fibonacci series till n. In that case, the size of the array must be n for each value of n. So say you have n = 5, in this case int a [20] will waste memory because only the first five slots will be used for the fibonacci series and the rest will be empty. Similarly if n = 25 then your array int a[20] will become too small.
The difference if you define array using malloc is that, you can pass the size of array dynamically i.e at run time. You input a value your program has during run time.
One more difference is that arrays created using malloc are allocated space on heap. So they are preserved across function calls unlike static arrays.
example-
#include<stdio.h>
#include<stdlib.h>
int main()
{
int n;
int *a;
scanf("%d",&n);
a=(int *)malloc(n*sizeof(int));
return 0;
}

Is it valid to access a multi dimensional C++ array as one contiguous block (on heap) [duplicate]

This question already has answers here:
May I treat a 2D array as a contiguous 1D array?
(6 answers)
Closed 8 years ago.
Related thread here: Does C99 guarantee that arrays are contiguous?
Apparently answer() isn't valid below, but could be re-written to use char * or cast to int[nElements] (possibly). I'll admit I don't understand the standard references and why a contiguous block of int couldn't be accessed via int* if properly aligned.
First is the following code block valid on most C++ platforms?
void answer(int *pData, size_t nElements)
{
for( size_t i=0; i<nElements; ++i )
pData[i] = 42;
}
void random_code()
{
int arr1[1][2][3][4]; // local allocation
answer(arr1, sizeof(arr1) / sizeof(int));
int arr2[20][15];
answer(arr2, sizeof(arr2) / sizeof(int));
}
Second does answer() remain valid for all allocation types (global, local, heap(hopefully correct!))?
int g_arr[20][15]; // global
void foo() {
int (*pData)[10] = new int[50][10]; // heap allocation, at least partially
answer(&pData[0][0], 50*10);
// not even sure if delete[] will free pData correctly, but...
}
Yes, most platforms will indeed pack the elements of an N-dimensional array in such a way that linear addressing on a pointer to the first element will find all of the elements.
It is actually hard (as in, I cannot figure it out) to come up with a standards compliant implementation that does not do this, as an array of arrays must pack said arrays, and the size of the array of arrays is the size of each sub array times the number of arrays of arrays. There does not seem to be room for it not to work. Even the ordering of each element seems to be well defined.
Despite this, no clause in the standard I am aware of lets you actually reinterpret the pointer to the first element of a multi dimensional array as a pointer to an array of the product. Many clauses talk about how you can only access the elements of the array, or one-past-the-end.
The code in answer() is fine. The code in random_code() is misusing answer() (or not calling the overload of answer() shown in the question). It should be:
void random_code()
{
int arr1[1][2][3][4];
answer(&arr1[0][0][0][0], sizeof(arr1) / sizeof(int));
int arr2[20][15];
answer(&arr2[0][0], sizeof(arr2) / sizeof(int));
}
The code in answer() expects an int *; you were passing an int (*)[2][3][4] and an int (*)[15], neither of which looks like int *.
This remains valid for other allocation mechanisms that allocate a single contiguous block of data, such as the ones shown.
As the previous person said, there's a type error in your code. You're trying to use an int ()[X] type actual argument for an int formal argument. So to make your code work, you should use type casting.
C++/C uses the same memory layout for data types not depending on what section of memory is used for allocating an object so that the same code can be used for values where they are. So the answer to your second question, is if your code is working on stack-allocated values, it is going to work with a heap-allocated value too.

Memset Definition and use

What's the usefulness of the function memset()?.
Definition: Sets the first num bytes of the block of memory pointed by ptr to the
specified value (interpreted as an unsigned char).
Does this mean it hard codes a value in a memory address?
memset(&serv_addr,0,sizeof(serv_addr) is the example that I'm trying to understand.
Can someone please explain in a VERY simplified way?
memset() is a very fast version of a relatively simple operation:
void* memset(void* b, int c, size_t len) {
char* p = (char*)b;
for (size_t i = 0; i != len; ++i) {
p[i] = c;
}
return b;
}
That is, memset(b, c, l) set the l bytes starting at address b to the value c. It just does it much faster than in the above implementation.
memset() is usually used to initialise values. For example consider the following struct:
struct Size {
int width;
int height;
}
If you create one of these on the stack like so:
struct Size someSize;
Then the values in that struct are going to be undefined. They might be zero, they might be whatever values happened to be there from when that portion of the stack was last used. So usually you would follow that line with:
memset(&someSize, 0, sizeof(someSize));
Of course it can be used for other scenarios, this is just one of them. Just think of it as a way to simply set a portion of memory to a certain value.
memset is a common way to set a memory region to 0 regardless of the data type. One can say that memset doesn't care about the data type and just sets all bytes to zero.
IMHO in C++ one should avoid doing memset when possible since it circumvents the type safety that C++ provides, instead one should use constructor or initialization as means of initializing. memset done on a class instance may also destroy something unintentionally:
e.g.
class A
{
public:
shared_ptr<char*> _p;
};
a memset on an instance of the above would not do a reference counter decrement properly.
I guess that serv_addr is some local or global variable of some struct type -perhaps struct sockaddr- (or maybe a class).
&serv_addr is taking the address of that variable. It is a valid address, given as first argument to memset. The second argument to memset is the byte to be used for filling (zero byte). The last argument to memset is the size, in bytes, of that memory zone to fill, which is the size of that serv_addr variable in your example.
So this call to memset clears a global or local variable serv_addr containing some struct.
In practice, the GCC compiler, when it is optimizing, will generate clever code for that, usually unrolling and inlining it (actually, it is often a builtin, so GCC can generate very clever code for it).
It is nothing but setting the memory to particular value.
Here is example code.
Memset(const *p,unit8_t V,unit8_t L) , Here the P is the pointer to target memory, V is the value to the target buffer which will be set to a value V and l is the length of the data.
while(L --> 0)
{
*p++ = V;
}
memset- set bytes in memory
Synopsis-
#include<string.h>
void *memset(void *s,int c,size_t n)
Description- The memset() function shall copy c (converted to an unsigned char) into each of the first n bytes of the object pointed to by s.
Here for the above function , the memset() shall return s value.

C++ Why is this passed-by-reference array generating a runtime error?

void pushSynonyms (string synline, char matrizSinonimos [1024][1024]){
stringstream synstream(synline);
vector<int> synsAux;
int num;
while (synstream >> num) {synsAux.push_back(num);}
int index=0;
while (index<(synsAux.size()-1)){
int primerSinonimo=synsAux[index];
int segundoSinonimo=synsAux[++index];
matrizSinonimos[primerSinonimo][segundoSinonimo]='S';
matrizSinonimos [segundoSinonimo][primerSinonimo]='S';
}
}
and the call..
char matrizSinonimos[1024][1024];
pushSynonyms("1 7", matrizSinonimos)
It's important for me to pass matrizSinonimos by reference.
Edit: took away the & from &matrizSinonimos.
Edit: the runtime error is:
An unhandled win32 exception occurred in program.exe [2488]![alt text][1]
What's wrong with it
The code as you have it there - i can't find a bug. The only problem i spot is that if you provide no number at all, then this part will cause harm:
(synsAux.size()-1)
It will subtract one from 0u . That will wrap around, because size() returns an unsigned integer type. You will end up with a very big value, somewhere around 2^16 or 2^32. You should change the whole while condition to
while ((index+1) < synsAux.size())
You can try looking for a bug around the call side. Often it happens there is a buffer overflow or heap corruption somewhere before that, and the program crashes at a later point in the program as a result of that.
The argument and parameter stuff in it
Concerning the array and how it's passed, i think you do it alright. Although, you still pass the array by value. Maybe you already know it, but i will repeat it. You really pass a pointer to the first element of this array:
char matrizSinonimos[1024][1024];
A 2d array really is an array of arrays. The first lement of that array is an array, and a pointer to it is a pointer to an array. In that case, it is
char (*)[1024]
Even though in the parameter list you said that you accept an array of arrays, the compiler, as always, adjusts that and make it a pointer to the first element of such an array. So in reality, your function has the prototype, after the adjustments of the argument types by the compiler are done:
void pushSynonyms (string synline, char (*matrizSinonimos)[1024]);
Although often suggested, You cannot pass that array as a char**, because the called function needs the size of the inner dimension, to correctly address sub-dimensions at the right offsets. Working with a char** in the called function, and then writing something like matrizSinonimos[0][1], it will try to interpret the first sizeof(char**) characters of that array as a pointer, and will try to dereference a random memory location, then doing that a second time, if it didn't crash in between. Don't do that. It's also not relevant which size you had written in the outer dimension of that array. It rationalized away. Now, it's not really important to pass the array by reference. But if you want to, you have to change the whole thingn to
void pushSynonyms (string synline, char (&matrizSinonimos)[1024][1024]);
Passing by reference does not pass a pointer to the first element: All sizes of all dimensions are preserved, and the array object itself, rather than a value, is passed.
Arrays are passed as pointers - there's no need to do a pass-by-reference to them. If you declare your function to be:
void pushSynonyms(string synline, char matrizSinonimos[][1024]);
Your changes to the array will persist - arrays are never passed by value.
The exception is probably 0xC00000FD, or a stack overflow!
The problem is that you are creating a 1 MB array on the stack, which probably is too big.
try declaring it as:
void pushSynonyms (const string & synline, char *matrizSinonimos[1024] )
I believe that will do what you want to do. The way you have it, as others have said, creates a 1MB array on the stack. Also, changing synline from string to const string & eliminates pushing a full string copy onto the stack.
Also, I'd use some sort of class to encapsulate matrizSinonimos. Something like:
class ms
{
char m_martix[1024][1024];
public:
pushSynonyms( const string & synline );
}
then you don't have to pass it at all.
I'm at a loss for what's wrong with the code above, but if you can't get the array syntax to work, you can always do this:
void pushSynonyms (string synline, char *matrizSinonimos, int rowsize, int colsize )
{
// the code below is equivalent to
// char c = matrizSinonimos[a][b];
char c = matrizSinonimos( a*rowsize + b );
// you could also Assert( a < rowsize && b < colsize );
}
pushSynonyms( "1 7", matrizSinonimos, 1024, 1024 );
You could also replace rowsize and colsize with a #define SYNONYM_ARRAY_DIMENSION 1024 if it's known at compile time, which will make the multiplication step faster.
(edit 1) I forgot to answer your actual question. Well: after you've corrected the code to pass the array in the correct way (no incorrect indirection anymore), it seems most probable to me that you did not check you inputs correctly. You read from a stream, save it into a vector, but you never checked whether all the numbers you get there are actually in the correct range. (end edit 1)
First:
Using raw arrays may not be what you actually want. There are std::vector, or boost::array. The latter one is compile-time fixed-size array like a raw-array, but provides the C++ collection type-defs and methods, which is practical for generic (read: templatized) code.
And, using those classes there may be less confusion about type-safety, pass by reference, by value, or passing a pointer.
Second:
Arrays are passed as pointers, the pointer itself is passed by value.
Third:
You should allocate such big objects on the heap. The overhead of the heap-allocation is in such a case insignificant, and it will reduce the chance of running out of stack-space.
Fourth:
void someFunction(int array[10][10]);
really is:
(edit 2) Thanks to the comments:
void someFunction(int** array);
void someFunction(int (*array)[10]);
Hopefully I didn't screw up elsewhere....
(end edit 2)
The type-information to be a 10x10 array is lost. To get what you've probably meant, you need to write:
void someFunction(int (&array)[10][10]);
This way the compiler can check that on the caller side the array is actually a 10x10 array. You can then call the function like this:
int main() {
int array[10][10] = { 0 };
someFunction(array);
return 0;
}