Declaring and initializing very large arrays on the heap

Declaring and initializing very large arrays on the heap - c++

I have an array of about 66,000 elements, with each element being a POD struct of integral data types. This array is constant and will never change, so my original thought was to just put it in as a constant global variable.
I declared it in a header as extern and initialized it in a cpp file, like (obviously simplified here):
const PODStruct bigArray[] =
{
{1,2,3,4} , {1,2,3,5} , ....
}
with some editing in a text editor so it wasn't just one continuous line.
--EDIT: I was reminded that global variables are of course not stack-allocated, so the paragraph that was here went away! However, what if I still would rather have the data in a vector?
I was thinking since C++11 allows that same syntax for std::vector initialization, I could just use that with a simple edit and have a vector instead. However, in MSVC++ 2013, when I attempt to compile it says I've hit compiler limits. I looked through the C++ standards for compiler limits and MSVC++13's deviations from it, but nothing seemed to be directly the cause. I'm guessing it has to do with how that initializer-list syntax is actually implemented.
I can get the array itself into a vector using the constructor in the answer here: How to initialize std::vector from C-style array?
However, then I'd still have the array in memory twice, right? It's not on the stack like I originally feared and not that big, so it's not a huge deal, but seems like a sloppy solution.
I'm thinking I could create a class with a default constructor, and declare and initialize the typed-out table in there. Then I can declare the vector in the constructor and construct it with the array. The class's only member could just be the vector.
I could just declare that class then create a global instance of it, right? That'd be similar to the behavior I had with the global array. If I wanted to get away from that, is the best approach to declare the class first thing under main, and then pass it around to the functions and methods that need the table?
Should I want to get away from that? This data is, despite a lot of it, along the lines of PI = 3.4.

Your idea about storing your "huge constant array" in a compile-size generated constant is ok, that what I'd do.
If you try to move all this to a vector or other variant of heap-allocated array, then you'd simply duplicate the data, since the initialization data resides in your executable image anyway.
To workaround the (idiotic) MSVC 2013 compiler limit that's what I'd try.
Switch to MSVC 2010 compiler. See the build options for your .cpp file, in MSVC 2013 you may set the "platform toolset" of MSVC 2010.
Try to redefine your data type. For instance, instead of having an array of structs, try an array of (constant) pointers to structs. All this should be compile-time generated as well.
With some efforts most probably you may work this around. Good luck.

It seem to me that you have hit some unknown file size barrier on MSVC and/or your computer hardware. Try load the data from an external file maybe (using mmap(2) preferably?)
Not really relevant, since this is a huge amount of data, yu can try look into things like OpenCL or CUDA to let the GPU help you crunch the numbers if possible. It would make things a lot faster.

Related

What does "-flax-vector-conversions" exactly mean for ARM compiler?

I am tring to write a xxx.toolchain.cmake from arm-linux-gnueabihf gcc/g++ compiler.
What confused me is, whether should I use -flax-vector-conversions compilation flag or not. I read the doc/man page of the compiler, and it tells:
-flax-vector-conversions
Allow implicit conversions between vectors with differing numbers of elements and/or incompatible element types. This option should not be used for new code.
(via https://gcc.gnu.org/onlinedocs/gcc/C-Dialect-Options.html)
My two confusions:
What does "vectors" mean in this explanation? Is there any example illustrate this?
What does "new code" mean? Why "new code" should not use with this compilation option?

GCC offers vector extensions that are meant to provide a way to access SIMD instructions in a machine-independent way (as opposed to intrinsics). This involves special vector types defined with __attribute__((vector_size(n))) to help the compiler understand the packed multiple-element data types that SIMD instructions use. Note that this has nothing to do with C++'s std::vector container.
Consider the following code:
typedef short eight_short __attribute__((vector_size(16)));
typedef int four_int __attribute__((vector_size(16)));
eight_short v1;
four_int v2;
void foo(void) {
v2 = v1;
}
Here four_int and eight_short are vectors of the corresponding number of elements and types. They are both 16 bytes and thus suitable to store in a 128-bit SIMD register. Assigning one to the other is clearly meant to "reinterpret" (aka bit-cast), but it also violates type safety. Presumably older versions of the compiler used to accept such code, and there may be code like this out there, but the compiler authors want to discourage it. So such code now causes an error by default, but they provide the option -flax-vector-conversions that you can use when compiling old code like this to suppress the error.
"New code" means code you are writing for the first time, where you have a choice as to how to write it. For such code you are most likely expected to use v2 = reinterpret_cast<four_int>(v1);, and not use -flax-vector-conversions. Then the compiler will flag any place where you forgot to cast (since it could be a bug where you actually meant something else).
If you're compiling legacy code, your best bet would be to first try building without this option. If it builds successfully, then the option is not needed, so don't use it. If it gets errors about conversions of vector types, you could consider using this option, or else rewrite the code with explicit casts where needed.

Visual Studio 2012 C++ arrays initialization using { }

i've just started programming in visual studio 2012 Express and from the beginning I'm having problems with arrays.
The environment says that this code is invalid:
int a[10] = {5,1,8,9,7, 2,3,11, 20,15};
First of all i had to declare that this array has fixed size using fixed keyword, but after that the program still has been wanting to put ; after a[10]. Filling up this array one number by one would be waste of time. Is it possible to work around it? I can't find any solution in google so I decided to post my problem here.

There's no fixed keyword in C++, perhaps in C#
The code you posted is perfectly valid in VS2012 Ultimate (and probably also Express)
From the above I might conclude you mismatched project and are trying to compile a C++ code in a C# environment.
Another reason that makes me think the above is the following error you get in a C# project if you try to compile the snippet above:
error CS0650: Bad array declarator: To declare a managed array the
rank specifier precedes the variable's identifier. To declare a fixed
size buffer field, use the fixed keyword before the field type.
which refers exactly to the fixed keyword you were trying to use.
Short story: you're trying to compile a C++ code in a C# project. Paste that code in a C++ project, not a C# one. Those are two different languages.

May be its too late to but you can use STL array for fix size arrays as
#include <array>
std::array<int, 5> ary { 1,2,3,4,5 }
This will be a fixed size array
As mentioned by Marco A. there is no "fixed" keyword in C++

C++ : how to make sure all variables are initialized?

Recently I had lots of trouble with a non initialized variable.
In Java, the default value of variable is null, therefore an exception is likely to be thrown when if the non-initialized variable is used. If I understood, in C++, the variable is initialized with whatever data turns out to be in the memory. Which means that the program is likely to run, and it might be hard to even know there is something wrong with it.
What would be the clean way to deal with this ? Is there some good programming habit that would reduce the risk ? In my case, the variable was declared in the header file and should have been initialized in the cpp file, which is an example of things that makes error more likely.
thx
Edition after receiving few answers:
My apologies, my question was not specific enough.
The answer I get to use flag for the compilers to get informed of non-initialized variables will be useful.
But there are rare cased variables can not be initialized at the beginning, because depending on the behavior of your system.
in header file
double learnedValue;
in cpp file
/* code that has nothing to do with learnedValue
...
*/
learnedValue = a*b*c; // values of a, b and c computed in the code above
/*code making use of learned value
...
*/
Now what happened is that forgot the line "learnedValue=a*b*c".
But the program was working good, just with value of learnedValue initialized with whatever what was in the memory when it was declared.
In Java, such error is not an issue, because the code making use of learned value is likely to crash or throw an exception (at least you get to know what was wrong).
In C++, you can apparently be happy and never get to know there is a problem at all. Or ?

Pls make sure you have appropriate warning levels set while compiling your program.
Compilers issue appropriate warning whenever un-initialized variables are used.
On g++, -Wall compiler option would show all warnings.
On Visual studio, you might have to use warning level 4.
Also, there are some static code analysis tool available in the market.
cppCheck is one such tool available for free.

You should not define a variable in a header (only declare it). Otherwise you will get other errors when you include the header in several .cpp files.
When actually defining a variable, you can also give it an initial value (like 0). In C++ it is also common to defer the definition of (local) variables until you have a value to assign to them.
In the header file
extern double learnedValue;
^^^^^^
In the cpp file
double learnedValue = 0;
/* code that has nothing to do with learnedValue
...
*/
learnedValue = a*b*c; // values of a, b and c computed in the code above
/*code making use of learned value
...
*/

you can define the variables on the spot they are declared

c++11 allows you to initialize variables inside class. If that is not implemented by the compiler yet then the constructor initialization list is the area to check.

The C# can initialize the variable. But C++ not, so when use a pointer without initialized, it always throw exception. You should make a good habit to initialize all the variables in the class constructor.

Fortran90 and size of arrays created in C++

I'm trying to call some Fortran 90 code from a C++ main program. The Fortran subroutine takes a array of double (call it X) as parameter, then proceeds to use size(X) in many places in the code. I call the routine with a C array created through
double *x = new double[21]
but when I print the result of size(X) in the Fortran code I get 837511505, or some other big numbers.
Right now I can modify the fortran code, so worst case is to rewrite the function, passing the size as a parameter. But I'd rather not do it.
Does anyone know if there's a way I can create the C array in such a way that the Fortran routine can figure out its size?

This is an implementation-specific feature. Many implementations (RSX and OpenVMS, for example) define a structure for passing a pointer to the data as well as a description of the dimensions, types, etc. Other implementations pass no such thing unless the external declaration explicitly invokes a mechanism to generate a descriptor. Most others provide no such mechanism.
Without knowing which implementation in use:
a) read the compiler's documentation
b) have the compiler generate assembly, and inspect it to see what it expects

Intel F95 uses array descriptor structure, which apart from the array pointer also store the bounds and dimension information. size() gets the information from the descriptor.
Since you're passing from C only pointer, no descriptor info is available, thus size() returns gibberish.
Generally, you're in the rough territory of mixed language programming, where arrays and structures are often a programmer's pain. Intel compiler user doc has a separate section about C<=>F95 mixed calling.
In particular, check about interfaces and binding -- a nice F95 feature that helps in inter-language calls.
The good news, C<=>F95 calling works very well once you stick to the conventions.

I personally use a ton of mixed coding from c++ to fortran 90/95/2003. I typically use gfortran as my compiler, but to avoid this issue, it is common practice to always send the size of the arrays. This even allows you to change the shape. Consider a 2 dimensional array containing x,y points:
double* x = new double[2*21]
real(8),intent(in),dimension(2,21)::x
This is a very handy feature and will then allow you to use the size command. The answers about compiler specifics are correct. To make your code usable on most compilers you should specify length when using multi-language interfaces.

why does C++ forbid the declaration of a parameter with no type?

I would like to have the following method as a generic method for any array,
int arrayLength(`anyType` array[])
{
return sizeof(array) / sizeof(array[0]);
}
However it appears C++ doesn't allow any ambiguity of types at all,
why is this, and how should I go about getting around it?

Because types have to be pushed onto the stack and then popped back off, and the sizeof one type is not equal to the sizeof another type.
If the size of types being passed on the stack between functions is not fixed or known in advance, how can the compiler compile a function?
The solutions to this problem -- as others have noted -- is templates and macros, both of which dynamically generate code -- which is then, in turn, compiled -- at compile-time, appearing to "solve" the problem, but really only obviating or distracting you from it by offloading the work onto the compiler.

In Visual C++ there's a __countof() construct that does the same. It's implemented as a template for C++ compiling and as a macro for C. The C++ version errors out if used on a pointer (as opposed to a true array), the C version does not.

I think what you're really asking is "Why does C++ insist on static typing?"
The answer: because it's easier to write a compiler that generates small, fast programs if the language uses static typing. And that's the purpose of C++: creating small, fast programs whose complexity would be relatively unmanageable if written in C.
When I say "small", I'm including the size of any required runtime libraries.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js