I'm currently trying to write a program that processes a fairly large file (~16GB) and then performs analysis upon it. Ideally, I would do the data processing in C/C++ (I already have an efficient implementation written) and then do the analysis in Matlab to make use of its efficient algorithms and ease of use.
My natural inclination is to use MEX to call routines written in C at the beginning of the program and then continue in Matlab. What I want to know (and what I for whatever reason can't seem to find online) is the way in which memory would be shared if I were to use this method:
Say that I were to make a single large heap-allocated array in C to pass to Matlab. Would this array need to be copied in memory before my Matlab functions could work on it, or would Matlab be able to access the original array directly, with no extra copying? I assume and hope that this will work in the second way, but I would rather make sure before spending my time and effort.
Memory can indeed be shared if you used the functions provided by Matlab for this purpose. For example, to create a matrix that is passed back to matlab you can use something like this:
plhs[0] = mxCreateNumericArray(2, out_dims, mxDOUBLE_CLASS, mxREAL);
double *result = mxGetPr(plhs[0]);
That would create an array in place, that matlab will later use. You fill it in using *result, and since the memory was allocated by use of the mx functions, then matlab will delete it when appropriate.
Related
I'm about to use fftw3 library in my very certain task.
I have a heavy load packets stream with variable frame size, which is produced like that:
while(thereIsStillData){
copyDataToInputArray();
createFFTWPlan();
performExecution();
destroyPlan();
}
Since creating plans is rather expensive, I want to modify my code to something like this:
while(thereIsStillData){
if(inputArraySizeDiffers()) destroyOldAndCreateNewPlan();
copyDataToInputArray(); // e.g. `memcpy` or `std::copy`;
performExecution();
}
Can I do this? I mean, does plan contain some important information based on data such, that plan created for one array with size N, when executed will give incorrect results for the other array of same size N.
The fftw_execute() function does not modify the plan presented to it, and can be called multiple times with the same plan. Note, however, that the plan contains pointers to the input and output arrays, so if copyDataToInputArray() involves creating a different input (or output) array then you cannot afterwards use the old plan in fftw_execute() to transform the new data.
FFTW does, however, have a set of "New-array Execute Functions" that could help here, supposing that the new arrays satisfy some additional similarity criteria with respect to the old (see linked docs for details).
The docs do recommend:
If you are tempted to use the new-array execute interface because you want to transform a known bunch of arrays of the same size, you should probably go use the advanced interface instead
but that's talking about transforming multiple arrays that are all in memory simultaneously, and arranged in a regular manner.
Note, too, that if your variable frame size is not too variable -- that is, if it is always one of a relatively small number of choices -- then you could consider keeping a separate plan in memory for each frame size instead of recomputing a plan every time one frame's size differs from the previous one's.
DISCLAIMER: I am at a very entry level in c++ (or any language)... I searched for similar questions but found none
I am trying to write a simple program which should make some operations on an array as big as int pop[100000000][4] (10^8); however my compiler crashs even for a int pop[130000][4] array... is there any way out? Am I using a wrong approach?
(For now I am limiting myself to a very simple program, my aim is to generate random numbers in the array[][0] every "turn" to simulate a population and work with that).
Thanks for your time and attention
An array of 130000 * 4 ints is going to be huge, and likely not something you want stored locally (in reality, on the stack where it generally won't fit).
Instead you can use dynamic allocation to get heap storage, the recommended means would be a vector of vectors
std::vector<std::vector<int>> pop(130000, std::vector<int>(4));
pop[12000][1] = 9; // expected syntax
vectors are dynamic, so know that they can be changed with all sorts of calls
If you're a new programmer and trying to write a simple programmer, you should consider not using 203KiB of ints
I'm using the Armadillo library in C++ for storing / calculating large matrices. It is my understanding that one should store large arrays / matrices dynamically (on the heap).
Suppose I declare a matrix
mat X;
and set the size to be (say) 500 rows, 500 columns with random entries:
X.randn(500,500);
Does Armadillo store X dynamically (i.e. on the heap) despite not using new or delete.? The reason I ask, is because it seems Armadillo allows me to declare a variable as:
mat::fixed<n_rows, n_cols>
which, I quote: "is generally faster than dynamic memory allocation, but the size of the matrix can't be changed afterwards (directly or indirectly)".
Regardless of the above -- should I use this:
mat A;
A.set_size(n-1,n-1);
or this:
mat *A = new mat;
(*A).set_size(n-1,n-1);
where n is between 1000 or 100000 and not known in advance.
Does Armadillo store X dynamically (i.e. on the heap) despite not
using new or delete.?
Yes. There will be some form of new or delete in the library code. You just don't notice it from the outside.
The reason I ask, is because it seems Armadillo
allows me to declare a variable as (mat::fixed ...)
You'd have to look into the source code to see what's going on exactly here. My guess is that it has some kind of internal logic that decides how to deal with things based on size. You would normally use mat::fixed for small matrices, though.
Following that, you should use
mat A(n-1,n-1);
if you know the size at that point already. In some cases,
mat A;
A.set_size(n-1,n-1);
might also be okay.
I can't think of a good reason to use your second option with the mat * pointer. First of all, libraries like armadillo handle their memory allocations internally, and developers take great care to get it right. Also, even if the memory code in the library was broken, your idea new mat wouldn't fix it: You would allocate memory for a mat object, but that object is certainly rather small. The big part is probably hidden behind something like a member variable T* data in the class mat, and you cannot influence how this is allocated from the outside.
I initially missed your comment on the size of n. As Mikhail says, dealing with 100000x100000 matrices will require much more care than simply thinking about the way you instantiate them.
I was told about memory mapped files as a possible way to get fast file i/o to store a 2d game tile map. The game will have frequent updates to the data where I will know the row/col to update so I can get direct access that way in the array. However looking at some examples I don't understand how this would work.
Does anyone have a small example of creating, reading, & writing to a memory map file of a struct, where the result would be a 1D array so I can access it for my game as map[row * MAX_ROW + col].tileID = x; for example. Boost or Win 32 would be fine I don't have a preference, but I find the examples online to be somewhat confusing and often have a hard time converting them to my desired result.
There's an example here that looks somewhat understandable: Problem with boost memory mapped files: they go to disk instead of RAM
Note the .data() member that gives you a char*, you could cast this to a pointer to an array of whatever you want given enough memory and go wild.
That said, I highly suspect that memory mapped files is the wrong solution here. Why not just load in your level using normal C++ (vector, classes, ifstreams, etc.), modify it however you like, and write it out again when you're done if you want the changes saved to disk?
I have a little problem here, i write c++ code to create an array but when i want to set array size to 100,000,000 or more i got an error.
this is my code:
int i=0;
double *a = new double[n*n];
this part is so important for my project.
When you think you need an array of 100,000,000 elements, what you actually need is a different data structure that you probably have never heard of before. Maybe a hash map, or maybe a sparse matrix.
If you tell us more about the actual problem you are trying to solve, we can provide better help.
In general, the only reason that would fail would be due to lack of memory/memory fragmentation/available address space. That is, trying to allocate 800MB of memory. Granted, I have no idea why your system's virtual memory can't handle that, but maybe you allocated a bunch of other stuff. It doesn't matter.
Your alternatives are to tricks like memory-mapped files, sparse arrays, and so forth instead of an explicit C-style array.
If you do not have sufficient memory, you may need to use a file to store your data and process it in smaller chunks.
Don't know if IMSL provides what you are looking for, however, if you want to work on smaller chunks you might devise an algorithm that can call IMSL functions with these small chunks and later merge the results. For example, you can do matrix multiplication by combining multiplication of sub-matrices.