3D lookup table to discretize the volume - c++

I have a depth camera that returns measured distance values of the volume in millimeters. It's needed to create a 3D lookup table to store all possible distance values for each pixel in the image. So I am getting an array of the size 640x480x2048. This approach is very memory consuming and if I use integers in C++ it takes about 2.5 GB of RAM. Additionally, I also have some parameters for each item in the volume, so all together it reaches maximum capacity of my 4GB memory.
My question is: Is there any good experience how I can optimally store and manage above described data set?
P.S Please don't consider the option of file storage. It doesn't fit me.
Thanks in advance

Related

Perform multi-scale training (yolov2)

I am wondering how the multi-scale training in YOLOv2 works.
In the paper, it is stated that:
The original YOLO uses an input resolution of 448 × 448. ith the addition of anchor boxes we changed the resolution to 416×416. However, since our model only uses convolutional and pooling layers it can be resized on the fly. We want YOLOv2 to be robust to running on images of different sizes so we train this into the model. Instead of fixing the input image size we change the network every few iterations. Every 10 batches our network randomly chooses a new image dimension size. "Since our model downsamples by a factor of 32, we pull from the following multiples of 32: {320, 352, ..., 608}. Thus the smallest option is 320 × 320 and the largest is 608 × 608. We resize the network to that dimension and continue training. "
I don't get how a network with only convolutional and pooling layers allow input of different resolutions. From my experience of building neural networks, if you change the resolution of the input to different scale, the number of parameters of this network will change, that is, the structure of this network will change.
So, how does YOLOv2 change this on the fly?
I read the configuration file for yolov2, but all I got was a random=1 statement...
if you only have convolutional layers, the number of weights does not change with the size of the 2D part of the layers (but it would change if you resized the number of channels, too).
for example (imagined network), if you have 224x224x3 input images and a 3x3x64 convolutional layer, you will have 64 different 3*3*3 convolutional filter kernels = 1728 weights. This value does not depend on the size of the image at all, since a kernel is applied on each position of the image independently, this is the most important thing of convolution and convolutional layers and the reason, why CNNs can go so deep, and why in faster R-CNN you can just crop the regions out of your feature map.
If there were any fully connected layers or something, it would not work this way, since there, bigger 2D layer dimension would lead to more connections and more weights.
In yolo v2, there is one thing that might look still not fitting right. For example if you double the image size in each dimension, you'll end up with 2 times the number of features in each dimension, right before the final 1x1xN filter, like if your grid was 7x7 for the original network size, the resized network might have 14x14. But then you'll just get 14x14 * B*(5+C) regression results, just fine.
In YoLo if you are only using convolution layers , the size of the output gird changes.
For example if you have size of:
320x320, output size is 10x10
608x608, output size is 19x19
You then calculate loss on these w.r.t to the ground truth grid which is similarly adjusted.
Thus you can back propagate loss without adding any more parameters.
Refer yolov1 paper for the loss function:
Loss Function from the paper
You thus can in theory only adjust this function which depends upon the grid size and no model parameters and you should be good to go.
Paper Link: https://arxiv.org/pdf/1506.02640.pdf
In the video explanation by the author mentions the same.
Time: 14:53
Video Link

Memory saving system for molecule calculations

I am currently working on a MD simulation. It stores the molecule positions in a vector. For each time step, that vector is stored for display in a second vector, resulting in
std::vector<std::vector<molecule> > data;
The size of data is time steps*<number of molecules>*sizeof(molecule), where sizeof(molecule) is (already reduced) 3*sizeof(double), as the position vector. Still I get memory problems for larger amounts of time steps and number of molecules.
Thus, is there an additional possibility to decrease the amount of data? My current workflow is that I calculate all molecules first, store them, and then render them by using the data of each molecule for each step, the rendering is done with Irrlicht. (Maybe later with blender).
If the trajectories are smooth, you can consider to compress data by storing only for every Nth step and restoring the intermediate positions by interpolation.
If the time step is small, linear interpolation can do. Top quality can be provided by cubic splines. Anyway, the computation of the spline coefficients is a global operation that you can only perform in the end and that required extra storage (!), and you might prefer Cardinal splines, which can be built locally, from four consecutive positions.
You could gain a factor of 2 improvement by storing the positions in single precision rather than double - it will be sufficient for rendering, if not for the simulation.
But ultimately you will need to store the results in a file and render offline.

How to handle a large object size in c++ when doing image/video processing

I am using openCV to parse videos with >10000 frames. On each frame, I need to detect up to 10 markers, save their position in the current image and calculate and save an object's position based on the markers along with some more primitive type variables for each frame. I do not need to keep the image itself in the form of an openCV matrix once my calculation is done.
My idea for this was a state machine object that would contain a vector of n frame objects (those contain the markers positions in the specific frame along with some more data) and a vector of up to 10 marker objects (that contain the marker's ID along with their real world position) to load the next frame, detect the markers, run calculations, ... .
However, the state machine object's size would be at least 697 600 (sizeof state machine object at roughly 10 000 frames).
I am wondering what the best practice would be in this case. Is it bad practice to have objects this large in general? Is it acceptable to have an object of this size on the heap? Should I save each frame's data straight to a file (and read it out for a specific frame later on) or will this slow down the program too much with a large amount of read and write accesses? Are there other better suited ways for my case?
Thanks in advance for any help!

Scaling a Dijkstra's Algorithm implementation

I have a Graph with each Edge having some weight.
I have implemented dijkstra's Algorithm to find the shortest Path from Vertex A to B.
Weights for the Graph are read from a Key/Value DB. [redis.io].
Each Weights DB is around 2 GBs.
There are 50 DBs for weights. [Or 50 different files each 2 GB having weight values which I stored in the Redis.io].
To find the shortest Path, function FindPath(Start, End, DB_name) is used.
Dijkstras reads the weight values from memory[Redio.io is an in-memory key value store]. But my RAM is only 6GBs. It is not possible to store 2GBs * 50 DBs into the memory at the same time.
The request for the Path can be Random and Concurrent.
What is the best way to store the Weights DB?
Is increasing the RAM only option to increase the speed of the program execution?
EDIT
Number of Edges: 4,62,505
If speed is concerned the main option is to increase ram. You cannot achieve similar perfomance with a nosql DB (eg. mongodb). Another option would be to try to parallelize the algorithm on a multi core system. But this is very tough as the final solution is global.
[EDIT]
The fastest way to store the weights is a contiguous array of weights indexed by edge number. One array per DB. If all arrays cannot fit in your ram , you can design some basic caching mechanims , swaping DB from file to array (hoping not all db are accessed simultaneously).

Why rendering engines subdivide an image in small squares?

Imagine that I have an image in memory that is represented by an array or an std::vector; for the sake of this example I'm also assuming that my image is like 400x300 pixels and I want to subdivide this structure in squares ( or tiles ) that are 64x64 pixels maximum.
The array that I'm considering is declared like this
int a[400*300];
and not like
int a[400][300];
It's 1 nice contiguous chunk of memory.
My point is that you always try to keep the data structure and the access to that data structure as much as linear as possible . Subdividing the image in squares involves jumping from 1 row to another or from 1 column to another depending on how the image is laid out in memory. I have no problem in computing the boundaries for the squares given a size and the dimensions of the image, but things get a little bit too complicated when expressing the iteration over this squares, without me seeing any real benefit in having this approach.
So why the solution about this kind of subdivision steps is so popular ? why not just render like 1 row at time or 1 column at time ?
Memory locality / cache coherency. Most image processing operations operate in 2D and for efficient memory access you want pixels that are close to each other in 2D to be close to each other in memory. Arranging the data in blocks like this means that 2 pixels that have the same x coordinate and adjacent y coordinates will on average have closer memory addresses than if you used a simple linear layout.
There are more complex ways of laying out the image that are often used for textures when rendered by GPUs which give even better memory locality on average.