Every day, I process the DimDate dimension before processing the cube. If I process the cube without processing the dimension, it's very likely the cube will generate date errors.
Regardless of how I process the dimension (ProcessFull or ProcessData) the cube becomes unavailable in Excel or PowerBi.
On the other hand, if I process the cube without ProcessData, the cube will be available in the server while it's being processed.
How can I process a Dimension without affecting the usability of the cube?
You need to use ProcessUpdate for dimensions and ProcessFull/ProcessData for the cube (potentially individual measure groups and/or individual partitions depending on the amount of you fact data).
It's still a good practice to run ProcessFull also for dimensions from time to time (for example, on a weekend or once a month), because ProcessUpdate makes soft (logical) deletes on the dimension members which don't exist anymore which will make your dimensions larger with the time.
Also please checkout the documentation
Related
I am currently working on a MD simulation. It stores the molecule positions in a vector. For each time step, that vector is stored for display in a second vector, resulting in
std::vector<std::vector<molecule> > data;
The size of data is time steps*<number of molecules>*sizeof(molecule), where sizeof(molecule) is (already reduced) 3*sizeof(double), as the position vector. Still I get memory problems for larger amounts of time steps and number of molecules.
Thus, is there an additional possibility to decrease the amount of data? My current workflow is that I calculate all molecules first, store them, and then render them by using the data of each molecule for each step, the rendering is done with Irrlicht. (Maybe later with blender).
If the trajectories are smooth, you can consider to compress data by storing only for every Nth step and restoring the intermediate positions by interpolation.
If the time step is small, linear interpolation can do. Top quality can be provided by cubic splines. Anyway, the computation of the spline coefficients is a global operation that you can only perform in the end and that required extra storage (!), and you might prefer Cardinal splines, which can be built locally, from four consecutive positions.
You could gain a factor of 2 improvement by storing the positions in single precision rather than double - it will be sufficient for rendering, if not for the simulation.
But ultimately you will need to store the results in a file and render offline.
I am working on a project to simulate a hard sphere model of a gas. (Similar to the ideal gas model.)
I have written my entire project, and it is working. To give you an idea of what I have done, there is a loop which does the following: (Pseudo code)
Get_Next_Collision(); // Figure out when the next collision will occur
Step_Time_Forwards(); // Step to time of collision
Process_Collision(); // Process collision between 2 particles
(Repeat)
For a large number of particles (say N particles), O(N*N) checks must be made to figure out when the next collision occurs. It is clearly inefficient to follow the above procedure, because in the vast majority of cases, collisions between pairs of particles are unaffected by the processing of a collision elsewhere. Therefore it is desirable to have some form of priority queue which stores the next event for each particle. (Actually, since a collision involves 2 particles, only half that number of events will be stored, because if A collides with B then B also collides with A, and at exactly the same time.)
I am finding it difficult to write such an event/collision priority queue.
I would like to know if there are any Molecular Dynamics simulators which have been written and which I can go and look at the source code in order to understand how I might implement such a priority queue.
Having done a google search, it is clear to me that there are many MD programs which have been written, however many of them are either vastly too complex or not suitable.
This may be because they have huge functionality, including the ability to produce visualizations or ability to compute the simulation for particles which have interacting forces acting between them, etc.
Some simulators are not suitable because they do calculations for a different model, ie: something other than the energy conserving, hard sphere model with elastic collisions. For example, particles interacting with potentials or non-spherical particles.
I have tried looking at the source code for LAMMPS, but it's vast and I struggle to make any sense of it.
I hope that is enough information about what I am trying to do. If not I can probably add some more info.
A basic version of a locality-aware system could look like this:
Divide the universe into a cubic grid (where each cube has side A, and volume A^3), where each cube is sufficiently large, but sufficiently smaller than the total volume of the system. Each grid cube is further divided into 4 sub-cubes whose particles it can theoretically give to its neighboring cubes (and lend for calculations).
Each grid cube registers particles that are contained within it and is aware of its neighboring grid cubes' contained particles.
Define a particle's observable universe to have a radius of (grid dimension/2). Define timestep=(griddim/2) / max_speed. This postulates that particles from a maximum of four, adjacent grid cubes can theoretically interact in that time period.
For every particle in every grid cube, run your traditional collision detection algorithm (with mini_timestep < timestep, where each particle is checked for possible collisions with other particles in its observable universe. Store the collisions into any structure sorted by time, even just an array, sorted by the time of collision.
The first collision that happens within a mini_timestep resets your universe(and universe clock) to (last_time + time_to_collide), where time_to_collide < mini_timestep. I suppose that does not differ from your current algorithm. Important note: particles' absolute coordinates are updated, but which grid cube and sub-cube they belong to are not updated.
Repeat step 5 until the large timestep has passed. Update the ownership of particles by each grid square.
The advantage of this system is that for each time window, we have (assuming uniform distribution of particles) O(universe_particles * grid_size) instead of O(universe_particles * universe_size) checks for collision. In good conditions (depending on universe size, speed and density of particles), you could improve the computation efficiency by orders of magnitude.
I didn't understand how the 'priority queue' approach would work, but I have an alternative approach that may help you. It is what I think #Boyko Perfanov meant with 'make use of locality'.
You can sort the particles into 'buckets', such that you don't have to check each particle against each other ( O(n²) ). This uses the fact that particles can only collide if they are already quite close to each other. Create buckets that represent a small area/volume, and fill in all particles that are currently in the area/volume of the bucket ( O(n) worst case ). Then check all particles inside a bucket against the other particles in the bucket ( O(m*(n/m)²) average case, m = number of buckets ). The buckets need to be overlapping for this to work, or else you could also check the particles from neighboring buckets.
Update: If the particles can travel for a longer distance than the bucket size, an obvious 'solution' is to decrease the time-step. However this will increase the running time of the algorithm again, and it works only if there is a maximum speed.
Another solution applicable even when there is no maximum speed, would be to create an additional 'high velocity' bucket. Since the velocity distribution is usually a gaussian curve, not many particles would have to be placed into that bucket, so the 'bucket approach' would still be more efficient than O(n²).
I have file with table containing 23 millions records the following form {atomName, x, y, z, transparence}. For solutions I decided to use OpenGL.
My task to render it. In first iteration, I used block "glBegin/glEnd" and have drawed every atom as point some color. This solution worked. But I got 0.002 fps.
Then i tried using VBO. I formed three buffers: vertex, color and indexes. This solution worked. I got 60 fps, but i have not comfortable binding buffers and i am drawing points, not spheres.
Then i read about VAO, which can simplify binding buffers. Ok, it is worked. I got comfortable binding.
Now i want to draw spheres, not points. I thought, to form relative to each point of the set of vertices on which it will be possible to build a sphere (with some accuracy). But if i have 23 million vertices, i must calculate yet ~12 or more vertices relaty every point. 23 000 000 * 4 (float) = 1 Gb data, perhaps it not good solution.
What is the best next move i should do? I can not fully understand, applicable shaders in this task or exist other ways.
About your drawing process
My task to render it. In first iteration, I used block "glBegin/glEnd" and have drawed every atom as point some color. This solution worked. But I got 0.002 fps.
Think about it: For every of your 23 million records you make at least one function call directly (glVertex) and probably several function calls implicitly by that. Even worse, glVertex likely causes a context switch. What this means is, that your CPU hits several speed bumps for every vertex it has to processes. A top notch CPU these days has a clock rate of about 3 GHz and a pipeline length in the order of 10 instructions. When you make a context switch that pipeline gets stalled, in the worst case it then takes one pipeline length to actually process one single instruction. Lets consider that you have to perform at least 1000 instructions for processing a single glVertex call (which is actually a rather optimistic estimation). That alone means, that you're limited to process at most 3 million vertices per second. So at 23 million vertices that's already less than one FPS then.
But you also got context switches in there, which add a further penality. And probably a lot of branching which create further pipeline flushes.
And that's just the glVertex call. You also have colors in there.
And you wonder that immediate mode is slow?
Of course it's slow. Using the Immediate Mode has been discouraged for well over 15 years. Vertex Arrays are available since OpenGL-1.1.
This solution worked. I got 60 fps,
Yes, because all the data resides on the GPU's own memory now. GPUs are massively parallel and optimized to crunch this kind of data and doing the operations they do.
but i have not comfortable binding buffers
Well, OpenGL is not a high level scene graph library. It's a mid to low level drawing API. You use it like a sophisticated pencil to draw on a digital canvas.
Then i read about VAO
Well, VAOs are meant to coalesce buffer objects that belong together so it makes sense using them.
Now i want to draw spheres, not points.
You have two options:
Using point sprite textures. This means that your points will get area when drawn, and that area gets a texture applied. I think this is the best method for you. Given the right shader you can even give your point sprite the right kind of depth values, so that your "spheres" will actually intersect like spheres in the depth buffer.
The other option is using instancing a single sphere geometry, using your atom records as control data for the instancing process. This would then process real sphere geometry. However I fear that implementing an instanced drawing process might be a bit too advanced for your skill level at the moment.
About drawing 23 million points
Seriously what kind of display do you have available, that you can draw 23 million, distinguishable points? Your typical computer screen will have some about 2000×1500 points. The highest resolution displays you can buy these days have about 4k×2.5k pixels, i.e. 10 million individual pixels. Let's assume your atoms are evenly distributed in a plane: At 23 million atoms to draw each pixel will get several times overdrawn. You simply can't display 23 million individual atoms that way. Another way to look at this is, that the display's pixel grid implies a spatial sampling and you can't reproduce anything smaller than twice the average sampling distance (sampling theorem).
So it absolutely makes sense to draw only a subset of the data, namely the subset that's actually in view. Also if you're zoomed very far out (i.e. you have the full dataset in view) it makes sense to coalesce atoms closeby.
It definitely makes sense to sort your data into a spatial subdivision structure. In your case I think an octree would be a good choice.
i am working on face detection - recognition project in opencv c++ , the code works really slow , there is a lag between the real camera feed and the processed feed , i dont want that lag to be visible to the user .
so can i have a function which just reads a frame from camera and displays it . and all the detection/recognition work can be done on other functions running in parallel ?
also i want my result to be visible on the screen ( a box around the face with necessary details) so can i transfer this data across functions . can i create a vector of Rect datatype which contains all these rectangle data , which can be accessed by all the functions to push new faces and to display them?
i am just searching for a solution to this problem , i know little about parallel computing , if there is any other alternative please give details
thanks
Rishi
Yes, you need to run face detection and recognition code in a separate thread. First you need to copy frame to use it on another thread.
Using vector of Rect will be convinient. But you need to lock mutex when you use vector to prevent problems with parallel access to the same data. And you need to lock mutex while copying frame.
I should note that if your face detection and recognition code runs very slowly, it will never give you up-to-date result: rectangles will be displaced.
First of all note one thing - there will be always some lag. Even if you just display image video from the camera (without any processing) it will be a bit delayed.
It's also important to optimize the process of face detection, parallel computing won't fix all you problems. Here i've written a bit about that (but it's mostly about eye detection within face). Anther technique which it's worth trying is checking whether region (part of image) in which you have found face in last frame have changed or not. General idea is quite simple - subtract region of new (actual) frame from the same region of old (previous one) frame. Then on the result image use binary threshold operation (you need to find threshold value on you own by trying different values - i'm not sure, but i think that i've used something about 30 - don't use too small value, because there is always some difference between two frames, because of noise and little changes in lighting etc). Then count all non-zero pixels and divide this number by number off all pixels of this region ( = width * height ) and multiply by 100. This number will be percentage of changed pixels. If this value is small, you don't have analyze current frame, you can just assume that results of analysis from previous frame are still actual and correct. Note that this technique is working fine only if background isn't changing quickly (like for example trees or water).
I have a process that accumulates mostly static data over time--and a lot of it, millions of data elements. It is possible that small parts of the data may change occasionally, but mostly, it doesn't change.
However, I want to allow the user the freedom to change how this data is viewed, both in shape and color.
Is there a way that I could store the data on the GPU just as data. Then have a number of ways to convert that data to something renderable on the GPU. The user could then choose between those algorithms and we swap it in efficiently without having to touch the data at all. Also, color ids would be in the data, but the user could change what color each id should match to, again, without touching the data.
So, for example, maybe there are the following data:
[1000, 602, 1, 1]
[1003, 602.5, 2, 2]
NOTE: the data is NOT vertices, but rather may require some computation or lookup to be converted to vertices.
The user can choose between visualization algorithms. Let's say one would display 2 cubes each at (0, 602, 0) and (3, 602.5, 100). The user chooses that color id 1 = blue and 2 = green. So the origin cube is shown as blue and the other as green.
Then without any modification to the data at all, the user chooses a different visualization and now a spheres are shown at (10, 602, 10) and (13, 602.5, 20) and the colors are different because the user changed the color mapping.
Yet another visualization might show lines between all the data elements, or a rectangle for each set of 4, etc.
Is the above description something that can be done in a straightforward way? How would it best be done?
Note that we would be adding new data, appending to the end, a lot. Bursts of thousands per second are likely. Modifications of existing data would be more rare and taking a performance hit for those cases is acceptable. User changing algorithm and color mapping would be relatively rare.
I'd prefer to do this using a cross platform API (across OS and GPU's), so I'm assuming OpenGL.
You can store your data in a VBO (in GPU memory) and update it when it changes.
On the GPU side, you can use a geometry shader to generate more geometry. Not sure how to switch from line to cube to sphere, but if you are drawing a triangle at each location, your GS can output "extra" triangles (ditto for lines and points).
As for the color change feature, you can bake that logic into the vertex shader. The idx (1, 2, ...) should be a vertex attribute; have the VS lookup a table giving idx -> color mappings (this could be stored as a small texture). You can update the texture to change the color mapping on the fly.
For applications like yours there are special GPGPU programming infrastructures: CUDA and OpenCL. OpenCL is the cross vendor system. CUDA is cross plattform, but supports only NVidia GPUs. Also OpenGL did introduce general purpose compute functionality in OpenGL-4.2 (compute shaders).
and a lot of it, millions of data elements
Millions is not a very lot, even if a single element consumed 100 bytes, that would be then only 100 MiB to transfert. Modern GPUs can transfer about 10 GiB/s from/to host system memory.
Is the above description something that can be done in a straightforward way? How would it best be done?
Yes it can be done. However only if you can parallelize your problem and make it's memory access pattern cater to what GPUs prefer you'll really see performance. Especially bad memory access patterns can cause several orders of magnitude performance loss.