Nvidia flex data transfer - opengl

So I'm trying to use the flex API by NVIDIA for my game engine(as a core gameplay mechanic) and I'm now arranging my data structures. I've already read the flex Manual, but the descriptions are rather sparsely. Because I'm also using CUDA, I need to know if the flex API calls like flexSetParticles etc. also accept device pointers as inputs. Also, it would be nice if someone could tell me, what exactly flexUpdateSolver does. Does it compute the velocities itself? Does it calculate gravity? If no, and you have to calculate the updated velocities yourself, what does the Solver even do?
At the moment, I calculate the new positions and velocities myself(without flex) like this:
void updateParticle(int i, float deltaTime)
{
velocities[i] = types[i].getVelocity(deltaTime);
//calculates the currently fixed velocity at a given time
positions[i] = positions[i] + velocities[i];
}
All the arrays in the function above are device pointers and the function is actually a kernel. If I now have to calculate the velocities myself, I would have to
1.) update the arrays by adding new particles if necessary(from host to device) and calculate velocities(device)
2.) copy the new positions (and velocities) back to the CPU and hand them over to flex
3.) after flex has finished, copy the new positions from flexGetParticles back to the GPU (an OpenGL buffer for rendering)
This seems pretty inefficient, so I would like to know if there is an easier solution.

Yes, the flexUpdateSolver will calculate the positions and velocities for the particles internally. So, you must not do that yourself. Remember that, you have to call NvFlexGetParticles(particleBuffer, n) to get the updated positions and velocities after each time step.
As for the flexSetParticles, it takes either a Host or Device buffer pointer. You can create the buffer using NvFlexAllocBuffer by passing the appropriate NvFlexBufferType enum.

Related

DXR Descriptor Heap management for raytracing

After watching videos and reading the documentation on DXR and DX12, I'm still not sure how to manage resources for DX12 raytracing (DXR).
There is quite a difference between rasterizing and raytracing in terms of resource management, the main difference being that rasterizing has a lot of temporal resources that can be bound on the fly, and raytracing being in need of all resources being ready to go at the time of casting rays. The reason is obvious, a ray can hit anything in the whole scene, so we need to have every shader, every texture, every heap ready and filled with data before we cast a single ray.
So far so good.
My first test was adding all resources to a single heap - based on some DXR tutorials. The problem with this approach arises with objects having the same shaders but different textures. I defined 1 shader root signature for my single hit group, which I had to prepare before raytracing. But when creating a root signature, we have to exactly tell which position in the heap corresponds to the SRV where the texture is located. Since there are many textures with different positions in the heap, I would need to create 1 root signature per object with different textures. This of course is not preferred, since based on documentation and common sense, we should keep the root signature amount as small as possible.
Therefore, I discarded this test.
My second approach was creating a descriptor heap per object, which contained all local descriptors for this particular object (Textures, Constants etc..). The global resources = TLAS (Top Level Acceleration Structure), and the output and camera constant buffer were kept global in a separate heap. In this approach, I think I misunderstood the documentation by thinking I can add multiple heaps to a root signature. As I'm writing this post, I could not find a way of adding 2 separate heaps to a single root signature. If this is possible, I would love to know how, so any help is appreciated.
Here the code I'm usign for my root signature (using dx12 helpers):
bool PipelineState::CreateHitSignature(Microsoft::WRL::ComPtr<ID3D12RootSignature>& signature)
{
const auto device = RaytracingModule::GetInstance()->GetDevice();
if (device == nullptr)
{
return false;
}
nv_helpers_dx12::RootSignatureGenerator rsc;
rsc.AddRootParameter(D3D12_ROOT_PARAMETER_TYPE_SRV,0); // "t0" vertices and colors
// Add a single range pointing to the TLAS in the heap
rsc.AddHeapRangesParameter({
{2 /*t2*/, 1, 0, D3D12_DESCRIPTOR_RANGE_TYPE_SRV, 1}, /* 2nd slot of the first heap */
{3 /*t3*/, 1, 0, D3D12_DESCRIPTOR_RANGE_TYPE_SRV, 3}, /* 4nd slot of the first heap. Per-instance data */
});
signature = rsc.Generate(device, true);
return signature.Get() != nullptr;
}
Now my last approach would be to create a heap containing all necessary resources
-> TLAS, CBVs, SRVs (Textures) etc per object = 1x heap per object effectively. Again, as I was reading documentation, this was not advised, and documentation was stating that we should group resources to global heaps. At this point, I have a feeling I'm mixing DX12 and DXR documentation and best practices, by using proposals from DX12 in the DXR domain, which is probably wrong.
I also read partly through Nvidia Falcor source code and they seem to have 1 resource heap per descriptor type effectively limiting the number of descriptor heaps to a minimum (makes total sense) but I did not jet find how a root signature is created with multiple separate heaps.
I feel like I'm missing one last puzzle part to this mystery before it all falls into place and creates a beautiful image. So if anyone could explain how the resource management (heaps, descriptors etc.. ) should be handled in DXR if we want to have many objects which different resources, it would help me a lot.
So thanks in advance!
Jakub
With DXR you need to start at shader model 6.2 where dynamic indexing started to have a much more official support than just "the last descriptor is free to leak in seemingly-looking overrun indices" that was the "secret" approach in 5.1
Now you have full "bindless" using a type var[] : register(t4, 1); declarative syntax and you can index freely var[1] will access register (t5,1) etc.
You can setup register ranges in the descriptor table, so if you have 100 textures you can span 100.
You can even declare other resources after the array variable as long as you remember to jump all the registers. But it's easier to use different virtual spaces:
float4 ambiance : register(b0, 0);
Texture2D all_albedos[] : register(t0, 1);
matrix4x4 world : register(b1, 0);
Now you can go to t100 with no disturbance on the following space0 declarations.
The limit on the the register value is lifted in SM6. It's
up to max supported heap allocation
So all_albedos[3400].Sample(..) is a perfectly acceptable call (provided your heap has bound the views).
Unfortunatly in DX12 they give you the feeling you can bind multiple heaps with the CommandList::SetDescriptorHeaps function, but if you try you'll get runtime errors:
D3D12 ERROR: ID3D12CommandList::SetDescriptorHeaps: pDescriptorHeaps[1] sets a descriptor heap type that appears earlier in the pDescriptorHeaps array.
Only one of any given descriptor heap type can be set at a time. [ EXECUTION ERROR #554: SET_DESCRIPTOR_HEAP_INVALID]
It's misleading so don't trust that plural s in the method name.
Really if we have multiple heaps, that would only be because of triple buffering circular update/usage case, or upload/shader-visible I suppose. Just put everything in your one heap, and let the descriptor table index in it as demanded.
A descriptor table is a very lightweight element, it's just 3 ints. A descriptor start, a span and a virtual space. Just use that, you can span for 1000 textures if you have 1000 textures in your scene. You can get the material ID if you embed it into an indirection texture that would have unique UVs like a lightmap. Or in the vertex data, or just the whole hitgroup (if you setup for 1 hitgroup = 1 object). Your hitgroup index, which is given by a system value in the shader, will be your texture index.
Dynamic indexing of HLSL 5.1 might be the solution to this issue.
https://learn.microsoft.com/en-us/windows/win32/direct3d12/dynamic-indexing-using-hlsl-5-1
With dynamic indexing, we can create one heap containing all materials and use an index per object that will be used in the shader to take the correct material at run time
Therefore, we do not need multiple heaps of the same type, since it's not possible anyway. Only 1 heap per heap type is allowed at the same time

Where to alter reference code to extract motion vectors from HEVC encoded video

So this question has been asked a few times, but I think my C++ skills are too deficient to really appreciate the answers. What I need is a way to start with an HEVC encoded video and end with CSV that has all the motion vectors. So far, I've compiled and run the reference decoder, everything seems to be working fine. I'm not sure if this matters, but I'm interested in the motion vectors as a convenient way to analyze motion in a video. My plan at first is to average the MVs in each frame to just get a value expressing something about the average amount of movement in that frame.
The discussion here tells me about the TComDataCU class methods I need to interact with to get the MVs and talks about how to iterate over CTUs. But I still don't really understand the following:
1) what information is returned by these MV methods and in what format? With my limited knowledge, I assume that there are going to be something like 7 values associated with the MV: the frame number, an index identifying a macroblock in that frame, the size of the macroblock, the x coordinate of the macroblock (probably the top left corner?), the y coordinate of the macroblock, the x coordinate of the vector, and the y coordinate of the vector.
2) where in the code do I need to put new statements that save the data? I thought there must be some spot in TComDataCU.cpp where I can put lines in that print the data I want to a file, but I'm confused when the values are actually determined and what they are. The variable declarations look like this:
// create motion vector fields
m_pCtuAboveLeft = NULL;
m_pCtuAboveRight = NULL;
m_pCtuAbove = NULL;
m_pCtuLeft = NULL;
But I can't make much sense of those names. AboveLeft, AboveRight, Above, and Left seem like an asymmetric mix of directions?
Any help would be great! I think I would most benefit from seeing some example code. An explanation of the variables I need to pay attention to would also be very helpful.
At TEncSlice.cpp, you can access every CTU in loop
for( UInt ctuTsAddr = startCtuTsAddr; ctuTsAddr < boundingCtuTsAddr; ++ctuTsAddr )
then you can choose exact CTU by using address of CTU.
pCtu(TComDataCU class)->getCtuRsAddr().
After that,
pCtu->getCUMvField()
will return CTU's motion vector field. You can extract MV of CTU in that object.
For example,
TComMvField->getMv(g_auiRasterToZscan[y * 16 + x])->getHor()
returns specific 4x4 block MV's Horizontal element.
You can save these data after m_pcCuEncoder->compressCtu( pCtu ) because compressCtu determines all data of CTU such as CU partition and motion estimation, etc.
I hope this information helps you and other people!

C++ Maya - Getting mesh vertices from frame and subframe

I'm writing a mesh deformer plugin that gets info about the mesh from past frames to perform some calculations. In the past, to get past mesh info, I did the following
MStatus MyClass::deform(MDataBlock& dataBlock, MItGeometry& itGeo,
const MMatrix& localToWorldMatrix, unsigned int index)
{
MFnPointArrayData fnPoints;
//... other init code
MPlug meshPlug = nodeFn.findPlug(MString("inputMesh"));
// gets the mesh connection from the previous frame
MPlug meshPositionPlug = meshPlug.elementByLogicalIndex(0);
MObject objOldMesh;
meshPositionPlug.getValue(objOldMesh);
fnPoints.setObject(objOldMesh);
// previous frame's vertices
MPointArray oldMeshPositionVertices = fnPoints.array();
// ... calculations
return MS::kSuccess;
}
If I needed more than one frame I'd run for-loops over logical indices and repeat the process. Since creating this however, I've found that the needs of my plugin can't just get past frames but also frames in the future as well as subframes (between integer frames). Since my current code relies on elementByLogicalIndex() to get past frame info and that only takes unsigned integers, and the 0th index refers to the previous frame, I can't get subframe information. I haven't tried getting future frame info yet but I don't think that's possible either.
How do I query mesh vertex positions in an array for past/future/sub-frames? Is my current method inflexible and, if so, how else could I do this?
So, the "intended" way to accomplish this is with an MDGContext, either with an MDGContextGuard, or with the versions of MPlug.asMObject that explicitly take a context (though these are deprecated).
Having said that - in the past when I've tried to use MDGContexts to query values at other times, I've found them either VERY slow, unstable, or both. So use with caution. It's possible that things will work better if, as you say, you're dealing purely with objects coming straight from an alembic mesh. However, if that's the case, you may have better luck reading the cache path from the node, and querying through the alembic API directly yourself.

Matlab griddata equivalent in C++

I am looking for a C++ equivalent to Matlab's griddata function, or any 2D global interpolation method.
I have a C++ code that uses Eigen 3. I will have an Eigen Vector that will contain x,y, and z values, and two Eigen matrices equivalent to those produced by Meshgrid in Matlab. I would like to interpolate the z values from the Vectors onto the grid points defined by the Meshgrid equivalents (which will extend past the outside of the original points a bit, so minor extrapolation is required).
I'm not too bothered by accuracy--it doesn't need to be perfect. However, I cannot accept NaN as a solution--the interpolation must be computed everywhere on the mesh regardless of data gaps. In other words, staying inside the convex hull is not an option.
I would prefer not to write an interpolation from scratch, but if someone wants to point me to pretty good (and explicit) recipe I'll give it a shot. It's not the most hateful thing to write (at least in an algorithmic sense), but I don't want to reinvent the wheel.
Effectively what I have is scattered terrain locations, and I wish to define a rectilinear mesh that nominally follows some distance beneath the topography for use later. Once I have the node points, I will be good.
My research so far:
The question asked here: MATLAB functions in C++ produced a close answer, but unfortunately the suggestion was not free (SciMath).
I have tried understanding the interpolation function used in Generic Mapping Tools, and was rewarded with a headache.
I briefly looked into the Grid Algorithms library (GrAL). If anyone has commentary I would appreciate it.
Eigen has an unsupported interpolation package, but it seems to just be for curves (not surfaces).
Edit: VTK has a matplotlib functionality. Presumably there must be an interpolation used somewhere in that for display purposes. Does anyone know if that's accessible and usable?
Thank you.
This is probably a little late, but hopefully it helps someone.
Method 1.) Octave: If you're coming from Matlab, one way is to embed the gnu Matlab clone Octave directly into the c++ program. I don't have much experience with it, but you can call the octave library functions directly from a cpp file.
See here, for instance. http://www.gnu.org/software/octave/doc/interpreter/Standalone-Programs.html#Standalone-Programs
griddata is included in octave's geometry package.
Method 2.) PCL: They way I do it is to use the point cloud library (http://www.pointclouds.org) and VoxelGrid. You can set x, and y bin sizes as you please, then set a really large z bin size, which gets you one z value for each x,y bin. The catch is that x,y, and z values are the centroid for the points averaged into the bin, not the bin centers (which is also why it works for this). So you need to massage the x,y values when you're done:
Ex:
//read in a list of comma separated values (x,y,z)
FILE * fp;
fp = fopen("points.xyz","r");
//store them in PCL's point cloud format
pcl::PointCloud<pcl::PointXYZ>::Ptr basic_cloud_ptr (new pcl::PointCloud<pcl::PointXYZ>);
int numpts=0;
double x,y,z;
while(fscanf(fp, "%lg, %lg, %lg", &x, &y, &z)!=EOF)
{
pcl::PointXYZ basic_point;
basic_point.x = x; basic_point.y = y; basic_point.z = z;
basic_cloud_ptr->points.push_back(basic_point);
}
fclose(fp);
basic_cloud_ptr->width = (int) basic_cloud_ptr->points.size ();
basic_cloud_ptr->height = 1;
// create object for result
pcl::PointCloud<pcl::PointXYZ>::Ptr cloud_filtered(new pcl::PointCloud<pcl::PointXYZ>());
// create filtering object and process
pcl::VoxelGrid<pcl::PointXYZ> sor;
sor.setInputCloud (basic_cloud_ptr);
//set the bin sizes here. (dx,dy,dz). for 2d results, make one of the bins larger
//than the data set span in that axis
sor.setLeafSize (0.1, 0.1, 1000);
sor.filter (*cloud_filtered);
So that cloud_filtered is now a point cloud that contains one point for each bin. Then I just make a 2-d matrix and go through the point cloud assigning points to their x,y bins if I want an image, etc. as would be produced by griddata. It works pretty well, and it's much faster than matlab's griddata for large datasets.

Slow C++ DirectX 2D Game

I'm new to C++ and DirectX, I come from XNA.
I have developed a game like Fly The Copter.
What i've done is created a class named Wall.
While the game is running I draw all the walls.
In XNA I stored the walls in a ArrayList and in C++ I've used vector.
In XNA the game just runs fast and in C++ really slow.
Here's the C++ code:
void GameScreen::Update()
{
//Update Walls
int len = walls.size();
for(int i = wallsPassed; i < len; i++)
{
walls.at(i).Update();
if (walls.at(i).pos.x <= -40)
wallsPassed += 2;
}
}
void GameScreen::Draw()
{
//Draw Walls
int len = walls.size();
for(int i = wallsPassed; i < len; i++)
{
if (walls.at(i).pos.x < 1280)
walls.at(i).Draw();
else
break;
}
}
In the Update method I decrease the X value by 4.
In the Draw method I call sprite->Draw (Direct3DXSprite).
That the only codes that runs in the game loop.
I know this is a bad code, if you have an idea to improve it please help.
Thanks and sorry about my english.
Try replacing all occurrences of at() with the [] operator. For example:
walls[i].Draw();
and then turn on all optimisations. Both [] and at() are function calls - to get the maximum performance you need to make sure that they are inlined, which is what upping the optimisation level will do.
You can also do some minimal caching of a wall object - for example:
for(int i = wallsPassed; i < len; i++)
{
Wall & w = walls[i];
w.Update();
if (w.pos.x <= -40)
wallsPassed += 2;
}
Try to narrow the cause of the performance problem (also termed profiling). I would try drawing only one object while continue updating all the objects. If its suddenly faster, then its a DirectX drawing problem.
Otherwise try drawing all the objects, but updating only one wall. If its faster then your update() function may be too expensive.
How fast is 'fast'?
How slow is'really slow'?
How many sprites are you drawing?
How big is each one as an image file, and in pixels drawn on-screen?
How does performance scale (in XNA/C++) as you change the number of sprites drawn?
What difference do you get if you draw without updating, or vice versa
Maybe you just have forgotten to turn on release mode :) I had some problems with it in the past - I thought my code was very slow because of debug mode. If it's not it, you can have a problem with rendering part, or with huge count of objects. The code you provided looks good...
Have you tried multiple buffers (a.k.a. Double Buffering) for the bitmaps?
The typical scenario is to draw in one buffer, then while the first buffer is copied to the screen, draw in a second buffer.
Another technique is to have a huge "logical" screen in memory. The portion draw in the physical display is a viewport or view into a small area in the logical screen. Moving the background (or screen) just requires a copy on the part of the graphics processor.
You can aid batching of sprite draw calls. Presumably Your draw call calls your only instance of ID3DXSprite::Draw with the relevant parameters.
You can get much improved performance by doing a call to ID3DXSprite::Begin (with the D3DXSPRITE_SORT_TEXTURE flag set) and then calling ID3DXSprite::End when you've done all your rendering. ID3DXSprite will then sort all your sprite calls by texture to decrease the number of texture switches and batch the relevant calls together. This will improve performance massively.
Its difficult to say more, however, without seeing the internals of your Update and Draw calls. The above is only a guess ...
To draw every single wall with a different draw call is a bad idea. Try to batch the data into a single vertex buffer/index buffer and send them into a single draw. That's a more sane idea.
Anyway for getting an idea of WHY it goes slowly try with some CPU and GPU (PerfHud, Intel GPA, etc...) to know first of all WHAT's the bottleneck (if the CPU or the GPU). And then you can fight to alleviate the problem.
The lookups into your list of walls are unlikely to be the source of your slowdown. The cost of drawing objects in 3D will typically be the limiting factor.
The important parts are your draw code, the flags you used to create the DirectX device, and the flags you use to create your textures. My stab in the dark... check that you initialize the device as HAL (hardware 3d) rather than REF (software 3d).
Also, how many sprites are you drawing? Each draw call has a fair amount of overhead. If you make more than couple-hundred per frame, that will be your limiting factor.