Store rendered layer

Store rendered layer - opengl

I'm using OpenTK (OpenGL wrapper for .NET) to draw 2D objects
Generally speaking I'm drawing two elements(which consists of smaller objects)
public void Draw()
{
DrawElement1(); // Element1 changes every 300ms
DrawElement2(); // Element2 changes every 50ms
}
In current implementation I must call Draw every 50ms in order to keep Element2 in latest state. In this situation I pointlessly refresh Element1 5 times.
So I need some way to store rendered state of Element1 in order to speed up my drawing
public void Draw()
{
if(needUpdateElement1)
DrawElement1();
else
DrawRenderedElement1();
DrawElement2();
}

What you want to do is to render "Element2" to an offscreen rendertarget and refresh that every 300ms. Per frame, simply map the rendertarget as a texture and render a quad so it blends correctly with "Element1"
Here is a nice tutorial for offscreen rendering using OpenTK.
The key thing to remember is that if the cost of switching rendertargets outweighs the cost of rendering your element then this might even slow things down. But if "Element2" is complex/heavy enough, this technique will help a lot.
Hope this helps!

Related

Is using Scale effect in render loop faster than pre-scaling bitmap?

Currently I draw images next way:
During load, using WIC, I obtain the original bitmap, store it as a property in object, that represents an image (ID2D1Bitmap *imageOriginal property).
Then (still at load time), I create compatible render target with the size I need image to be.
Draw image to the compatible target using scale effect.
Allocate new bitmap as property of object that represents an image (ID2D1Bitmap *imageScaled property).
Copy from compatible target to imageScaled.
Free compatible target. Here image load ends.
When already created image object need to be resized, I repeat steps 2-6. In the result, in render loop I have to only draw imageScaled.
I currently thinking about of removing 2-6 steps and just draw scale effect with imageOriginal passed from each image object in the render loop every time.
I do not know what exactly Direct2d Scale effect does. If it actually every time does something similar to steps 2-6, then, probably I don't need to do it.
In the other hand, in my render loop there is basic skip algorithm for objects that are out of parent view, so they are not drawn at all. In current realization I may need to wait time for pre scale objects that possibly out of view, and they will not be drawn currently. With Scale effect in render loop realization this problem will be solved.
Does anyone know which solution will be the fastest?

After rewriting my code, currently it seems that using Scale in render loop is faster for a single image.
Again, before that, when setImage method of the object that represents UI an image is called, something like that was happening:
void ImageObject::setImage(const wchar_t *path)
{
if(!wcscmp(this->path, path))
return;
SafeRelease(&this->originalImage);
// Load original image via WIC
this->scaledImage = RescaleImage(this->originalImage, this->width, this->height);
}
And in the main render loop:
void ImageObject::Render()
// render loop iterates through ImageObject objects array and calls each object's Render method
{
// skip is cached variable simply equals
// (this->x > this->parent->width || this->y > this->parent->height || etc)
if(skip)
return;
renderTarget->DrawBitmap(this->originalImage, rectangle);
}
Now it is like this:
void ImageObject::setImage(const wchar_t *path)
{
if(!wcscmp(this->path, path))
return;
SafeRelease(&this->originalImage);
// Obtain originalImage and that's it
}
void ImageObject::Render()
{
if(skip)
return;
globalScale->SetInput(0, this->originalImage);
globalScale->SetValue(D2D1_SCALE_PROP_SCALE, ...);
renderTarget->DrawImage(globalScale, point);
}
First method actually supposed to be more faster, because in the render I need to just draw plain bitmap.
As I wrote in the post, I though the second method should work faster in case of big amount of images, when part of them are out of screen, but currently, with this method, drawing one image is faster than with image prescaling method.

Is it okay to call SDL_RenderCopy() for each sprite?

This is a followup to my question here: Is it okay to have a SDL_Surface and SDL_Texture for each sprite?
I made an class called entity each having a SDL_Texture, which is set in the constructor and then a member function render() is called for every onscreen entity in a vector, which uses SDL_RenderCopy() to draw to the renderer.
This render() function includes generating rectangles for each sprite based on their position/cameradata
Is this okay? Is there a faster way?
I made a testlevel with 96 sprites that each take up 2% of the screen with tons of overdraw and ft is 15ms (~65fps)at a resolution of1600x900. Seems a little slow for just some sprites, and my computer breathes much heavier then when playing a full game such as spelunky or isaac.

Prefer frame time over FPS
You want to measure and judge your performance based on the frame time not FPS. Because the relation between the two is not linear. Going from 20 FPS to 30 FPS needs about 16.7 ms worth of optimization. That is the same amount of performance gain in optimization it takes to get from 30 FPS to 60 FPS. So if you judge performance based on FPS you would come to conclusion that a particular "optimization" that increased the FPS from 30 to 60 is better that the one that made a 20 FPS scene run 31 FPS. while the latter is actually a better optimization.
Batch your draws
If you pack all your textures into one and store each individual image's coordinates, you can use the same texture to draw many of your objects. This is limited by the size and number of your textures and also the maximum texture size supported in your environment. In my experiences 4096x4096 is safe but I prefer to use 2048x2048 "texture atlases". There are many utility programs to make such textures. You can easily find a suitable one by doing a Google search.
In this setup in addition to a SDL texture, each sprite also has the x, y, width and height of the region in the "big" texture containing the particular image needed. You can make a TextureRegion class. Each sprite then has a TextureRegion. This whole process is often referred to as batching. Look it up. The whole idea is to minimize state changes. I am not sure if it applies to software rendering or to all of SDL2 backends.
Cache your transformations
Batching your sprites will increase the performance in the GPU side. The CPU bound code is another optimization opportunity. Instead of calculating the parameters of SDL_RenderCopy in each frame, calculate them once and cache them. Then when the position/rotation of the camera or object changes, recalculate the cache. You can do this in "accessors" of your entity class (like setPosition, setRotaion, etc..). Note that instead of directly recalculating transform as soon as a position or rotation changes your want to flag the object as "dirty" and check for the dirty flag in the your render function. if this->isDirty Then recalculate and cache the transform. This prevents redundant calculations when you do this:
//if dirty flag is not used each of the following function calls
//would have resulted in a recalculation of transforms. However by
//using the dirty flag they will be calculated only once before
//the rendering of next frame in the render() function.
player->setPostion(start_x,start_y);
player->setRotation(0);
camera->reset();

So, I've done some more testing by examining the memory/cpu usage of this program at full screen with a "demanding" level and managed to make it similar to other games by enforcing a framerate cap with SDL_Wait()
float g_max_framerate = 60;
float g_max_frametime = 1/g_max_framerate * 1000;
...
while (!quit) {
lastticks = ticks;
ticks = SDL_GetTicks();
elapsed = ticks - lastticks;
...
SDL_RenderPresent(renderer);
//lock framerate
if(elapsed < g_max_frametime) {
SDL_Delay(g_max_frametime - elapsed);
}
}
With this limitation it is appropriatly lowspec.

Is there any way to save the path and restore it in Cairo?

I have two graphs of drawing signals on a gtkmm application.
The problem comes when I have to paint a graph with many points (around 300-350k) and lines to the following points since it slows down a lot to paint all the points each iteration.
bool DrawArea::on_draw(const Cairo::RefPtr<Cairo::Context>& c)
{
cairo_t* cr = c->cobj();
//xSignal.size() = ySignal.size() = 350000
for (int j = 0; j < xSignal.size() - 1; ++j)
{
cairo_move_to(cr, xSignal[j], ySignal[j]);
cairo_line_to(cr, xSignal[j + 1], ySignal[j + 1]);
}
cairo_stroke(cr);
return true;
}
I know that exist a cairo_stroke_preserve but i think is not valid for me because when I switch between graphs, it disappears.
I've been researching about save the path and restore it on the Cairo documentation but i don´t see anything. In 2007, a user from Cairo suggested in the documentation 'to do' the same thing but apparently it has not been done.
Any suggestion?

It's not necessary that you draw everything in on_draw. What I understand from your post is that you have a real-time waveform drawing application where samples are available at fixed periods (every few milliseconds I presume). There are three approaches you can follow.
Approach 1
This is good particularly when you have limited memory and do not care about retaining the plot if window is resized or uncovered. Following could be the function that receives samples (one by one).
NOTE: Variables prefixed with m_ are class members.
void DrawingArea::PlotSample(int nSample)
{
Cairo::RefPtr <Cairo::Context> refCairoContext;
double dNewY;
//Get window's cairo context
refCairoContext = get_window()->create_cairo_context();
//TODO Scale and transform sample to new Y coordinate
dNewY = nSample;
//Clear area for new waveform segment
{
refCairoContext->rectangle(m_dPreviousX
+ 1,
m_dPreviousY,
ERASER_WIDTH,
get_allocated_height()); //See note below on m_dPreviousX + 1
refCairoContext->set_source_rgb(0,
0,
0);
refCairoContext->fill();
}
//Setup Cairo context for the trace
{
refCairoContext->set_source_rgb(1,
1,
1);
refCairoContext->set_antialias(Cairo::ANTIALIAS_SUBPIXEL); //This is up to you
refCairoContext->set_line_width(1); //It's 2 by default and better that way with anti-aliasing
}
//Add sub-path and stroke
refCairoContext->move_to(m_dPreviousX,
m_dPreviousY);
m_dPreviousX += m_dXStep;
refCairoContext->line_to(m_dPreviousX,
dNewY);
refCairoContext->stroke();
//Update coordinates
if (m_dPreviousX
>= get_allocated_width())
{
m_dPreviousX = 0;
}
m_dPreviousY = dNewY;
}
While clearing area the X coordinate has to be offset by 1 because otherwise the 'eraser' will clear of the anti-aliasing on the last coulmn and your trace will have jagged edges. It may need to be more than 1 depending on your line thickness.
Like I said before, with this method your trace will get cleared if the widget is resized or 'revealed'.
Approach 2
Even here the sample are plotted the same way as before. Only difference is that each sample received is pushed directly into a buffer. When the window is resized or 'reveled' the widget's on_draw is called and there you can plot all the samples one time. Of course you'll need some memory (quite a lot if you have 350K samples in queue) but the trace stays on screen no matter what.
Approach 3
This one also takes up a little bit of memory (probably much more depending on the size of you widget), and uses an off-screen buffer. Here instead of storing samples we store the rendered result. Override the widgets on_map method and on_size_allocate to create an offsceen buffer.
void DrawingArea::CreateOffscreenBuffer(void)
{
Glib::RefPtr <Gdk::Window> refWindow = get_window();
Gtk::Allocation oAllocation = get_allocation();
if (refWindow)
{
Cairo::RefPtr <Cairo::Context> refCairoContext;
m_refOffscreenSurface =
refWindow->create_similar_surface(Cairo::CONTENT_COLOR,
oAllocation.get_width(),
oAllocation.get_height());
refCairoContext = Cairo::Context::create(m_refOffscreenSurface);
//TODO paint the background (grids may be?)
}
}
Now when you receive samples, instead of drawing into the window directly draw into the off-screen surface. Then block copy the off screen surface by setting this surface as your window's cairo context's source and then draw a rectangle to draw the newly plotted sample. Also in your widget's on_draw just set this surface as the source of widget's cairo context and do a Cairo::Context::paint(). This approach is particularly useful if your widget probably doesn't get resized and the advantage is that the blitting (where you transfer contents of one surface to the other) is way faster than plotting individual line segments.

To answer your question:
There is cairo_copy_path() and cairo_append_path() (there is also cairo_copy_path_flat() and cairo_path_destroy()).
Thus, you can save a path with cairo_copy_path() and later append it to the current path with cairo_append_path().
To answer your not-question:
I doubt that this will speed up your drawing. Appending these lines to the current path is unlikely to be slow. Rather, I would expect the actual drawing of these lines to be slow.
You write "it slows down a lot to paint all the points each iteration.". I am not sure what "each iteration" refers to, but why are you drawing all these points all the time? Wouldn't it make more sense to only draw them once and then to re-use the drawn result?

OpenGL: How to undo scaling?

I'm new to OpenGL. I'm using JOGL.
I have a WorldEntity class that represents a thing that can be rendered. It has attributes like position and size. To render, I've been using this method:
/**
* Renders the object in the world.
*/
public void render() {
gl.glTranslatef(getPosition().x, getPosition().y, getPosition().z);
gl.glRotatef(getRotationAngle(), getRotation().x, getRotation().y, getRotation().z);
// gl.glScalef(size, size, size);
gl.glCallList(drawID);
// gl.glScalef(1/size, 1/size, 1/size);
gl.glRotatef(-getRotationAngle(), getRotation().x, getRotation().y, getRotation().z);
gl.glTranslatef(-getPosition().x, -getPosition().y, -getPosition().z);
}
The pattern I've been using is applying each attribute of the entity (like position or rotation), then undoing it to avoid corrupting the state for the next entity to get rendered.
Uncommenting out the scaling lines causes the app to be much more sluggish as it renders a modest scene on my modest computer. I'm guessing that the float division is too much to handle thousands of operations per second. (?)
What is the correct way to go about this? Can I find a less computationally intensive way to undo a scaling transformation? Do I need to sort objects by scale and draw them in order to reduce scaling transformations required?
Thanks.

This is where you use matrices (bear with me, I come from a OpenGL/C programming background):
glMatrixMode(GL_MODELVIEW); // set the matrix mode to manipulate models
glPushMatrix(); // push the matrix onto the matrix stack
// apply transformations
glTranslatef(getPosition().x, getPosition().y, getPosition().z);
glRotatef(getRotationAngle(), getRotation().x, getRotation().y, getRotation().z);
glScalef(size, size, size);
glCallList(drawID); // drawing here
glPopMatrix(); // get your original matrix back
... at least, that's what I think it is.

It's very unlikely the divisions will cause any perf issue. rfw gave you the usual way of implementing this, but my guess is that your "slugish" rendering is mostly due to the fact that your GPU is the bottleneck, and using the matrix stacks will not improve perf.
When you increase the size of your drawn objects, more pixels have to be processed, and the GPU has to work significantly harder. What your CPU does at this point (the divisions) is irrelevant.
To prove my point, try to keep the scaling code in, but with sizes around 1.

Slow C++ DirectX 2D Game

I'm new to C++ and DirectX, I come from XNA.
I have developed a game like Fly The Copter.
What i've done is created a class named Wall.
While the game is running I draw all the walls.
In XNA I stored the walls in a ArrayList and in C++ I've used vector.
In XNA the game just runs fast and in C++ really slow.
Here's the C++ code:
void GameScreen::Update()
{
//Update Walls
int len = walls.size();
for(int i = wallsPassed; i < len; i++)
{
walls.at(i).Update();
if (walls.at(i).pos.x <= -40)
wallsPassed += 2;
}
}
void GameScreen::Draw()
{
//Draw Walls
int len = walls.size();
for(int i = wallsPassed; i < len; i++)
{
if (walls.at(i).pos.x < 1280)
walls.at(i).Draw();
else
break;
}
}
In the Update method I decrease the X value by 4.
In the Draw method I call sprite->Draw (Direct3DXSprite).
That the only codes that runs in the game loop.
I know this is a bad code, if you have an idea to improve it please help.
Thanks and sorry about my english.

Try replacing all occurrences of at() with the [] operator. For example:
walls[i].Draw();
and then turn on all optimisations. Both [] and at() are function calls - to get the maximum performance you need to make sure that they are inlined, which is what upping the optimisation level will do.
You can also do some minimal caching of a wall object - for example:
for(int i = wallsPassed; i < len; i++)
{
Wall & w = walls[i];
w.Update();
if (w.pos.x <= -40)
wallsPassed += 2;
}

Try to narrow the cause of the performance problem (also termed profiling). I would try drawing only one object while continue updating all the objects. If its suddenly faster, then its a DirectX drawing problem.
Otherwise try drawing all the objects, but updating only one wall. If its faster then your update() function may be too expensive.

How fast is 'fast'?
How slow is'really slow'?
How many sprites are you drawing?
How big is each one as an image file, and in pixels drawn on-screen?
How does performance scale (in XNA/C++) as you change the number of sprites drawn?
What difference do you get if you draw without updating, or vice versa

Maybe you just have forgotten to turn on release mode :) I had some problems with it in the past - I thought my code was very slow because of debug mode. If it's not it, you can have a problem with rendering part, or with huge count of objects. The code you provided looks good...

Have you tried multiple buffers (a.k.a. Double Buffering) for the bitmaps?
The typical scenario is to draw in one buffer, then while the first buffer is copied to the screen, draw in a second buffer.
Another technique is to have a huge "logical" screen in memory. The portion draw in the physical display is a viewport or view into a small area in the logical screen. Moving the background (or screen) just requires a copy on the part of the graphics processor.

You can aid batching of sprite draw calls. Presumably Your draw call calls your only instance of ID3DXSprite::Draw with the relevant parameters.
You can get much improved performance by doing a call to ID3DXSprite::Begin (with the D3DXSPRITE_SORT_TEXTURE flag set) and then calling ID3DXSprite::End when you've done all your rendering. ID3DXSprite will then sort all your sprite calls by texture to decrease the number of texture switches and batch the relevant calls together. This will improve performance massively.
Its difficult to say more, however, without seeing the internals of your Update and Draw calls. The above is only a guess ...

To draw every single wall with a different draw call is a bad idea. Try to batch the data into a single vertex buffer/index buffer and send them into a single draw. That's a more sane idea.
Anyway for getting an idea of WHY it goes slowly try with some CPU and GPU (PerfHud, Intel GPA, etc...) to know first of all WHAT's the bottleneck (if the CPU or the GPU). And then you can fight to alleviate the problem.

The lookups into your list of walls are unlikely to be the source of your slowdown. The cost of drawing objects in 3D will typically be the limiting factor.
The important parts are your draw code, the flags you used to create the DirectX device, and the flags you use to create your textures. My stab in the dark... check that you initialize the device as HAL (hardware 3d) rather than REF (software 3d).
Also, how many sprites are you drawing? Each draw call has a fair amount of overhead. If you make more than couple-hundred per frame, that will be your limiting factor.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js