Custom Fragment Shaders with SkiaSharp - glsl

I'm evaluating SkiaSharp for use in a project which allows for rendering some vector elements on top of a photo. SkiaSharp looks well suited to rendering the vector elements, however I also want to apply transformations to the image -- tone-curve adjustments like contrast or exposure, for example.
The ideal scenario would be a way to execute custom GLSL fragment shader code when rendering the rectangle containing the image (eg an SKColorFilter subclass that just wraps GLSL code, to be applied when calling DrawBitmap, or similar).
It would probably also work to have my existing GLSL code render to a texture, and then have SkiaSharp draw a rectangle using that texture as its contents. However, I can't see a way to do this without a GPU read-back, which feels like it would be prohibitively slow.
What's the best way forward here? More precisely: either, how can I apply transformations to the pixel data of an image, on the GPU, in SkiaSharp, or, what else could I use to provide SkiaSharp like vector rendering primitives and also allow GLSL fragment-shader-like pixel transformations?

You could have a look at some really early previews of the v2.84 series. They have a new SKRuntimeEffect that you can make use of.
Check out the preview feed: https://aka.ms/skiasharp-eap/index.json
You must have a look at the builds for the https://github.com/mono/SkiaSharp/pull/1321 PR:
v2.84.0-pr.1321.*
See this example: https://github.com/mono/SkiaSharp/issues/1319#issuecomment-640529824
// input values
SKCanvas canvas = ...;
float threshold = 1.05f;
float exponent = 1.5f;
// shader
var src = #"
in fragmentProcessor color_map;
uniform float scale;
uniform half exp;
uniform float3 in_colors0;
void main(float2 p, inout half4 color) {
half4 texColor = sample(color_map, p);
if (length(abs(in_colors0 - pow(texColor.rgb, half3(exp)))) < scale)
discard;
color = texColor;
}";
using var effect = SKRuntimeEffect.Create(src, out var errorText);
// input values
var inputs = new SKRuntimeEffectInputs(effect);
inputs.Set("scale", threshold);
inputs.Set("exp", exponent);
inputs.Set("in_colors0", new[] { 1f, 1f, 1f });
// shader values
using var blueShirt = SKImage.FromEncodedData(Path.Combine(PathToImages, "blue-shirt.jpg"));
using var textureShader = blueShirt.ToShader();
var children = new SKRuntimeEffectChildren(effect);
children.Set("color_map", textureShader);
// create actual shader
using var shader = effect.ToShader(inputs, children, true);
// draw as normal
canvas.Clear(SKColors.Black);
using var paint = new SKPaint { Shader = shader };
canvas.DrawRect(SKRect.Create(400, 400), paint);

Related

Attempt at making a Displacement Filter

So, basically, I'm trying to make a OBS Filter that displaces the pixels based on a lightmap/luminance map. I decided to learn how to make a filter by following this tutorial. But, in this tutorial, they don't explain much in terms of pixel displacement. So, I made a function that basically gets the brightness value of a texture I input and tested it by changing the pixel's alpha value with the red value of the texture:
float4 get_displacement(float2 position)
{
float2 pattern_uv = position / pattern_size;
float4 pattern_sample = pattern_texture.Sample(linear_wrap, pattern_uv / scale);
return pattern_sample;
}
float4 pixel_shader(pixel_data pixel) : TARGET
{
float4 source_sample = image.Sample(linear_wrap, pixel.uv);
if (pattern_size.x <= 0){
return source_sample;
}
float2 position = pixel.uv * float2(width, height);
float4 lightmap = get_displacement(position);
return float4(source_sample.rgb, lightmap.r);
return source_sample;
}
Which results to this (Note: The green is from a colour source that's behind the image to show the alpha value)
But, for some reason, when I try it with the vertex_shader, the function that decides where the pixel is rendered at, it seems to not work:
pixel_data vertex_shader(vertex_data vertex)
{
pixel_data pixel;
pixel.uv = vertex.uv;
if (pattern_size.x <= 0){
pixel.pos = mul(float4(vertex.pos.xyz, 1.0), ViewProj);
return pixel;
}
float2 position = vertex.uv * float2(width, height);
float4 lightmap = get_displacement(position);
pixel.pos = mul(float4(vertex.pos.x + (lightmap.r * testRamp1), vertex.pos.yz, 1.0), ViewProj);
return pixel;
}
(Note: testRamp1 is used as a value that I can change from a slider inside of OBS via some filter Properties)
The result that I'm expecting is something similar to this
To see if the issue was from me changing the XY position, I tested it using this function:
pixel_data vertex_shader(vertex_data vertex)
{
pixel_data pixel;
pixel.uv = vertex.uv;
pixel.pos = mul(float4(vertex.pos.x + 100, vertex.pos.yz, 1.0), ViewProj);
return pixel;
}
And it gave me an expected result.
I also changed the 100 with the testRamp1 value, and it works just the same based on the value of the slider.
So, I then tested if it was from the pixels needing to all move the same distance as each other. So, I change the function to this:
pixel_data vertex_shader(vertex_data vertex)
{
pixel_data pixel;
pixel.uv = vertex.uv;
pixel.pos = mul(float4(vertex.pos.x + (vertex.uv.x * testRamp1), vertex.pos.yz, 1.0), ViewProj);
return pixel;
}
Which then gives me either a squashed image when testRamp1 is set to a negative value, and it gives me a stretched image when it's set it to a positive value.
But as soon as I try to get the value of an image, may it be the pattern or from the source image, it no longer works(not even the filter parameters appear). For example, I used, this function to use the values of the source image:
pixel_data vertex_shader(vertex_data vertex)
{
pixel_data pixel;
float4 source_sample = image.Sample(linear_wrap, vertex.uv);
pixel.uv = vertex.uv;
pixel.pos = mul(float4(vertex.pos.x + (source_sample.r * testRamp1), vertex.pos.yz, 1.0), ViewProj);
return pixel;
}
At this point, I'm at a loss of words as to what could be causing this issue
First of all, the vertex shader is not what you want to use for this kind of effect. What you actually want to do, is to sample the image in the pixel shader, but offset the UV values slightly by your displacement before you pass them to the Sample function.
The primary reason, why you don't want to do this in the vertex shader, is, that the number of vertices is usually much smaller than the number of pixels - in the worst case, you only have 4 vertices in total (one for each corner of your screen), so the granuality of things you can do in the vertex shader is rather coarse. (Note: I'm not too familiar with OBS filters, and don't know how many vertices OBS dispatches, but certainly much less than the number of pixels you have on your screen).
Now, the reason why your vertex shader didn't work at all, is a bit more technical. In short, you can't use Sample in a vertex shader, you'd have to use SampleLevel or SampleGrad instead (note that these functions require more parameters). This is because Sample automatically calculates a UV gradient between adjacent pixels, to figure out the level of detail that is needed for your texture (whether or not it actually has multiple levels of detail). But the vertex shader operates on vertices, not on pixels, so the concept of an "adjacent pixel" doesn't make sense in a vertex shader - thus, the Sample method doesn't work.

Stuck trying to optimize complex GLSL fragment shader

So first off, let me say that while the code works perfectly well from a visual point of view, it runs into very steep performance issues that get progressively worse as you add more lights. In its current form it's good as a proof of concept, or a tech demo, but is otherwise unusable.
Long story short, I'm writing a RimWorld-style game with real-time top-down 2D lighting. The way I implemented rendering is with a 3 layered technique as follows:
First I render occlusions to a single-channel R8 occlusion texture mapped to a framebuffer. This part is lightning fast and doesn't slow down with more lights, so it's not part of the problem:
Then I invoke my lighting shader by drawing a huge rectangle over my lightmap texture mapped to another framebuffer. The light data is stored in an array in an UBO and it uses the occlusion mapping in its calculations. This is where the slowdown happens:
And lastly, the lightmap texture is multiplied and added to the regular world renderer, this also isn't affected by the number of lights, so it's not part of the problem:
The problem is thus in the lightmap shader. The first iteration had many branches which froze my graphics driver right away when I first tried it, but after removing most of them I get a solid 144 fps at 1440p with 3 lights, and ~58 fps at 1440p with 20 lights. An improvement, but it scales very poorly. The shader code is as follows, with additional annotations:
#version 460 core
// per-light data
struct Light
{
vec4 location;
vec4 rangeAndstartColor;
};
const int MaxLightsCount = 16; // I've also tried 8 and 32, there was no real difference
layout(std140) uniform ubo_lights
{
Light lights[MaxLightsCount];
};
uniform sampler2D occlusionSampler; // the occlusion texture sampler
in vec2 fs_tex0; // the uv position in the large rectangle
in vec2 fs_window_size; // the window size to transform world coords to view coords and back
out vec4 color;
void main()
{
vec3 resultColor = vec3(0.0);
const vec2 size = fs_window_size;
const vec2 pos = (size - vec2(1.0)) * fs_tex0;
// process every light individually and add the resulting colors together
// this should be branchless, is there any way to check?
for(int idx = 0; idx < MaxLightsCount; ++idx)
{
const float range = lights[idx].rangeAndstartColor.x;
const vec2 lightPosition = lights[idx].location.xy;
const float dist = length(lightPosition - pos); // distance from current fragment to current light
// early abort, the next part is expensive
// this branch HAS to be important, right? otherwise it will check crazy long lines against occlusions
if(dist > range)
continue;
const vec3 startColor = lights[idx].rangeAndstartColor.yzw;
// walk between pos and lightPosition to find occlusions
// standard line DDA algorithm
vec2 tempPos = pos;
int lineSteps = int(ceil(abs(lightPosition.x - pos.x) > abs(lightPosition.y - pos.y) ? abs(lightPosition.x - pos.x) : abs(lightPosition.y - pos.y)));
const vec2 lineInc = (lightPosition - pos) / lineSteps;
// can I get rid of this loop somehow? I need to check each position between
// my fragment and the light position for occlusions, and this is the best I
// came up with
float lightStrength = 1.0;
while(lineSteps --> 0)
{
const vec2 nextPos = tempPos + lineInc;
const vec2 occlusionSamplerUV = tempPos / size;
lightStrength *= 1.0 - texture(occlusionSampler, vec2(occlusionSamplerUV.x, 1 - occlusionSamplerUV.y)).x;
tempPos = nextPos;
}
// the contribution of this light to the fragment color is based on
// its square distance from the light, and the occlusions between them
// implemented as multiplications
const float strength = max(0, range - dist) / range * lightStrength;
resultColor += startColor * strength * strength;
}
color = vec4(resultColor, 1.0);
}
I call this shader as many times as I need, since the results are additive. It works with large batches of lights or one by one. Performance-wise, I didn't notice any real change trying different batch numbers, which is perhaps a bit odd.
So my question is, is there a better way to look up for any (boolean) occlusions between my fragment position and light position in the occlusion texture, without iterating through every pixel by hand? Could render buffers perhaps help here (from what I've read they're for reading data back to system memory, I need it in another shader though)?
And perhaps, is there a better algorithm for what I'm doing here?
I can think of a couple routes for optimization:
Exact: apply a distance transform on the occlusion map: this will give you the distance to the nearest occluder at each pixel. After that you can safely step by that distance within the loop, instead of doing baby steps. This will drastically reduce the number of steps in open regions.
There is a very simple CPU-side algorithm to compute a DT, and it may suit you if your occluders are static. If your scene changes every frame, however, you'll need to search the literature for GPU side algorithms, which seem to be more complicated.
Inexact: resort to soft shadows -- it might be a compromise you are willing to make, and even seen as an artistic choice. If you are OK with that, you can create a mipmap from your occlusion map, and then progressively increase the step and sample lower levels as you go farther from the point you are shading.
You can go further and build an emitters map (into the same 4-channel map as the occlusion). Then your entire shading pass will be independent of the number of lights. This is an equivalent of voxel cone tracing GI applied to 2D.

Rendering point cloud data with draw instancing from OSG Cookbook not working

I am rendering a point cloud using OSG. I followed the example in the OSG cookbook titled "Rendering point cloud data with draw instancing" that shows how to make one point with many instances and then transfer the point locations to the graphics card via a texture. It then uses a shader to pull the points out of the texture and move each instance to the right location. There appear to be two problems with what is getting rendered.
First, the points aren't in the right location compared to a more straight forward, working approach to rendering. It looks like they are roughly scaled from zero wrong, some kind of multiplicative factor on position.
Second, the imagery is blurry. Points tend to be generally in the right place; there are many points in the place where a large object should be. However, I can't tell what the object. Data rendered with my working (but slower) rendering method looks sharp.
I have verified that I have the same input data going into the texture and draw list in both methods so it seems it has to be something with the rendering.
Here is the code to set up the Geometry which is nearly directly copied from the text book.
osg::Geometry* geo = new osg::Geometry;
osg::ref_ptr<osg::Image> img = new osg::Image;
img->allocateImage(w,h, 1, GL_RGBA, GL_FLOAT);
osg::BoundingBox box;
float* data = (float*)img->data();
for (unsigned long int k=0; k<NPoints; k++)
{
*(data++) = cloud->x[k];
*(data++) = cloud->y[k];
*(data++) = cloud->z[k];
*(data++) = cloud->meta[0][k];
box.expandBy(cloud->x[k],cloud->y[k],cloud->z[k]);
}
geo->setUseDisplayList(false);
geo->setUseVertexBufferObjects(true);
geo->setVertexArray( new osg::Vec3Array(1));
geo->addPrimitiveSet( new osg::DrawArrays(GL_POINTS, 0, 1, stop) );
geo->setInitialBound(box);
osg::ref_ptr<osg::Texture2D> tex = new osg::Texture2D;
tex->setImage( img);
tex->setInternalFormat( GL_RGBA32F_ARB );
tex->setFilter( osg::Texture2D::MIN_FILTER, osg::Texture2D::LINEAR);
tex->setFilter( osg::Texture2D::MAG_FILTER, osg::Texture2D::LINEAR);
And here is the shader code.
void main () {
float row;
row = float(gl_InstanceID) / float(width);
vec2 uv = vec2( fract(row), floor(row) / float(height) );
vec4 texValue = texture2D(defaultTex,uv);
vec4 pos = gl_Vertex + vec4(texValue.xyz, 1.0);
gl_Position = gl_ModelViewProjectionMatrix * pos;
}
After a bunch of experimenting, I found that the example code from the OSG Cookbook has some problems.
The scale issue (the first problem) is in the shader.
vec4 pos = gl_Vertex + vec4(texValue.xyz, 1.0);
Should be
vec4 pos = gl_Vertex + vec4(texValue.xyz, 0.0);
This is because the gl_Vertex is a 3-vector with an extra 1 element to aide with matrix transformation. That element should always be 1. The example created another 3+1 vector and added it to gl_Vertex making it a 2. Replace the 1 with a zero and the scale problem goes away.
The blurriness (the second problem) was caused by texture interpolation.
tex->setFilter( osg::Texture2D::MIN_FILTER, osg::Texture2D::LINEAR);
tex->setFilter( osg::Texture2D::MAG_FILTER, osg::Texture2D::LINEAR);
needs to be
tex->setFilter( osg::Texture2D::MIN_FILTER, osg::Texture2D::NEAREST);
tex->setFilter( osg::Texture2D::MAG_FILTER, osg::Texture2D::NEAREST);
so that the interpolator will just take the values from the texture instead of interpolating them from neighboring texture pixels which may be points on the other side of the point cloud. After fixing these two issues, the example works as advertised and seems to be a bit faster in my limited testing.

Can you modify a uniform from within the shader? If so. how?

So I wanted to store all my meshes in one large VBO. The problem is, how do you do have just one draw call, but let every mesh have its own model to world matrix?
My idea was to submit an array of matrices to a uniform before drawing. In the VBO I would make the color of every first vertex of a mesh negative (So I'd be using the signing bit to check whether a vertex was the first of a mesh).
Okay, so I can detect when a new mesh has started and I have an array of matrices ready and probably a uniform called 'index'. But how do I increase this index by one every time I encounter a new mesh?
Can you modify a uniform from within the shader? If so, how?
Can you modify a uniform from within the shader?
If you could, it wouldn't be uniform anymore, would it?
Furthermore, what you're wanting to do cannot be done even with Image Load/Store or SSBOs, both of which allow shaders to write data. It won't work because vertex shader invocations are not required to be executed sequentially. Many happen at the same time, and there's no way for any shader invocation to know that it will happen "after" the "first vertex" in a mesh.
The simplest way to deal with this is the obvious solution. Render each mesh individually, but set the uniforms for each mesh before each draw call. Without changing buffers between draws, of course. Uniform changes, while not exactly cheap, aren't the most expensive state changes that exist.
There are more complicated drawing methods that could allow you more performance. But that form is adequate for most needs. You've already done the hard part: you removed the need for any state change (textures, buffers, vertex formats, etc) except uniform state.
There are two approaches to minimize draw calls - instancing and batching. The first (instancing) allows you to draw multiple copies of same meshes in one draw call, but it depends on the API (is available from OpenGL 3.1). Batching is similar to instancing but allows you to draw different meshes. Both of these approaches have restrictions - meshes should be with the same materials and shaders.
If you would to draw different meshes in one VBO then instancing is not an option. So, batching requires keeping all meshes in 'big' VBO with applied world transform. It not a problem with static meshes, but have some discomfort with animated. I give you some pseudocode with batching implementation
struct SGeometry
{
uint64_t offsetVB;
uint64_t offsetIB;
uint64_t sizeVB;
uint64_t sizeIB;
glm::mat4 oldTransform;
glm::mat4 transform;
}
std::vector<SGeometry> cachedGeometries;
...
void CommitInstances()
{
uint64_t vertexOffset = 0;
uint64_t indexOffset = 0;
for (auto instance in allInstances)
{
Copy(instance->Vertexes(), VBO);
for (uint64_t i = 0; i < instances->Indices().size(); ++i)
{
auto index = instances->Indices()[i];
index += indexOffset;
IBO[i] = index;
}
cachedGeometries.push_back({vertexOffset, indexOffset});
vertexOffset += instance->Vertexes().size();
indexOffset += instance->Indices().size();
}
Commit(VBO);
Commit(IBO);
}
void ApplyTransform(glm::mat4 modelMatrix, uint64_t instanceId)
{
const SGeometry& geom = cachedGeometries[i];
glm::mat4 inverseOldTransform = glm::inverse(geom.oldTransform);
VertexStream& stream = VBO->GetStream(Position, geom.offsetVB);
for (uint64_t i = 0; i < geom.sizeVB; ++i)
{
glm::vec3 pos = stream->Get(i);
// We need to revert absolute transformation before applying new
pos = glm::vec3(inverseOldNormalTransform * glm::vec4(pos, 1.0f));
pos = glm::vec3(normalTransform * glm::vec4(pos, 1.0f));
stream->Set(i);
}
// .. Apply normal transformation
}
GPU Gems 2 has a good article about geometry instancing http://www.amazon.com/GPU-Gems-Programming-High-Performance-General-Purpose/dp/0321335597

Precise Texture Overlay

I'm trying to set up a two-stage render of objects in a 3D engine I'm working on written in C++ with DirectX9 to facilitate transparency (and other things). I thought it was all working nicely until I noticed some dodgyness on the edge of objects rendered before objects using this two stage method.
The two stage method is simple:
Draw model to off-screen ("side") texture of same size using same zbuffer (no MSAA is used anywhere)
Draw off-screen ("side") texture over the top of the main render target with a suitable blend and no alpha test or write
In the image below the left view is with the two stage render of the gray object (a lamppost) with the body in-front of it rendered directly to the target texture. The right view is with the two-stage render disabled, so both are rendered directly onto the target surface.
On close inspection it is as if the side texture is offset by exactly 1 pixel "down" and 1 pixel "right" when rendered over the target surface (but is rendered correctly in-place). This can be seen in an overlay of the off screen texture (which I get my program to write out to a bitmap file via D3DXSaveTextureToFile) over a screen shot below.
One last image so you can see where the edge in the side texture is coming from (it's because rendering to the side texture does use z test). Left is screen short, right is side texture (as overlaid above).
All this leads me to believe that my "overlaying" isn't very effective. The code that renders the side texture over the main render target is shown below (note that the same viewport is used for all scene rendering (on and off screen)). The "effect" object is an instance of a thin wrapper over LPD3DXEFFECT, with the "effect" field (sorry about shoddy naming) being a LPD3DXEFFECT itself.
void drawSideOver(LPDIRECT3DDEVICE9 dxDevice, drawData* ddat)
{ // "ddat" drawdata contains lots of render state information, but all we need here is the handles for the targetSurface and sideSurface
D3DXMATRIX idMat;
D3DXMatrixIdentity(&idMat); // create identity matrix
dxDevice->SetRenderTarget(0, ddat->targetSurface); // switch to targetSurface
dxDevice->SetRenderState(D3DRS_ZENABLE, false); // disable z test and z write
dxDevice->SetRenderState(D3DRS_ZWRITEENABLE, false);
vertexOver overVerts[4]; // create square
overVerts[0] = vertexOver(-1, -1, 0, 0, 1);
overVerts[1] = vertexOver(-1, 1, 0, 0, 0);
overVerts[2] = vertexOver(1, -1, 0, 1, 1);
overVerts[3] = vertexOver(1, 1, 0, 1, 0);
effect.setTexture(ddat->sideTex); // use side texture as shader texture ("tex")
effect.effect->SetTechnique("over"); // change to "over" technique
effect.setViewProj(&idMat); // set viewProj to identity matrix so 1/-1 map directly
effect.effect->CommitChanges();
setAlpha(dxDevice); // this sets up the alpha blending which works fine
UINT numPasses, pass;
effect.effect->Begin(&numPasses, 0);
effect.effect->BeginPass(0);
dxDevice->SetVertexDeclaration(vertexDecOver);
dxDevice->DrawPrimitiveUP(D3DPT_TRIANGLESTRIP, 2, overVerts, sizeof(vertexOver));
effect.effect->EndPass();
effect.effect->End();
dxDevice->SetRenderState(D3DRS_ZENABLE, true); // revert these so we don't mess everything up drawn after this
dxDevice->SetRenderState(D3DRS_ZWRITEENABLE, true);
}
The C++ side definition for the VertexOver struct and constructor (HLSL side shown below somewhere):
struct vertexOver
{
public:
float x;
float y;
float z;
float w;
float tu;
float tv;
vertexOver() { }
vertexOver(float xN, float yN, float zN, float tuN, float tvN)
{
x = xN;
y = yN;
z = zN;
w = 1.0;
tu = tuN;
tv = tvN;
}
};
Inefficiency in re-creating and passing the vertices down to the GPU each draw aside, what I really want to know is why this method doesn't quite work, and if there are any better methods for overlaying textures like this with an alpha blend that won't exhibit this issue
I figured that the texture sampling may matter somewhat in this matter, but messing about with options didn't seem to help much (for example, using a LINEAR filter just makes it fuzzy as you might expect implying that the offset isn't as clear-cut as a 1 pixel discrepancy). Shader code:
struct VS_Input_Over
{
float4 pos : POSITION0;
float2 txc : TEXCOORD0;
};
struct VS_Output_Over
{
float4 pos : POSITION0;
float2 txc : TEXCOORD0;
float4 altPos : TEXCOORD1;
};
struct PS_Output
{
float4 col : COLOR0;
};
Texture tex;
sampler texSampler = sampler_state { texture = <tex>;magfilter = NONE; minfilter = NONE; mipfilter = NONE; AddressU = mirror; AddressV = mirror;};
// side/over shaders (these make up the "over" technique (pixel shader version 2.0)
VS_Output_Over VShade_Over(VS_Input_Over inp)
{
VS_Output_Over outp = (VS_Output_Over)0;
outp.pos = mul(inp.pos, viewProj);
outp.altPos = outp.pos;
outp.txc = inp.txc;
return outp;
}
PS_Output PShade_Over(VS_Output_Over inp)
{
PS_Output outp = (PS_Output)0;
outp.col = tex2D(texSampler, inp.txc);
return outp;
}
I've looked about for a "Blended Blit" or something but I can't find anything, and other related searches have only brought up forums implying that rendering a quad with an orthographic projection is the way to go about doing this.
Sorry if I've given far too much detail for this issue but it's both interesting and infuriating and any feedback would be greatly appreciated.
It looks for me that you problem is the mapping of texels to pixels. You must offset a screen-aligned quad with a half pixel to match the texels direct to the screenpixels. This issue is explaines here: Directly Mapping Texels to Pixels (MSDN)
For anyone else hitting a similar wall, my specific problem solved by adjusting the U and V values of the verticies sent to the GPU for the overlaid texture triangles thus:
for (int i = 0; i < 4; i++)
{
overVerts[i].tu += 0.5 / (float)ddat->targetVp->Width; // ddat->targetVp is the viewport in use, and the viewport is the same size as the texture
overVerts[i].tv += 0.5 / (float)ddat->targetVp->Height;
}
See Directly Mapping Texels to Pixels as provided by Gnietschow's answer for an explanation as to why this makes sense.