Display CPU-writable RGB data on the screen - directx-12

This should be a rather simple task, but I'm unsure what would be the best way to approach this: All I want to do is, given a pointer to m * n * 3 float's (interpreted as RGB data) display them (in the resolution of m * n pixels) on the display and allow updates of them by the CPU. How can this be done? Do I need to upload the data as a texture?

Related

Compression of an image

I have been calculating the uncompressed and compressed file sizes of an image. This for me has always resulted in the compressed image being smaller than the uncompressed image which I would expect. If an image contains a large number of different colours, then storing the palette takes up a significant amount of space, and more bits are also needed to store each code. However my question is, would it be possible the compression method could potentially result in a larger file than the uncompressed RGB image. What would the size (in pixels) of the smallest square RGB image, containing a total of k different colours, for which this compression method is still useful? So we want to find, for a given value of k, find the smallest integer number n for which an image of size n×n takes up less storage space after compression than the original RGB image.
Let's begin by making a small simplification -- the size of the encoded output depends on the number of pixels (the actual proportion of width vs. height doesn't really matter). Hence, let's generalize the problem to number of pixels N, from which we can always calculate n by taking a square root.
To further simplify the problem, we will also ignore the overhead of any image headers/metadata, such as width, height, size of the palette, etc. In practice, this would generally be some relatively small constant.
Problem Statement
Given that we have
N representing the number of pixels in an image
k representing the number of distinct colours in an image
24 bits per pixel RGB encoding
LRGB representing the length of a RGB image
LP representing the length of a palette image
our goal is to solve the following inequality
in terms of N.
Size of RGB Image
RGB image is just an array of N pixels, each pixel taking up a fixed number of bits given by the RGB encoding. Hence,
Size of Palette Image
Palette image consists of two parts: a palette, and the pixels.
A palette is an array of k colours, each colour taking up a fixed number of bits given by the RGB encoding. Therefore,
In this case, each pixel holds an index to a palette entry, rather than an actual RGB colour. The number of bits required to represent k values is
However, unless we can encode fractional bits (which I consider outside the scope of this question), we need to round this up. Therefore, the number of bits required to encode a palette index is
Since there are N such palette indices, the size of the pixel data is
and the total size of the palette image is
Solving the Inequality
And finally
In Python, we could express this in the following way:
import math
def limit_size(k):
return (k * 24.) / (24. - math.ceil(math.log(k, 2)))
def size_rgb(N):
return (N * 24.)
def size_pal(N, k):
return (N * math.ceil(math.log(k, 2))) + (k * 24.)
In general no, but your question is not precise.
If we compress normal files, they could be larger. E.g. if you compress a random generated sequence of bytes, there is not much to compress, and so you get the header of compression program, which tell which compression method is used, and some versioning. This will enlarge the file, and ev. some escaping. Good compression program will see that compression will not shrink the size, and so they should just not compress, and tell in the header that it is a flat file. Possibly this is done by region of program.
But your question is about images. Compression is done inside the file, and often not all file, but just the image bits. In this case program will see that there is no need to compress, and so they would keep the file uncompressed. But because the image headers are always present, this change only a flag, and so no increase of size.
But this could depends also on file format. You wrote about "palette", but this is not much used nowadays: compression is done finding similar pattern on file. But again: this depends on the image format. If you look in Wikipedia, for particular file format, you may see a table with headers parameters (e.g. bit depth or number of colours (palette), definitions of colours, and methods used to compress).
Then, for palette like image, the answer of Dan Mašek (https://stackoverflow.com/a/58683948/2758823) has some nice mathematical explanation, but one should not forget that compression is much heuristic and test of real examples: real images have patterns.

Is there a way to normalize a Caffe Blob along a certain dimension?

I've got a blob with the shape n * t * h * w (Batch number, features, height, width). Within a Caffe layer I want to do an L1 normalization along the t axis, i.e. for fixed n, h and w the sum of values along t should be equal to 1. Normally this would be no big deal, but since it's within a Caffe layer it should happen very quickly, preferably through the Caffe math functions (based on BLAS). Is there a way to achieve this in an efficient manner?
I unfortunately can't change the order of the shape parameters due to later processing, but I can remove the batch number (have a vector of blobs with just t * h * w) or I could convert the blob to an OpenCV Mat, if it makes things easier.
Edit 1: I'm starting to suspect I might be able to solve my task with the help of caffe_gpu_gemm, where I'd multiply a vector of ones of length t with a blob from one batch of shape t * h * w, which should theoretically give me the sums along the feature axis. I'll update if I figure out the next step.

how go get RGB values of ROI selected in depth stream

I wrote an simple kinect application where I'm accessing the depth values to detect some objects. I use the following code to get the depth value
depth = NuiDepthPixelToDepth(pBufferRun);
this will give me the depth value for each pixel. Now I want to subselect a region of the image, and get the RGB camera values of this corresponding region.
What I'm not sure about:
do I need to open a color image stream?
or is it enough to just convert the depth into color?
how do I use NuiImageGetColorPixelCoordinateFrameFromDepthPixelFrameAtResolution?
I'm fine with the simplest solution where I have a depth frame and a color frame, so that I can select a ROI with opencv and then crop the color frame accordingly.
do I need to open a color image stream?
Yes. You can get the coordinates in the colour frame without opening the stream, but you won't be able to do anything useful with them because you'll have no colour data to index into!
or is it enough to just convert the depth into color?
There's no meaningful conversion of distance into colour. You need two image streams, and a co-ordinate conversion function.
how do I use NuiImageGetColorPixelCoordinateFrameFromDepthPixelFrameAtResolution?
That's a terribly documented function. Go take a look at NuiImageGetColorPixelCoordinatesFromDepthPixelAtResolution instead, because the function arguments and documentation actually make sense! Depth value and depth (x,y) coordinate in, RGB (x,y) coordinate out. Simple.
To get the RGB data at some given coordinates, you must first grab an RGB frame using NuiImageStreamGetNextFrame to get an INuiFrameTexture instance. Call LockRect on this to get a NUI_LOCKED_RECT. The pBits property of this object is a pointer to the first pixel of the raw XRGB image. This image is stored row wise, in top-to-bottom left-to-right order, with each pixel being represented by 4 sequential bytes representing a padding byte then R, G and B follwing it.
The pixel at position (100, 200) is therefore at
lockedRect->pBits[ ((200 * width * 4) + (100 * 4) ];
and the byte representing the red channel should be at
lockedRect->pBits[ ((200 * width * 4) + (100 * 4) + 1 ];
This is a standard 32bit RGB image format, and the buffer can be freely passed to your image manipulation library of choice... GDI, WIC, OpenCV, IPL, whatever.
(caveat... I'm not totally certain I have the pixel byte ordering correct. I think it is XRGB, but it could be XBGR or BGRX, for example. Testing for which one is actually being returned should be trivial)

scanline function in qimage class

I'm developing application for editing raster graphic. In this application I have to create scanline function which will do same thing as scanline function in QImage class.
But I'm little confused with the way that scanline function works and with scanline generally.
For example, when I call bytesPerLine() for image which height is 177px I was expecting that value will be 531 (3 bytes for each pixel) but this function is returning 520?
Also, when I use
uchar data = image->scanLine(y)[x]
for R=249 G=249 B=249 value in variable data is 255.
I really don't understand this value.
Thanks in advance :)
For reliable behavior you should check the return value of QImage::format() to see what underlying format is used before accessing the raw image data.
Qt seems to prefer RGB32/ARGB32 format for true-colors, where each pixel takes 4 bytes, whether an alpha channel exists or not (for RGB32 format it's simply filled with 0xff). If you load a true-color image, it's probably in one of these two formats.
Besides, the byte order can be different across platforms, use QRgb to access 32-bit pixels whenever possible.
BTW, shouldn't a scanline be horizontal? I think you should use width() instead of height() to calculate the length of a scanline.

Draw sound wave with possibility to zoom in/out

I'm writing a sound editor for my graduation. I'm using BASS to extract samples from MP3, WAV, OGG etc files and add DSP effects like echo, flanger etc. Simply speaching I made my framework that apply an effect from position1 to position2, cut/paste management.
Now my problem is that I want to create a control similar with this one from Cool Edit Pro that draw a wave form representation of the song and have the ability to zoom in/out select portions of the wave form etc. After a selection i can do something like:
TInterval EditZone = WaveForm->GetSelection();
where TInterval have this form:
struct TInterval
{
long Start;
long End;
}
I'm a beginner when it comes to sophisticated drawing so any hint on how to create a wave form representation of a song, using sample data returned by BASS, with ability to zoom in/out would be appreciated.
I'm writing my project in C++ but I can understand C#, Delphi code so if you want you can post snippets in last two languages as well :)
Thanx DrOptix
By Zoom, I presume you mean horizontal zoom rather than vertical. The way audio editors do this is to scan the wavform breaking it up into time windows where each pixel in X represents some number of samples. It can be a fractional number, but you can get away with dis-allowing fractional zoom ratios without annoying the user too much. Once you zoom out a bit the max value is always a positive integer and the min value is always a negative integer.
for each pixel on the screen, you need to have to know the minimum sample value for that pixel and the maximum sample value. So you need a function that scans the waveform data in chunks and keeps track of the accumulated max and min for that chunk.
This is slow process, so professional audio editors keep a pre-calculated table of min and max values at some fixed zoom ratio. It might be at 512/1 or 1024/1. When you are drawing with a zoom ration of > 1024 samples/pixel, then you use the pre-calculated table. if you are below that ratio you get the data directly from the file. If you don't do this you will find that you drawing code gets to be too slow when you zoom out.
Its worthwhile to write code that handles all of the channels of the file in an single pass when doing this scanning, slowness here will make your whole program feel sluggish, it's the disk IO that matters here, the CPU has no trouble keeping up, so straightforward C++ code is fine for building the min/max tables, but you don't want to go through the file more than once and you want to do it sequentially.
Once you have the min/max tables, keep them around. You want to go back to the disk as little as possible and many of the reasons for wanting to repaint your window will not require you to rescan your min/max tables. The memory cost of holding on to them is not that high compared to the disk io cost of building them in the first place.
Then you draw the waveform by drawing a series of 1 pixel wide vertical lines between the max value and the min value for the time represented by that pixel. This should be quite fast if you are drawing from pre built min/max tables.
I've recently done this myself. As Marius suggests you need to work out how many samples are at each column of pixels. You then work out the minimum and maximum and then plot a vertical line from the maximum to the minimum.
As a first pass this seemingly works fine. The problem you'll get is that as you zoom out it will start to take too long to retrieve the samples from disk. As a solution to this I built a "peak" file alongside the audio file. The peak file stores the minimum/maximum pairs for groups of n samples. PLaying with n till you get the right amount is up to uyou. Personally I found 128 samples to be a good tradeoff between size and speed. Its also worth remembering that, unless you are drawing a control larger than 65536 pixels in size that you needn't store this peak information as anything more than 16-bit values which saves a bit of space.
Wouldn't you just plot the sample points on a 2 canvas? You should know how many samples there are per second for a file (read it from the header), and then plot the value on the y axis. Since you want to be able to zoom in and out, you need to control the number of samples per pixel (the zoom level). Next you take the average of those sample points per pixel (for example take the average of every 5 points if you have 5 samples per pixel. Then you can use a 2d drawing api to draw lines between the points.
Using the open source NAudio Package -
public class WavReader2
{
private readonly WaveFileReader _objStream;
public WavReader2(String sPath)
{
_objStream = new WaveFileReader(sPath);
}
public List<SampleRangeValue> GetPixelGraph(int iSamplesPerPixel)
{
List<SampleRangeValue> colOutputValues = new List<SampleRangeValue>();
if (_objStream != null)
{
_objStream.Position = 0;
int iBytesPerSample = (_objStream.WaveFormat.BitsPerSample / 8) * _objStream.WaveFormat.Channels;
int iNumPixels = (int)Math.Ceiling(_objStream.SampleCount/(double)iSamplesPerPixel);
byte[] aryWaveData = new byte[iSamplesPerPixel * iBytesPerSample];
_objStream.Position = 0; // startPosition + (e.ClipRectangle.Left * iBytesPerSample * iSamplesPerPixel);
for (float iPixelNum = 0; iPixelNum < iNumPixels; iPixelNum += 1)
{
short iCurrentLowValue = 0;
short iCurrentHighValue = 0;
int iBytesRead = _objStream.Read(aryWaveData, 0, iSamplesPerPixel * iBytesPerSample);
if (iBytesRead == 0)
break;
List<short> colValues = new List<short>();
for (int n = 0; n < iBytesRead; n += 2)
{
short iSampleValue = BitConverter.ToInt16(aryWaveData, n);
colValues.Add(iSampleValue);
}
float fLowPercent = (float)((float)colValues.Min() /ushort.MaxValue);
float fHighPercent = (float)((float)colValues.Max() / ushort.MaxValue);
colOutputValues.Add(new SampleRangeValue(fHighPercent, fLowPercent));
}
}
return colOutputValues;
}
}
public struct SampleRangeValue
{
public float HighPercent;
public float LowPercent;
public SampleRangeValue(float fHigh, float fLow)
{
HighPercent = fHigh;
LowPercent = fLow;
}
}