How to convert RGB -> YUV -> RGB (both ways) - c++

I want a pair of conversion algorithms, one from RGB to YUV, the other from YUV to RGB, that are inverses of each other. That is, a round-trip conversion should leave the value unchanged. (If you like, replace YUV with Y'UV, YUV, YCbCr, YPbPr.)
Does such a thing exist? If so, what is it?
Posted solutions (How to perform RGB->YUV conversion in C/C++?, http://www.fourcc.org/fccyvrgb.php, http://en.wikipedia.org/wiki/YUV) are only inverses (the two 3x3 matrices are inverses), when omitting the clamping to [0,255]. But omitting that clamping allows things like negative luminance, which plays merry havoc with image processing in YUV space. Retaining the clamping makes the conversion nonlinear, which makes it tricky to define an inverse.

Yes, invertible transformations exist.
equasys GmbH posted invertible transformations from RGB to YUV, YCbCr, and YPbPr, along with explanations of which situation each is appropriate for, what this clamping is really about, and links to references. (Like a good SO answer.)
For my own application (jpg images, not analog voltages) YCbCr was appropriate, so I wrote code for those two transformations. Indeed, there-and-back-again values differed by less than 1 part in 256, for many images; and the before-and-after images were visually indistinguishable.
PIL's colour space conversion YCbCr -> RGB gets credit for mentioning equasys's web page.
Other answers, that could doubtfully improve on equasys's precision and concision:
https://code.google.com/p/imagestack/ includes rgb_to_x and x_to_rgb
functions, but I didn't try to compile and test them.
Cory Nelson's answer links to code with similar functions, but it says that
inversion's not possible in general, contradicting equasys.
The source code of FFmpeg, OpenCV, VLFeat, or ImageMagick.
2019 Edit: Here's the C code from github, mentioned in my comment.
void YUVfromRGB(double& Y, double& U, double& V, const double R, const double G, const double B)
{
Y = 0.257 * R + 0.504 * G + 0.098 * B + 16;
U = -0.148 * R - 0.291 * G + 0.439 * B + 128;
V = 0.439 * R - 0.368 * G - 0.071 * B + 128;
}
void RGBfromYUV(double& R, double& G, double& B, double Y, double U, double V)
{
Y -= 16;
U -= 128;
V -= 128;
R = 1.164 * Y + 1.596 * V;
G = 1.164 * Y - 0.392 * U - 0.813 * V;
B = 1.164 * Y + 2.017 * U;
}

RGB to YUV and back again
There is a nice diagram over on Wikipedia on the topic of YUV which depicts the layout of YUV420p. However, if you're like me you want NV21, sometimes called YUV420sp, which interleaves the V and U components in a single plane so in this case that diagram is wrong, but it gives you the intuition on how it works.
This format (NV21) is the standard picture format on Android camera
preview. YUV 4:2:0 planar image, with 8 bit Y samples, followed by
interleaved V/U plane with 8bit 2x2 subsampled chroma samples.
So a lot of code I've seen just starts coding literally to this specification without taking into account Endianess. Furthermore, they tend to only support YUV to RGB and only one or two formats. I however, wanted something a little more trustworthy and it turns out C++ code taken from the Android source code repository does the trick. It is pretty much straight C++ and should be easily used in any project.
JNI/C++ code that takes an RGB565 image and converts it to NV21
From Java in this case, but easily C or C++ you pass in an array of bytes containing the RGB565 image and output an NV21 byte array.
#include <jni.h>
#include <cstring>
#include <cstdint>
#include "Converters.h"
#define JNI(X) JNIEXPORT Java_algorithm_ImageConverter_##X
#ifdef __cplusplus
extern "C" {
#endif
void JNI(RGB565ToNV21)(JNIEnv *env, jclass *, jbyteArray aRGB565in, jbyteArray aYUVout, jint width, jint height) {
//get jbyte array into C space from JVN
jbyte *rgb565Pixels = env->GetByteArrayElements(aRGB565in, NULL);
jbyte *yuv420sp = env->GetByteArrayElements(aYUVout, NULL);
size_t pixelCount = width * height;
uint16_t *rgb = (uint16_t *) rgb565Pixels;
// This format (NV21) is the standard picture format on Android camera preview. YUV 4:2:0 planar
// image, with 8 bit Y samples, followed by interleaved V/U plane with 8bit 2x2 subsampled
// chroma samples.
int uvIndex = pixelCount;
for (int row = 0; row < height; row++) {
for (int column = 0; column < width; column++) {
int pixelIndex = row * width + column;
uint8_t y = 0;
uint8_t u = 0;
uint8_t v = 0;
chroma::RGB565ToYUV(rgb[pixelIndex], &y, &u, &v);
yuv420sp[pixelIndex] = y;
if (row % 2 == 0 && pixelIndex % 2 == 0) {
#if __BYTE_ORDER == __LITTLE_ENDIAN
yuv420sp[uvIndex++] = u;
yuv420sp[uvIndex++] = v;
#else
yuv420sp[uvIndex++] = v;
yuv420sp[uvIndex++] = u;
#endif
}
}
}
//release temp reference of jbyte array
env->ReleaseByteArrayElements(aYUVout, yuv420sp, 0);
env->ReleaseByteArrayElements(aRGB565in, rgb565Pixels, 0);
}
#ifdef __cplusplus
}
#endif
Converters.h
As you will see in the header there are many different conversion options available to/from any number of formats.
/*
* Copyright (C) 2011 The Android Open Source Project
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#ifndef HW_EMULATOR_CAMERA_CONVERTERS_H
#define HW_EMULATOR_CAMERA_CONVERTERS_H
#include <endian.h>
#ifndef __BYTE_ORDER
#error "could not determine byte order"
#endif
/*
* Contains declaration of framebuffer conversion routines.
*
* NOTE: RGB and big/little endian considerations. Wherever in this code RGB
* pixels are represented as WORD, or DWORD, the color order inside the
* WORD / DWORD matches the one that would occur if that WORD / DWORD would have
* been read from the typecasted framebuffer:
*
* const uint32_t rgb = *reinterpret_cast<const uint32_t*>(framebuffer);
*
* So, if this code runs on the little endian CPU, red color in 'rgb' would be
* masked as 0x000000ff, and blue color would be masked as 0x00ff0000, while if
* the code runs on a big endian CPU, the red color in 'rgb' would be masked as
* 0xff000000, and blue color would be masked as 0x0000ff00,
*/
namespace chroma {
/*
* RGB565 color masks
*/
#if __BYTE_ORDER == __LITTLE_ENDIAN
static const uint16_t kRed5 = 0x001f;
static const uint16_t kGreen6 = 0x07e0;
static const uint16_t kBlue5 = 0xf800;
#else // __BYTE_ORDER
static const uint16_t kRed5 = 0xf800;
static const uint16_t kGreen6 = 0x07e0;
static const uint16_t kBlue5 = 0x001f;
#endif // __BYTE_ORDER
static const uint32_t kBlack16 = 0x0000;
static const uint32_t kWhite16 = kRed5 | kGreen6 | kBlue5;
/*
* RGB32 color masks
*/
#if __BYTE_ORDER == __LITTLE_ENDIAN
static const uint32_t kRed8 = 0x000000ff;
static const uint32_t kGreen8 = 0x0000ff00;
static const uint32_t kBlue8 = 0x00ff0000;
#else // __BYTE_ORDER
static const uint32_t kRed8 = 0x00ff0000;
static const uint32_t kGreen8 = 0x0000ff00;
static const uint32_t kBlue8 = 0x000000ff;
#endif // __BYTE_ORDER
static const uint32_t kBlack32 = 0x00000000;
static const uint32_t kWhite32 = kRed8 | kGreen8 | kBlue8;
/*
* Extracting, and saving color bytes from / to WORD / DWORD RGB.
*/
#if __BYTE_ORDER == __LITTLE_ENDIAN
/* Extract red, green, and blue bytes from RGB565 word. */
#define R16(rgb) static_cast<uint8_t>((rgb) & kRed5)
#define G16(rgb) static_cast<uint8_t>(((rgb) & kGreen6) >> 5)
#define B16(rgb) static_cast<uint8_t>(((rgb) & kBlue5) >> 11)
/* Make 8 bits red, green, and blue, extracted from RGB565 word. */
#define R16_32(rgb) static_cast<uint8_t>((((rgb) & kRed5) << 3) | (((rgb) & kRed5) >> 2))
#define G16_32(rgb) static_cast<uint8_t>((((rgb) & kGreen6) >> 3) | (((rgb) & kGreen6) >> 9))
#define B16_32(rgb) static_cast<uint8_t>((((rgb) & kBlue5) >> 8) | (((rgb) & kBlue5) >> 14))
/* Extract red, green, and blue bytes from RGB32 dword. */
#define R32(rgb) static_cast<uint8_t>((rgb) & kRed8)
#define G32(rgb) static_cast<uint8_t>((((rgb) & kGreen8) >> 8) & 0xff)
#define B32(rgb) static_cast<uint8_t>((((rgb) & kBlue8) >> 16) & 0xff)
/* Build RGB565 word from red, green, and blue bytes. */
#define RGB565(r, g, b) static_cast<uint16_t>((((static_cast<uint16_t>(b) << 6) | (g)) << 5) | (r))
/* Build RGB32 dword from red, green, and blue bytes. */
#define RGB32(r, g, b) static_cast<uint32_t>((((static_cast<uint32_t>(b) << 8) | (g)) << 8) | (r))
#else // __BYTE_ORDER
/* Extract red, green, and blue bytes from RGB565 word. */
#define R16(rgb) static_cast<uint8_t>(((rgb) & kRed5) >> 11)
#define G16(rgb) static_cast<uint8_t>(((rgb) & kGreen6) >> 5)
#define B16(rgb) static_cast<uint8_t>((rgb) & kBlue5)
/* Make 8 bits red, green, and blue, extracted from RGB565 word. */
#define R16_32(rgb) static_cast<uint8_t>((((rgb) & kRed5) >> 8) | (((rgb) & kRed5) >> 14))
#define G16_32(rgb) static_cast<uint8_t>((((rgb) & kGreen6) >> 3) | (((rgb) & kGreen6) >> 9))
#define B16_32(rgb) static_cast<uint8_t>((((rgb) & kBlue5) << 3) | (((rgb) & kBlue5) >> 2))
/* Extract red, green, and blue bytes from RGB32 dword. */
#define R32(rgb) static_cast<uint8_t>(((rgb) & kRed8) >> 16)
#define G32(rgb) static_cast<uint8_t>(((rgb) & kGreen8) >> 8)
#define B32(rgb) static_cast<uint8_t>((rgb) & kBlue8)
/* Build RGB565 word from red, green, and blue bytes. */
#define RGB565(r, g, b) static_cast<uint16_t>((((static_cast<uint16_t>(r) << 6) | g) << 5) | b)
/* Build RGB32 dword from red, green, and blue bytes. */
#define RGB32(r, g, b) static_cast<uint32_t>((((static_cast<uint32_t>(r) << 8) | g) << 8) | b)
#endif // __BYTE_ORDER
/* An union that simplifies breaking 32 bit RGB into separate R, G, and B colors.
*/
typedef union RGB32_t {
uint32_t color;
struct {
#if __BYTE_ORDER == __LITTLE_ENDIAN
uint8_t r; uint8_t g; uint8_t b; uint8_t a;
#else // __BYTE_ORDER
uint8_t a; uint8_t b; uint8_t g; uint8_t r;
#endif // __BYTE_ORDER
};
} RGB32_t;
/* Clips a value to the unsigned 0-255 range, treating negative values as zero.
*/
static __inline__ int
clamp(int x)
{
if (x > 255) return 255;
if (x < 0) return 0;
return x;
}
/********************************************************************************
* Basics of RGB -> YUV conversion
*******************************************************************************/
/*
* RGB -> YUV conversion macros
*/
#define RGB2Y(r, g, b) (uint8_t)(((66 * (r) + 129 * (g) + 25 * (b) + 128) >> 8) + 16)
#define RGB2U(r, g, b) (uint8_t)(((-38 * (r) - 74 * (g) + 112 * (b) + 128) >> 8) + 128)
#define RGB2V(r, g, b) (uint8_t)(((112 * (r) - 94 * (g) - 18 * (b) + 128) >> 8) + 128)
/* Converts R8 G8 B8 color to YUV. */
static __inline__ void
R8G8B8ToYUV(uint8_t r, uint8_t g, uint8_t b, uint8_t* y, uint8_t* u, uint8_t* v)
{
*y = RGB2Y((int)r, (int)g, (int)b);
*u = RGB2U((int)r, (int)g, (int)b);
*v = RGB2V((int)r, (int)g, (int)b);
}
/* Converts RGB565 color to YUV. */
static __inline__ void
RGB565ToYUV(uint16_t rgb, uint8_t* y, uint8_t* u, uint8_t* v)
{
R8G8B8ToYUV(R16_32(rgb), G16_32(rgb), B16_32(rgb), y, u, v);
}
/* Converts RGB32 color to YUV. */
static __inline__ void
RGB32ToYUV(uint32_t rgb, uint8_t* y, uint8_t* u, uint8_t* v)
{
RGB32_t rgb_c;
rgb_c.color = rgb;
R8G8B8ToYUV(rgb_c.r, rgb_c.g, rgb_c.b, y, u, v);
}
/********************************************************************************
* Basics of YUV -> RGB conversion.
* Note that due to the fact that guest uses RGB only on preview window, and the
* RGB format that is used is RGB565, we can limit YUV -> RGB conversions to
* RGB565 only.
*******************************************************************************/
/*
* YUV -> RGB conversion macros
*/
/* "Optimized" macros that take specialy prepared Y, U, and V values:
* C = Y - 16
* D = U - 128
* E = V - 128
*/
#define YUV2RO(C, D, E) clamp((298 * (C) + 409 * (E) + 128) >> 8)
#define YUV2GO(C, D, E) clamp((298 * (C) - 100 * (D) - 208 * (E) + 128) >> 8)
#define YUV2BO(C, D, E) clamp((298 * (C) + 516 * (D) + 128) >> 8)
/*
* Main macros that take the original Y, U, and V values
*/
#define YUV2R(y, u, v) clamp((298 * ((y)-16) + 409 * ((v)-128) + 128) >> 8)
#define YUV2G(y, u, v) clamp((298 * ((y)-16) - 100 * ((u)-128) - 208 * ((v)-128) + 128) >> 8)
#define YUV2B(y, u, v) clamp((298 * ((y)-16) + 516 * ((u)-128) + 128) >> 8)
/* Converts YUV color to RGB565. */
static __inline__ uint16_t
YUVToRGB565(int y, int u, int v)
{
/* Calculate C, D, and E values for the optimized macro. */
y -= 16; u -= 128; v -= 128;
const uint16_t r = (YUV2RO(y,u,v) >> 3) & 0x1f;
const uint16_t g = (YUV2GO(y,u,v) >> 2) & 0x3f;
const uint16_t b = (YUV2BO(y,u,v) >> 3) & 0x1f;
return RGB565(r, g, b);
}
/* Converts YUV color to RGB32. */
static __inline__ uint32_t
YUVToRGB32(int y, int u, int v)
{
/* Calculate C, D, and E values for the optimized macro. */
y -= 16; u -= 128; v -= 128;
RGB32_t rgb;
rgb.r = YUV2RO(y,u,v) & 0xff;
rgb.g = YUV2GO(y,u,v) & 0xff;
rgb.b = YUV2BO(y,u,v) & 0xff;
return rgb.color;
}
/* YUV pixel descriptor. */
struct YUVPixel {
uint8_t Y;
uint8_t U;
uint8_t V;
inline YUVPixel()
: Y(0), U(0), V(0)
{
}
inline explicit YUVPixel(uint16_t rgb565)
{
RGB565ToYUV(rgb565, &Y, &U, &V);
}
inline explicit YUVPixel(uint32_t rgb32)
{
RGB32ToYUV(rgb32, &Y, &U, &V);
}
inline void get(uint8_t* pY, uint8_t* pU, uint8_t* pV) const
{
*pY = Y; *pU = U; *pV = V;
}
};
/* Converts an YV12 framebuffer to RGB565 framebuffer.
* Param:
* yv12 - YV12 framebuffer.
* rgb - RGB565 framebuffer.
* width, height - Dimensions for both framebuffers.
*/
void YV12ToRGB565(const void* yv12, void* rgb, int width, int height);
/* Converts an YV12 framebuffer to RGB32 framebuffer.
* Param:
* yv12 - YV12 framebuffer.
* rgb - RGB32 framebuffer.
* width, height - Dimensions for both framebuffers.
*/
void YV12ToRGB32(const void* yv12, void* rgb, int width, int height);
/* Converts an YU12 framebuffer to RGB32 framebuffer.
* Param:
* yu12 - YU12 framebuffer.
* rgb - RGB32 framebuffer.
* width, height - Dimensions for both framebuffers.
*/
void YU12ToRGB32(const void* yu12, void* rgb, int width, int height);
/* Converts an NV12 framebuffer to RGB565 framebuffer.
* Param:
* nv12 - NV12 framebuffer.
* rgb - RGB565 framebuffer.
* width, height - Dimensions for both framebuffers.
*/
void NV12ToRGB565(const void* nv12, void* rgb, int width, int height);
/* Converts an NV12 framebuffer to RGB32 framebuffer.
* Param:
* nv12 - NV12 framebuffer.
* rgb - RGB32 framebuffer.
* width, height - Dimensions for both framebuffers.
*/
void NV12ToRGB32(const void* nv12, void* rgb, int width, int height);
/* Converts an NV21 framebuffer to RGB565 framebuffer.
* Param:
* nv21 - NV21 framebuffer.
* rgb - RGB565 framebuffer.
* width, height - Dimensions for both framebuffers.
*/
void NV21ToRGB565(const void* nv21, void* rgb, int width, int height);
/* Converts an NV21 framebuffer to RGB32 framebuffer.
* Param:
* nv21 - NV21 framebuffer.
* rgb - RGB32 framebuffer.
* width, height - Dimensions for both framebuffers.
*/
void NV21ToRGB32(const void* nv21, void* rgb, int width, int height);
}; /* namespace chroma */
#endif /* HW_EMULATOR_CAMERA_CONVERTERS_H */
Converters.cpp
/*
* Copyright (C) 2011 The Android Open Source Project
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* Contains implemenation of framebuffer conversion routines.
*/
#define LOG_NDEBUG 0
#define LOG_TAG "EmulatedCamera_Converter"
#include "Converters.h"
namespace chroma {
static void _YUV420SToRGB565(const uint8_t* Y,
const uint8_t* U,
const uint8_t* V,
int dUV,
uint16_t* rgb,
int width,
int height)
{
const uint8_t* U_pos = U;
const uint8_t* V_pos = V;
for (int y = 0; y < height; y++) {
for (int x = 0; x < width; x += 2, U += dUV, V += dUV) {
const uint8_t nU = *U;
const uint8_t nV = *V;
*rgb = YUVToRGB565(*Y, nU, nV);
Y++; rgb++;
*rgb = YUVToRGB565(*Y, nU, nV);
Y++; rgb++;
}
if (y & 0x1) {
U_pos = U;
V_pos = V;
} else {
U = U_pos;
V = V_pos;
}
}
}
static void _YUV420SToRGB32(const uint8_t* Y,
const uint8_t* U,
const uint8_t* V,
int dUV,
uint32_t* rgb,
int width,
int height)
{
const uint8_t* U_pos = U;
const uint8_t* V_pos = V;
for (int y = 0; y < height; y++) {
for (int x = 0; x < width; x += 2, U += dUV, V += dUV) {
const uint8_t nU = *U;
const uint8_t nV = *V;
*rgb = YUVToRGB32(*Y, nU, nV);
Y++; rgb++;
*rgb = YUVToRGB32(*Y, nU, nV);
Y++; rgb++;
}
if (y & 0x1) {
U_pos = U;
V_pos = V;
} else {
U = U_pos;
V = V_pos;
}
}
}
void YV12ToRGB565(const void* yv12, void* rgb, int width, int height)
{
const int pix_total = width * height;
const uint8_t* Y = reinterpret_cast<const uint8_t*>(yv12);
const uint8_t* U = Y + pix_total;
const uint8_t* V = U + pix_total / 4;
_YUV420SToRGB565(Y, U, V, 1, reinterpret_cast<uint16_t*>(rgb), width, height);
}
void YV12ToRGB32(const void* yv12, void* rgb, int width, int height)
{
const int pix_total = width * height;
const uint8_t* Y = reinterpret_cast<const uint8_t*>(yv12);
const uint8_t* V = Y + pix_total;
const uint8_t* U = V + pix_total / 4;
_YUV420SToRGB32(Y, U, V, 1, reinterpret_cast<uint32_t*>(rgb), width, height);
}
void YU12ToRGB32(const void* yu12, void* rgb, int width, int height)
{
const int pix_total = width * height;
const uint8_t* Y = reinterpret_cast<const uint8_t*>(yu12);
const uint8_t* U = Y + pix_total;
const uint8_t* V = U + pix_total / 4;
_YUV420SToRGB32(Y, U, V, 1, reinterpret_cast<uint32_t*>(rgb), width, height);
}
/* Common converter for YUV 4:2:0 interleaved to RGB565.
* y, u, and v point to Y,U, and V panes, where U and V values are interleaved.
*/
static void _NVXXToRGB565(const uint8_t* Y,
const uint8_t* U,
const uint8_t* V,
uint16_t* rgb,
int width,
int height)
{
_YUV420SToRGB565(Y, U, V, 2, rgb, width, height);
}
/* Common converter for YUV 4:2:0 interleaved to RGB32.
* y, u, and v point to Y,U, and V panes, where U and V values are interleaved.
*/
static void _NVXXToRGB32(const uint8_t* Y,
const uint8_t* U,
const uint8_t* V,
uint32_t* rgb,
int width,
int height)
{
_YUV420SToRGB32(Y, U, V, 2, rgb, width, height);
}
void NV12ToRGB565(const void* nv12, void* rgb, int width, int height)
{
const int pix_total = width * height;
const uint8_t* y = reinterpret_cast<const uint8_t*>(nv12);
_NVXXToRGB565(y, y + pix_total, y + pix_total + 1,
reinterpret_cast<uint16_t*>(rgb), width, height);
}
void NV12ToRGB32(const void* nv12, void* rgb, int width, int height)
{
const int pix_total = width * height;
const uint8_t* y = reinterpret_cast<const uint8_t*>(nv12);
_NVXXToRGB32(y, y + pix_total, y + pix_total + 1,
reinterpret_cast<uint32_t*>(rgb), width, height);
}
void NV21ToRGB565(const void* nv21, void* rgb, int width, int height)
{
const int pix_total = width * height;
const uint8_t* y = reinterpret_cast<const uint8_t*>(nv21);
_NVXXToRGB565(y, y + pix_total + 1, y + pix_total,
reinterpret_cast<uint16_t*>(rgb), width, height);
}
void NV21ToRGB32(const void* nv21, void* rgb, int width, int height)
{
const int pix_total = width * height;
const uint8_t* y = reinterpret_cast<const uint8_t*>(nv21);
_NVXXToRGB32(y, y + pix_total + 1, y + pix_total,
reinterpret_cast<uint32_t*>(rgb), width, height);
}
}; /* namespace chroma */

Once you clamp, you're done. They become a different color and you can't go back. I've written some of my own code to convert between all of those and more if you'd like to see, but it won't help inverse clamped colors back to their originals.

The conversion is lossy by necessity. Since the 8 bit YUV only uses Y values [16, 235] and U, V values [16, 240] it has less possible colors than RGB using [0, 255]. However, far more than this are lost in the conversion as the results are rounded.
I am taking all 16.8 million colors through the conversion code posted by Camille Goudeseune and applying rounding and integer conversion on the result (this seems slightly better than truncation without rounding). I take note that all Y values are within [16, 235] and the U/V values are within [16, 240] as they are supposed to, no clipping was required (no out of range values and with all of the limited range employed). When converting back to RGB, a range of [-2, 257] is produced because of rounding errors and this can be fixed by clipping.
Of the 16.8M RGB colors only 15.9 % were present after the round trip, with 15.7 % having been restored to the same color that they were.
System
How to determine
Number of colors
Proportion
Accuracy
8-bit RGB
256 * 256 * 256
16 777 216
100.0 %
100.0 %
8-bit YUV
220 * 225 * 225
11 137 500
66.4 %
-
YUV from RGB
Count unique YUV of all RGB colors
2 666 665
15.9 %
-
RGB-YUV-RGB
Count unique RGB after roundtrip, clipping
2 666 625
15.9 %
15.7 %
All YUV in RGB
8-bit YUV converted to RGB with clipping
2 956 551
17.6 %
15.7 %
Note: In the last conversion over 8 million colors turned into invalid RGB triplets, with red ranging [-179, 434], green [-135, 390] and blue [-226, 481]. Real world SDR YUV material (movies) contain such "out of range" values that possibly would be better displayed as intended in HDR10 with brighter saturated colors rather than using SDR and clipping (but the latter is standard practice).
Since YUV has so many fewer colors than RGB, it is recommended to add quantization noise to reduce visible banding. Random [-.5, .5) is to be added to each YUV value while they are in float format in [16, 235]/[16, 240] range. This should be done in conversions both ways, although some visible noise will then be produced on otherwise solid surfaces. The better option is to use 10 bit (HDR) formats where banding is far less visible and no quantization noise is necessary.

Related

Convert Pixels Buffer type from 1555 to 5551 (C++, OpenGL ES)

I'm having a problem while converting OpenGL video plugin to support GLES 3.0
So far everything went well, except glTexSubImage2D the original code uses GL_UNSIGNED_SHORT_1_5_5_5_REV as pixels type which is not supported in GLES 3.0
the type that worked is GL_UNSIGNED_SHORT_5_5_5_1 but colors and pixels are broken,
so I thought converting the pixels buffer would be fine..
but due to my limited understanding in GL and C++ I didn't succeed to do that.
Pixels process:
the pixels will be converted internally to 16 bit ABGR as in the Shader comments:
// Take a normalized color and convert it into a 16bit 1555 ABGR
// integer in the format used internally by the Playstation GPU.
uint rebuild_psx_color(vec4 color) {
uint a = uint(floor(color.a + 0.5));
uint r = uint(floor(color.r * 31. + 0.5));
uint g = uint(floor(color.g * 31. + 0.5));
uint b = uint(floor(color.b * 31. + 0.5));
return (a << 15) | (b << 10) | (g << 5) | r;
}
it will be received by this method after processing by vGPU:
static void Texture_set_sub_image_window(struct Texture *tex, uint16_t top_left[2], uint16_t resolution[2], size_t row_len, uint16_t* data)
{
uint16_t x = top_left[0];
uint16_t y = top_left[1];
/* TODO - Am I indexing data out of bounds? */
size_t index = ((size_t) y) * row_len + ((size_t) x);
uint16_t* sub_data = &( data[index] );
glPixelStorei(GL_UNPACK_ROW_LENGTH, (GLint) row_len);
glPixelStorei(GL_UNPACK_ALIGNMENT, 1);
glBindTexture(GL_TEXTURE_2D, tex->id);
glTexSubImage2D(GL_TEXTURE_2D, 0,
(GLint) top_left[0], (GLint) top_left[1],
(GLsizei) resolution[0], (GLsizei) resolution[1],
GL_RGBA, GL_UNSIGNED_SHORT_1_5_5_5_REV /* Not supported in GLES */,
(void*)sub_data);
glPixelStorei(GL_UNPACK_ROW_LENGTH, 0);
}
as for row_len it's get the value from #define VRAM_WIDTH_PIXELS 1024
What I tried to do:
1st I replaced the type with another one:
glTexSubImage2D(GL_TEXTURE_2D, 0,
(GLint) top_left[0], (GLint) top_left[1],
(GLsizei) resolution[0], (GLsizei) resolution[1],
GL_RGBA, GL_UNSIGNED_SHORT_5_5_5_1 /* <- Here new type */,
(void*)sub_data);
2nd converted sub_data using this method:
uint16_t* ABGRConversion(const uint16_t* pixels, int row_len, int x, int y, int width, int height) {
uint16_t *frameBuffer = (uint16_t*)malloc(width * row_len * height);
signed i, j;
for (j=0; j < height; j++)
{
for (i=0; i < width; i++)
{
int offset = j * row_len + i;
uint16_t pixel = pixels[offset];
frameBuffer[offset] = Convert1555To5551(pixel); //<- stuck here
}
}
return frameBuffer;
}
I have no idea what Convert1555To5551 should look like?
Note: Sorry if some descriptions is wrong, I don't really have full understanding for the whole process.
Performance is not major problem.. just need to know how to deal with the current pixel buffer.
Side note: I had to replace glFramebufferTexture with glFramebufferTexture2D so I hope it's not involved in the issue.
Thanks.
This should be what you're looking for.
uint16_t Convert1555To5551(uint16_t pixel)
{
// extract rgba from 1555 (1 bit alpha, 5 bits blue, 5 bits green, 5 bits red)
uint16_t a = pixel >> 15;
uint16_t b = (pixel >> 10) & 0x1f; // mask lowest five bits
uint16_t g = (pixel >> 5) & 0x1f;
uint16_t r = pixel & 0x1f;
// compress rgba into 5551 (5 bits red, 5 bits green, 5 bits blue, 1 bit alpha)
return (r << 11) | (g << 6) | (b << 1) | a;
}

Change DWORD color alpha channel value

I have a starting color: 0xffff00ff, which is a:255, r:255, g:0, b:255.
The goal is to change the alpha channel of the color to be less opaque based on a percentage. i.e. 50% opacity for that color is roughly 0x80ff00ff.
How I've tried to reach the solution:
DWORD cx = 0xffff00ff;
DWORD cn = .5;
DWORD nc = cx*cn;
DWORD cx = 0xffff00ff;
float cn = .5;
DWORD alphaMask=0xff000000;
DWORD nc = (cx|alphaMask)&((DWORD)(alphaMask*cn)|(~alphaMask));
This should do the trick. all I'm doing here is setting the first 8 bits of the DWORD to 1's with the or (symbolized by '|') and then anding those bits with the correct value you want them to be which is the alpha mask times cn. Of course I casted the result of the multiplication to make it a DWORD again.
This is tested code (in linux). However, you might find a simpler answer. Note: this is RGBA, not ARGB as you have referenced in your question.
double transparency = 0.500;
unsigned char *current_image_data_iterator = reinterpret_cast<unsigned char*>( const_cast<char *>( this->data.getCString() ) );
unsigned char *new_image_data_iterator = reinterpret_cast<unsigned char*>( const_cast<char *>( new_image_data->data.getCString() ) );
size_t x;
//cout << "transparency: " << transparency << endl;
for( x = 0; x < data_length; x += 4 ){
//rgb data is the same
*(new_image_data_iterator + x) = *(current_image_data_iterator + x);
*(new_image_data_iterator + x + 1) = *(current_image_data_iterator + x + 1);
*(new_image_data_iterator + x + 2) = *(current_image_data_iterator + x + 2);
//multiply the current opacity by the applied transparency
*(new_image_data_iterator + x + 3) = uint8_t( double(*(current_image_data_iterator + x + 3)) * ( transparency / 255.0 ) );
//cout << "Current Alpha: " << dec << static_cast<int>( *(current_image_data_iterator + x + 3) ) << endl;
//cout << "New Alpha: " << double(*(current_image_data_iterator + x + 3)) * ( transparency / 255.0 ) << endl;
//cout << "----" << endl;
}
typedef union ARGB
{
std::uint32_t Colour;
std::uint8_t A, R, G, B;
};
int main()
{
DWORD cx = 0xffff00ff;
reinterpret_cast<ARGB*>(&cx)->A = reinterpret_cast<ARGB*>(&cx)->A / 2;
std::cout<<std::hex<<cx;
}
The solution I chose to go with:
DWORD changeOpacity(DWORD color, float opacity) {
int alpha = (color >> 24) & 0xff;
int r = (color >> 16) & 0xff;
int g = (color >> 8) & 0xff;
int b = color & 0xff;
int newAlpha = ceil(alpha * opacity);
UINT newColor = r << 16;
newColor += g << 8;
newColor += b;
newColor += (newAlpha << 24);
return (DWORD)newColor;
}
I understand your question as: I wish to change a given rgba color component by a certain factor while keeping the same overall transparency.
For a color with full alpha (1.0 or 255), this is trivial: simply multiply the component without touching the others:
//typedef unsigned char uint8
enum COMPONENT {
RED,
GREEN,
BLUE,
ALPHA
};
struct rgba {
uint8 components[4];
// uint8 alpha, blue, green, red; // little endian
uint8 &operator[](int index){
return components[index];
}
};
rgba color;
if (color[ALPHA] == 255)
color[RED] *= factor;
else
ComponentFactor(color, RED, factor);
There's'probably not a single answer to that question in the general case. Consider that colors may be encoded alternatively in HSL or HSV. You might want to keep some of these parameters fixed, and allow other to change.
My approach to this problem would be to first try to find the hue distance between the source and target colors at full alpha, and then convert the real source color to HSV, apply the change in hue, then convert back to RGBA. Obviously, that second step is not necessary if the alpha is actually 1.0.
In pseudo code:
rgba ComponentFactor(rgba color, int component, double factor){
rgba fsrc = color, ftgt;
fsrc.alpha = 1.0; // set full alpha
ftgt = fsrc;
ftgt[component] *= factor; // apply factor
hsv hsrc = fsrc, htgt = ftgt; // convert to hsv color space
int distance = htgt.hue - hsrc.hue; // find the hue difference
hsv tmp = color; // convert actual color to hsv
tmp.hue += distance; // apply change in hue
rgba res = tmp; // convert back to RGBA space
return res;
}
Note how the above rely on type rgba and hsv to have implicit conversion constructors. Algorithms for conversion may be easily found with a web search. It should be also easy to derive struct definitions for hsv from the rgba one, or include individual component access as field members (rather than using the [] operator).
For instance:
//typedef DWORD uint32;
struct rgba {
union {
uint8 components[4];
struct {
uint8 alpha,blue,green,red; // little endian plaform
}
uint32 raw;
};
uint8 &operator[](int index){
return components[4 - index];
}
rgba (uint32 raw_):raw(raw_){}
rgba (uint8 r, uint8 g, uint8 b, uint8 a):
red(r), green(g), blue(b),alpha(a){}
};
Perhaps you will have to find a hue factor rather than a distance, or tweak other HSV components to achieve the desired result.

Implement a near real-time CPU capability like glAlphaFunc(GL_GREATER) with RGB source and RGBA overlay

Latency is the biggest concern here. I have found that trying to render 3 1920x1080 video feeds with RGBA overlays to individual windows via OpenGL has limits. I am able to render two windows with overlays or 3 windows without overlays just fine, but when the third window is introduced, rendering stalls are obvious. I believe that the issue is due to the overuse of glAlphaFunc() to overlay and RGBA based texture on an RGB video texture. In order to reduce the overuse, my thought is to move some of the overlay function into CPU (as I have lots of CPU - dual hexcore Xeon). The ideal place to do this would be when copying the source RGB image to the mapped PBO and replacing the RGB values with the ones from the RGBA overlay where A > 0.
I have tried using Intel IPP methods, but there is no method available that doesn't involve multiple calls and results in too much latency. I've tried straight C code, but this takes longer than the 33 ms that I am allowed. I need help with creating an optimized assembly or SSE based routine that will provide minimal latency.
Compile the below code with > g++ -fopenmp -O2 -mtune=native
Basic C function for clarity:
void copyAndOverlay(const uint8_t* aSourceRGB, const uint8_t* aOverlayRGBA, uint8_t* aDestinationRGB, int aWidth, int aHeight) {
int i;
#pragma omp parallel for
for (i=0; i<aWidth*aHeight; ++i) {
if (0 == aOverlayRGBA[i*4+3]) {
aDestinationRGB[i*3] = aSourceRGB[i*3]; // R
aDestinationRGB[i*3+1] = aSourceRGB[i*3+1]; // G
aDestinationRGB[i*3+2] = aSourceRGB[i*3+2]; // B
} else {
aDestinationRGB[i*3] = aOverlayRGBA[i*4]; // R
aDestinationRGB[i*3+1] = aOverlayRGBA[i*4+1]; // G
aDestinationRGB[i*3+2] = aOverlayRGBA[i*4+2]; // B
}
}
}
uint64_t getTime() {
struct timeval tNow;
gettimeofday(&tNow, NULL);
return (uint64_t)tNow.tv_sec * 1000000 + (uint64_t)tNow.tv_usec;
}
int main(int argc, char **argv) {
int pixels = _WIDTH_ * _HEIGHT_ * 3;
uint8_t *rgba = new uint8_t[_WIDTH_ * _HEIGHT_ * 4];
uint8_t *src = new uint8_t[pixels];
uint8_t *dst = new uint8_t[pixels];
uint64_t tStart = getTime();
for (int t=0; t<1000; ++t) {
copyAndOverlay(src, rgba, dst, _WIDTH_, _HEIGHT_);
}
printf("delta: %lu\n", (getTime() - tStart) / 1000);
delete [] rgba;
delete [] src;
delete [] dst;
return 0;
}
Here is an SSE4 implementation that is a little more than 5 times faster than the code you posted with the question (without parallelization of the loop). As written it only works on RGBA buffers that are 16-byte aligned and sized in multiples of 64, and on RGB buffers that are 16-byte aligned and sized in multiples of 48. The size will requirments will jive perfectly with your 1920x1080 resolution, and you may need to add code to ensure your buffers are 16-byte aligned.
void copyAndOverlay(const uint8_t* aSourceRGB, const uint8_t* aOverlayRGBA, uint8_t* aDestinationRGB, int aWidth, int aHeight) {
__m128i const ocmp = _mm_setzero_si128();
__m128i const omskshf1 = _mm_set_epi32(0x00000000, 0x0F0F0F0B, 0x0B0B0707, 0x07030303);
__m128i const omskshf2 = _mm_set_epi32(0x07030303, 0x00000000, 0x0F0F0F0B, 0x0B0B0707);
__m128i const omskshf3 = _mm_set_epi32(0x0B0B0707, 0x07030303, 0x00000000, 0x0F0F0F0B);
__m128i const omskshf4 = _mm_set_epi32(0x0F0F0F0B, 0x0B0B0707, 0x07030303, 0x00000000);
__m128i const ovalshf1 = _mm_set_epi32(0x00000000, 0x0E0D0C0A, 0x09080605, 0x04020100);
__m128i const ovalshf2 = _mm_set_epi32(0x04020100, 0x00000000, 0x0E0D0C0A, 0x09080605);
__m128i const ovalshf3 = _mm_set_epi32(0x09080605, 0x04020100, 0x00000000, 0x0E0D0C0A);
__m128i const ovalshf4 = _mm_set_epi32(0x0E0D0C0A, 0x09080605, 0x04020100, 0x00000000);
__m128i const blndmsk1 = _mm_set_epi32(0xFFFFFFFF, 0x00000000, 0x00000000, 0x00000000);
__m128i const blndmsk2 = _mm_set_epi32(0xFFFFFFFF, 0xFFFFFFFF, 0x00000000, 0x00000000);
__m128i const blndmsk3 = _mm_set_epi32(0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0x00000000);
__m128i a, b, c, x, y, z, w, p, q, r, s;
uint8_t const *const aSourceRGBPast = aSourceRGB + 3 * aWidth * aHeight;
while (aSourceRGB != aSourceRGBPast) {
// source:
// aaabbbcccdddeeef
// ffggghhhiiijjjkk
// klllmmmnnnoooppp
//
// overlay:
// aaaabbbbccccdddd
// eeeeffffgggghhhh
// iiiijjjjkkkkllll
// mmmmnnnnoooopppp
// load source
a = _mm_load_si128((__m128i const*)(aSourceRGB ));
b = _mm_load_si128((__m128i const*)(aSourceRGB + 16));
c = _mm_load_si128((__m128i const*)(aSourceRGB + 32));
// load overlay
x = _mm_load_si128((__m128i const*)(aOverlayRGBA ));
y = _mm_load_si128((__m128i const*)(aOverlayRGBA + 16));
z = _mm_load_si128((__m128i const*)(aOverlayRGBA + 32));
w = _mm_load_si128((__m128i const*)(aOverlayRGBA + 48));
// compute blend mask, put 0xFF in bytes equal to zero
p = _mm_cmpeq_epi8(x, ocmp);
q = _mm_cmpeq_epi8(y, ocmp);
r = _mm_cmpeq_epi8(z, ocmp);
s = _mm_cmpeq_epi8(w, ocmp);
// align overlay to be condensed to 3-byte color
x = _mm_shuffle_epi8(x, ovalshf1);
y = _mm_shuffle_epi8(y, ovalshf2);
z = _mm_shuffle_epi8(z, ovalshf3);
w = _mm_shuffle_epi8(w, ovalshf4);
// condense overlay to 3-btye color
x = _mm_blendv_epi8(x, y, blndmsk1);
y = _mm_blendv_epi8(y, z, blndmsk2);
z = _mm_blendv_epi8(z, w, blndmsk3);
// align blend mask to be condensed to 3-byte color
p = _mm_shuffle_epi8(p, omskshf1);
q = _mm_shuffle_epi8(q, omskshf2);
r = _mm_shuffle_epi8(r, omskshf3);
s = _mm_shuffle_epi8(s, omskshf4);
// condense blend mask to 3-btye color
p = _mm_blendv_epi8(p, q, blndmsk1);
q = _mm_blendv_epi8(q, r, blndmsk2);
r = _mm_blendv_epi8(r, s, blndmsk3);
// select from overlay and source based on blend mask
x = _mm_blendv_epi8(x, a, p);
y = _mm_blendv_epi8(y, b, q);
z = _mm_blendv_epi8(z, c, r);
// write colors to destination
_mm_store_si128((__m128i*)(aDestinationRGB ), x);
_mm_store_si128((__m128i*)(aDestinationRGB + 16), y);
_mm_store_si128((__m128i*)(aDestinationRGB + 32), z);
// update poniters
aSourceRGB += 48;
aOverlayRGBA += 64;
aDestinationRGB += 48;
}
}

How to set a pixel in a SDL_surface?

I need to use the following function from this page. The SDL_Surface structure is defined as
typedef struct SDL_Surface {
Uint32 flags; /* Read-only */
SDL_PixelFormat *format; /* Read-only */
int w, h; /* Read-only */
Uint16 pitch; /* Read-only */
void *pixels; /* Read-write */
SDL_Rect clip_rect; /* Read-only */
int refcount; /* Read-mostly */
} SDL_Surface;
The function is:
void set_pixel(SDL_Surface *surface, int x, int y, Uint32 pixel)
{
Uint8 *target_pixel = (Uint8 *)surface->pixels + y * surface->pitch + x * 4;
*(Uint32 *)target_pixel = pixel;
}
Here I have few doubts, may be due to the lack of a real picture.
Why do we need to multiply surface->pitch by y, and x by 4?
What is the necessity of declaring target_pixel as an 8-bit integer pointer first, then casting it into a 32-bit integer pointer later?
How does target_pixel retain the pixel value after the set_pixel function return?
Since each pixel has size 4 (the surface is using Uint32-valued pixels), but the computation is being made in Uint8. The 4 is ugly, see below.
To make the address calculation be in bytes.
Since the pixel to be written really is 32-bit, the pointer must be 32-bit to make it a single write.
The calculation has to be in bytes since the surface's pitch field is in bytes.
Here's a (less aggressive than my initial attempt) re-write:
void set_pixel(SDL_Surface *surface, int x, int y, Uint32 pixel)
{
Uint32 * const target_pixel = (Uint32 *) ((Uint8 *) surface->pixels
+ y * surface->pitch
+ x * surface->format->BytesPerPixel);
*target_pixel = pixel;
}
Note how we use surface->format->BytesPerPixel to factor out the 4. Magic constants are not a good idea. Also note that the above assumes that the surface really is using 32-bit pixels.
You can use the code below:
unsigned char* pixels = (unsigned char*)surface -> pixels;
pixels[4 * (y * surface -> w + x) + c] = 255;
x is the x of the point you want, y is the y of the point and c shows what information you want:
c=0 corresponds to blue
c=1 corresponds to green
c=2 corresponds to red
c=3 corresponds to alpha(opacity)

SDL return code 3 from SDL at strange place in code

I am getting a 3 error code from an SDL executable, and it seems to be in a place where I pass a SDL color by value and I don't understand the reason.
void Map::draw(SDL_Surface *surface, int level){
//the surface is locked
if ( SDL_MUSTLOCK(surface) )
SDL_LockSurface(surface);
long start= (long)level * this->xmax * this->ymax;
long end= (long)(level+1) * this->xmax * this->ymax;
for(long n=start; n<end; ++n){
Node *pn= this->nodes+n;
//exit(18); //exit code is 18
draw_pixel_nolock(surface, pn->location.x, pn->location.y, colors[pn->content]);
}
//the surface is unlocked
if ( SDL_MUSTLOCK(surface) )
SDL_UnlockSurface(surface);
}
And the graphics function called is:
SDL_Color colors[]= { {0,0,0}, {0xFF,0,0}, {0,0xFF,0}, {0,0,0xFF} };
void PutPixel32_nolock(SDL_Surface * surface, int x, int y, Uint32 color)
{
Uint8 * pixel = (Uint8*)surface->pixels;
pixel += (y * surface->pitch) + (x * sizeof(Uint32));
*((Uint32*)pixel) = color;
}
void PutPixel24_nolock(SDL_Surface * surface, int x, int y, Uint32 color)
{
Uint8 * pixel = (Uint8*)surface->pixels;
pixel += (y * surface->pitch) + (x * sizeof(Uint8) * 3);
#if SDL_BYTEORDER == SDL_BIG_ENDIAN
pixel[0] = (color >> 24) & 0xFF;
pixel[1] = (color >> 16) & 0xFF;
pixel[2] = (color >> 8) & 0xFF;
#else
pixel[0] = color & 0xFF;
pixel[1] = (color >> 8) & 0xFF;
pixel[2] = (color >> 16) & 0xFF;
#endif
}
void PutPixel16_nolock(SDL_Surface * surface, int x, int y, Uint32 color)
{
Uint8 * pixel = (Uint8*)surface->pixels;
pixel += (y * surface->pitch) + (x * sizeof(Uint16));
*((Uint16*)pixel) = color & 0xFFFF;
}
void PutPixel8_nolock(SDL_Surface * surface, int x, int y, Uint32 color)
{
Uint8 * pixel = (Uint8*)surface->pixels;
pixel += (y * surface->pitch) + (x * sizeof(Uint8));
*pixel = color & 0xFF;
}
//this function draws a pixel of wanted color on a surface at (x,y) coordinate
void draw_pixel_nolock(SDL_Surface *surface, int x, int y, SDL_Color s_color)
{ exit(19);//exit code is 3
//SDL_MapRGB return a color map depending on bpp (definition)
Uint32 color = SDL_MapRGB(surface->format, s_color.r, s_color.g, s_color.b);
//byte per pixel
int bpp = surface->format->BytesPerPixel;
//here is checked the number of byte used by our surface
switch (bpp)
{
case 1: // 1 byte => 8-bpp
PutPixel8_nolock(surface, x, y, color);
break;
case 2: // 2 byte => 16-bpp
PutPixel16_nolock(surface, x, y, color);
break;
case 3: // 3 byte => 24-bpp
PutPixel24_nolock(surface, x, y, color);
break;
case 4: // 4 byte => 32-bpp
PutPixel32_nolock(surface, x, y, color);
break;
}
}
The code returns error code 18 when I exit there, but never returns error code 19, and gives errror code 3 instead. What could possibly go wrong?
Without seeing the entire code it's hard to tell, but as a general practice:
Validate that
long start= (long)level * this->xmax * this->ymax;
long end= (long)(level+1) * this->xmax * this->ymax;
start and end are valid offsets for your node array, otherwise this->node + n will return a garbage pointer.
Validate that
Node *pn= this->nodes+n;
Is not null and a valid pointer to a Node object
Validate that
pn->content
Is within the bounds of your colors array