Image sticking Issue when using iOS replaykit Broadcast Upload Extension - replaykit

I am testing Broadcast Upload Extension under iOS11.4.1 or 12.0 with iPad6.
After extracting the YUV data from CMSampleBufferRef, and saving those data to a file, I get some bad images. The issue seems like that the last frame image still remains on the new frame, and not refreshed. Maybe the data is not ready, when coming out from callback.I wonders how to avoid this issue.
one bad image here
another bad image
- (void)processSampleBuffer:(CMSampleBufferRef)sampleBuffer withType:(RPSampleBufferType)sampleBufferType {
switch (sampleBufferType) {
case RPSampleBufferTypeVideo:
{
CFRetain(sampleBuffer);
size_t bytes = 0;
char* data = NULL;
size_t bufwidth, bufheight, bufstride;
CVPixelBufferRef pixelbuf = CMSampleBufferGetImageBuffer(sampleBuffer);
CVReturn cr = CVPixelBufferLockBaseAddress(pixelbuf, kCVPixelBufferLock_ReadOnly);
for (size_t i = 0; i < CVPixelBufferGetPlaneCount(pixelbuf); i++)
{
bufwidth = CVPixelBufferGetWidthOfPlane(pixelbuf, i);
bufheight = CVPixelBufferGetHeightOfPlane(pixelbuf, i);
bufstride = CVPixelBufferGetBytesPerRowOfPlane(pixelbuf,i);
data = (char*)CVPixelBufferGetBaseAddressOfPlane(pixelbuf,i);
if(bufwidth == bufstride)
{
size_t ylen = bufwidth*bufheight;
fwrite(data, ylen, 1, _file_yuv);
}
else
{
size_t factor = bufstride/bufwidth;
bytes = bufwidth * factor;
for (j = 0; j < bufheight; j++)
{
fwrite(data, bytes, 1, _file_yuv);
data += bufstride;
}
}
}
CVPixelBufferUnlockBaseAddress(pixelbuf, kCVPixelBufferLock_ReadOnly);
CFRelease(sampleBuffer);
}
break;
case RPSampleBufferTypeAudioApp:
break;
default:
break;
}
}

Related

is it possible to find a small image in a big image faster than this way?

I need to get the coordinate of a image on the screen.
I use gdi to capture the screen to get big image data, and load small image data from file.
I compare two images with follow code,but it is too slow.
Is it possible to find a faster way? And the way is no loss of accuracy.
Beacause i tried gray transform and comapre the hash of every column and other ways, they are faster but not precise.
// input:big image,samll image,sim,dfcolor,rc
// output:a POINT,{-1,-1} means not found
PBYTE pSrc = _src.getBytes(); // _src is big image, pSrc is big image data pointer
PBYTE pPic = pic->getBytes(); // pic is small image, pPic issamll image data pointer
int max_error = (1. - sim) * pic->width() * pic->height();
int error_count = 0;
bool bad = false;
// rc is a rect,because use multithreading,every thread handle a block of big image
for (int i = rc.y1; i < rc.y2; ++i) {
for (int j = rc.x1; j < rc.x2; ++j) {
// stop is a std::atomic_bool variable,to notify other threads to stop if found
if (stop) {
return { -1, -1 };
}
// image data is stored as bgra,i just compare rgb
// dfcolor is color deviation
for (int y1 = 0; y1 < pic->height() && !bad; ++y1) {
for (int x1 = 0; x1 < pic->width(); ++x1) {
int index1 = ((i + y1) * _src.width() + j + x1) << 2;
int index2 = (y1 * pic->width() + x1) << 2;
if (abs(*(pSrc + index1) - *(pPic + index2)) >= dfcolor.b ||
abs(*(pSrc + index1 + 1) - *(pPic + index2 + 1)) >= dfcolor.g ||
abs(*(pSrc + index1 + 2) - *(pPic + index2 + 2)) >= dfcolor.r) {
++error_count;
if (error_count > max_error) {
bad = true;
break;
}
}
}
}
// not found,continue
if (bad) {
error_count = 0;
bad = false;
continue;
}
// found
stop = true;
return { i, j };
}
}
return { -1,-1 };
Not sure how smart your compiler is, but your index1 and index2 in the innermost loop (VERY nested) are advancing by 4 bytes in each iteration.
You could simply calculate pointers into your Src and Pic images and advance those instead of pretty complicated math.
The effects are at least two-folds: you may save some time on that math AND (more important) compiler may notice that it can vectorize that if statement.

What's the correct way to assign one GPU memory buffer value from another GPU memory buffer with some arithmetic on each source buffer's element?

I'm a newbie for GPU programming using Cuda toolkit, and I have to write some code offering the functionality as I mentioned in the title.
I'd like to paste the code to show what exactly I want to do.
void CTrtModelWrapper::forward(void **bindings,
unsigned height,
unsigned width,
short channel,
ColorSpaceFmt colorFmt,
PixelDataType pixelType) {
uint16_t *devInRawBuffer_ptr = (uint16_t *) bindings[0];
uint16_t *devOutRawBuffer_ptr = (uint16_t *) bindings[1];
const unsigned short bit = 16;
float *devInputBuffer_ptr = nullptr;
float *devOutputBuffer_ptr = nullptr;
unsigned volume = height * width * channel;
common::cudaCheck(cudaMalloc((void **) &devInputBuffer_ptr, volume * getElementSize(nvinfer1::DataType::kFLOAT)));
common::cudaCheck(cudaMalloc((void **) &devOutputBuffer_ptr, volume * getElementSize(nvinfer1::DataType::kFLOAT)));
unsigned short npos = 0;
switch (pixelType) {
case PixelDataType::PDT_INT8: // high 8bit
npos = bit - 8;
break;
case PixelDataType::PDT_INT10: // high 10bit
npos = bit - 10;
break;
default:
break;
}
switch (colorFmt) {
case CFMT_RGB: {
for (unsigned i = 0; i < volume; ++i) {
devInputBuffer_ptr[i] = float((devInRawBuffer_ptr[i]) >> npos); // SEGMENTATION Fault at this line
}
}
break;
default:
break;
}
void *rtBindings[2] = {devInputBuffer_ptr, devOutputBuffer_ptr};
// forward
this->_forward(rtBindings);
// convert output
unsigned short ef_bit = bit - npos;
switch (colorFmt) {
case CFMT_RGB: {
for (unsigned i = 0; i < volume; ++i) {
devOutRawBuffer_ptr[i] = clip< uint16_t >((uint16_t) devOutputBuffer_ptr[i],
0,
(uint16_t) pow(2, ef_bit)) << npos;
}
}
break;
default:
break;
}
}
bindings is a pointer to an array, the 1st element in the array is a device pointer that points to a buffer allocated using cudaMalloc on the gpu, each element in the buffer is a 16bit integer.the 2nd one the same, used to store the output data.
height,width,channel,colorFmt(RGB here),pixelType(PDT_INT8, aka 8bit) respective to the image height, width,channel number, colorspace, bits to store one pixel value.
the _forward function requires a pointer to an array, similar to bindings except that each element in the buffer should be a 32bit float number.
so I make some transformation using a loop
for (unsigned i = 0; i < volume; ++i) {
devInputBuffer_ptr[i] = float((devInRawBuffer_ptr[i]) >> npos); // SEGMENTATION Fault at this line
}
the >> operation is because the actual 8bit data is stored in the high 8 bit.
SEGMENTATION FAULT occurred at this line of code devInputBuffer_ptr[i] = float((devInRawBuffer_ptr[i]) >> npos); and i equals 0.
I try to separate this code into several line:
uint16_t value = devInRawBuffer_ptr[i];
float transferd = float(value >> npos);
devInputBuffer_ptr[i] = transferd;
and SEGMENTATION FAULT occurred at this line uint16_t value = devInRawBuffer_ptr[i];
I wonder that is this a valid way to assign value to an allocated gpu memory buffer?
PS: the buffer given in bindings are totally fine. they are from host memory using cudaMemcpy before the call to forward function, but I still paste the code below
nvinfer1::DataType type = nvinfer1::DataType::kHALF;
HostBuffer hostInputBuffer(volume, type);
DeviceBuffer deviceInputBuffer(volume, type);
HostBuffer hostOutputBuffer(volume, type);
DeviceBuffer deviceOutputBuffer(volume, type);
// HxWxC --> WxHxC
auto *hostInputDataBuffer = static_cast<unsigned short *>(hostInputBuffer.data());
for (unsigned w = 0; w < W; ++w) {
for (unsigned h = 0; h < H; ++h) {
for (unsigned c = 0; c < C; ++c) {
hostInputDataBuffer[w * H * C + h * C + c] = (unsigned short )(*(ppm.buffer.get() + h * W * C + w * C + c));
}
}
}
auto ret = cudaMemcpy(deviceInputBuffer.data(), hostInputBuffer.data(), volume * getElementSize(type),
cudaMemcpyHostToDevice);
if (ret != 0) {
std::cout << "CUDA failure: " << ret << std::endl;
return EXIT_FAILURE;
}
void *bindings[2] = {deviceInputBuffer.data(), deviceOutputBuffer.data()};
model->forward(bindings, H, W, C, sbsisr::ColorSpaceFmt::CFMT_RGB, sbsisr::PixelDataType::PDT_INT8);
In CUDA, it's generally not advisable to dereference a device pointer in host code. For example, you are creating a "device pointer" when you use cudaMalloc:
common::cudaCheck(cudaMalloc((void **) &devInputBuffer_ptr, volume * getElementSize(nvinfer1::DataType::kFLOAT)));
From the code you have posted, it's not possible to deduce that for devInRawBuffer_ptr but I'll assume it also is a device pointer.
In that case, to perform this operation:
for (unsigned i = 0; i < volume; ++i) {
devInputBuffer_ptr[i] = float((devInRawBuffer_ptr[i]) >> npos);
}
You would launch a CUDA kernel, something like this:
// put this function definition at file scope
__global__ void shift_kernel(float *dst, uint16_t *src, size_t sz, unsigned short npos){
for (size_t idx = blockIdx.x*blockDim.x+threadIdx.x, idx < sz; idx += gridDim.x*blockDim.x) dst[idx] = (float)((src[idx]) >> npos);
}
// call it like this in your code:
kernel<<<160, 1024>>>(devInputBuffer_ptr, devInRawBuffer_ptr, volume, npos);
(coded in browser, not tested)
If you'd like to learn more about what's going on here, you may wish to study CUDA. For example, you can get most of the basic concepts here and by studying the CUDA sample code vectorAdd. The grid-stride loop is discussed here.

Receiving large Binary file from server

I am trying to receive and save file over 5gb size using c++. But during the course of the process, memory used by the application is increasing and sometimes the application crashes. Is there something I am doing wrong ? Or is there a better way to do this?
char * buffer = channel->cread(&len);
__int64_t lengthFile = *(__int64_t * ) buffer;
__int64_t received = 0;
__int64_t offset = 0;
__int64_t length;
string filename = "received/testFile";
ofstream ifs;
ifs.open(filename.c_str(),ios::binary | ios::out);
int len = 0;
while(1){
length = 256;
if(offset + MAX_MESSAGE >= lengthFile){
length = lengthFile - offset;
breakCondition = true;
}
file = new filemsg(offset,length);
channel->cwrite((char *)file,sizeof (*file));
buffer = channel->cread(&len);
ifs.write(buffer,length);
received = received + length;
offset = offset + MAX_MESSAGE;
if(breakCondition)
break;
}

Buffer Overflow in C++ while reading virtual memory

I've got a program which is reading processes virtual memory and some registers for some data, then making amendments to it.
Here I pass the contents of eax register to my function (this seems to work fine, but I thought it might demonstrate what types of data are being involved)
case EXCEPTION_SINGLE_STEP: // EXCEPTION_SINGLE_STEP = 0x80000004
bl_flag = TRUE;
memset((void *)&context, 0, 0x2CC);
context.ContextFlags = 0x10017;
thread = OpenThread(0x1FFFFF, 0, debug_event.dwThreadId);
GetThreadContext(thread, &context);
context.Eip = context.Eip + 1;
// sub_FD4BF0((HANDLE)(*((DWORD *)(lpThreadParameter))), context.Eax);
StringToHtml((HANDLE)(dwArray[0]), context.Eax);
SetThreadContext(thread, &context);
CloseHandle(thread);
break;
void StringToHtml(HANDLE hProcess, DWORD address)
{
WCHAR buff[0x100];
WCHAR html[0x100];
DWORD oldProt = 0, real = 0;
int len = 0;
VirtualProtectEx(hProcess, (LPVOID)address, 0x200, PAGE_READWRITE, &oldProt);
ReadProcessMemory(hProcess, (LPCVOID)address, (LPVOID)buff, 0x200, &real);
len = wcslen(buff);
int k = 0, j = 0;
wprintf(L"Found out chat string : \"%s\" \n", buff);
for (int pp = 0; pp < 0x100; pp++)
html[pp] = NULL;
while(j < len)
{
if (buff[j] == L'&')
{
if (wcsncmp((const WCHAR *)(buff + j + 1), L"lt;", 3) == 0)
{
//html[k] = L'<';
html[k] = L'<font color="#00FF10">';
k++;
j = j + 4;
continue;
}
I am aware this is an incomplete function snippet. However the issue is arriving at my for loop here.
for (int pp = 0; pp < 0x100; pp++)
If i enter more than 256 characters (I at first thought this would be enough) then it crashes. I have clearly missed something obvious as I tried doing pp < len which I thought would use the buffer size, however, I still get the same crash.
How can I read the total size of the string entered in the chat into the loop and make it iterate over the WHOLE thing. Or at the very least catch this error?
Did you change the size of html and buffer according to the max of your for loop? Maybe that is already the solution.

SDL_GetPixel pointer problems

This is my very first question:
First of these 2 functions you see here below works fine to some extent:
Uint32 AWSprite::get_pixelColor_location(SDL_Surface * surface, int x, int y) {
int bpp = surface->format->BytesPerPixel;
/* Here p is the address to the pixel we want to retrieve */
Uint8 *p = (Uint8 *)surface->pixels + y * surface->pitch + x * bpp;
switch (bpp) {
case 1:
return *p;
case 2:
return *(Uint16 *)p;
case 3:
if (SDL_BYTEORDER == SDL_BIG_ENDIAN)
return p[0] << 16 | p[1] << 8 | p[2];
else
return p[0] | p[1] << 8 | p[2] << 16;
case 4:
return *(Uint32 *)p;
default:
return 0;
}
}
void AWSprite::set_all_frame_image_actual_size() {
/* This function finds an entire rows that has transparency
then stores the amount of rows to a Frame_image_absolute structure
*/
absolute_sprite = new Frame_image_absolute*[howManyFrames];
for (int f = 0; f < howManyFrames; f++) {
SDL_LockSurface(frames[f]);
int top_gap = 0; int bottom_gap = 0;
int per_transparent_px_count = 1;
for (int i = 0; i < frames[f]->h; i++) {
int per_transparent_px_count = 1;
if (this->get_pixelColor_location(frames[f], j, i) == transparentColour) per_transparent_px_count++;
if (per_transparent_px_count >= frames[f]->w) {
if (i < frames[f]->h / 2) {
per_transparent_px_count = 1;
top_gap++;
} else {
per_transparent_px_count = 1;
bottom_gap++;
}
}
}
}
int realHeight = frames[f]->h - (top_gap + bottom_gap);
absolute_sprite[f] = new Frame_image_absolute();
absolute_sprite[f]->offset_y = top_gap;
absolute_sprite[f]->height = realHeight;
}
}
When i ran this i get:
Unhandled exception at 0x00173746 in SE Game.exe: 0xC0000005: Access violation reading location 0x03acc0b8.
When i when through debuging, i found that it crashes at:
When iterators variable f == 31, i == 38, j = 139
And stops at AWSprite::get_pixelColor_location() in the line at " return *(Uint32 *)p;
I found that if i ran it again and go through debugging line by line then i will works sometime and sometime it dont!!! So i mean that "It crash at randomly when f > 30, i, j iterators value"
What is going on...
I cannot comment on the question yet, but here are some questions:
Where does j come from? Based on the get_pixelColor_location function I would assume that you're iterating over the width of the surface. This part seems to be missing from the code you posted.
Did you validate that i and j are within the bounds of your surface?
Also, you don't seem to Unlock the surface.
Running your function seems to work adequately here so I suspect you're reading outside of your buffer with invalid parameters.