How to create a custom winrt::Microsoft::AI::MachineLearning::TensorFloat16Bit?

How to create a custom winrt::Microsoft::AI::MachineLearning::TensorFloat16Bit? - c++

How do I create a TensorFloat16Bit when manually doing a tensorization of the data?
We tensorized our data based on this Microsoft example, where we are converting 255-0 to 1-0, and changing the RGBA order.
...
std::vector<int64_t> shape = { 1, channels, height , width };
float* pCPUTensor;
uint32_t uCapacity;
// The channels of image stored in buffer is in order of BGRA-BGRA-BGRA-BGRA.
// Then we transform it to the order of BBBBB....GGGGG....RRRR....AAAA(dropped)
TensorFloat tf = TensorFloat::Create(shape);
com_ptr<ITensorNative> itn = tf.as<ITensorNative>();
CHECK_HRESULT(itn->GetBuffer(reinterpret_cast<BYTE**>(&pCPUTensor), &uCapacity));
// 2. Transform the data in buffer to a vector of float
if (BitmapPixelFormat::Bgra8 == pixelFormat)
{
for (UINT32 i = 0; i < size; i += 4)
{
// suppose the model expects BGR image.
// index 0 is B, 1 is G, 2 is R, 3 is alpha(dropped).
UINT32 pixelInd = i / 4;
pCPUTensor[pixelInd] = (float)pData[i];
pCPUTensor[(height * width) + pixelInd] = (float)pData[i + 1];
pCPUTensor[(height * width * 2) + pixelInd] = (float)pData[i + 2];
}
}
ref: https://github.com/microsoft/Windows-Machine-Learning/blob/2179a1dd5af24dff4cc2ec0fc4232b9bd3722721/Samples/CustomTensorization/CustomTensorization/TensorConvertor.cpp#L59-L77
I just converted our .onnx model to float16 to verify if that would provide some performance improvements on the inference when the available hardware provides support for float16. However, the binding is failing and the suggestion here is to pass a TensorFloat16Bit.
So if I swap the TensorFloat for TensorFloat16Bit I get an access violation exception at pCPUTensor[(height * width * 2) + pixelInd] = (float)pData[i + 2]; because pCPUTensor is half of the size of what it was. It seems like I should be reinterpreting_cast to uint16_t** or something among those lines, so pCPUTensor will have the same size as when it was a TensorFloat, but then I get further errors that it can only be uint8_t** or BYTE**.
Any ideas on how I can modify this code so I can get a custom TensorFloat16Bit?

Try the factory methods on TensorFloat16Bit.
However, you will need to convert you data to float16:
https://stackoverflow.com/a/60047308/11998382
Also, I might recommend you instead do the conversion within the onnx model.

Related

How to convert CMSampleBufferRef/CIImage/UIImage into pixels e.g. uint8_t[]

I have input from captured camera frame as CMSampleBufferRef and I need to get the raw pixels preferably in C type uint8_t[].
I also need to find the color scheme of the input image.
I know how to convert CMSampleBufferRef to UIImage and then to NSData with png format but I dont know how to get the raw pixels from there. Perhaps I could get it already from CMSampleBufferRef/CIImage`?
This code shows the need and the missing bits.
Any thoughts where to start?
int convertCMSampleBufferToPixelArray (CMSampleBufferRef sampleBuffer)
{
// inputs
CVImageBufferRef imageBuffer = CMSampleBufferGetImageBuffer(sampleBuffer);
CIImage *ciImage = [CIImage imageWithCVPixelBuffer:imageBuffer];
CIContext *imgContext = [CIContext new];
CGImageRef cgImage = [imgContext createCGImage:ciImage fromRect:ciImage.extent];
UIImage *uiImage = [UIImage imageWithCGImage:cgImage];
NSData *nsData = UIImagePNGRepresentation(uiImage);
// Need to fill this gap
uint8_t* data = XXXXXXXXXXXXXXXX;
ImageFormat format = XXXXXXXXXXXXXXXX; // one of: GRAY8, RGB_888, YV12, BGRA_8888, ARGB_8888
// sample showing expected data values
// this routine converts the image data to gray
//
int width = uiImage.size.width;
int height = uiImage.size.height;
const int size = width * height;
std::unique_ptr<uint8_t[]> new_data(new uint8_t[size]);
for (int i = 0; i < size; ++i) {
new_data[i] = uint8_t(data[i * 3] * 0.299f + data[i * 3 + 1] * 0.587f +
data[i * 3 + 2] * 0.114f + 0.5f);
}
return 1;
}

Some pointers you can use to search for more info. It's nicely documented and you shouldn't have an issue.
int convertCMSampleBufferToPixelArray (CMSampleBufferRef sampleBuffer) {
CVImageBufferRef imageBuffer = CMSampleBufferGetImageBuffer(sampleBuffer);
if (imageBuffer == NULL) {
return -1;
}
// Get address of the image buffer
CVPixelBufferLockBaseAddress(imageBuffer, 0);
uint8_t* data = CVPixelBufferGetBaseAddress(imageBuffer);
// Get size
size_t width = CVPixelBufferGetWidth(imageBuffer);
size_t height = CVPixelBufferGetHeight(imageBuffer);
// Get bytes per row
size_t bytesPerRow = CVPixelBufferGetBytesPerRow(imageBuffer);
// At `data` you have a bytesPerRow * height bytes of the image data
// To get pixel info you can call CVPixelBufferGetPixelFormatType, ...
// you can call CVImageBufferGetColorSpace and inspect it, ...
// When you're done, unlock the base address
CVPixelBufferUnlockBaseAddress(imageBuffer, 0);
return 0;
}
There're couple of things you should be aware of.
First one is that it can be planar. Check the CVPixelBufferIsPlanar, CVPixelBufferGetPlaneCount, CVPixelBufferGetBytesPerRowOfPlane, etc.
Second one is that you have to calculate pixel size based on CVPixelBufferGetPixelFormatType. Something like:
CVPixelBufferGetPixelFormatType(imageBuffer)
size_t pixelSize;
switch (pixelFormat) {
case kCVPixelFormatType_32BGRA:
case kCVPixelFormatType_32ARGB:
case kCVPixelFormatType_32ABGR:
case kCVPixelFormatType_32RGBA:
pixelSize = 4;
break;
// + other cases
}
Let's say that the buffer is not planar and:
CVPixelBufferGetWidth returns 200 (pixels)
Your pixelSize is 4 (calcuated bytes per row is 200 * 4 = 800)
CVPixelBufferGetBytesPerRow can return anything >= 800
In other words, the pointer you have is not a pointer to a contiguous buffer. If you need row data you have to do something like this:
uint8_t* data = CVPixelBufferGetBaseAddress(imageBuffer);
// Get size
size_t width = CVPixelBufferGetWidth(imageBuffer);
size_t height = CVPixelBufferGetHeight(imageBuffer);
size_t pixelSize = 4; // Let's pretend it's calculated pixel size
size_t realRowSize = width * pixelSize;
size_t bytesPerRow = CVPixelBufferGetBytesPerRow(imageBuffer);
for (int row = 0 ; row < height ; row++) {
// bytesPerRow acts like an offset where the next row starts
// bytesPerRow can be >= realRowSize
uint8_t *rowData = data + row * bytesPerRow;
// realRowSize = how many bytes are available for this row
// copy them somewhere
}
You have to allocate a buffer and copy these row data there if you'd like to have contiguous buffer. How many bytes to allocate? CVPixelBufferGetDataSize.

(C++)(Visual Studio) Change RGB to Grayscale

I am accessing the image like so:
pDoc = GetDocument();
int iBitPerPixel = pDoc->_bmp->bitsperpixel; // used to see if grayscale(8 bits) or RGB (24 bits)
int iWidth = pDoc->_bmp->width;
int iHeight = pDoc->_bmp->height;
BYTE *pImg = pDoc->_bmp->point; // pointer used to point at pixels in the image
int Wp = iWidth;
const int area = iWidth * iHeight;
int r; // red pixel value
int g; // green pixel value
int b; // blue pixel value
int gray; // gray pixel value
BYTE *pImgGS = pImg; // grayscale image pixel array
and attempting to change the rgb image to gray like so:
// convert RGB values to grayscale at each pixel, then put in grayscale array
for (int i = 0; i<iHeight; i++)
for (int j = 0; j<iWidth; j++)
{
r = pImg[i*iWidth * 3 + j * 3 + 2];
g = pImg[i*iWidth * 3 + j * 3 + 1];
b = pImg[i*Wp + j * 3];
r * 0.299;
g * 0.587;
b * 0.144;
gray = std::round(r + g + b);
pImgGS[i*Wp + j] = gray;
}
finally, this is how I try to draw the image:
//draw the picture as grayscale
for (int i = 0; i < iHeight; i++) {
for (int j = 0; j < iWidth; j++) {
// this should set every corresponding grayscale picture to the current picture as grayscale
pImg[i*Wp + j] = pImgGS[i*Wp + j];
}
}
}
original image:
and the resulting image that I get is this:

First check if image type is 24 bits per pixels.
Second, allocate memory to pImgGS;
BYTE* pImgGS = (BTYE*)malloc(sizeof(BYTE)*iWidth *iHeight);
Please refer this article to see how bmp data is saved. bmp images are saved upside down. Also, first 54 byte of information is BITMAPFILEHEADER.
Hence you should access values in following way,
double r,g,b;
unsigned char gray;
for (int i = 0; i<iHeight; i++)
{
for (int j = 0; j<iWidth; j++)
{
r = (double)pImg[(i*iWidth + j)*3 + 2];
g = (double)pImg[(i*iWidth + j)*3 + 1];
b = (double)pImg[(i*iWidth + j)*3 + 0];
r= r * 0.299;
g= g * 0.587;
b= b * 0.144;
gray = floor((r + g + b + 0.5));
pImgGS[(iHeight-i-1)*iWidth + j] = gray;
}
}
If there is padding present, then first determine padding and access in different way. Refer this to understand pitch and padding.
double r,g,b;
unsigned char gray;
long index=0;
for (int i = 0; i<iHeight; i++)
{
for (int j = 0; j<iWidth; j++)
{
r = (double)pImg[index+ (j)*3 + 2];
g = (double)pImg[index+ (j)*3 + 1];
b = (double)pImg[index+ (j)*3 + 0];
r= r * 0.299;
g= g * 0.587;
b= b * 0.144;
gray = floor((r + g + b + 0.5));
pImgGS[(iHeight-i-1)*iWidth + j] = gray;
}
index =index +pitch;
}
While drawing image,
as pImg is 24bpp, you need to copy gray values thrice to each R,G,B channel. If you ultimately want to save grayscale image in bmp format, then again you have to write bmp data upside down or you can simply skip that step in converting to gray here:
pImgGS[(iHeight-i-1)*iWidth + j] = gray;

tl; dr:
Make one common path. Convert everything to 32-bits in a well-defined manner, and do not use image dimensions or coordinates. Refactor the YCbCr conversion ( = grey value calculation) into a separate function, this is easier to read and runs at exactly the same speed.
The lengthy stuff
First, you seem to have been confused with strides and offsets. The artefact that you see is because you accidentially wrote out one value (and in total only one third of the data) when you should have written three values.
One can get confused with this easily, but here it happened because you do useless stuff that you needed not do in the first place. You are iterating coordinates left to right, top-to-bottom and painstakingly calculate the correct byte offset in the data for each location.
However, you're doing a full-screen effect, so what you really want is iterate over the complete image. Who cares about the width and height? You know the beginning of the data, and you know the length. One loop over the complete blob will do the same, only faster, with less obscure code, and fewer opportunities of getting something wrong.
Next, 24-bit bitmaps are common as files, but they are rather unusual for in-memory representation because the format is nasty to access and unsuitable for hardware. Drawing such a bitmap will require a lot of work from the driver or the graphics hardware (it will work, but it will not work well). Therefore, 32-bit depth is usually a much better, faster, and more comfortable choice. It is much more "natural" to access program-wise.
You can rather trivially convert 24-bit to 32-bit. Iterate over the complete bitmap data and write out a complete 32-bit word for each 3 byte-tuple read. Windows bitmaps ignore the A channel (the highest-order byte), so just leave it zero, or whatever.
Also, there is no such thing as a 8-bit greyscale bitmap. This simply doesn't exist. Although there exist bitmaps that look like greyscale bitmaps, they are in reality paletted 8-bit bitmaps where (incidentially) the bmiColors member contains all greyscale values.
Therefore, unless you can guarantee that you will only ever process images that you have created yourself, you cannot just rely that e.g. the values 5 and 73 correspond to 5/255 and 73/255 greyscale intensity, respectively. That may be the case, but it is in general a wrong assumption.
In order to be on the safe side as far as correctness goes, you must convert your 8-bit greyscale bitmaps to real colors by looking up the indices (the bitmap's grey values are really indices) in the palette. Otherwise, you could be loading a greyscale image where the palette is the other way around (so 5 would mean 250 and 250 would mean 5), or a bitmap which isn't greyscale at all.
So... you want to convert 24-bit and you want to convert 8-bit bitmaps, both to 32-bit depth. That means you do all the annoying what-if stuff once at the beginning, and the rest is one identical common path. That's a good thing.
What you will be showing on-screen is always a 32-bit bitmap where the topmost byte is ignored, and the lower three are all the same value, resulting in what looks like a shade of grey. That's simple, and simple is good.
Note that if you do a BT.601 style YCbCr conversion (as indicated by your use of the constants 0.299, 0.587, and 0.144), and if your 8-bit greyscale images are perceptive (this is something you must know, there is no way of telling from the file!), then for 100% correctness, you need to to the inverse transformation when converting from paletted 8-bit to RGB. Otherwise, your final result will look like almost right, but not quite. If your 8-bit greycales are linear, i.e. were created without using the above constants (again, you must know, you cannot tell from the image), you need to copy everything as-is (here, doing the conversion would make it look almost-but-not-quite right).
About the RGB-to-greyscale conversion, you do not need an extra greyscale bitmap just to hold the values that you never need again afterwards. You can read the three color values from the loaded bitmap, calculate Y, and directly build the 32-bit ARGB word, which you then write out to the final bitmap. This saves one entirely useless round-trip to memory which is not necessary.
Something like this:
uint32_t* out = (uint32_t*) output_bitmap_data;
for(int i = 0; i < inputSize; i+= 3)
{
uint8_t Y = calc_greyscale(in[0], in[1], in[2]);
*out++ = (Y<<16) | (Y<<8) | Y;
}
Alternatively, you can also do the from-whatever-to-32 conversion, and then do the to-greyscale conversion in-place there. This, in turn, introduces an extra round-trip to memory, but the code becomes much, much easier overall.

C++ - Convert uint8_t* image data to double** image data

I am working on a C++ function (inside my iOS app) where I have image data in the form uint8_t*.
I obtained the image data using the code using the CVPixelBufferGetBaseAddress() method of the iOS SDK:
uint8_t *bPixels = (uint8_t *)CVPixelBufferGetBaseAddress(imageBuffer);
I have another function (from a third part source) that does some of the image processing functions I would like to use on my image data, but the input for the image data for these functions is double**.
Does anyone have any idea how to go about converting this?
What other information can I provide?
The constructor prototype for the class that use double** look like:
Image(double **iPixels, unsigned int iWidth, unsigned int iHeight);

Your uint8_t *bPixels seems to hold image data as 1-dimensional continuous array of height*width lenght. So to access pixel in the x-th row and y-th column you have to write bPixels[x*width+y].
Image() seems to work on 2-dimensional arrays. To access pixel like above you would have to write iPixels[x][y].
So you need to copy your existing 1-dimensional array to a 2-dimensional:
double **mypixels = new double* [height];
for (int x=0; x<height; x++)
{
mypixels[x] = new double [width];
for (int y=0; y<width; y++)
mypixels[x][y] = bPixels[x*width+y]; // attention here, maybe normalization is necessary
// e.g. mypixels[x][y] = bPixels[x*width+y] / 255.0
}
Because your 1-dimensional array has pixel of type uint8_t and the 2-dimensional one pixel of type double, you must allocate new memory. Otherwise, if both would have same pixel type, the more elegant solution (a simple map) would be:
uint8_t **mypixels = new uint8_t* [height];
for (int x=0; x<height; x++)
mypixels[x] = bPixels+x*width;
Attention: beside the problem of eventually necessary normalization, there is also a problem with the indices-compatibility! My examples assume that the 1-dimensional array is stored row-by-row and that the functions working on 2-dimensional index with [x][y] (that means first-row-then-column). The declaration of Image() however, could lead to the conclusion that it needs its arrays to be indexed with [y][x] maybe.

I'm going to take a giant bunch of guesses here in hopes that this will lead you towards getting at the documentation and answering back. If there's no further documentation, well, here's a starting point.
Guess 1) The Image constructor requires a doubly dimensioned array where each component is an R,G,B,Alpha channel in that order. So iPixels[0] is the red data, iPixels[1] is the green data, etc.
Guess 2) Because it's not integer data, the values range from 0 to 1.
Guess 3) All of this must be pre-allocated.
Guess 4) Image data is row-major
Guess 5) Source data is BRGA
So with that in mind, starting with bPixels
double *redData = new double[width*height];
double *greenData = new double[width*height];
double *blueData = new double[width*height];
double *alphaData = new double[width*height];
double **iPixels = new double*[4];
iPixels[0] = redData;
iPixels[1] = greenData;
iPixels[2] = blueData;
iPixels[3] = alphaData;
for(int y = 0;y < height;y++)
{
for(int x = 0;x < width;x++)
{
int alpha = bPixels[(y*width + x)*4 + 3];
int red = bPixels[(y*width +x)*4 + 2];
int green = bPixels[(y*width + x)*4 + 1];
int blue = bPixels[(y*width + x)*4];
redData[y*width + x] = red/255.0;
greenData[y*width + x] = green/255.0;
blueData[y*width + x] = blue/255.0;
alphaData[y*width + x] = alpha/255.0;
}
}
Image newImage(iPixels,width,height);
some of the things that can go wrong.
Source is not BGRA but RGBA, which will make the colors all wrong.
Not row major or destination is not in slices which will make things look all screwed up and/or seg-fault

array, copy pixels to correct index, algorithm

I have image size is 2x2, so count pixels = 4
one pixel - 4 bytes
so I have an array of 16 bytes - mas[16] - width * height * 4 = 16
I want to make the same image, but the size is more a factor of 2, this means that instead of one will be four pixels
new array will have size of 64 bytes - newMas[16] - width*2 * height*2 * 4
problem, that i can't correct copy pixels to newMas,that with different size image correctly copy pixels
this code copy pixels to mas[16]
size_t width = CGImageGetWidth(imgRef);
size_t height = CGImageGetHeight(imgRef);
const size_t bytesPerRow = width * 4;
const size_t bitmapByteCount = bytesPerRow * height;
size_t mas[bitmapByteCount];
UInt8* data = (UInt8*)CGBitmapContextGetData(bmContext);
for (size_t i = 0; i < bitmapByteCount; i +=4)
{
UInt8 a = data[i];
UInt8 r = data[i + 1];
UInt8 g = data[i + 2];
UInt8 b = data[i + 3];
mas[i] = a;
mas[i+1] = r;
mas[i+2] = g;
mas[i+3] = b;
}

In general, using the built-in image drawing API will be faster and less error-prone than writing your own image-manipulation code. There are at least three potential errors in the code above:
It assumes that there's no padding at the end of rows (iOS seems to pad up to a multiple of 16 bytes); you need to use CGImageGetBytesPerRow().
It assumes a fixed pixel format.
It gets the width/height from a CGImage but the data from a CGBitmapContext.
Assuming you have a UIImage,
CGRect r = {{0,0},img.size};
r.size.width *= 2;
r.size.height *= 2;
UIGraphicsBeginImageContext(r.size);
// This turns off interpolation in order to do pixel-doubling.
CGContextSetInterpolationQuality(UIGraphicsGetCurrentContext(), kCGInterpolationNone);
[img drawRect:r];
UIImage * bigImg = UIGraphicsGetImageFromCurrentImageContext();
UIGraphicsEndImageContext();

Applying Matrix To Image, Seeking Performance Improvements

Edited: Working on Windows platform.
Problem: Less of a problem, more about advise. I'm currently not incredibly versed in low-level program, but I am attempting to optimize the code below in an attempt to increase the performance of my overall code. This application depends on extremely high speed image processing.
Current Performance: On my computer, this currently computes at about 4-6ms for a 512x512 image. I'm trying to cut that in half if possible.
Limitations: Due to this projects massive size, fundamental changes to the application are very difficult to do, so things such as porting to DirectX or other GPU methods isn't much of an option. The project currently works, I'm simply trying to figure out how to make it work faster.
Specific information about my use for this: Images going into this method are always going to be exactly square and some increment of 128. (Most likely 512 x 512) and they will always come out the same size. Other than that, there is not much else to it. The matrix is calculated somewhere else, so this is just the applying of the matrix to my image. The original image and the new image are both being used, so copying the image is necessary.
Here is my current implementation:
void ReprojectRectangle( double *mpProjMatrix, unsigned char *pDstScan0, unsigned char *pSrcScan0,
int NewBitmapDataStride, int SrcBitmapDataStride, int YOffset, double InversedAspect, int RectX, int RectY, int RectW, int RectH)
{
int i, j;
double Xnorm, Ynorm;
double Ynorm_X_ProjMatrix4, Ynorm_X_ProjMatrix5, Ynorm_X_ProjMatrix7;;
double SrcX, SrcY, T;
int SrcXnt, SrcYnt;
int SrcXec, SrcYec, SrcYnvDec;
unsigned char *pNewPtr, *pSrcPtr1, *pSrcPtr2, *pSrcPtr3, *pSrcPtr4;
int RectX2, RectY2;
/* Compensate (or re-center) the Y-coordinate regarding the aspect ratio */
RectY -= YOffset;
/* Compute the second point of the rectangle for the loops */
RectX2 = RectX + RectW;
RectY2 = RectY + RectH;
/* Clamp values (be careful with aspect ratio */
if (RectY < 0) RectY = 0;
if (RectY2 < 0) RectY2 = 0;
if ((double)RectY > (InversedAspect * 512.0)) RectY = (int)(InversedAspect * 512.0);
if ((double)RectY2 > (InversedAspect * 512.0)) RectY2 = (int)(InversedAspect * 512.0);
/* Iterate through each pixel of the scaled re-Proj */
for (i=RectY; i<RectY2; i++)
{
/* Normalize Y-coordinate and take the ratio into account */
Ynorm = InversedAspect - (double)i / 512.0;
/* Pre-compute some matrix coefficients */
Ynorm_X_ProjMatrix4 = Ynorm * mpProjMatrix[4] + mpProjMatrix[12];
Ynorm_X_ProjMatrix5 = Ynorm * mpProjMatrix[5] + mpProjMatrix[13];
Ynorm_X_ProjMatrix7 = Ynorm * mpProjMatrix[7] + mpProjMatrix[15];
for (j=RectX; j<RectX2; j++)
{
/* Get a pointer to the pixel on (i,j) */
pNewPtr = pDstScan0 + ((i+YOffset) * NewBitmapDataStride) + j;
/* Normalize X-coordinates */
Xnorm = (double)j / 512.0;
/* Compute the corresponding coordinates in the source image, before Proj and normalize source coordinates*/
T = (Xnorm * mpProjMatrix[3] + Ynorm_X_ProjMatrix7);
SrcY = (Xnorm * mpProjMatrix[0] + Ynorm_X_ProjMatrix4)/T;
SrcX = (Xnorm * mpProjMatrix[1] + Ynorm_X_ProjMatrix5)/T;
// Compute the integer and decimal values of the coordinates in the sources image
SrcXnt = (int) SrcX;
SrcYnt = (int) SrcY;
SrcXec = 64 - (int) ((SrcX - (double) SrcXnt) * 64);
SrcYec = 64 - (int) ((SrcY - (double) SrcYnt) * 64);
// Get the values of the four pixels up down right left
pSrcPtr1 = pSrcScan0 + (SrcXnt * SrcBitmapDataStride) + SrcYnt;
pSrcPtr2 = pSrcPtr1 + 1;
pSrcPtr3 = pSrcScan0 + ((SrcXnt+1) * SrcBitmapDataStride) + SrcYnt;
pSrcPtr4 = pSrcPtr3 + 1;
SrcYnvDec = (64-SrcYec);
(*pNewPtr) = (unsigned char)(((SrcYec * (*pSrcPtr1) + SrcYnvDec * (*pSrcPtr2)) * SrcXec +
(SrcYec * (*pSrcPtr3) + SrcYnvDec * (*pSrcPtr4)) * (64 - SrcXec)) >> 12);
}
}
}

Two things that could help: multiprocessing and SIMD. With multiprocessing you could break up the output image into tiles and have each processor work on the next available tile. You can use SIMD instructions (like SSE, AVX, AltiVec, etc.) to calculate multiple things at the same time, such as doing the same matrix math to multiple coordinates at the same time. You can even combine the two - use multiple processors running SIMD instructions to do as much work as possible. You didn't mention what platform you're working on.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js