Kinect for Windows v2 depth to color image misalignment

Kinect for Windows v2 depth to color image misalignment - c++

currently I am developing a tool for the Kinect for Windows v2 (similar to the one in XBOX ONE). I tried to follow some examples, and have a working example that shows the camera image, the depth image, and an image that maps the depth to the rgb using opencv. But I see that it duplicates my hand when doing the mapping, and I think it is due to something wrong in the coordinate mapper part.
here is an example of it:
And here is the code snippet that creates the image (rgbd image in the example)
void KinectViewer::create_rgbd(cv::Mat& depth_im, cv::Mat& rgb_im, cv::Mat& rgbd_im){
HRESULT hr = m_pCoordinateMapper->MapDepthFrameToColorSpace(cDepthWidth * cDepthHeight, (UINT16*)depth_im.data, cDepthWidth * cDepthHeight, m_pColorCoordinates);
rgbd_im = cv::Mat::zeros(depth_im.rows, depth_im.cols, CV_8UC3);
double minVal, maxVal;
cv::minMaxLoc(depth_im, &minVal, &maxVal);
for (int i=0; i < cDepthHeight; i++){
for (int j=0; j < cDepthWidth; j++){
if (depth_im.at<UINT16>(i, j) > 0 && depth_im.at<UINT16>(i, j) < maxVal * (max_z / 100) && depth_im.at<UINT16>(i, j) > maxVal * min_z /100){
double a = i * cDepthWidth + j;
ColorSpacePoint colorPoint = m_pColorCoordinates[i*cDepthWidth+j];
int colorX = (int)(floor(colorPoint.X + 0.5));
int colorY = (int)(floor(colorPoint.Y + 0.5));
if ((colorX >= 0) && (colorX < cColorWidth) && (colorY >= 0) && (colorY < cColorHeight))
{
rgbd_im.at<cv::Vec3b>(i, j) = rgb_im.at<cv::Vec3b>(colorY, colorX);
}
}
}
}
}
Does anyone have a clue of how to solve this? How to prevent this duplication?
Thanks in advance
UPDATE:
If I do a simple depth image thresholding I obtain the following image:
This is what more or less I expected to happen, and not having a duplicate hand in the background. Is there a way to prevent this duplicate hand in the background?

I suggest you use the BodyIndexFrame to identify whether a specific value belongs to a player or not. This way, you can reject any RGB pixel that does not belong to a player and keep the rest of them. I do not think that CoordinateMapper is lying.
A few notes:
Include the BodyIndexFrame source to your frame reader
Use MapColorFrameToDepthSpace instead of MapDepthFrameToColorSpace; this way, you'll get the HD image for the foreground
Find the corresponding DepthSpacePoint and depthX, depthY, instead of ColorSpacePoint and colorX, colorY
Here is my approach when a frame arrives (it's in C#):
depthFrame.CopyFrameDataToArray(_depthData);
colorFrame.CopyConvertedFrameDataToArray(_colorData, ColorImageFormat.Bgra);
bodyIndexFrame.CopyFrameDataToArray(_bodyData);
_coordinateMapper.MapColorFrameToDepthSpace(_depthData, _depthPoints);
Array.Clear(_displayPixels, 0, _displayPixels.Length);
for (int colorIndex = 0; colorIndex < _depthPoints.Length; ++colorIndex)
{
DepthSpacePoint depthPoint = _depthPoints[colorIndex];
if (!float.IsNegativeInfinity(depthPoint.X) && !float.IsNegativeInfinity(depthPoint.Y))
{
int depthX = (int)(depthPoint.X + 0.5f);
int depthY = (int)(depthPoint.Y + 0.5f);
if ((depthX >= 0) && (depthX < _depthWidth) && (depthY >= 0) && (depthY < _depthHeight))
{
int depthIndex = (depthY * _depthWidth) + depthX;
byte player = _bodyData[depthIndex];
// Identify whether the point belongs to a player
if (player != 0xff)
{
int sourceIndex = colorIndex * BYTES_PER_PIXEL;
_displayPixels[sourceIndex] = _colorData[sourceIndex++]; // B
_displayPixels[sourceIndex] = _colorData[sourceIndex++]; // G
_displayPixels[sourceIndex] = _colorData[sourceIndex++]; // R
_displayPixels[sourceIndex] = 0xff; // A
}
}
}
}
Here is the initialization of the arrays:
BYTES_PER_PIXEL = (PixelFormats.Bgr32.BitsPerPixel + 7) / 8;
_colorWidth = colorFrame.FrameDescription.Width;
_colorHeight = colorFrame.FrameDescription.Height;
_depthWidth = depthFrame.FrameDescription.Width;
_depthHeight = depthFrame.FrameDescription.Height;
_bodyIndexWidth = bodyIndexFrame.FrameDescription.Width;
_bodyIndexHeight = bodyIndexFrame.FrameDescription.Height;
_depthData = new ushort[_depthWidth * _depthHeight];
_bodyData = new byte[_depthWidth * _depthHeight];
_colorData = new byte[_colorWidth * _colorHeight * BYTES_PER_PIXEL];
_displayPixels = new byte[_colorWidth * _colorHeight * BYTES_PER_PIXEL];
_depthPoints = new DepthSpacePoint[_colorWidth * _colorHeight];
Notice that the _depthPoints array has a 1920x1080 size.
Once again, the most important thing is to use the BodyIndexFrame source.

Finally I get some time to write the long awaited answer.
Lets start with some theory to understand what is really happening and then a possible answer.
We should start by knowing the way to pass from a 3D point cloud which has the depth camera as the coordinate system origin to an image in the image plane of the RGB camera. To do that it is enough to use the camera pinhole model:
In here, u and v are the coordinates in the image plane of the RGB camera. the first matrix in the right side of the equation is the camera matrix, AKA intrinsics of the RGB Camera. The following matrix is the rotation and translation of the extrinsics, or better said, the transformation needed to go from the Depth camera coordinate system to the RGB camera coordinate system. The last part is the 3D point.
Basically, something like this, is what the Kinect SDK does. So, what could go wrong that makes the hand gets duplicated? well, actually more than one point projects to the same pixel....
To put it in other words and in the context of the problem in the question.
The depth image, is a representation of an ordered point cloud, and I am querying the u v values of each of its pixels that in reality can be easily converted to 3D points. The SDK gives you the projection, but it can point to the same pixel (usually, the more distance in the z axis between two neighbor points may give this problem quite easily.
Now, the big question, how can you avoid this.... well, I am not sure using the Kinect SDK, since you do not know the Z value of the points AFTER the extrinsics are applied, so it is not possible to use a technique like the Z buffering.... However, you may assume the Z value will be quite similar and use those from the original pointcloud (at your own risk).
If you were doing it manually, and not with the SDK, you can apply the Extrinsics to the points, and the use the project them into the image plane, marking in another matrix which point is mapped to which pixel and if there is one existing point already mapped, check the z values and compared them and always leave the closest point to the camera. Then, you will have a valid mapping without any problems. This way is kind of a naive way, probably you can get better ones, since the problem is now clear :)
I hope it is clear enough.
P.S.:
I do not have Kinect 2 at the moment so I can'T try to see if there is an update relative to this issue or if it still happening the same thing. I used the first released version (not pre release) of the SDK... So, a lot of changes may had happened... If someone knows if this was solve just leave a comment :)

Related

Parallel Bundle Adjustment (PBA)

I'm trying to perform Bundle Adjustment (BA) on a sequence of stereo images (class Step) taken with the same camera.
Each Step has left & right images (rectified and synchronized), the generated depth map, keypoints+descriptors of the left image & 2 4x4 matrices - 1 for local (image plane) to global (3D world), and its inverse (T_L2G and T_G2L respectively).
The steps are registered with respect to the 1st image.
I'm trying to run BA on the result to refine the transformation and I'm trying to use PBA (https://grail.cs.washington.edu/projects/mcba/)
Code for setting up the cameras:
for (int i = 0; i < steps.size(); i++)
{
Step& step = steps[i];
cv::Mat& T_G2L = step.T_G2L;
cv::Mat R;
cv::Mat t;
T_G2L(cv::Rect(0, 0, 3, 3)).copyTo(R);
T_G2L(cv::Rect(3, 0, 1, 3)).copyTo(t);
CameraT camera;
// Camera Parameters
camera.SetFocalLength((double)m_focalLength); // Same camera, global focal length
camera.SetTranslation((float*)t.data);
camera.SetMatrixRotation((float*)R.data);
if (i == 0)
{
camera.SetConstantCamera();
}
camera_data.push_back(camera);
}
Then, I generate a global keypoint by running on all image pairs and matching
(currently using SURF).
Then, Generating BA points data:
for (size_t i = 0; i < globalKps.size(); i++)
{
cv::Point3d& globalPoint = globalKps[i].AbsolutePoint;
cv::Point3f globalPointF((float)globalPoint.x, (float)globalPoint.y, (float)globalPoint.z);
int num_obs = 0;
std::vector < std::pair<int/*stepID*/, int/*KP_ID*/>>& localKps = globalKps[i].LocalKeypoints;
if (localKps.size() >= 2)
{
Point3D pointData;
pointData.SetPoint((float*)&globalPointF);
// For this point, set all the measurements
for (size_t j = 0; j < localKps.size(); j++)
{
int& stepID = localKps[j].first;
int& kpID = localKps[j].second;
int cameraID = stepsLUT[stepID];
Step& step = steps[cameraID];
cv::Point3d p3d = step.KeypointToLocal(kpID);
Point2D measurement = Point2D(p3d.x, p3d.y);
measurements.push_back(measurement);
camidx.push_back(cameraID);
ptidx.push_back((int)point_data.size());
}
point_data.push_back(pointData);
}
}
Then, Running BA:
ParallelBA pba(ParallelBA::PBA_CPU_FLOAT);
pba.SetFixedIntrinsics(true); // Same camera with known intrinsics
pba.SetCameraData(camera_data.size(), &camera_data[0]); //set camera parameters
pba.SetPointData(point_data.size(), &point_data[0]); //set 3D point data
pba.SetProjection(measurements.size(), &measurements[0], &ptidx[0], &camidx[0]);//set the projections
pba.SetNextBundleMode(ParallelBA::BUNDLE_ONLY_MOTION);
pba.RunBundleAdjustment(); //run bundle adjustment, and camera_data/point_data will be
Then, where I'm facing the problems, extracting the data back from PBA:
for (int i = 1/*First camera is stationary*/; i < camera_data.size(); i++)
{
Step& step = steps[i];
CameraT& camera = camera_data[i];
int type = CV_32F;
cv::Mat t(3, 1, type);
cv::Mat R(3, 3, type);
cv::Mat T_L2G = cv::Mat::eye(4, 4, type);
cv::Mat T_G2L = cv::Mat::eye(4, 4, type);
camera.GetTranslation((float*)t.data);
camera.GetMatrixRotation((float*)R.data);
t.copyTo(T_G2L(TranslationRect));
R.copyTo(T_G2L(RotationRect));
cv::invert(T_G2L, T_L2G);
step.SetTransformation(T_L2G); // Step expects local 2 global transformation
}
Everything runs the way I expect it to. PBA reports relatively small initial error (currently testing with a small amount of pair-wise registered images, so the error shouldn't be too large), and after the run it's reporting a smaller one. (Converges quickly, usually less the 3 iterations)
However, when I'm dumping the keypoints using the newly found transformations, the clouds seems to have moved further apart from each other.
(I've also tried switching between the T_G2L & T_L2G to "bring them closer". Doesn't work).
I'm wondering if there's something I'm missing using it.

the clouds seems to have moved further apart from each other
This appears not to be a PBA specific problem, but a bundle adjustment general problem.
When performing a bundle adjustment, you need to constrain the cloud, at least 7 constraints for 7 dof. If not, your cloud will drift in 3 axes, in 3 rotations and in scale.
In local BA border points are set fixed. In full BA usually there are designated point like the origin and an extra pair which fixes the scale and orientation.

Find rectangular object quality with perspective

I get image from a camera (calibrated and without lens distortions) and I need to detect a rectangular object. Markers are a good example. For markers I check corner count, min size, board contrast and convexity. I had an idea on how to improve this in cases where there is large amount of false rectangles.
Here is an example image:
Normally all of these are valid, because without knowing anything about camera we cannot determine if perspective allows these kinds of shapes. I know the size (or at least the ratio) of the rectangle in real-life. So I had an idea that I should be able to disregard many of these shapes just by reprojecting them and checking for error.
Like if I use solvePnPRansac it would not be able to converge if the shape is not possible. If it doesn't converge I just disregard it. Sadly, none of the OpenCV solve functions allow checking me for error or convergence. I actually need some ratio or quality, because it is possible that some of the rectangles overlap. For example my object finder identifies these rectangles:
One of the three is actually correct, or at least "the best". But I need some way to know which one it is. I cannot use things like line lengths because of the camera perspective. So I just thought I could solve and see which has the smallest error.
There are no lens distortions in the image, but even if there were solvePnP usually allows passing D to it as well.
Is this even possible or am I missing something?
I guess I could try hacking around solvePnPRansac just to return convergence, but maybe there is a simpler way?

I figured I can do something like what is done during calibration with a grid. I can calculate the reprojection error. So first I solve to get the transformation matrix. Then I transform the points in 3D using the transformation matrix and afterwards use projectPoints to project them back in 2D. Then I check distance between original 2D points and the projected 2D points. This can then be used for quality. Objects that are not possible often have 100 pixels or more reprojection error in my images, but possible objects have less than 20px. So I just did a 25 pixel cutoff and it seems to work fine.
Note that more transformations are possible than I though. In my original image maybe two are not possible with my current camera, but it still did reject a lot of fakes.
If nobody else has some ideas I will accept this as answer.
Here is some code for the method I use:
//This is the object in 3D
double width = 50.0; //Object is 50mm wide
double height = 30.0; //Object is 30mm tall
cv::Mat object_points(4,3,CV_64FC1);
object_points.at<double>(0,0)=0;
object_points.at<double>(0,1)=0;
object_points.at<double>(0,2)=0;
object_points.at<double>(1,0)=width;
object_points.at<double>(1,1)=0;
object_points.at<double>(1,2)=0;
object_points.at<double>(2,0)=width;
object_points.at<double>(2,1)=height;
object_points.at<double>(2,2)=0;
object_points.at<double>(3,0)=0;
object_points.at<double>(3,1)=height;
object_points.at<double>(3,2)=0;
//Check all rectangles for error
cv::Mat image_points(4,2,CV_64FC1);
for (size_t i = 0; i < rectangles_to_test.size(); i++) {
// Get rectangle points
for (size_t c = 0; c < 4; ++c) {
image_points.at<double>(c,0) = (rectangles_to_test[i].points[c].x);
image_points.at<double>(c,1) = (rectangles_to_test[i].points[c].y);
}
// Calculate transformation matrix
cv::Mat rvec, tvec;
cv::solvePnP(object_points, image_points, M1, D1, rvec, tvec);
cv::Mat rotation;
Matrix4<double> transform;
transform.init_identity();
cv::Rodrigues(rvec, rotation);
for(size_t row = 0; row < 3; ++row) {
for(size_t col = 0; col < 3; ++col) {
transform.set(row, col, rotation.at<double>(row, col));
}
transform.set(row, 3, tvec.at<double>(row, 0));
}
// Calculate projection
std::vector<cv::Point3f> p3(4);
std::vector<cv::Point2f> p2;
Vector4<double> p = transform * Vector4<double>(0, 0, 0, 1);
p3[0] = cv::Point3f((float)p.x, (float)p.y, (float)p.z);
p = transform * Vector4<double>(width, 0, 0, 1);
p3[1] = cv::Point3f((float)p.x, (float)p.y, (float)p.z);
p = transform * Vector4<double>(width, height, 0, 1);
p3[2] = cv::Point3f((float)p.x, (float)p.y, (float)p.z);
p = transform * Vector4<double>(0, height, 0, 1);
p3[3] = cv::Point3f((float)p.x, (float)p.y, (float)p.z);
cv::projectPoints(p3, cv::Mat::zeros(1, 3, CV_64FC1), cv::Mat::zeros(1, 3, CV_64FC1), M1, D1, p2);
// Calculate reprojection error
rectangles_to_test[i].reprojection_error = 0.0;
for (size_t c = 0; c < 4; ++c) {
double dx = p2[c].x - rectangles_to_test[i].points[c].x;
double dy = p2[c].y - rectangles_to_test[i].points[c].y;
rectangles_to_test[i].reprojection_error += std::sqrt(dx*dx + dy*dy);
}
if (rectangles_to_test[i].reprojection_error > reprojection_error_threshold) {
//rectangle is no good
}
}

OpenCV 3.1 Stitch images in order they were taken

I am building an Android app to create panoramas. The user captures a set of images and those images
are sent to my native stitch function that was based on https://github.com/opencv/opencv/blob/master/samples/cpp/stitching_detailed.cpp.
Since the images are in order, I would like to match each image only to the next image in the vector.
I found an Intel article that was doing just that with following code:
vector<MatchesInfo> pairwise_matches;
BestOf2NearestMatcher matcher(try_gpu, match_conf);
Mat matchMask(features.size(),features.size(),CV_8U,Scalar(0));
for (int i = 0; i < num_images -1; ++i)
{
matchMask.at<char>(i,i+1) =1;
}
matcher(features, pairwise_matches,matchMask);
matcher.collectGarbage();
Problem is, this wont compile. Im guessing its because im using OpenCV 3.1.
Then I found somewhere that this code would do the same:
int range_width = 2;
BestOf2NearestRangeMatcher matcher(range_width, try_cuda, match_conf);
matcher(features, pairwise_matches);
matcher.collectGarbage();
And for most of my samples this works fine. However sometimes, especially when im stitching
a large set of images (around 15), some objects appear on top of eachother and in places they shouldnt.
I've also noticed that the "beginning" (left side) of the end result is not the first image in the vector either
which is strange.
I am using "orb" as features_type and "ray" as ba_cost_func. Seems like I cant use SURF on OpenCV 3.1.
The rest of my initial parameters look like this:
bool try_cuda = false;
double compose_megapix = -1; //keeps resolution for final panorama
float match_conf = 0.3f; //0.3 default for orb
string ba_refine_mask = "xxxxx";
bool do_wave_correct = true;
WaveCorrectKind wave_correct = detail::WAVE_CORRECT_HORIZ;
int blend_type = Blender::MULTI_BAND;
float blend_strength = 5;
double work_megapix = 0.6;
double seam_megapix = 0.08;
float conf_thresh = 0.5f;
int expos_comp_type = ExposureCompensator::GAIN_BLOCKS;
string seam_find_type = "dp_colorgrad";
string warp_type = "spherical";
So could anyone enlighten me as to why this is not working and how I should match my features? Any help or direction would be much appreciated!
TL;DR : I want to stitch images in the order they were taken, but above codes are not working for me, how can I do that?

So I found out that the issue here is not with the order the images are stitched but rather the rotation that is estimated for the camera parameters in the Homography Based Estimator and the Bundle Ray Adjuster.
Those rotation angles are estimated considering a self rotating camera and my use case envolves an user rotating the camera (which means that will be some translation too.
Because of that (i guess) horizontal angles (around Y axis) are highly overestimated which means that the algorithm considers the set of images cover >= 360 degrees which results in some overlapped areas that shouldnt be overlapped.
Still havent found a solution for that problem though.

matcher() takes UMat as mask instead of Mat object, so try the following code:
vector<MatchesInfo> pairwise_matches;
BestOf2NearestMatcher matcher(try_gpu, match_conf);
Mat matchMask(features.size(),features.size(),CV_8U,Scalar(0));
for (int i = 0; i < num_images -1; ++i)
{
matchMask.at<char>(i,i+1) =1;
}
UMat umask = matchMask.getUMat(ACCESS_READ);
matcher(features, pairwise_matches, umask);
matcher.collectGarbage();

Isometric Collision - 'Diamond' shape detection

My project uses an isometric perspective for the time being I am showing the co-ordinates in grid-format above them for debugging. However, when it comes to collision/grid-locking of the player, I have an issue.
Due to the nature of sprite drawing, my maths is creating some issues with the 'triangular' corner empty areas of the textures. I think that the issue is something like below (blue is what I think is the way my tiles are being detected, whereas the red is how they ideally should be detected for accurate roaming movement on the tiles:
As you can see, the boolean that checks the tile I am stood on (which takes the pixel central to the player's feet, the player will later be a car and take a pixel based on the direction of movement) is returning false and denying movement in several scenarios, as well as letting the player move in some places that shouldn't be allowed.
I think that it's because the cutoff areas of each texture are (I think) being considered part of the grid area, so when the player is in one of these corner areas it is not truly checking the correct tile, and so returning the wrong results.
The code I'm using for creating the grid is this:
int VisualComponent::TileConversion(Tile* tileToConvert, bool xOrY)
{
int X = (tileToConvert->x - tileToConvert->y) * 64; //change 64 to TILE_WIDTH_HALF
int Y = (tileToConvert->x + tileToConvert->y) * 25;
/*int X = (tileToConvert->x * 128 / 2) + (tileToConvert->y * 128 / 2) + 100;
int Y = (tileToConvert->y * 50 / 2) - (tileToConvert->x * 50 / 2) + 100;*/
if (xOrY)
{
return X;
}
else
{
return Y;
}
}
and the code for checking the player's movement is:
bool Clsentity::CheckMovementTile(int xpos, int ypos, ClsMapData* mapData) //check if the movement will end on a legitimate road tile UNOPTIMISED AS RUNS EVERY FRAME FOR EVERY TILE
{
int x = xpos + 7; //get the center bottom pixel as this is more suitable than the first on an iso grid (more realistic 'foot' placement)
int y = ypos + 45;
int mapX = (x / 64 + y / 25) / 2; //64 is TILE-WIDTH HALF and 25 is TILE HEIGHT
int mapY = (y / 25 - (x / 64)) / 2;
for (int i = 0; i < mapData->tilesList.size(); i++) //for each tile of the map
{
if (mapData->tilesList[i]->x == mapX && mapData->tilesList[i]->y == mapY) //if there is an existing tile that will be entered
{
if (mapData->tilesList[i]->movementTile)
{
HAPI->DebugText(std::to_string(mapX) + " is the x and the y is " + std::to_string(mapY));
return true;
}
}
}
return false;
}
I'm a little stuck on progression until having this fixed in the game loop aspect of things. If anyone thinks they either know the issue from this or might be able to help it'd be great and I would appreciate it. For reference also, my tile textures are 128x64 pixels and the math behind drawing them to screen treats them as 128x50 (to cleanly link together).

Rather than writing specific routines for rendering and click mapping, seriously consider thinking of these as two views on the data, which can be transformed in terms of matrix transformations of a coordinate space. You can have two coordinate spaces - one is a nice rectangular grid that you use for positioning and logic. The other is the isometric view that you use for display and input.
If you're not familiar with linear algebra, it'll take a little bit to wrap your head around it, but once you do, it makes everything trivial.
So, how does that work? Your isometric view is merely a rotation of a bog standard grid view, right? Well, close. Isometric view also changes the dimensions if you're starting with a square grid. Anyhow: can we just do a simple coordinate transformation?
Logical coordinate system -> display system (e.g. for rendering)
Texture point => Rotate 45 degrees => Scale by sqrt(2) because a 45 degree rotation changes the dimension of the block by sqrt(1 * 1 + 1 * 1)
Display system -> logical coordinate system (e.g. for mapping clicks into logical space)
Click point => descale by sqrt(2) to unsquish => unrotate by 45 degrees
Why?
If you can do coordinate transformations, then you'd be dealing with a pretty bog-standard rectangular grid for everything else you write, which will make your any other logic MUCH simpler. Your calculations there won't involve computing angles or slopes. E.g. now your "can I move 'down'" logic is much simpler.
Let's say you have 64 x 64 tiles, for simplicity. Now transforming a screen space click to a logical tile is simply:
(int, int) whichTile(clickX, clickY) {
logicalX, logicalY = transform(clickX, clickY)
return (logicalX / 64, logicalY / 64)
}
You can do checks like see if x0,y0 and x1,y1 are on the same tile, in the logical space by someting as simple as:
bool isSameTile(x0, y0, x1, y1) {
return floor(x0/64) == floor(x1/64) && floor(y0/64) == floor(y1/64)
}
Everything gets much simpler once you define the transforms and work in the logical space.
http://en.wikipedia.org/wiki/Rotation_matrix
http://en.wikipedia.org/wiki/Scaling_%28geometry%29#Matrix_representation
http://www.alcove-games.com/advanced-tutorials/isometric-tile-picking/
If you don't want to deal with some matrix library, you can do the equivalent math pretty straightforwardly, but if you separate concerns of logic management from display / input through these transformations, I suspect you'll have a much easier time of it.

C++/SDL: Fading out a surface already having per-pixel alpha information

Suppose we have a 32-bit PNG file of some ghostly/incorporeal character, which is drawn in a semi-transparent fashion. It is not equally transparent in every place, so we need the per-pixel alpha information when loading it to a surface.
For fading in/out, setting the alpha value of an entire surface is a good way; but not in this case, as the surface already has the per-pixel information and SDL doesn't combine the two.
What would be an efficient workaround (instead of asking the artist to provide some awesome fade in/out animation for the character)?

I think the easiest way for you to achieve the result you want is to start by loading the source surface containing your character sprites, then, for every instance of your ghost create a working copy of the surface. What you'll want to do is every time the alpha value of an instance change, SDL_BlitSurface (doc) your source into your working copy and then apply your transparency (which you should probably keep as a float between 0 and 1) and then apply your transparency on every pixel's alpha channel.
In the case of a 32 bit surface, assuming that you initially loaded source and allocated working SDL_Surfaces you can probably do something along the lines of:
SDL_BlitSurface(source, NULL, working, NULL);
if(SDL_MUSTLOCK(working))
{
if(SDL_LockSurface(working) < 0)
{
return -1;
}
}
Uint8 * pixels = (Uint8 *)working->pixels;
pitch_padding = (working->pitch - (4 * working->w));
pixels += 3; // Big Endian will have an offset of 0, otherwise it's 3 (R, G and B)
for(unsigned int row = 0; row < working->h; ++row)
{
for(unsigned int col = 0; col < working->w; ++col)
{
*pixels = (Uint8)(*pixels * character_transparency); // Could be optimized but probably not worth it
pixels += 4;
}
pixels += pitch_padding;
}
if(SDL_MUSTLOCK(working))
{
SDL_UnlockSurface(working);
}
This code was inspired from SDL_gfx (here), but if you're doing only that, I wouldn't bother linking against a library just for that.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js