How to use PCA with data of values with same units - pca

My data consists of 400000 samples with 3200 values of same units (400000x3200). I know, when the data has values of different size, first you have to standardize or normalize the data, so the scale of each value is the same.
But how to calculate PCA of data, which has values of the same units?
Is it necessary to normalize, standardize, center the data before? What about the cov-matrix for values with same units?
I hope somebody can help. Thanks

Related

Extract region from a Curvilinear satellite Dataset

I have satellite swath data from MODIS and need to extract a subset (region) of data to analyze (NOT PLOT). I am trying to find the best way to do this with out loops which can be slow. In the past I have used set.intersect but this does not work on 2D data.
My issue is both Lat and Lon are 2D and I need to find the indices where my conditions are met (lat>=x1)&(lat<=x2) and similar for lon. and then use those 2D indices to slice my main data set (Aerosol Optical Depth)
Latitude Sample
Longitude Sample
Aerosol MetaData
Code so Far
Normally (for 1D lat/lon) I would used Opt_Depth_Land[:,goodlat,goodlon] to extract my data but this does not work for this type of data set.
Any Help would be greatly appreciated.
valid_lat=(lat>=user_lat-radius)&(lat<=user_lat+radius)
valid_lon=(lon>=user_lon-radius)&(lon<=user_lon+radius)
Valid_Coord=np.where((valid_lat==True)&(valid_lon==True))

How to find that image is more or less homogeneous w.r.t color (hue)?

UPDATE:
I have segmented the image into different regions. For each region, I need to know whether it is more or less homogeneous in terms of color.
What could be the possible strategies to do so?
previous:
I want to check the color variance (preferably hue variance) of an image to find out the images made up of homogeneous colors (i.e. the images which have only one or two color).
I understand that one strategy could be to create a hue-histogram for that and then I can found the count of each color but I have several images altogether and I cannot create a hue-histogram of 180 bins for each image because then it would be computationally expensive for whole code.
Is there any inbuilt openCV method OR other simpler method to find out whether the image consist of homogeneous color only OR several colors?
Something, which can calculate the variance of hue-image would also be fine. I could not find something like variance(image);
PS: I am writing the code in C++.
The variance can be computed without an histogram, as the average squared values minus the square of the averaged values. It takes a single pass over the image, with two accumulators. Choose a data type that will not overflow.

3D lookup table to discretize the volume

I have a depth camera that returns measured distance values of the volume in millimeters. It's needed to create a 3D lookup table to store all possible distance values for each pixel in the image. So I am getting an array of the size 640x480x2048. This approach is very memory consuming and if I use integers in C++ it takes about 2.5 GB of RAM. Additionally, I also have some parameters for each item in the volume, so all together it reaches maximum capacity of my 4GB memory.
My question is: Is there any good experience how I can optimally store and manage above described data set?
P.S Please don't consider the option of file storage. It doesn't fit me.
Thanks in advance

DCT based Video Encoding Process

I am having some issues that I am hoping you will be able to clarify. I have self taught myself a video encoding process similar to Mpeg2. The process is as follows:
Split an RGBA image into 4 separate channel data memory blocks. so an array of all R values, a separate array of G values etc.
take the array and grab a block of 8x8 pixel data, to transform it using the Discrete Cosine Transform (DCT).
Quantize this 8x8 block using a pre-calculated quantization matrix.
Zigzag encode the output of the quantization step. So I should get a trail of consecutive numbers.
Run Length Encode (RLE) the output from the zigzag algorithm.
Huffman Code the data after the RLE stage. Using substitution of values from a pre-computed huffman table.
Go back to step 2 and repeat until all the channels data has been encoded
Go back to step 2 and repeat for each channel
First question is do I need to convert the RGBA values to YUV+A (YCbCr+A) values for the process to work or can it continue using RGBA? I ask as the RGBA->YUVA conversion is a heavy workload that I would like to avoid if possible.
Next question. I am wondering should the RLE store runs for just 0's or can that be extended to all the values in the array? See examples below:
440000000111 == [2,4][7,0][3,1] // RLE for all values
or
440000000111 == 44[7,0]111 // RLE for 0's only
The final question is what would a single symbol be in regard to the huffman stage? would a symbol to be replaced be a value like 2 or 4, or would a symbol be the Run-level pair [2,4] for example.
Thanks for taking the time to read and help me out here. I have read many papers and watched many youtube videos, which have aided my understanding of the individual algorithms but not how they all link to together to form the encoding process in code.
(this seems more like JPEG than MPEG-2 - video formats are more about compressing differences between frames, rather than just image compression)
If you work in RGB rather than YUV, you're probably not going to get the same compression ratio and/or quality, but you can do that if you want. Colour-space conversion is hardly a heavy workload compared to the rest of the algorithm.
Typically in this sort of application you RLE the zeros, because that's the element that you get a lot of repetitions of (and hopefully also a good number at the end of each block which can be replaced with a single marker value), whereas other coefficients are not so repetitive but if you expect repetitions of other values, I guess YMMV.
And yes, you can encode the RLE pairs as single symbols in the huffman encoding.
1) Yes you'll want to convert to YUV... to achieve higher compression ratios, you need to take advantage of the human eye's ability to "overlook" significant loss in color. Typically, you'll keep your Y plane the same resolution (presumably the A plane as well), but downsample the U and V planes by 2x2. E.g. if you're doing 640x480, the Y is 640x480 and the U and V planes are 320x240. Also, you might choose different quantization for the U/V planes. The cost for this conversion is small compared to DCT or DFT.
2) You don't have to RLE it, you could just Huffman Code it directly.

CUDA cufftPlan2d plan size question

I'm studying the code behind the convolutionFFT2D example of the Nvidia CUDA sdk, but I don't get the point of this line:
cufftPlan2d(&fftPlan, fftH, fftW/2, CUFFT_C2C);
Apparently this initializes a complex plane for the FFT to be running in, but I don't see the point of dividing the plan width by 2.
Just to be precise: the fftH and fftW are rounded values for imageX+kernelX+1 and imageY+kernelY+1 dimensions (just for speed reasons). I know that in the frequency domain you usually have a positive component and a symmetric negative component of the same frequency.. but this sounds like cutting half of my image data away..
Can someone explain this to me a little better? I've never used a FFT (I just know the theory behind a fourier transformation)
When you perform a real to complex FFT half the frequency domain data is redundant due to symmetry. This is only the case in one axis of a 2D FFT though. You can think of a 2D FFT as two 1D FFT operations, the first operates on all the rows, and for a real valued image this will give you complex row values. In the second stage you apply a 1D FFT to every column, but since the row values are now complex this will be a complex to complex FFT with no redundancy in the output. Hence you only need width / 2 points in the horizontal axis, but you still need height pointe in the vertical axis.