Image compression using PCA - pca

not sure if this is the place to ask this question.
I have a question about PCA with regards to storage space.
If we were to use PCA to compress images,
We would at least have to store
1) The number of Principal components
2) The numpy array where the mean was extracted
Since the original image array size and the mean subtracted array size is the same. amount of storage required will be the same and hence where is the compression then?

First: Using PCA to compress images is possible, yet it is not possible (doesnt make any sense) without loss. The basic idea is to minimize the number of dimensions while maximizing the maintained variance.
Assume you have n images of size x*y.
Then you would compute a single mean image of size x * y, which you would have to store.
Further, you could use the top k eigenvectors/principal components to reduce dimensionalty. Thereby you would reduce each image (based on your choice of how much variance is to be kept) from x * y dimensions to k dimensions.
Finally you would need to store the top k eigenvectors/principal components which is a matrix of size k * (x*y).
To sum up: You could reduce n images of size x * y to
a) n arrays of size k
b) a single mean image of size x * y
c) a matrix of size k * ( x * y) containing the top k principal components
Whether or not this does actually result in a compression depends on your choice of k and on the number of images.
Although theoretically possible, this compression does contain loss.

Related

Compression of an image

I have been calculating the uncompressed and compressed file sizes of an image. This for me has always resulted in the compressed image being smaller than the uncompressed image which I would expect. If an image contains a large number of different colours, then storing the palette takes up a significant amount of space, and more bits are also needed to store each code. However my question is, would it be possible the compression method could potentially result in a larger file than the uncompressed RGB image. What would the size (in pixels) of the smallest square RGB image, containing a total of k different colours, for which this compression method is still useful? So we want to find, for a given value of k, find the smallest integer number n for which an image of size n×n takes up less storage space after compression than the original RGB image.
Let's begin by making a small simplification -- the size of the encoded output depends on the number of pixels (the actual proportion of width vs. height doesn't really matter). Hence, let's generalize the problem to number of pixels N, from which we can always calculate n by taking a square root.
To further simplify the problem, we will also ignore the overhead of any image headers/metadata, such as width, height, size of the palette, etc. In practice, this would generally be some relatively small constant.
Problem Statement
Given that we have
N representing the number of pixels in an image
k representing the number of distinct colours in an image
24 bits per pixel RGB encoding
LRGB representing the length of a RGB image
LP representing the length of a palette image
our goal is to solve the following inequality
in terms of N.
Size of RGB Image
RGB image is just an array of N pixels, each pixel taking up a fixed number of bits given by the RGB encoding. Hence,
Size of Palette Image
Palette image consists of two parts: a palette, and the pixels.
A palette is an array of k colours, each colour taking up a fixed number of bits given by the RGB encoding. Therefore,
In this case, each pixel holds an index to a palette entry, rather than an actual RGB colour. The number of bits required to represent k values is
However, unless we can encode fractional bits (which I consider outside the scope of this question), we need to round this up. Therefore, the number of bits required to encode a palette index is
Since there are N such palette indices, the size of the pixel data is
and the total size of the palette image is
Solving the Inequality
And finally
In Python, we could express this in the following way:
import math
def limit_size(k):
return (k * 24.) / (24. - math.ceil(math.log(k, 2)))
def size_rgb(N):
return (N * 24.)
def size_pal(N, k):
return (N * math.ceil(math.log(k, 2))) + (k * 24.)
In general no, but your question is not precise.
If we compress normal files, they could be larger. E.g. if you compress a random generated sequence of bytes, there is not much to compress, and so you get the header of compression program, which tell which compression method is used, and some versioning. This will enlarge the file, and ev. some escaping. Good compression program will see that compression will not shrink the size, and so they should just not compress, and tell in the header that it is a flat file. Possibly this is done by region of program.
But your question is about images. Compression is done inside the file, and often not all file, but just the image bits. In this case program will see that there is no need to compress, and so they would keep the file uncompressed. But because the image headers are always present, this change only a flag, and so no increase of size.
But this could depends also on file format. You wrote about "palette", but this is not much used nowadays: compression is done finding similar pattern on file. But again: this depends on the image format. If you look in Wikipedia, for particular file format, you may see a table with headers parameters (e.g. bit depth or number of colours (palette), definitions of colours, and methods used to compress).
Then, for palette like image, the answer of Dan Mašek (https://stackoverflow.com/a/58683948/2758823) has some nice mathematical explanation, but one should not forget that compression is much heuristic and test of real examples: real images have patterns.

Histogram Binning of Gradient Vectors

I am working on a project that has a small component requiring the comparison of distributions over image gradients. Assume I have computed the image gradients in the x and y directions using a Sobel filter and have for each pixel a 2-vector. Obviously getting the magnitude and direction is reasonably trivial and is as follows:
However, what is not clear to me is how to bin these two components in to a two dimensional histogram for an arbitrary number of bins.
I had considered something along these lines(written in browser):
//Assuming normalised magnitudes.
//Histogram dimensions are bins * bins.
int getHistIdx(float mag, float dir, int bins) {
const int magInt = reinterpret_cast<int>(mag);
const int dirInt = reinterpret_cast<int>(dir);
const int magMod = reinterpret_cast<int>(static_cast<float>(1.0));
const int dirMod = reinterpret_cast<int>(static_cast<float>(TWO_PI));
const int idxMag = (magInt % magMod) & bins
const int idxDir = (dirInt % dirMod) & bins;
return idxMag * bins + idxDir;
}
However, I suspect that the mod operation will introduce a lot of incorrect overlap, i.e. completely different gradients getting placed in to the same bin.
Any insight in to this problem would be very much appreciated.
I would like to avoid using any off the shelf libraries as I want to keep this project as dependency light as possible. Also I intend to implement this in CUDA.
This is more of a what is an histogram question? rather than one of your tags. Two things:
In a 2D plain two directions equal by modulation of 2pi are in fact the same - so it makes sense to modulate.
I see no practical or logical reason of modulating the norms.
Next, you say you want a "two dimensional histogram", but return a single number. A 2D histogram, and what would make sense in your context, is a 3D plot - the plane is theta/R, 2 indexed, while the 3D axis is the "count".
So first suggestion, return
return Pair<int,int>(idxMag,idxDir);
Then you can make a 2D histogram, or 2 2D histograms.
Regarding the "number of bins"
this is use case dependent. You need to define the number of bins you want (maybe different for theta and R). Maybe just some constant 10 bins? Maybe it should depend on the amount of vectors? In any case, you need a function that receives either the number of vectors, or the total set of vectors, and returns the number of bins for each axis. This could be a constant (10 bins) initially, and you can play with it. Once you decide on the number of bins:
Determine the bins
For a bounded case such as 0<theta<2 pi, this is easy. Divide the interval equally into the number of bins, assuming a flat distribution. Your modulation actually handles this well - if you would have actually modulated by 2*pi, which you didn't. You would still need to determine the bin bounds though.
For R this gets trickier, as this is unbounded. Two options here, but both rely on the same tactic - choose a maximal bin. Either arbitrarily (Say R=10), so any vector longer than that is placed in the "longer than max" bin. The rest is divided equally (for example, though you could choose other distributions). Another option is for the longest vector to determine the edge of the maximal bin.
Getting the index
Once you have the bins, you need to search the magnitude/direction of the current vector in your bins. If bins are pairs representing min/max of bin (and maybe an index), say in a linked list, then it would be something like (for mag for example):
bin = histogram.first;
while ( mag > bin.min ) bin = bin.next;
magIdx = bin.index;
If the bin does not hold the index you can just use a counter and increase it in the while. Also, for the magnitude the final bin should hold "infinity" or some large number as a limit. Note this has nothing to do with modulation, though that would work for your direction - as you have coded. I don't see how this makes sense for the norm.
Bottom line though, you have to think a bit about what you want. In any case all the "objects" here are trivial enough to write yourself, or even use small arrays.
I think you should arrange your bins in a square array, and then bin by vx and vy independently.
If your gradients are reasonably even you just need to scan the data first to accumulate the min and max in x and y, and then split the gradients evenly.
If the gradients are very unevenly distributed, you might want to sort the (eg) vx first and arrange that the boundaries between each bin exactly evenly divides the values.
An intermediate solution might be to obtain the min and max ignoring the (eg) 10% most extreme values.

Is there a way to normalize a Caffe Blob along a certain dimension?

I've got a blob with the shape n * t * h * w (Batch number, features, height, width). Within a Caffe layer I want to do an L1 normalization along the t axis, i.e. for fixed n, h and w the sum of values along t should be equal to 1. Normally this would be no big deal, but since it's within a Caffe layer it should happen very quickly, preferably through the Caffe math functions (based on BLAS). Is there a way to achieve this in an efficient manner?
I unfortunately can't change the order of the shape parameters due to later processing, but I can remove the batch number (have a vector of blobs with just t * h * w) or I could convert the blob to an OpenCV Mat, if it makes things easier.
Edit 1: I'm starting to suspect I might be able to solve my task with the help of caffe_gpu_gemm, where I'd multiply a vector of ones of length t with a blob from one batch of shape t * h * w, which should theoretically give me the sums along the feature axis. I'll update if I figure out the next step.

FFT of large data (16gB) using Matlab

I am trying to compute a fast fourier transform of a large chunk of data imported from a text file which is around 16 gB in size. I was trying to think of a way to compute its fft in matlab, but due to my computer memory (8gB) it is giving me an out of memory error. I tried using memmap, textscan, but was not able to apply to get FFT of the combined data.
Can anyone kindly guide me as to how should I approach to get the fourier transform? I am also trying to get the fourier transform (using definition) using C++ code on a remote server, but it's taking a long time to execute. Can anyone give me a proper insight as to how should I handle this large data?
It depends on the resolution of the FFT that you require. If you only need an FFT of, say, 1024 points, then you can reshape your data to, or sequentially read it as N x 1024 blocks. Once you have it in this format, you can then add the output of each FFT result to a 1024 point complex accumulator.
If you need the same resolution after the FFT, then you need more memory, or a special fft routine that is not included in Matlab (but I'm not sure if it is even mathematically possible to do a partial FFT by buffering small chunks through for full resolution).
It may be better you implement FFT with your own code.
The FFT algorithm has a "butterfly" operation. Hence you can split the whole step into smaller blocks.
The file size is too large for a typical pc to handle. But FFT doesn't need all data at once. It can always start with 2-point (maybe 8-point is better) FFT, and you can build up by cascading the stages. It means you can read only a few points at a time, do some calculation, and save your data to disk. Next time you doing another iteration, you can read the saved data from disk.
Depending on how you build the data structure, you can either store all the data in one single file, and read/save it with pointers (in Matlab it's merely a number); or you can store every single point in one individual file, generating billions of files and distinguishing them by file names.
The idea is you can dump your calculation to disk, instead of memory. Of course it requires such amount of disk space, which is more feasible.
I can show you a piece of pseudo-code. Depending on the data structure of your original data (that 16GB txt file), the implementation will be different, but you can easily operate as you own the file. I will start with 2-point FFT and do with the 8-point sample in this wikipedia picture.
1.Do 2-point FFT on x, generating y, the 3rd column of white circles from left.
read x[0], x[4] from file 'origin'
y[0] = x[0] + x[4]*W(N,0);
y[1] = x[0] - x[4]*W(N,0);
save y[0], y[1] to file 'temp'
remove x[0], x[4], y[0], y[1] from memory
read x[2], x[6] from file 'origin'
y[2] = x[2] + x[6]*W(N,0);
y[3] = x[2] - x[6]*W(N,0);
save y[2], y[3] to file 'temp'
remove x[2], x[6], y[2], y[3] from memory
....
2.Do 2-point FFT on y, generating z, the 5th column of white circles.
3.Do 2-point FFT on z, generating final result, X.
Basically the Cooley–Tukey FFT algorithm is designed to enable you cut up the data and calculate piece by piece, so it's possible to handle large-amount data. I know it's not a regular way but if you can take a look at the Chinese version of that Wikipedia page, you may find a number of pictures that may help you understand how it splits up the points.
I've encountered this same problem. I ended up finding a solution in a paper:
Extending sizes of effective convolution algorithms. It essentially involves loading shorter chunks, multiplying by a phase factor and FFT-ing, then loading the next chunk in the series. This gives a sampled of the total FFT of the full signal. The process is then repeated with a number of times with different phase factors to fill in the remaining points. I will attempt to summarize here (adapted from Table II in the paper):
For a total signal f(j) of length N, decide on a number m or shorter chunks each of length N/m that you can store in memory (if needed, zero-pad the signal such that N is a multiple of m)
For beta = 0, 1, 2, ... ,m - 1 do the following:
Divide the new series into m subintervals of N/m successive points.
For each subinterval, multiply each jth element by exp(i*2*pi*j*beta/N). Here, j is indexed according to the position of the point relative to the first in the whole data stream.
Sum the first elements of each subinterval to produce a single number, sum the second elements, and so forth. This can be done as points are read from file, so there is no need to have the full set of N points in memory.
Fourier transform the resultant series, which contains N/m points.
This will give F(k) for k = ml + beta, for l = 0, ..., N/m-1. Save these values to disk.
Go to 2, and proceed with the next value of beta.

Examples of Matlab to OpenCV conversions

From time to time I have to port some Matlab Code to OpenCV.
Almost always there is a way to do it and an appropriate function in OpenCV. Nevertheless its not always easy to find.
Therefore I would like to start this summary to find and gather some equivalents between Matlab and OpenCV.
I use the Matlab function as heading and append its description from Matlab help. Afterwards a OpenCV example or links to solutions are appreciated.
Repmat
Replicate and tile an array. B = repmat(A,M,N) creates a large matrix B consisting of an M-by-N tiling of copies of A. The size of B is [size(A,1)*M, size(A,2)*N]. The statement repmat(A,N) creates an N-by-N tiling.
B = repeat(A, M, N)
OpenCV Docs
Find
Find indices of nonzero elements. I = find(X) returns the linear indices corresponding to the nonzero entries of the array X. X may be a logical expression. Use IND2SUB(SIZE(X),I) to calculate multiple subscripts from the linear indices I.
Similar to Matlab's find
Conv2
Two dimensional convolution. C = conv2(A, B) performs the 2-D convolution of matrices A and B. If [ma,na] = size(A), [mb,nb] = size(B), and [mc,nc] = size(C), then mc = max([ma+mb-1,ma,mb]) and nc = max([na+nb-1,na,nb]).
Similar to Conv2
Imagesc
Scale data and display as image. imagesc(...) is the same as IMAGE(...) except the data is scaled to use the full colormap.
SO Imagesc
Imfilter
N-D filtering of multidimensional images. B = imfilter(A,H) filters the multidimensional array A with the multidimensional filter H. A can be logical or it can be a nonsparse numeric array of any class and dimension. The result, B, has the same size and class as A.
SO Imfilter
Imregionalmax
Regional maxima. BW = imregionalmax(I) computes the regional maxima of I. imregionalmax returns a binary image, BW, the same size as I, that identifies the locations of the regional maxima in I. In BW, pixels that are set to 1 identify regional maxima; all other pixels are set to 0.
SO Imregionalmax
Ordfilt2
2-D order-statistic filtering. B=ordfilt2(A,ORDER,DOMAIN) replaces each element in A by the ORDER-th element in the sorted set of neighbors specified by the nonzero elements in DOMAIN.
SO Ordfilt2
Roipoly
Select polygonal region of interest. Use roipoly to select a polygonal region of interest within an image. roipoly returns a binary image that you can use as a mask for masked filtering.
SO Roipoly
Gradient
Approximate gradient. [FX,FY] = gradient(F) returns the numerical gradient of the matrix F. FX corresponds to dF/dx, the differences in x (horizontal) direction. FY corresponds to dF/dy, the differences in y (vertical) direction. The spacing between points in each direction is assumed to be one. When F is a vector, DF = gradient(F)is the 1-D gradient.
SO Gradient
Sub2Ind
Linear index from multiple subscripts. sub2ind is used to determine the equivalent single index corresponding to a given set of subscript values.
SO sub2ind
backslash operator or mldivide
solves the system of linear equations A*x = B. The matrices A and B must have the same number of rows.
cv::solve