I extracted video frames and mfcc from a video. I got (524, 64, 64) video frames and a shape of (80, 525) mfcc. The number of frames the data match but the dimensions are inversed. How can I make align the mfcc to be in the size (525, 80).
And by permuting the dimensions, will it distort the audio information?
Swapping the dimensions of a multidimensional array does not alter the values at all, only their locations.
To swap such that the time-axis is the first in your MFCC, use the .T (for transpose) numpy attribute.
mfcc_timefirst = mfcc.T
Related
I have a stream of raw images that cames from a network grayscale camera that we are developing. In this case, our images are arrays of 8bits pixels (640x480). Since this camera outputs more than 200 frames per second, I need to store these images as a WebM video, as quickly as possible, in order to not lose any frame.
What is the best way of doing that, using libvpx?
The fastest and easiest thing to do would be to provide the gray scale plane directly into libvpx compression function vpx_codec_encode with VPX_IMG_FMT_I420. You'll have to input two 2x2 subsampled color planes with it though - 320x240 in your case - make all the octets of those planes have the value 128.
I have RGB values array with raw size each time. I'm trying to determine which width/height it would be more suitable for it.
The idea is, I'm getting raw files and I want to display file data as BMP image (e.g Hex Workshop got that feature which called Data Visualizer)
Any suggestions?
Regards.
Find the divisors of the pixel array size.
For instance, if your array contains 243 pixels, divisors are 1, 3, 9, 27, 81 and 243. It means that your image is either 1x243, 3x81, 9x27, 27x9, 81x3 or 243x1.
You can only guess which is the good one by analyzing image content, vertical or horizontal features, recurring patterns, common aspect ratio, etc.
I'm trying to filter the same point through multiple frames. Basically, I want to take a single pixel (say at position (0,0)) and run a filter at that position across multiple frames.
I'm getting a frame (type Mat) from the webcam. I want to buffer about 30 frames from the camera, and make vectors that represent the same position for those 30 frames. For example, if the input is 640x480 # 30fps. I want to have 640x480=307,200 vectors that are 30 points long. In MATLAB, this would basically be a matrix of vectors (3D matrix), where each vector is 30 elements long. I want this so that I can apply temporal filters for each pixel.
I think I need to make a 3D Mat (CvMatND) with 30 dimensions. Then I will put each new frame into the a new dimension until my matrix is 640x480x30. Then I can filter the vectors
(0, 0, :)
(0, 1, :)
(0, 2, :)
...
(640, 480, :)
Once I've applied the filter to each vector, I will have 30 frames of video to output.
My question is what is the best way to buffer 30 frames? Once I have the 30 frames, what is the best way to apply a filter (say a low pass filter) to each pixel?
Thanks for your help.
This is what I came up with with Øystein W.'s help
Create a Mat for the new frame and a vector of mats for the buffer:
Mat frame; // grab the newest frame
std::vector <cv::Mat> buffer; // buffer for frames
Since I am getting frames from the webcam (the newest one is in 'frame'), I have to fill up the buffer before moving forward:
if (buffer.size() < 30)
{
buffer.push_back(frame);
continue; // goes back to the beginning of the loop, program can't start until the buffer is full
}
else
{
buffer.erase(buffer.begin()); // this part deletes the first element
buffer.push_back(frame); // this part adds the new frame to the end of the vector
}
This should keep the newest frame at the bottom and the oldest frame at the top.
I'm using a
std::vector <cv::Mat*> images
as a buffer. It's easy to iterate though the vector and you can pop and push in back and front. I have no problems with the real-time processing.
I am fairly new to image processing using Visual C++, I am in search of a way that reads black and white TIFF files and writes the image as an array hex values representing 0 or 1, then get the location information of either 0 (black) or 1 (white).
After a bit of research on Google and an article here https://www.ibm.com/developerworks/linux/library/l-libtiff/#resources I have the following trivial questions, please do point me to relevant articles, there are so many that I couldn’t wrap my heads around them, meanwhile I will keep on searching and reading.
Is it possible to extract pixel location information from TIFF using LIBTIFF?
Again being new to all image formats, I couldn't help to think that an image is made up of a 2D array of hex values , is the bitmap format more appropriate? Thinking in that way makes me wonder if I can just write the location of “white” or “black” mapped from its height onto a 2D array. I want the location of either “black” or “white”, is that possible? I don’t really want an array of 0s and 1s that represents the image, not sure if I even need them.
TIFF Example of a 4 by 2 BW image:
black white
white black
black black
white white
Output to something like ptLocArray[imageHeight][imageWidth] under a function named mappingNonPrint()
2 1
4 4
or under a function named mappingPrint()
1 2
3 3
With LIBTIFF, in what direction/ways does it read TIFF files?
Ideally I would like TIFF file to be read vertically from top to bottom, then shifts to the next column, then start again from top to bottom. But my gut tells me that's a fairy tale, being very new to image processing, I like to know how TIFF is read for example a single strip ones.
Additional info
I think I should add why I want the pixel location. I am trying to make a cylindrical roll printer, using a optical rotary encoder providing a location feedback The height of the document represents the circumference of the roll surface, the width of the document represent the number of revolutions.
The following is a grossly untested logic of my ENC_reg(), pretty much unrelated to the image processing part, but this may help readers to see what I am trying to do with the processed array of a tiff image. i is indexed from 0 to imageHeight, however, each element may be an arbitrary number ranging from 0 to imageHeight, could be 112, 354 etc that corresponds to whatever the location that contains black in the tiff image. j is indexed from 0 to imageWidth, and each element there is also starting 0 to imageWidth. For example, ptLocArray[1][2] means the first location that has a black pixel in column 2, the value stored there could be 231. the 231st pixel counting from the top of column 2.
I realized that the array[imageHeight][] should be instead array[maxnum_blackPixelPerColumn], but I don't know how to count the number of 1s, or 0s per column....because I don't know how to convert the 1s and 0s in the right order as I mentioned earlier..
void ENC_reg()
{
if(inputPort_ENC_reg == true) // true if received an encoder signal
{
ENC_regTemp ++; // increment ENC register temp value by one each time the port registers a signal
if(ENC_regTemp == ptLocArray[i][j])
{
Rollprint(); // print it
i++; // increment by one to the next height location that indicates a "black", meaning print
if (ENC_regTemp == imageHeight)
{
// Check if end of column reached, end of a revolution
j++; // jump to next column of the image (starting next revolution)
i = 0; // reset i index to 0, to the top of the next column
ENC_regTemp = 0; // reset encoder register temp to 0 for the new revolution
};
};
ENC_reg(); // recall itself;
};
}
As I read your example, you want two arrays for each column of the image, one to contain the numbers of rows with a black pixel in that column, the other with indices of rows with a white pixel. Right so far?
This is certainly possible, but would require huge amounts of memory. Even uncompressed, a single pixel of a grayscale image will take one byte of memory. You can even use bit packing and cram 8 pixels into a byte. On the other hand, unless you restrict yourself to images with no more than 255 rows, you'll have multiple bytes to represent each column index. And you'd need extra logic to mark the end of the column.
My point is: try working with the “array of 0s and 1s” as this should be both easier to accomplish, less demanding in terms of memory, and more efficient to run.
I'm working in Quartz/Core-graphics. I'm trying to create a black and white, 1b per pixel graphics context.
I currently have a CGImageRef with a grayscale image (which is really black and white). I want to draw it into a black and white BitmapContext so I can get the bitmap out and compress it with CCITT-group 4. (For some reason Quartz won't let you save in any TIFF format other than LZW).
So, I need the 1bit per pixel data. I figure that drawing into a 1bpp context would do that. However, it won't let me create the context with:
context = CGBitmapContextCreate (data,
pixelsWide,
pixelsHigh,
1,
pixelsWide/8,
CGColorSpaceCreateDeviceGray(),
kCGImageAlphaNone
);
Is there a colorspace smaller than gray?
Even if 1-bit bitmaps were supported, if pixelsWide is not a multiple of 8, then the number of bytes per row is not an integer: for example, if your image is 12 pixels wide, then the number of bytes per row is one and a half. Your division expression will truncate this to one byte per row, which is wrong.
But that's if 1-bit bitmaps were supported, which they aren't.