How to find a Gstreamer bottleneck?

How to find a Gstreamer bottleneck? - gstreamer

I have a glvideomixer sink that shows 16 720p-60Hz videos simultaneously in a 4x4 matrix array. When the source of all 16 videos are from 16 different "h264 main profile" files all runs smoothly, but when I acquire the videos from 4 grabber cards (4 x 4HDMI input ports set to 1280x720-60Hz, same as video files) the output gets stutter.
The pipeline is very simple:
glvideomixer(name=vmix)
ksvideosrc(device-index=0...15)->capsfilter(video/x-raw,format=YV12,framerate=60/1,height=720,width=1280)->vmix.sink_0...15
Note: The ksvideosrc element is only available on Windows platform.
AFAIK the pipeline is GL based, so all the videos streams are implicitly uploaded to an GL context, when the glvideomixer treats them as GL textures. I am right?
But i don't understand why when i use 16 video files all runs smoothly, even when, in theory, the process is more complex because the computer must decode these streams before sending them to the GPU, and when i use the grabber cards, all output is stuttered.
I'm pretty sure that the stream format of the cards are RAW YV12, because i set the capsfilter element to explicitly choose that stream. Here the link to the grabbers: http://www.yuan.com.tw/en/products/capture/capture_sc510n4_hdmi_spec.htm
I think that the bottleneck is there in the PCIe bus, but i not sure because the GPU is an AMD FirePro W7100 run at 16x and the 4 grabber cards are 4x PCIe runs at 4x.
It should be noted that all run smoothly up to 13 video signals from the grabbers. Adding one more the stutter shows up.
So: how to know where is the bottleneck?
Many thanks in advance.
Edit:
The rig is:
MB: Asus x99-deluxe USB 3.1:
http://www.asus.com/Motherboards/X99DELUXEU31/
CPU: Hexacore i7-5930K Haswell 40 PCIe lanes:
http://ark.intel.com/es/products/82931/Intel-Core-i7-5930K-Processor-15M-Cache-up-to-3_70-GHz
RAM: Kingston Hyperx PC4 21300 HX426C15FBK2/8 dual channel setup:
http://www.kingston.com/dataSheets/HX426C15FBK2_8.pdf
GPU: AMd FirePro W7100 8Gb GDDR5 256 bits:
http://www.amd.com/en-us/products/graphics/workstation/firepro-3d/7100
HDD: Kingston SSD Now V300:
http://www.kingston.com/datasheets/sv300s3_en.pdf

Related

Output image data from HDMI port using C/C++/opencv on a windows 10 PC

I am working with a DMD (digital micromirror array) device Provided from TI (DLP3010EVM-LC). It is basically a projector that can be controlled through USB and HDMI. The HDMI is used to input external image data to the projector, thus the DMD. I am planning to control the pixels of the DMD by sending an image of 1-bit data. My goal framerate is about 1000 fps. It seems that this is possible in case of a 1-bit data where the pixel data even though HDMI port has a max of 60 fps. I am looking for a way to send image data using c/c++/opencv from my windows 10 PC HDMI output. I looked a lot on different sites and forums but couldn't find a concrete way or documentations of how to program the HDMI port (in fact, I just need image output with no sound data). Some people on the Rasberry Pi community simply use imshow() from opencv for this, but it doesn't seem to work on a Windows PC.
Any help on this matter is appreciated.

How HD video works on modern hardware(OpenGL)?

Back in the days of DOS, the way we showed graphics in computers was just copying raw image data into memory for every frame.
Because of the bandwidth between the CPU and the GPU, this proves to be very ineffective. In order to send every frame to the screen in modern resolutions, we need something like
1080 * 1200 * 4(color data) * 60(frames per second) = 311 megabytes every second.
So we preloaded the textures and vertices into the GPU memory and we just send the transformations.
So, how HD video playing is solved in modern hardware? Is there a way to compress every frame that is sent to the GPU? Or we just send the raw 311MB/s like in the old days?

Assuming decompression is not being done at least in part on the GPU itself, then yes, you send video by uploading the image each frame to graphics memory.
Your math is off though. 1080p is 1920x1080. 30fps video at 1080p requires ~238MB/sec. And... that's perfectly doable. Even PCIe 1.0 x1 could handle that (though barely), and GPUs tend to use x16 slots, so 16x more bandwidth. And PCIe is at version 4.0 (on most machines), so it's much faster today.

phytec phyBOARD iMX-6 performed poorly when running qt5 opengles application from flash instead of sd card (fps halved)

I'm developing a graphics application(Racing Game) on phytec phyBOARD iMX-6, with Qt 5.9 and
OpenGLESv2. I create OpenGL context through Qt modules. My problem is my game gets 40 fps when running on sd card. And gets 20 fps when running on Flash. Why opengles frame rate is so low on flash? The operating systems in the flash and sd card are identical.
My first thought was that the performance decreased due to the low read / write ability of the flash. But my game only reads data from disk during the boot phase. In the remaining stages, it exchanges data with the disk in a very limited way. Therefore, It isn't very likely that low performance is caused by disk read and write speeds.
Have you ever encountered such a problem where the opengles frame rate is low when application working on flash? Maybe a similar solution can contribute to me.

I managed to solve it with pure luck. I added the line
PREFERRED_VERSION_mesa = "git"
in the Local.conf file. And now I get the same fps on flash(40 fps) and sd card(40 fps).

Utilizing multiple GPU in my machine (Intel + Nvidia) - Copy data between them

My machine has 1 Intel graphic card and 1 Nvidia 1060 card.
I use Nvidia gpu for object detection (Yolo) .
PipeLine
---Stream--->Intel gpu (decode)----> Nvidia Gpu (Yolo)---->Renderer
I want to utilize both of my gpu cards ; I want to use one for decoding frames (Hardware accleration -ffmpeg ) and other for yolo. (Nvidia restricts number of streams that you can decode at one time to 1, but I dont see such restriction with Intel)
Has anyone tried some thing like this ? any pointers on how to do interGPU frame transfer

OpenCV and 4K Videos

I am trying to use OpenCV to read and display a 4K video file. The same program, a very simple one shown in appendix A, works fine when displaying 1080 videos but there is noticeable lag when upgrading to the 4K video.
Obviously there is now 16x more pixels in any operation.
Now I am running generally on a PC with not great specifications, inbuilt graphics, 4Gb RAM & i3 CPU and a HDD (not SSD). I have tested this on a PC with 8GB RAM, i5 & SSD and although 3.XGb of RAM is used it seems mainly a CPU intensive program and is maxing all my cores out at 100% even on the better PC.
My questions are: (to make this post specific)
Is this something that would be helped by using the GPU operations?
Is this a problem that would be solved by upgrading to a PC with a better CPU? Practically this application can only be run on an i7 as I don't imagine we are going to be buying a server CPU...
Is it the drawing to the screen operation or simply reading it from the disk that is causing the slow down?
If anyone has any past experience on using 4K with OpenCV that would also be useful information.
Appendix A
int main()
{
VideoCapture cap(m_selected_video);
if (!cap.isOpened()) // check if we succeeded
{
std:cout << "Video ERROR";
}
while (_continue)
{
Mat window1;
cap >> window1; // get a new frame from camera
imshow("Window1", window1);
if (waitKey(30) >= 0) break;
}
}

The answer to this question is an interesting one, but I think boils down to codecs or encoding of videos.
THe first video I was using was this one (although that might not be the exact download I used) which doesn't seem to play very well in VLC or OpenCV but does play fine in windows media player. I think this is because it is encoded in MPEG AAC Audio.
I then downloaded an elysium 4k trailer which is h264 encoded and seems to work fine in both VLC & OpenCV. so Hooray 4K isn't a problem overall in OpenCV!
SO I thought it might be file sizes. I paid for and downloaded a 7Gb 6 minute 4K video. This plays fine in both OpenCV & VLC with no lag even if drawing it to the screen three times. This is a .mov file and I don't currently have access to the codec (ill update this bit when I do).
So TL:DR: It's not file sizes or container types that causes issues but there does seem to be an issue with certain codecs. This is only a small exploration and there might be differing issues.
addendum: Thanks to the help of cornstalks in the comments who pointed out that WMP might have built in GPU support and to do any testing in VLC which was very helpful

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to find a Gstreamer bottleneck? - gstreamer

Related

Output image data from HDMI port using C/C++/opencv on a windows 10 PC

How HD video works on modern hardware(OpenGL)?

phytec phyBOARD iMX-6 performed poorly when running qt5 opengles application from flash instead of sd card (fps halved)

Utilizing multiple GPU in my machine (Intel + Nvidia) - Copy data between them

OpenCV and 4K Videos

Categories

Resources