Mesa - glCopyPixels is much slower on iris compared to i965 - opengl

I'm working on a 3D application based on Blender 2.79 (which includes the old game engine), running on linux. Some more details about the system:
$ lsb_release -a | grep Description
Description: Ubuntu 22.04.1 LTS
$ uname -r
5.15.0-46-lowlatency
$ lscpu | grep Intel
Model name: Intel(R) Core(TM) i7-8665U CPU # 1.90GHz
The application doesn't have much special about it except that it needs to copy every frame into memory. For this bge.texture.ImageTexture::refresh is used with its buffer parameter (see blender docs).
The problem I've been seeing is that the time spent in the refresh call is highly varied depending on whether I use i965 or iris as the driver. On the newer one, iris, I'm not capable of maintaining 60fps. I use the MESA_LOADER_DRIVER_OVERRIDE={i965,iris} environment variable.
This is not an issue on my current hardware, but I know that i965 does not work on newer hardware, in particular 12th gen intel CPUs.
I've produced some flame graphs when running each of those drivers. Here are the important (in my opinion) bits:
i965 (fast)
iris (slow)
It looks like with i965, after _mesa_ReadPixelsARB, the code jumps into the intel-specific part of mesa, while with iris it uses something called the state-tracker. I suppose this is a module that bridges mesa and Gallium3D, but I'm not confident that this information is helpful.
I also attached a debugger to the blender process to find that the _st_ReadPixels seems to enter the "slow" path, according to the author (see source)
At this point I would greatly appreciate someone more proficient at mesa and opengl to have a look at the information I have and help make sense of it.

Related

Intel Integrated Graphics misidentification (DXGI)

I'm filling a window with a blank swap chain that's being handled by DirectX 12. While fooling a round a bit with Explicit Multi-Adapter, I came across this weird behaviour...
As shown in pretty much all DX12 demo code in existence so far, I loop through all DXGI adapters that I get using IDXGIFactory4::EnumAdapters1() to find the most suitable (or in my case every) adapter at D3D_FEATURE_LEVEL_11_0 or higher. And as shown in the demos, I discard all adapters that have the DXGI_ADAPTER_FLAG_SOFTWARE like this:
if ((adapterDesc.Flags & DXGI_ADAPTER_FLAG_SOFTWARE) != FALSE)
continue; // Check the next adapter.
In my implementation I then dump all compatible adapters into a std::vector to be used later.
If I use a breakpoint to check how everything looks at runtime, I notice my adapter list only contains one adapter after the loop has exited, which is not what I would expect, as I have both an NVIDIA GeForce GT 650M and an Intel HD Graphics 4000.
By breaking during the loop and checking the DXGI_ADAPTER_DESC2 structure for each adapter, I find that the one I get is indeed the GT 650M, so that means my integrated graphics is identifying itself as a software adapter.
This is plausible on its own, but if you look at a picture of an Ivy Bridge die (which is what I have) you see a big area cordoned off as "Processor Graphics", which Intel themselves define like this: "Processor graphics refer to graphics that are physically in the processor package or integrated into the processor silicon." That just screams "hardware adapter" at me.
If I remove the above code block I do indeed get two adapters in my list, but the second one identifies itself as "Microsoft Basic Render Driver" and gives a vendor ID of 0x1414, while Google says that Intel usually returns 0x8086 as its ID. This list doesn't even mention the owner of 0x1414.
And, to make things even more confusing, if I check the Information Center in my Intel HD Graphics Control Panel it says it has a vendor ID of 0x8086!
Before anyone asks: Yes, my drivers should be up-to-date; I updated them as soon as I noticed this. Strangely though, DxDiag gives me an incorrect driver date for the integrated graphics, but does the same (though slightly closer to the truth) for the GT 650M. The discrete GPU driver is of WDDM 2.0, while the integrated graphics driver is of WDDM 1.3, which might be relevant, because I think it should be of 2.0 too. (Might the update have failed?)
The primary reason for the if (adapterDesc.Flags & DXGI_ADAPTER_FLAG_SOFTWARE) filter is to avoid selecting the Microsoft Basic Render Driver. This uses the WARP11 software device which does not support DirectX 12.
WARP11 is supported in all versions of Windows with DirectX 11. WARP12 is currently a developer only device (i.e. "Graphics Tools" optional feature-on-demand is installed).
It's probably a bug if your discrete part is returning true for this flag. It might be a bug in your code, a driver bug, or some strange side-effect of Optimus-style selection. WARP / MBR is really the only thing that is expected to return DXGI_ADAPTER_FLAG_SOFTWARE.
You can also exclude MBR via a if ( ( adapterDesc.VendorId == 0x1414 ) && ( adapterDesc.DeviceId == 0x8c ) ) test for well-known VendorID/DeviceID, but I suggest digging your code to understand why you are incorrectly getting DXGI_ADAPTER_FLAG_SOFTWARE returned for hardware devices.
See Anatomy of Direct3D 11 Create Device

Getting Started with OpenCL with NVIDIA graphics cards and Ubuntu Linux

I am looking to start programming using OpenCL. I currently have a laptop running Ubuntu Linux. (More specifically it's Linux Mint however they are similar in many respects and I will be changing back to Xubuntu shortly, so I am hoping any info will work for both.)
This laptop is a "difficult" laptop because it has both an on-chip Intel graphics processor (side by side with the CPU) and a dedicated NVIDIA Graphics Card. (I believe it is a GTX 670?) I say difficult because it was pretty complicated to install the drivers to allow me to develop using OpenGL... Even now I get confused sometimes when I run my program and it explodes because I didn't run it using 'optirun'.
Anyway, back to the question in hand, I researched the required software, and continually keep being pointed at NVIDIA's site to download their OpenCL drivers / toolkits. However I would prefer to use Khronos OpenCL rather than NVIDIA's Cuda. I don't fully understand what the difference is however*, and online info is either limited or cryptic.
The actual programming / problem vectorization I have already done, I'm just a bit lost at the moment as to what software I should / must install and how to go about doing so.
*Edit: I find the OpenCL syntax more intuitive.

Is there any good way to get a indication if a computer can run a specific program/software?

Is there any good way too get a indication if a computer is capable to run a program/software without any performance problem, using pure JavaScript (Google V8), C++ (Windows, Mac OS & Linux), by requiring as little information as possible from the software creator (like CPU score, GPU score)?
That way can I give my users a good indication whether their computer is good enough to run the software or not, so the user doesn't need to download and install it from the first place if she/he will not be able to run it anyway.
I thinking of something like "score" based indications:
CPU: 230 000 (generic processor score)
GPU: 40 000 (generic GPU score)
+ Network/File I/O read/write requirements
That way can I only calculate those scores on the users computer and then compare them, as long as I'm using the same algorithm, but I have no clue about any such algorithm, whose would be sufficient for real-world software for desktop usage.
I would suggest testing on existence of specific libraries and environment (OS version, video card presence, working sound drivers, DirectX, OpenGL, Gnome, KDE). Assign priorities to these libraries and make comparison using the priorities, e.g. video card presence is more important than KDE presence.
The problem is, even outdated hardware can run most software without issues (just slower), but newest hardware cannot run some software without installing requirements.
For example, I can run Firefox 11 on my Pentium III coppermine (using FreeBSD and X server), but if you install windows XP on the newest hardware with six-core i7 and nVidia GTX 640 it still cannot run DirectX 11 games.
This method requires no assistance from the software creator, but is not 100% accurate.
If you want 90+% accurate information, make the software creator check 5-6 checkboxes before uploading. Example:
My application requires DirectX/OpenGL/3D acceleration
My application requires sound
My application requires Windows Vista or later
My application requires [high bandwith] network connection
then you can test specific applications using information from these checkboxes.
Edit:
I think additional checks could be:
video/audio codecs
pixel/vertex/geometry shader version, GPU physics acceleration (may be crucial for games)
not so much related anymore: processor extensions (SSE2 MMX etc)
third party software such as pdf, flash, etc
system libraries (libpng, libjpeg, svg)
system version (Service Pack number, OS edition (premium professional etc)
window manager (some apps on OSX require X11 for functioning, some apps on Linux work only on KDE, etc)
These are actual requirements I (and many others) have seen when installing different software.
As for old hardware, if the computer satisfies hardware requirements (pixel shader version, processor extensions, etc), then there's a strong reason to believe the software will run on the system (possibly slower, but that's what benchmarks are for if you need them).
For GPUs I do not think getting a score is usable/possible without running some code on the machine to test if the machine is up to spec.
With GPU's this is typically checking what Shader Models it is able to use, and either defaulting to a lower shader model (thus the complexity of the application is of less "quality") or telling them they have no hope of running the code and thus quitting.

OpenGL GLUT on VirtualBox Ubuntu 11.10 segmentation fault

DISCLAIMER:
I see that some suggestions for the exact same question come up, however that (similar) post was migrated to SuperUsers and seems to have been removed. I however would still like to post my question here because I consider it software/programming related enough not to post on SuperUsers (the line is vague sometimes between what is a software and what is a hardware issue).
I am running a very simple OpenGL program in Code::Blocks in VirtualBox with Ubuntu 11.10 installed on a SSD. Whenever I build&run a program I get these errors:
OpenGL Warning: XGetVisualInfo returned 0 visuals for 0x232dbe0
OpenGL Warning: Retry with 0x802 returned 0 visuals
Segmentation fault
From what I have gathered myself so far this is VirtualBox related. I need to set
LIBGL_ALWAYS_INDIRECT=1
In other words, enabling indirect rendering via X.org rather then communicating directly with the hardware. This issue is probably not related to the fact that I have an ATI card as I have a laptop with an ATI card that runs the same program flawlessly.
Still, I don't dare to say that the fact that my GPU is an ATI doesn't play any role at all. Nor am I sure if the drivers are correctly installed (it says under System info -> Graphics -> graphics driver: Chromium.)
Any help on HOW to set LIBGL_ALWAYS_INDIRECT=1 would be greatly appreciated. I simply lack the knowledge of where to put this command or where/how to execute it in the terminal.
Sources:
https://forums.virtualbox.org/viewtopic.php?f=3&t=30964
https://www.virtualbox.org/ticket/6848
EDIT: in the terminal type:
export LIBGL_ALWAYS_INDIRECT = 1
To verfiy that direct rendering is off:
glxinfo | grep direct
However, the problem persists. I still get mentioned OpenGL warnings and the segmentation fault.
I ran into this same problem running the Bullet Physics OpenGL demos on Ubuntu 12.04 inside VirtualBox. Rather than using indirect rendering, I was able to solve the problem by modifying the glut window creation code in my source as described here: https://groups.google.com/forum/?fromgroups=#!topic/comp.graphics.api.opengl/Oecgo2Fc9Zc.
This entailed replacing the original
...
glutCreateWindow(title);
...
with
...
if (!glutGet(GLUT_DISPLAY_MODE_POSSIBLE))
{
exit(1);
}
glutCreateWindow(title);
...
as described in the link. It's not clear to me why this should correct the segfault issue; apparently glutGet has some side effects beyond retrieving state values. It could be a quirk of freeglut's implementation of glut.
If you look at the /etc/environment file, you can see a couple of variables exposed there - this will give you and idea of how to expose that environment variable across the entire system. You could also try putting it in either ~/.profile or ~/.bash_profile depending on your needs.
The real question in my mind is: Did you install the guest additions for Ubuntu? You shouldn't need to install any ATI drivers in your guest as VirtualBox won't expose the actual physical graphics hardware to your VM. You can configure your guest to support 3D acceleration in the virtual machine settings (make sure you turn off the VM first) under the Display section. You will probably want to boost the allocated virtual memory - 64MB or 128MB should be plenty depending on your needs.

cuda app on part of the cards

I've got a Nvidia Tesla s2050; a host with a nvidia quadro card.CentOS 5.5 with CUDA 3.1
When i run cuda app, i wanna use 4 Tesla c-2050, but not including quadro on host in order not to lagging the whole performance while split the job by 5 equally.any way to implement this?
I'm assuming you have four processes and four devices, although your question suggests you have five processes and four devices, which means that manual scheduling may be preferable (with the Tesla devices in "shared" mode).
The easiest is to use nvidia-smi to specify that the Quadro device is "compute prohibited". You would also specify that the Teslas are "compute exclusive" meaning only one context can attach to each of these at any given time.
Run man nvidia-smi for more information.
Yes. Check CUDA Support/Choosing a GPU
Problem
Running your code on a machine with
multiple GPUs may result in your code
executing on an older and slower GPU.
Solution
If you know the device number of the
GPU you want to use, call
cudaSetDevice(N). For a more robust
solutions, include the code shown
below at the beginning of your program
to automatically select the best GPU
on any machine.
Check their website for further explanation.
You may also find this post very interesting.