How can I run a code directly into a processor with a File System? - c++

I have a simple anisotropic filter c/c++ code that will process an .pgm image which is an text file with greyscale information for each pixel, and after done processing, it will generate an output image with the filter applied.
This program takes up to some seconds in order for it to do about 10 iterations on a x86 CPU running windows.
Me and an academic finishing his master degree on applied computing, we need to run the code under FPGA (Altera DE2-115) to see if there is considerable results of performance gain when running the code directly on the processor (NIOS 2).
We have successfully booted up the S.O uClinux under the FPGA, but there are some errors with device hardware, and by that we can't access SD-Card not even Ethernet, so we can't get the code and image into the FPGA in order to test its performance.
So I am here asking to an alternative way to test our code performance directly into an CPU with a file system so the code can read the image and generate another one.
The alternative can be either with an product that has low cost and easy to use (I was thinking raspberry PI), or either if I could upload the code somewhere that runs automatically for me and give me the reports.
Thanks in advance.

what you're trying to do is benchmarking some software on a multi GHz x86 Processor vs. a soft-core processor running 50MHz? (as much as I can tell from Altera docs)
I can guarantee that it will be even slower on the FPGA! Since it is also running an OS (even embedded Linux) it also has threading overhead and what not. This can not be considered running it "directly" on CPU (whatever you mean by this)
If you really want to leverage the performance of an FPGA you should "convert" your C-Code into a HDL and run it directly in hardware. Accessing the data should be possible. I don't know how it's done with an Altera board but Xilinx has some libraries accessing data from a SD card with FAT.

You can use on board SRAM or DDR2 RAM to run OS and your application.
Hardware design in your FPGA must have memory controller in it. In SOPC or Qsys select external memory as reset vector and compile design.
Then open NioSII build tools for Eclipse.
In Eclipse create new project by selecting NiosII Application and BSP project.
Once the project is created, go to BSP properties and type offset of external memory in the linker tab and generate BSP.
Compile project and Run as Nios II hardware.
This will run you application on through external memory.
You wont be able to see the image but 2-D array representing image in memory can be
printed on console.

Related

Inner workings of Raspberry Pi userland graphics driver (not firmware or kernel part)

I'm trying to understand the userland part of the Raspberry Pi graphics driver code from https://github.com/raspberrypi/userland
My understanding so far is:
- a firmware blob runs in the GPU and offers an OpenGL-like interface which, on lower levels, is based on message (byte-array) passing on top of one of multiple 28-bit-word FIFOs called VCHIQ (the other VCHIQ queues are irrelevant for graphics)
- on the CPU part, OpenGL calls are turned into messages to the GPU. Access to the low-level facility (either the message queue or VCHIQ -- I haven't found that part yet in the code) requires a Linux kernel module, but no high-level logic happens in there.
- the GPU part is closed, but that's okay for my purposes. The (ARM) CPU part is, AFAIK, open
My ultimate goal is to get communication with the GPU working on bare metal (without Linux), but with the closed firmware blob intact. As a first goal, I want to understand how an OpenGL call is actually passed to the GPU. Anything beyond that is not part of this question.
However, I'm stuck at finding the actual code for this. The OpenGL calls use RPC_CALL* and in turn RPC_DO, which calls khronos_server_lock_func_table(). However, that function seems to be missing from the code, and to my surprise, I couldn't find anything useful about it on Google.
My questions:
- am I still on the ARM CPU side, or did I move to GPU land without noticing? If the latter is the case, where did I cross that line?
- Assuming I'm still on the CPU side -- where is the code for that function? Is it open at all, or do we actually have closed parts left around on the CPU side here? All sources on the web seem to indicate that the code for the CPU is 100% open.
- at which point does the implementation of the C OpenGL functions actually send a message to the GPU? I'm somewhat expecting a call to the kernel functionality that represents VCHIQ to be happening at some point, probably implemented as a device file.
I don't fully understand how do you intend to access the GPU without using Linux, and I am not that familiar with the technicalities, but some time ago I've been digging into the GPU for my private project so I'll tell you what I know.
The GPU is VideoCore IV and its documentation is available on Broadcom's website.
Also, on the Raspberry Pi Wiki you can see on the picture on the left that VCHIQ is in the kernel driver, so you might look for the implementation details in the kernel's source code.
Maybe this might be of some help too: VideoCore IV Programmer's Manual. About the document:
This is a independent documentation project based on a combination of static analysis and trial and error on real hardware. This work is 100% independent from and not sanctioned by or connected with Broadcom or its agents. No Broadcom documents or materials were used beyond those publicly available.
As for the software itself, The Khronos Group provides OpenGL ES and OpenVG implementation, but it's not open source. You can get the documentation from their website, but I doubt you'll find anything on such low level.
Hope it helps.

Limiting processor count for multi-threaded applications

I am developing a multi threaded application which ran fine on my development system which has 8 cores. When I ran it on a PC with 2 cores I encountered some synchronization issues.
Apart from turning off hyper-threading is there any way of limiting the number of cores an application can use so that I can emulate single and dual core environments for testing & debugging.
My application is written in C++ using Visual Studio 2010.
We always test in virtual machines nowadays since it's so easy to set up specific environments with given limitations.
For example, VMWare easily allows you to limit the number of processors in use, how much memory there is, hard disk sizes, the presence of USB or floppies or printers and all sorts of other wondrous things.
In fact, we have scripts which do all the work at the push of a button, from restoring the VM to a known initial state, then booting it up, installing the code over the network, running a test cycle then moving the results to an analysis machine on the network as well.
It greatly speeds up and simplifies the testing regime.
You want the SetProcessAffinityMask function or the SetThreadAffinityMask function.
The former works on the whole process and the latter on a specific thread.
You can also limit the active cores via the Windows Task Manager. Right click on process name and select "Set Affinity".

Is there any good way to get a indication if a computer can run a specific program/software?

Is there any good way too get a indication if a computer is capable to run a program/software without any performance problem, using pure JavaScript (Google V8), C++ (Windows, Mac OS & Linux), by requiring as little information as possible from the software creator (like CPU score, GPU score)?
That way can I give my users a good indication whether their computer is good enough to run the software or not, so the user doesn't need to download and install it from the first place if she/he will not be able to run it anyway.
I thinking of something like "score" based indications:
CPU: 230 000 (generic processor score)
GPU: 40 000 (generic GPU score)
+ Network/File I/O read/write requirements
That way can I only calculate those scores on the users computer and then compare them, as long as I'm using the same algorithm, but I have no clue about any such algorithm, whose would be sufficient for real-world software for desktop usage.
I would suggest testing on existence of specific libraries and environment (OS version, video card presence, working sound drivers, DirectX, OpenGL, Gnome, KDE). Assign priorities to these libraries and make comparison using the priorities, e.g. video card presence is more important than KDE presence.
The problem is, even outdated hardware can run most software without issues (just slower), but newest hardware cannot run some software without installing requirements.
For example, I can run Firefox 11 on my Pentium III coppermine (using FreeBSD and X server), but if you install windows XP on the newest hardware with six-core i7 and nVidia GTX 640 it still cannot run DirectX 11 games.
This method requires no assistance from the software creator, but is not 100% accurate.
If you want 90+% accurate information, make the software creator check 5-6 checkboxes before uploading. Example:
My application requires DirectX/OpenGL/3D acceleration
My application requires sound
My application requires Windows Vista or later
My application requires [high bandwith] network connection
then you can test specific applications using information from these checkboxes.
Edit:
I think additional checks could be:
video/audio codecs
pixel/vertex/geometry shader version, GPU physics acceleration (may be crucial for games)
not so much related anymore: processor extensions (SSE2 MMX etc)
third party software such as pdf, flash, etc
system libraries (libpng, libjpeg, svg)
system version (Service Pack number, OS edition (premium professional etc)
window manager (some apps on OSX require X11 for functioning, some apps on Linux work only on KDE, etc)
These are actual requirements I (and many others) have seen when installing different software.
As for old hardware, if the computer satisfies hardware requirements (pixel shader version, processor extensions, etc), then there's a strong reason to believe the software will run on the system (possibly slower, but that's what benchmarks are for if you need them).
For GPUs I do not think getting a score is usable/possible without running some code on the machine to test if the machine is up to spec.
With GPU's this is typically checking what Shader Models it is able to use, and either defaulting to a lower shader model (thus the complexity of the application is of less "quality") or telling them they have no hope of running the code and thus quitting.

C++ code for CPU load and CPU temperature

I want to see CPU temperature and CPU load in Windows. I have to write it myself not using software like Core Temp. How can I access this information?
I read a similar question to mine, but there was no useful answer:(.
Recently I have started a similar project. I needed to read the cpu temperature and to control the fan in Linux and Windows. I don't know much about C++ and VS and DDK but I figured how to write a simple kernel driver and a simple program with winring0. In my laptop (and most other) the temperature and the fan is controled by the embedded controller. You have 2 choices, either you can write a kernel driver or you can use a library to access the embedded controller. It's because Windows protect the ec from being accessed with normal user rights. A good (and working) library is winring0 (WinRing0_1_3_1b). A useful program to check the ec and everything else in Windows is the RW tool.
Take a look at Getting CPU temp from MSDN forums, there are a few approaches.
As to the sane way, you can use Win32_TemperatureProbe class, that gets its intel from SMBIOS.

How do you structure unit tests for cross-compiled code?

My new project is targeting an embedded ARM processor. I have a build system that uses a cross-compiler running on an Ubuntu linux box. I like to use unit testing as much as possible, but I'm a little bit confused about how to proceed with this setup.
I can't see how to run unit tests on the ARM device itself (somebody correct me if I'm wrong). I think that my best option is to compile the code on the build machine using its own native compiler for the unit tests. Is this approach fundamentally flawed? Is unit testing on a different platform a waste of time?
I'm planning to use CppUnit on the build machine using the native compiler for the unit tests. Then I'll cross compile the code for the ARM processor and do integration and system testing on the target device itself. How would you structure the source code and the test code to keep this from turning into a tangled mess?
With embedded device it depends on what interfaces (hardware) you have.
For example the motion control cards I deal with uses a command line interface. The IDE they ship uses it as it primary method of interacting with the cards. It works the same way regardless if I am using PCI, IDE, Serial, or Ethernet.
The DLL they ship for programming give access to the command line interface. So I can send a string, and read back the response. So what I do for my unit tests is have a physical card hooked (or in) my development machine. I send it commands after uploading the software, read the response and if they are correct it passes the test.
I also have extra hardware, a black box if you will, that simulates a machine that motion control card is normally hooked up too. It helps with the automated sets but there is a manual phase as I have to set switches to simulate different setups on the machine.
I have achieved a greater degree of automation by taking a digital I/O card and using it outputs to feed into the inputs of the motion control card and the same in reverse.
I found that for most hardware you have to have some type of simulator hardware.
The exception being the rare package that comes with a software simulator.
I know this isn't probably ideal as not every developer can have one of these on their desk. My hardware simulator so I can give it to whoever it working on the motion control software at the time. If it can't be portable then having a dedicated testing or hardware development computer would be in order.
Finally it boils down on the specifics of your hardware and what support the manufacturer gives in terms of software and simulators. To help you more you will need to post more specifics.
In ten-plus years in the embedded industry, I've seen it done quite a few ways. At my current company:
one of our products has enough horsepower (and space) to run tests on the target board. It's somewhat slow, and we can't stick all the python on the box we'd like, but it works well.
one of our products doesn't have the space, so we compile all the libs we can in x86 (anything that isn't hardware-dependent) and run unit tests on desktops. It's not perfect, but far better than nothing.
one of our components is a super-lightweight power-miser on exotic hardware, so virtually no unit tests are possible. Core algorithms (DES, etc.) are tested on x86 as above, but much of the code simply has to be tested as a whole, in situ. This entails lot of code reviews.