I'm currently programming a GameBoy Classis Emulator. Here is the GitHub-repo (https://github.com/FelixWeichselgartner/GameBoy-Classic-Emulator).
The CPU instructions seem to be working fine. I compared the instructions with the ones of this gameboy debugger (http://bgb.bircd.org/). For Tetris im executing the same instructions.
My question is about the graphics. I have implemented a function that fetches the tiles from the correct address (depending on which tiles set is used). However I don't know how to initialise the Video Ram (# adress 0x8000). I copied the 32kB Tetris rom in my memory from adress 0x0000 to 0x7FFF. Therefore everything from 0x8000 is not initialised here. Neither in the Debugging tool, nor in my code is something written to the vram (from the cpu opcode instructions).
Therefore I expected I will have to initialise the VRAM. However I could not find any resources online, when something is written to VRAM.
So my question:
Which instance of a gameboy emulator is responsible for copying tiles in the VRAM.
What I have already tried:
Debugging with another emulator -> this showed me that no cpu instructions copy to VRAM.
Looking at various gameboy emulators on Github -> could'nt find anyone initialising the VRAM
I someone was able to help me I would be very thankful.
Greetings
schnauzbartS
On Gameboy Classic there's only one way to initialize VRAM - manually copy data with CPU instructions. It's game's responsibility. You can see this, for example, in bootstrap ROM:
XOR A ; $0003 Zero the memory from $8000-$9FFF (VRAM)
LD HL,$9fff ; $0004
Addr_0007:
LD (HL-),A ; $0007
BIT 7,H ; $0008
JR NZ, Addr_0007 ; $000a
On Gameboy Color there's another way to write to VRAM - DMA. But, again, game has to explicitly trigger it. Gameboy doesn't do anything on its own.
Related
I use C++ (Visual Studio 2015) and OpenCV (ver 3.2.0) to process data sent from Kinect v1. My C++ program has no problem when it starts debugging for the first time. After it stops debugging and re-start debugging, however, it gets very slow.
I am suspecting that the program closes without releasing some memory (i.e., memory leak). I am aware of that I would need to use the delete function to release the memory if I use the new function. But I didn't use the new function in the C++ program (I neither used the malloc() function, which is equivalent to the new function in C programs).
For OpenCV, I use the destroyAllWindows function at the end of the program. For Kinect v1, I also use the NuiShutdown(), Release(), and CloseHandle() functions at the end of the program.
Is there anything else I need do to release the memory (e.g., releasing memory associated with Mat in OpenCV)? Or is something else causing the decrease in processing speed?
I'd appreciate your help. Thanks.
After first run disconnect Kinect then reconnect and try second run.
If all goes well now then the problem is most likely stuck thread. The device access is usually handled by separate threads and especially with USB they can get stuck (in case of error or sync problem between accessing form host and expecting on device side) until you disconnect device (not sure which Kinect driver are you using but JUNGO version which NuiShutdown() infers have this problem). You can also check task manager before disconnection if there are not some stuck processes left after first run.
To remedy this you need to find out what are you doing wrong during access. It could be:
wrong USB port
use the back side not front slots.
invalid USB transfer request
device is always waiting for specific set of commands or stream and waits until it does not receive it so it blocks all other things. So using unsupported commands or reading in wrong times or sizes of packets can cause this.
USB communication is out of sync
PC host can timeout in case you do not have enough CPU power while critical operation is processed (or have opened too many apps on background).
This can be caused also by wrong gfx driver as I suspect you are using rendering ... Intel HD graphics can generate such problems with ease especially on notebooks. Try to disable any rendering in your app or at least limit rendering to OpenGL 1.0 to see if speed is the same in between runs. If this is the case the whole desktop usually flickers or is not repainting parts of apps ... and animations are sometimes sluggish.
Another problem might be a debugger. If without it all is well then debugger is the problem and you can not solve it. Debugging while accessing IO can cause sync and timeout problems especially with USB.
To check for memory leaks you can simply see how much free memory you got before 1st run and compare it to values after 1st,2nd,3th .. runs if the value lowers you got something stuck somewhere. After app close all the memory belonging to app is freed by OS so even if you forget some delete that does not matter unless some thread is still running ...
Some USB drivers based on libUSB I encountered got also problem with Handle leaks. But that behaves differently ... all runs fine until there are no free handles. After that OS is non functional you can not open any window,app, anything ... until any app is closed.
[Edit1] Front USB slots
Front slots are usually connected to motherboard with relatively long cable (usually flat and not very well shielded) so it is more susceptible to noise. Also as it is located usually around HDD and above high frequency parts of the motherboard it also induce it into the USB feed. All this degrades the quality of USB signal causing much much bigger rejection rate hence lowering sync capability and also the overall usable bandwidth.
If you compare that with backside USB ports they have no cables but are connected directly in PCB with short and well shielded paths so the connection quality is much much better.
So if you use device demanding high bandwith or synchronism then front ports are a bad choice.
I'm trying to understand the userland part of the Raspberry Pi graphics driver code from https://github.com/raspberrypi/userland
My understanding so far is:
- a firmware blob runs in the GPU and offers an OpenGL-like interface which, on lower levels, is based on message (byte-array) passing on top of one of multiple 28-bit-word FIFOs called VCHIQ (the other VCHIQ queues are irrelevant for graphics)
- on the CPU part, OpenGL calls are turned into messages to the GPU. Access to the low-level facility (either the message queue or VCHIQ -- I haven't found that part yet in the code) requires a Linux kernel module, but no high-level logic happens in there.
- the GPU part is closed, but that's okay for my purposes. The (ARM) CPU part is, AFAIK, open
My ultimate goal is to get communication with the GPU working on bare metal (without Linux), but with the closed firmware blob intact. As a first goal, I want to understand how an OpenGL call is actually passed to the GPU. Anything beyond that is not part of this question.
However, I'm stuck at finding the actual code for this. The OpenGL calls use RPC_CALL* and in turn RPC_DO, which calls khronos_server_lock_func_table(). However, that function seems to be missing from the code, and to my surprise, I couldn't find anything useful about it on Google.
My questions:
- am I still on the ARM CPU side, or did I move to GPU land without noticing? If the latter is the case, where did I cross that line?
- Assuming I'm still on the CPU side -- where is the code for that function? Is it open at all, or do we actually have closed parts left around on the CPU side here? All sources on the web seem to indicate that the code for the CPU is 100% open.
- at which point does the implementation of the C OpenGL functions actually send a message to the GPU? I'm somewhat expecting a call to the kernel functionality that represents VCHIQ to be happening at some point, probably implemented as a device file.
I don't fully understand how do you intend to access the GPU without using Linux, and I am not that familiar with the technicalities, but some time ago I've been digging into the GPU for my private project so I'll tell you what I know.
The GPU is VideoCore IV and its documentation is available on Broadcom's website.
Also, on the Raspberry Pi Wiki you can see on the picture on the left that VCHIQ is in the kernel driver, so you might look for the implementation details in the kernel's source code.
Maybe this might be of some help too: VideoCore IV Programmer's Manual. About the document:
This is a independent documentation project based on a combination of static analysis and trial and error on real hardware. This work is 100% independent from and not sanctioned by or connected with Broadcom or its agents. No Broadcom documents or materials were used beyond those publicly available.
As for the software itself, The Khronos Group provides OpenGL ES and OpenVG implementation, but it's not open source. You can get the documentation from their website, but I doubt you'll find anything on such low level.
Hope it helps.
I have a simple anisotropic filter c/c++ code that will process an .pgm image which is an text file with greyscale information for each pixel, and after done processing, it will generate an output image with the filter applied.
This program takes up to some seconds in order for it to do about 10 iterations on a x86 CPU running windows.
Me and an academic finishing his master degree on applied computing, we need to run the code under FPGA (Altera DE2-115) to see if there is considerable results of performance gain when running the code directly on the processor (NIOS 2).
We have successfully booted up the S.O uClinux under the FPGA, but there are some errors with device hardware, and by that we can't access SD-Card not even Ethernet, so we can't get the code and image into the FPGA in order to test its performance.
So I am here asking to an alternative way to test our code performance directly into an CPU with a file system so the code can read the image and generate another one.
The alternative can be either with an product that has low cost and easy to use (I was thinking raspberry PI), or either if I could upload the code somewhere that runs automatically for me and give me the reports.
Thanks in advance.
what you're trying to do is benchmarking some software on a multi GHz x86 Processor vs. a soft-core processor running 50MHz? (as much as I can tell from Altera docs)
I can guarantee that it will be even slower on the FPGA! Since it is also running an OS (even embedded Linux) it also has threading overhead and what not. This can not be considered running it "directly" on CPU (whatever you mean by this)
If you really want to leverage the performance of an FPGA you should "convert" your C-Code into a HDL and run it directly in hardware. Accessing the data should be possible. I don't know how it's done with an Altera board but Xilinx has some libraries accessing data from a SD card with FAT.
You can use on board SRAM or DDR2 RAM to run OS and your application.
Hardware design in your FPGA must have memory controller in it. In SOPC or Qsys select external memory as reset vector and compile design.
Then open NioSII build tools for Eclipse.
In Eclipse create new project by selecting NiosII Application and BSP project.
Once the project is created, go to BSP properties and type offset of external memory in the linker tab and generate BSP.
Compile project and Run as Nios II hardware.
This will run you application on through external memory.
You wont be able to see the image but 2-D array representing image in memory can be
printed on console.
I am really puzzled here. I want to create an application that does different events upon different temperatures of my graphics card which is an AMD one.
The reason i want to make such an applications is because, for a GPU i haven't found one, and the second is to ensure i never fry my card by reaching enormous temps.
However i have no idea how people(not connected to amd/intel/nvidia) write applications to monitor temperatures of any kind.
So how does it happen? Some APIs i don't know or something?
After a little bit of googling, i found this:
I think this is really vendor specific, it will probably involve interfacing directly with the motherboard or video driver and knowing which IOCTL represents the code for requesting the temperature. I reverse engineered a motherboard driver once for this purpose. It's not as hard as it sounds, download a manufacturer motherboard/BIOS utility and try to hook the function that gets called when that app needs to display the temperature to the user. Then watch for a call to DeviceIoControl() in Windows, or ioctl() in linux and see what the inputs / outputs are.
This may be your best bet. I found this information here:
http://www.gamedev.net/topic/557599-get-gpucpu-temperature/
Edit:
Also found this:
http://msdn.microsoft.com/en-us/library/aa389762%28v=VS.85%29.aspx
http://msdn.microsoft.com/en-us/library/aa394493%28VS.85%29.aspx
Hope it helps.
You could use one of the existing GPU temperature monitoring programs, such as GPU-Z, configure it for continuous monitoring, and read the log entries.
RivaTuner is another GPU monitoring program, which has a shared memory interface allowing other programs to get the data in real-time, but is nVidia focused. As long as your action isn't "reduce the GPU clock speed" it'll probably work well enough with ATI cards.
I'd like to be able to detect how much graphics memory is available. I've written a C++ project that uses DirectShow.
Some ancient gfx cards can't do video properly and fall back to four colour mode. If I try to allocate more than one video window, the program just crashes on these machines without warning.
This is less than elegant, and I'd like to detect available graphics memory ahead of time, so I can determine if the program has enough gfx mem to run.
A really sneaky way that should work on XP and lower is to read the registry:
For example, I access \HKLM\Hardware\Devicemap\Video and get a GUID:
{3468769C-3D6B-4BB1-85B6-7B5AE7F4E8F8}
Then I access \HKLM\CCS\Control\Video, and read "HardwareInformation.MemorySize" for that device:
HKEY_LOCAL_MACHINE\SYSTEM\ControlSet001\Control\Video{3468769C-3D6B-4BB1-85B6-7B5AE7F4E8F8}
A much better approach (the recommended approach, in fact) is to use WMI:
GetVideoMemoryViaWMI
8mb. IIRC this is the maximum amount according to the AGP standard. All the extra memory on the gfx card was there to buffer the main memory so it didn't have to go across the bus.
I'd be surprised if the standard hadn't been revised.
If you have really old gfx cards to work with you can try looking at the Video Bios Extensions (VBE). That has a method for querying memory.