Direct SPU to SPU DMA requests on the Cell Processor? - cell

Normal DMA requests on the Cell happen between the SPUs and the PPU. However, I have read that it is possible to set up DMA directly between SPUs. Anyone have any idea how this is accomplished?

Have a look at spe_get_ls(). This will help you to setup a list of effective addresses that you can use to transfer data between local stores. You may need some management to map spe identifiers to physical SPUs.

The trick is essentially what Chris said. The local store of one SPE is memory-mapped into the PPE's memory space. And then you just perform a regular DMA transfer from the other SPE to that address on the PPE.
I'm sorry I don't have the exact code for this. It's been a year or so since I had to do any of this. :)

Related

Is there a way to code data directly to the hard drive (similar to how one can do with RAM)?

My question concerns C/C++. It is possible to manipulate the data on the RAM with pretty great flexibility. You can also give the GPU direct commands using OpenGL, allowing one to manipulate VRAM as well.
My curiosity is whether it is possible to do this to the hard drive (even though this would likely be a horrible idea with many, many possibilities of corrupting existing data). The logic of my question comes from an assumption that the hard drive is similar to RAM and VRAM (bytes of data), but just accesses data slower.
I'm not asking about how to perform file IO, but instead how to directly modify bytes of memory on the hard drive (maybe via some sort of "hard-drive pointer").
If my assumption is totally off, a detailed correction about how the hard drive's data storage is different from RAM or VRAM would be very helpful. Thank you!
Modern operating systems in combination with modern CPUs offer the ability to memory-map disk clusters to memory pages.
The memory pages are initially marked as invalid, and as soon as you try to access them an invalid page "trap" or "interrupt" occurs, which is handled by the operating system, which loads the corresponding cluster into that memory page.
If you write to that page there is either a hardware-supported "dirty" bit, or another interrupt mechanism: the memory page is initially marked as read-only, so the first time you try to write to it there is another interrupt, which simply marks the page as dirty and turns it read-write. Then, you know that the page needs to be flushed to disk at a convenient time.
Note that reading and writing is usually done via Direct Memory Access (DMA) so the CPU is free to do other things while the pages are being transferred.
So, yes, you can do it, either with the help of the operating system, or by writing all that very complex code yourself.
Not for you. Being able to write directly to the hard drive would give you infinite potential to mess up things beyond all recognition. (The technical term is FUBAR, and the F doesn't stand for Mess).
And if you write hard disk drivers, I sincerely hope you are not trying to ask for help here.

How to find most frequently used areas of memory?

I'm looking to profile a large C++ application and determine which pieces of data (or memory regions) are fetched the most. Basically, I want to be able to do something like the processor's MFU cache algorithm for determining what to store in L2/L3 caches. There is surprisingly little to no information online on anybody that has tried to accomplish this.
Edit: Changed MRU to MFU
Edit 2: To clarify, I need the addresses, or the data structures that are pointed to at the addresses.
You can use Pin tool to log all memory accesses and calculate cache hit/miss.
valgrind can do this - it will need a plugin , dont know if there is already one.
EDIT: it's called cachegrind

Is memory-mapped memory possible?

I know that is possible to use memory-mapped files i.e. real files on disk that are transparently mapped to memory. As far as I understand (I haven't used these yet) the mapping takes place immediately, the file is partly read on the first memory access while the OS starts "caching" the whole file in the background.
Now: Is it possible to somewhat abuse this concept and memory-map another block of memory? Assuming the OS provides such indirection one could create a kind of compressed_malloc() that returns a mapping from memory to memory. The memory returned to the caller is simple the memory-mapped range that is transparently compressed in memory and also eventually kept in memory. Thus, for large buffers it could be possible that only part of it get decompressed on-the-fly (on access) while the remaining blocks are kept compressed.
Is that concept technically possible at the moment or - if already realized (in software) - what are the things to look at?
Update 1: I am more or less looking for something that is technically achievable without modifying the OS kernel itself or which requires a virtualization platform.
Update 2: I am hoping for something which allows me to implement the compression and related logic in my own user-space code. I would just use the facilities of the operating system to create the memory-mapping.
Very much so. The VM (Virtual Memory) system is designed to handle different kinds of objects that can be mapped. There is in fact a filesystem call cramfs that does something similar in the sense that it keeps compressed data in storage, but enables transparent, uncompressed access.
You would not be modifying the kernel per se, but you will have to work in the kernel space, implementing VM handlers for this new kind of a memory mapped object.
This is possible, eg.
http://pubs.vmware.com/vsphere-4-esx-vcenter/index.jsp?topic=/com.vmware.vsphere.resourcemanagement.doc_41/managing_memory_resources/c_memory_compression.html
It is not correctly implemented in kernel space in Linux, but something like this could be implemented in user space.

how to keep c++ variables in RAM securely?

I'm working on a C++ application which is keeping some user secret keys in the RAM. This secret keys are highly sensitive & I must minimize risk of any kind of attack against them.
I'm using a character array to store these keys, I've read some contents about storing variables in CPU registers or even CPU cache (i.e using C++ register keyword), but seems there is not a guaranteed way to force application to store some of it's variables outside of RAM (I mean in CPU registers or cache).
Can anybody suggest a good way to do this or suggest any other solution to keep these keys securely in the RAM (I'm seeking for an OS-independent solution)?
Your intentions may be noble, but they are also misguided. The short answer is that there's really no way to do what you want on a general purpose system (i.e. commodity processors/motherboard and general-purpose O/S). Even if you could, somehow, force things to be stored on the CPU only, it still would not really help. It would just be a small nuisance.
More generally to the issue of protecting memory, there are O/S specific solutions to indicate that blocks memory should not be written out to the pagefile such as the VirtualLock function on Windows. Those are worth using if you are doing crypto and holding sensitive data in that memory.
One last thing: I will point out that it worries me is that you have a fundamental misunderstanding of the register keyword and its security implications; remember it's a hint and it won't - indeed, it cannot - force anything to actually be stored in a register or anywhere else.
Now, that, by itself, isn't a big deal, but it is a concern here because it indicates that you do not really have a good grasp on security engineering or risk analysis, which is a big problem if you are designing or implementing a real-world cryptographic solution. Frankly, your posts suggests (to me, at least) that you aren't quite ready to architect or implement such a system.
You can't eliminate the risk, but you can mitigate it.
Create a single area of static memory that will be the only place that you ever store cleartext keys. And create a single buffer of random data that you will use to xor any keys that are not stored in this one static buffer.
Whenever you read a key into memory, from a keyfile or something, you only read it directly into this one static buffer, xor with your random data and copy it out wherever you need it, and immediately clear the buffer with zeroes.
You can compare any two key by just comparing their masked versions. You can even compare hashes of masked keys.
If you need to operate on the cleartext key - e.g. to generate a hash or validate they key somehow load the masked xor'ed key into this one static buffer, xor it back to cleartext and use it. Then write zeroes back into that buffer.
The operation of unmasking, operating and remasking should be quick. Don't leave the buffer sitting around unmasked for a long time.
If someone were to try a cold-boot attack, pulling the plug on the hardware, and inspecting the memory chips there would be only one buffer that could possibly hold a cleartext key, and odds are during that particular instant of the coldboot attack the buffer would be empty.
When operating on the key, you could even unmask just one word of the key at a time just before you need it to validate the key such that a complete key is never stored in that buffer.
#update: I just wanted to address some criticisms in the comments below:
The phrase "security through obscurity" is commonly misunderstood. In the formal analysis of security algorithms "obscurity" or methods of hiding data that are not crytpographically secure do not increase the formal security of a cryptographic algorithm. And it is true in this case. Given that keys are stored on the users machine, and must be used by that program on that machine there is nothing that can be done to make the keys on this machine cryptographically secure. No matter what process you use to hide or lock the data at some point the program must use it, and a determined hacker can put breakpoints in the code and watch when the program uses the data. But no suggestion in this thread can eliminate that risk.
Some people have suggested that the OP find a way to use special hardware with locked memory chips or some operating system method of locking a chip. This is cryptographically no more secure. Ultimately if you have physical access to the machine a determined enough hacker could use a logic analyzer on the memory bus and recover any data. Besides the OP has stated that the target systems don't have such specialized hardware.
But this doesn't mean that there aren't things you can do to mitigate risk. Take the simplest of access keys- the password. If you have physical access to a machine you can put in a key logger, or get memory dumps of running programs etc. So formally the password is no more secure than if it was written in plaintext on a sticky note glued to the keyboard. Yet everyone knows keeping a password on a sticky note is a bad idea, and that is is bad practice for programs to echo back passwords to the user in plaintext. Because of course practically speaking this dramatically lowers the bar for an attacker. Yet formally a sticky note with a password is no less secure.
The suggestion I make above has real security advantages. None of the details matter except the 'xor' masking of the security keys. And there are ways of making this process a little better. Xor'ing the keys will limit the number of places that the programmer must consider as attack vectors. Once the keys are xord, you can have different keys all over your program, you can copy them, write them to a file, send them over the network etc. None of these things will compromise your program unless the attacker has the xor buffer. So there is a SINGLE BUFFER that you have to worry about. You can then relax about every other buffer in the system. ( and you can mlock or VirtualLock that one buffer )
Once you clear out that xor buffer, you permanently and securely eliminate any possibility that an attacker can recover any keys from a memory dump of your program. You are limiting your exposure both in terms of the number of places and the times that keys can be recovered. And you are putting in place a system that allows you to work with keys easily without worrying during every operation on an object that contains keys about possible easy ways the keys can be recovered.
So you can imagine for example a system where keys refcount the xor buffer, and when all key are no longer needed, you zero and delete the xor buffer and all keys become invalidated and inaccessible without you having to track them down and worry about if a memory page got swapped out and still holds plaintext keys.
You also don't have to literally keep around a buffer of random data. You could for example use a cryptographically secure random number generator, and use a single random seed to generate the xor buffer as needed. The only way an attacker can recover the keys is with access to the single generator seed.
You could also allocate the plaintext buffer on the stack as needed, and zero it out when done such that it is extremely unlikely that the stack ever leaves on chip cache. If the complete key is never decoded, but decoded one word at a time as needed even access to the stack buffer won't reveal the key.
There is no platform-independent solution. All the threats you're addressing are platform specific and thus so are the solutions. There is no law that requires every CPU to have registers. There is no law that requires CPUs to have caches. The ability for another program to access your program's RAM, in fact the existence of other programs at all, are platform details.
You can create some functions like "allocate secure memory" (that by default calls malloc) and "free secure memory" (that by default calls memset and then free) and then use those. You may need to do other things (like lock the memory to prevent your keys from winding up in swap) on platforms where other things are needed.
Aside from the very good comments above, you have to consider that even IF you succeed in getting the key to be stored in registers, that register content will most likely get stored in memory when an interrupt comes in, and/or when another task gets to run on the machine. And of course, someone with physical access to the machine can run a debugger and inspect the registers. Debugger may be an "in circuit emulator" if the the key is important enough that someone will spent a few thousand dollars on such a device - which means no software on the target system at all.
The other question is of course how much this matters. Where are the keys originating from? Is someone typing them in? If not, and are stored somewhere else (in the code, on a server, etc), then they will get stored in the memory at some point, even if you succeed in keeping them out of the memory when you actually use the keys. If someone is typing them in, isn't the security risk that someone in one way or another, forces the person(s) knowing the keys to reveal the keys?
As others have said, there is no secure way to do this on a general purpose computer. The alternative is to use a Hardware Security Module (HSM).
These provide:
greater physical protection for the keys than normal PCs/servers (protecting against direct access to RAM);
greater logical protection as they are not general purpose - no other software is running on the machine so no other processes/users have access to the RAM.
You can use the HSM's API to perform the cryptographic operations you need (assuming they are somewhat standard) without ever exposing the unencrypted key outside of the HSM.
If your platform supports POSIX, you would want to use mlock to prevent your data from being paged to the swap area. If you're writing code for Windows, you can use VirtualLock instead.
Keep in mind that there's no absolute way to protect the sensitive data from getting leaked, if you require the data to be in its unencrypted form at any point in time in the RAM (we're talking about plain ol' RAM here, nothing fancy like TrustZone). All you can do (and hope for) is to minimize the amount of time that the data remains unencrypted so that the adversary will have lesser time to act upon it.
If yours is an user mode application and the memory you are trying to protect is from other user mode processes try CryptProtectMemory api (not for persistant data).
As the other answers mentioned, you may implement a software solution but if your program runs on a general purpose machine and OS and the attacker has access to your machine it will not protect your sensitive data. If you data is really very sensitive and an attacker can physically access the machine a general software solution won't be enough.
I once saw some platforms dealing with very sensible data which had some sensors to detect when someone was accessing the machine physically, and which would actively delete the data when that was the case.
You already mentioned cold boot attack, the problem is that the data in RAM can be accessed until minutes after shut down on general RAM.

Store huge amount of data in memory

I am looking for a way to store several gb's of data in memory. The data is loaded into a tree structure. I want to be able to access this data through my main function, but I'm not interested in reloading the data into the tree every time I run the program. What is the best way to do this? Should I create a separate program for loading the data and then call it from the main function, or are there better alternatives?
thanks
Mads
I'd say the best alternative would be using a database - which would be then your "separate program for loading the data".
If you are using a POSIX compliant system, then take a look into mmap.
I think Windows has another function to memory map a file.
You could probably solve this using shared memory, to have one process that it long-lived build the tree and expose the address for it, and then other processes that start up can get hold of that same memory for querying. Note that you will need to make sure the tree is up to being read by multiple simultaneous processes, in that case. If the reads are really just pure reads, then that should be easy enough.
You should look into a technique called a Memory mapped file.
I think the best solution is to configure a cache server and put data there.
Look into Ehcache:
Ehcache is an open source, standards-based cache used to boost
performance, offload the database and simplify scalability. Ehcache is
robust, proven and full-featured and this has made it the most
widely-used Java-based cache.
It's written in Java, but should support any language you choose:
The Cache Server has two apis: RESTful resource oriented, and SOAP.
Both support clients in any programming language.
You must be running a 64 bit system to use more than 4 GB's of memory. If you build the tree and set it as a global variable, you can access the tree and data from any function in the program. I suggest you perhaps try an alternative method that requires less memory consumption. If you post what type of program, and what type of tree you're doing, I can perhaps give you some help in finding an alternative method.
Since you don't want to keep reloading the data...file storage and databases are out of question, but several gigs of memory seem like such a hefty price.
Also note that on Windows systems, you can access the memory of another program using ReadProcessMemory(), all you need is a pointer to use for the location of the memory.
You may alternatively implement the data loader as an executable program and the main program as a dll loaded and unloaded on demand. That way you can keep the data in the memory and be able to modify the processing code w/o reloading all the data or doing cross-process memory sharing.
Also, if you can operate on the raw data from the disk w/o making any preprocessing of it (e.g. putting it in a tree, manipulating pointers to its internals), you may want to memory-map the data and avoid loading unused portions of it.