Do pictures ever get stored in RAM? - c++

I am a beginner C++ programmer.
I wrote a simple program that creates a char array (the size is user's choice) and reads what previous information was in it. Often you can find something that makes sense but most of it is just strange characters. I made it output into a binary file.
Why do I often find multiple copies of the alphabet?
Is it possible to find a picture inside of the RAM chunk I retrieved?
I heard about file signatures (headers), which goes before any of the data in a file, but do "trailers" go in the back after all the data?

When you read uninitialized data from memory that you allocated, you'll never see any data from another process. You only ever see data that your own process has written. That is: your code plus all the libraries that you called.
This is a security feature of your kernel: It never leaks information from a process unless it's specifically asked to transfer that information.
If you didn't load a picture in memory, you'll never see one using this method.

Assumning your computer runs Linux, Windows, MacOS or something like that, there will NEVER be any pictures in the memory your process uses - unless you loaded them into your process. For security reasons, the memory used by other processes is cleared before it gets given to YOUR process. This is the case for all modern OS's, and has been the case for multi-user OS's (Unix, VAX-VMS, etc) more or less since they were first invented in the late 1950's or early 1960's - because someone figured out that it's kind of unfun when "your" data is found by someone else who is just out there fishing for it.
Even a process that has ended will have it's memory cleared - how would you like it if your password was still stored in memory for someone to find when the program that reads the password ended? [Programs that hold highly sensitive data, such as encryption keys or passwords, often manually (as in using code, but not waiting until the OS clears it when the process ends) clear the memory used to store such, because of the below debug functionally allowing the memory content to be inspected at any time, and the shorter time, the less likely a leak of sensitive information]
Once memory has been allocated to your process, and freed again, it will contain whatever happens to be in that memory, as clearing it takes extra time, and most of the time, you'd want to fill it with something else anyway. So it contains whatever it happens to contain, and if you poke around it, you will potentially "find stuff". But it's all your own processes work.
Most OS's have a way to read what another process is doing as part of the debug functionality (if you run the "debugger" in your system, it will of course run as a separate process, but needs to be able to access your program when you debug it, so there needs to be ways to read the memory of that process), but that requires a little more effort than just calling new or malloc (and you either will need to have extra permissions (superuser, adminstrator, etc), or be the owner of the other process too).
Of course, if your computer is running DOS or CP/M, it has no such security features, and you get whatever happens to be in the memory (and you could also just make up a pointer to an arbitrary address and read it, as long as you stay within the memory range of the system).

Related

Linux non-persistent backing store for mmap()

First, a little motivating background info: I've got a C++-based server process that runs on an embedded ARM/Linux-based computer. It works pretty well, but as part of its operation it creates a fairly large fixed-size array (e.g. dozens to hundreds of megabytes) of temporary/non-persistent state information, which it currently keeps on the heap, and it accesses and/or updates that data from time to time.
I'm investigating how far I can scale things up, and one problem I'm running into is that eventually (as I stress-test the server by making its configuration larger and larger), this data structure gets big enough to cause out-of-memory problems, and then the OOM killer shows up, and general unhappiness ensues. Note that this embedded configuration of Linux doesn't have swap enabled, and I can't (easily) enable a swap partition.
One idea I have on how to ameliorate the issue is to allocate this large array on the computer's local flash partition, instead of directly in RAM, and then use mmap() to make it appear to the server process like it's still in RAM. That would reduce RAM usage considerably, and my hope is that Linux's filesystem-cache would mask most of the resulting performance cost.
My only real concern is file management -- in particular, I'd like to avoid any chance of filling up the flash drive with "orphan" backing-store files (i.e. old files whose processes don't exist any longer, but the file is still present because its creating process crashed or by some other mistake forgot to delete it on exit). I'd also like to be able to run multiple instances of the server simultaneously on the same computer, without the instances interfering with each other.
My question is, does Linux have any built-it facility for handling this sort of use-case? I'm particularly imagining some way to flag a file (or an mmap() handle or similar) so that when the file that created the process exits-or-crashes, the OS automagically deletes the file (similar to the way Linux already automagically recovers all of the RAM that was allocated by a process, when the process exits-or-crashes).
Or, if Linux doesn't have any built-in auto-temp-file-cleanup feature, is there a "best practice" that people use to ensure that large temporary files don't end up filling up a drive due to unintentionally becoming persistent?
Note that AFAICT simply placing the file in /tmp won't help me, since /tmp is using a RAM-disk and therefore doesn't give me any RAM-usage advantage over simply allocating in-process heap storage.
Yes, and I do this all the time...
open the file, unlink it, use ftruncate or (better) posix_fallocate to make it the right size, then use mmap with MAP_SHARED to map it into your address space. You can then close the descriptor immediately if you want; the memory mapping itself will keep the file around.
For speed, you might find you want to help Linux manage its page cache. You can use posix_madvise with POSIX_MADV_WILLNEED to advise the kernel to page data in and POSIX_MADV_DONTNEED to advise the kernel to release the pages.
You might find that last does not work the way you want, especially for dirty pages. You can use sync_file_range to explicitly control flushing to disk. (Although in that case you will want to keep the file descriptor open.)
All of this is perfectly standard POSIX except for the Linux-specific sync_file_range.
Yes, You create/open the file. Then you remove() the file by its filename.
The file will still be open by your process and you can read/write it just like any opened file, and it will disappear when the process having the file opened exits.
I believe this behavior is mandated by posix, so it will work on any unix like system. Even at a hard reboot, the space will be reclaimed.
I believe this is filesystem-specific, but most Linux filesystems allow deletion of open files. The file will still exist until the last handle to it is closed. I would recommend that you open the file then delete it immediately and it will be automatically cleaned up when your process exits for any reason.
For further details, see this post: What happens to an open file handle on Linux if the pointed file gets moved, delete

Allocating memory that can be freed by the OS if needed

I'm writing a program that generates thumbnails for every page in a large document. For performance reasons I would like to keep the thumbnails in memory for as long as possible, but I would like the OS to be able to reclaim that memory if it decides there is another more important use for it (e.g. the user has started running a different application.)
I can always regenerate the thumbnail later if the memory has gone away.
Is there any cross-platform method for flagging memory as can-be-removed-if-needed? The program is written in C++.
EDIT: Just to clarify, rather than being notified when memory is low or regularly monitoring the system's amount of memory, I'm thinking more along the lines of allocating memory and then "unlocking" it when it's not in use. The OS can then steal unlocked memory if needed (even for disk buffers if it thinks that would be a better use of the memory) and all I have to do as a programmer is just "lock" the memory again before I intend to use it. If the lock fails I know the memory has been reused for something else so I need to regenerate the thumbnail again, and if the lock succeeds I can just keep using the data from before.
The reason is I might be displaying maybe 20 pages of a document on the screen, but I may as well keep thumbnails of the other 200 or so pages in case the user scrolls around a bit. But if they go do something else for a while, that memory might be better used as a disk cache or for storing web pages or something, so I'd like to be able to tell the OS that it can reuse some of my memory if it wants to.
Having to monitor the amount of free system-wide memory may not achieve the goal (my memory will never be reclaimed to improve disk caching), and getting low-memory notifications will only help in emergencies. I was hoping that by having a lock/unlock method, this could be achieved in more of a lightweight way and benefit the performance of the system in a non-emergency situation.
Is there any cross-platform method for flagging memory as can-be-removed-if-needed? The program is written in C++
For Windows, at least, you can register for a memory resource notification.
HANDLE WINAPI CreateMemoryResourceNotification(
_In_ MEMORY_RESOURCE_NOTIFICATION_TYPE NotificationType
);
NotificationType
LowMemoryResourceNotification Available physical memory is running low.
HighMemoryResourceNotification Available physical memory is high.
Just be careful responding to both events. You might create a feedback loop (memory is low, release the thumbnails! and then memory is high, make all the thumbnails!).
In AIX, there is a signal SIGDANGER that is send to applications when available memory is low. You may handle this signal and free some memory.
There is a discussion among Linux people to implement this feature into Linux. But AFAIK it is not yet implemented in Linux. Maybe they think that application should not care about low level memory management, and it could be transparently handled in OS via swapping.
In posix standard there is a function posix_madvise might be used to mark an area of memory as less important. There is an advice POSIX_MADV_DONTNEED specifies that the application expects that it will not access the specified range in the near future.
But unfortunately, current Linux implementation will immediately free the memory range when posix_madvise is called with this advice.
So there's no portable solution to your question.
However, on almost every OS you are able to read the current available memory via some OS interface. So you can routinely read such value and manually free memory when available memory in OS is low.
There's nothing special you need to do. The OS will remove things from memory if they haven't been used recently automatically. Some OSes have platform-specific ways to improve this, but generally, nothing special is needed.
This question is very similar and has answers that cover things not covered here.
Allocating "temporary" memory (in Linux)
This shouldn't be too hard to do because this is exactly what the page cache does, using unused memory to cache the hard disk. In theory, someone could write a filesystem such that when you read from a certain file, it calculated something, and the page cache would cache it automatically.
All the basics of automatically freed cache space are already there in any OS with a disk cache, and It's hard to imagine there not being an API for something that would make a huge difference especially in things like mobile web browsers.

hibernation feature for arbitrary code

So this just came to my mind when I was thinking about data serialization and its resemblance to windows hibernation. When you hibernate the system, the OS does not care about individual programs and whether they can serialize/deserialize their data. It just dumps the whole thing to disk and later on you can resume whatever you have been doing.
Here's the question: How does Windows do this without caring about each individual program? Is it possible to somehow emulate this for your code so that you can "dump" it to disk and later on resume it without bothering to write serialization/deserialization methods?
Windows does this by suspending execution of every process and writing out the active (allocated) memory pages to disk. When this memory is later restored and the kernel kick-started, it is able to resume everything where it left off, because from its perspective, the memory hasn't actually changed. It's as though it just froze for a very long period of time.
The only way you could do this with one of your own processes would be to have some other supervisory code running in the kernel -- you'd need a way to get at your process' memory map and preserve it along with the actual memory pages so that all existing pointers in the application's memory remain valid when the pages are restored later. You would also need a way to persist other data (such as any open file descriptors) so that they could be restored as well.
This is not practical for most applications.

Risk of damaging your computer by altering memory in C++

I know some Java and am right now trying C++ instead and apparently in C++ you can do things like declare an int array of size 6, then change the 10th element of that array, which I'm understanding to be simply the 4th byte after the end of the section of memory that was allocated for the 6-integer array.
So my question is, if I'm careless is it possible to accidentally alter memory in my C++ program that is being used by other programs on my system? Is there an actual risk of seriously messing something up this way? I mean I know you can just restart your computer and clear the memory if you have to, but if I don't do that, there could be some lasting damage.
It depends on your system. Formally, an out of bounds access is
undefined behavior. On a modern general purpose system, each user
process has its own address space, and one process can't modify, or even
read, that of another process (barring shared memory), so unless you're
writing kernel code, you shouldn't be able to break anything outside of
your own process (and non-kernel code can't normally do physical IO, so
I don't see how anything in the hardware could break).
If you're writing kernel code, however, or working on an embedded
processor with no memory mapping or protection, you can literally
destroy hardware with an out of bounds write; if the program is
controlling something like a nuclear power plant, you can even destroy a
lot more than just the machine your code is running on.
Each process is given its own virtual address space, so naturally processes don't see each others memory. Don't forget that even a buffer overrun that is local to your program can have dire consequences - the overrun may cause the program to misbehave and do something that has lasting effect (like deleting all files for example).
This depends on what operating system and environment you are in:
Normal OS (Windows, Linux etc) userspace program: You can only mess up your own process memory. However having really bad luck this can be enough. Imagine for example that you make a call to some function that deletes files. If your memory is corrupted at the time of the call, the parameters to the function might be messed up to mean deletion of something else than you intended. As long as you keep from calling delete file routines etc. in the programs where you test memory handling, this risk is non-existent.
Normal OS, kernel space device driver: You can access system memory and the memory of the currently running process, possibly destroying everything.
Simple embedded OS without memory protection: You can access everything and destroy anything.
Legacey OS without memory protection (Win 3.X, MS-DOS): You can access everyting and destroy anything.
Every program runs in its own address space and one program cannot access (read / modify) any other programs address space (this is a memory management technique called paging).
If you try to access an address in memory that your program cannot read it will cause a segmentation or page fault and your program will crash.
In answer to your question, no permanent damage will be caused.
I'm not sure modern OS (especially win7) allows you to do that. The OS will block the buffer overrun action as you've described
Back in the day (DOS times) some viruses would try to damage hardware by directly programming video or hard drive controller, but even then it was not easy or sure thing. Modern hardware and OSs make it practicaly impossible for user level applications to damage hardware. So program away :) you wont break anything.
There is a different possibility. Having a buffer overrun might allow ill reputed people to exploit that bug and execute arbitrary code in your clients computer. I guess thats bad enough. And the most dangerous part about overrun is that you may not find it even after a seriously excessive test.

What will the effect of trimming my "working set" be on a system with no page file?

A customer is complaining that my program is using too much memory. However, after working with them for a while, I've realised that:
They've turned off their page file (on their terminal services box).
They're worried about the size of the "private working set" figure in task manager for my program.
So, my question is, if I just trim the size of the working set with EmptyWorkingSet() after my program has started up (it uses lots of memory during XML parsing but then deletes it, but the working set doesn't seem to go down) I can make the working set figure go right down. However, will this actually help the customer? I have a feeling this just means that the working set will be paged and I believe if you have the page file turned off, the working set is backed by real memory anyway....
Is it true to say that what task manager reports as "private working set" is really how much my program has new/malloced?
At least in a sense of standard terminology, "private working set" is the amount of memory your program has mapped that is not backed by files (the program executable, dlls, or manually memory-mapped files) on disk or other shared resources. If swap (paging) were enabled, it's the amount of swap space your program would occupy if it were entirely swapped out of memory.
I would agree with your management that you need to fix your bloated program. Turning off swap is a very sane decision for a customer with low-latency requirements. If your program is using 2GB of memory, perhaps you need to rethink whatever libraries you're using to represent XML data in memory.
You've noted that your working set goes up after new/malloc. This is because they ask the OS for memory. You've also noted that it does not go down after delete/free. This is because they don't return memory to the OS. On a normal, sane system, this is not a problem. The unused memory space for your process will end up in swap, untouched and out of RAM.
On this special box, you'd be better off to override operator new with a direct call to HeapAlloc and operator delete with a call to HeapFree. Do enable the low-fragmentation heap for Server 2003; it's already the default for 2008.
The private working set seems to be the virtual memory your program alone uses and needs, so I'm not sure that resetting is going to help you. I'd find out why your program is using so much memory, rather then trying to play around with the private working set.
Memory leaks?