Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
how to overwrite all free disk space with zeros, like the cipher command in Windows; for example:
cipher /wc:\
This will overwrite the free disk space in three passes. How can I do this in C or C++? (I want to this in one pass and as fast as possible.)
You can create a set a files and write random bytes to them until available disk space is filled. These files should be removed before exiting the program.
The files must be created on the device you wish to clean.
Multiple files may be required on some file systems, due to file size limitations.
It is important to use different non repeating random sequences in these files to avoid file system compression and deduplicating strategies that may reduce the amount of disk space actually written.
Note also that the OS may have quota systems that will prevent you from filling available disk space and may also show erratic behavior when disk space runs out for other processes.
Removing the files may cause the OS to skip the cache flushing mechanism, causing some blocks to not be written to disk. A sync() system call or equivalent might be required. Further synching at the hardware level might be delayed, so waiting for some time before removing the files may be necessary.
Repeating this process with a different random seed improves the odds of hardware recovery through surface analysis with advanced forensic tools. These tools are not perfect, especially when recovery would be a life saver for a lost Bitcoin wallet owner, but may prove effective in other more problematic circumstances.
Using random bytes has a double purpose:
prevent some file systems from optimizing the blocks and compress or share them instead of writing to the media, thus overwriting existing data.
increase the difficulty in recovering previously written data with advanced hardware recovery tools, just like these security envelopes that have random patterns printed on the inside to prevent exposing the contents of the letter by simply scanning the envelope over a strong light.
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 11 months ago.
Improve this question
I have an idea that I am working on. I have a windows mini-filter driver that I am trying to create that will virtualize changes to files by certain processes. I am doing this by capturing the writes, and sending the writes to a file that is in a virtualized location. Here is the issue:
If the process tries to read, it needs to get unaltered reads for parts of the file it has not written to, but it needs to get the altered reads from parts that have been written to. How do I track the segments of the file that have been altered in an efficient way? I seem to remember a way you can use a bitmask to map file segments, but I may be misremembering. Anyway any help would be greatly appreciated.
Two solutions:
Simply copy the original file to virtualized storage, and use only this file. For small files, it will probably be the best and fastest solution.
To give an example, let's say that any file smaller than 65536 bytes would be fully copied - use a power of two in any case.
If file is growing above limit, see solution 2.
For big files, keep overwritten segments in virtualized storage, use them according to current file position when needed. Easiest way will be to split it in 65536 bytes chunks... You get the chunk number by shifting file's position by 16 to the right, and the position within the chunk is obtained by masking only the lower 16 bits.
Example:
file_position = 165 232 360
chunk_number = file_position >> 16 (== 2 521)
chunk_pos = file_position & 0xFFFF (== 16 104)
So, your virtualized storage become a directory, storing chunk named trivially (chunk #2521 = 2521.chunk, for example).
When a write occurs, you start to copy the original data to a new chunk in virtualized storage, then you allow application to write inside.
Obviously, if file is growing, simply add chunks that will exist only in virtualized storage.
It's not perfect - you can use delta chunks instead of full ones, to save disk space - but it's a good start that can be optimized later.
Also, it's quite easy to add versions, and keep trace of:
Various applications that use the file (keep multiple virtualized storages),
Successive launches (run #1 modifies start of file, run #2 modifies end of file, you keep both virtualizations and you can easily "revert" the last launch).
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
In some video games, I find that everytime a new character is created a factory method is used create the new one like this
class CharacterEngine
{
public:
static Character* CreateCharacter(string Name, Weapons InitialWeapons)
{
return new Character(Name, InitialWeapons);
}
};
//...
Now that if I have 100000000 characters (very many, e.g like simulated particles), heap allocation like this may fail to work on computers with small RAM. What is your solution to this problem?
Edit
What other methods or designs do you know can change or replace the factory method/class?
Do you actually have 100K characters? And are you actually in an environment in which you are memory constrained and allocation fails? Even if Character is a whopping 1KB in side, you'd be looking at 100MBs consumed, which isn't that much, even for feature phones.
But perhaps you're worried that you might actually have memory to spare, but fragmentation is so high you can't use it. That's a fairer concern, and one usually relevant to games. Perhaps take a look at the object pool pattern. Also, taking into consideration the large number of characters you're speaking of, perhaps flighweight might also help!
Finally, running out of memory isn't like other program errors like losing a TCP connection or facing a disk error. If you need to allocate the 100001th character and there's no more memory for it you can't not allocate it, show an error to the user or try again later. You can't go on without them as it were. So don't - just bail the program and perhaps do whatever cleaning up is required to not lose too much game state etc. Have a read for malloc never fails as well.
The heap memory is obviously limited, but the limit is in practice not that small (at least gigabytes on current PCs).
And memory consumption is not the biggest problem in a game. If you have many characters, you might need to deal with interactions between them, and that could be more difficult (e.g. determining the set of characters close to a given one could be more challenging).
You should read more about memory management, virtual address space, smart pointers, reference counting, RAII, circular references, weak references, hash consing.
Notice that the heap is global to your program & process (it is not the property of some particular class or code chunk, but of your entire program).
The heap allocation routines (related to new & delete) are generally implemented about some operating system primitives (often system calls) to grow the virtual address space. On Linux, see mmap(2). The operating system could provide some mean to query your virtual address space (on Linux, see proc(5) and for a process of pid 1234, the /proc/1234/maps pseudo-file).
I recommend reading a good book on garbage collection, such as the GC handbook. It teaches you concepts and terminology which are relevant for C++ programming (notably in games). In some sense, you may want to implement your own GC for your game.
C++ has some allocator concept and standard containers know about that.
Read also some Introduction to Algorithms.
heap allocation like this may fail to work on computers with small RAM.
Then either improve your program to use less memory, or get a bigger computer. Perhaps consider some distributed computing approach (e.g. cloud computing), like in MMORG.
What other methods or designs do you know can change or replace the factory method/class?
They won't change much the consumed memory, because in your design every character is represented by its unique C++ object. So that does not matter much.
Assuming you have enough local disk space to store the information of all characters, you can mmap one or more files that store all the character data, and create a character object from data in the file(s) only when needed.
If you have neither enough memory nor disk space to store data of all characters locally, then it becomes a much more difficult problem -- you might need to assign to every character an URI and load it from the network...
EDIT: Of course, after updating the character data, you'd need to write it back to the corresponding file. And for performance sake, you might want to implement some caching mechanism so frequently used characters don't need to be read & written back every time they are used.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
I want to implement a fast database alternative that only needs to handle binary data.
To specify, I want something close to a database that will be securely stored even in case of a forced termination (task manager) during execution, whilst also being accessed directly from memory in C++. Like a vector of structs that is mirrored to the hard disk. It should be able to handle hundreds of thousands of read accesses and at least 1000 write accesses per second. In case of a forced termination, at most the last command can be lost. It does not need to support multithreading and the database file will only be accessed by a single instance of the program. Only needs to run on Windows. These are the solutions I've thought of so far:
SQL Databases
Advantages
Easy to implement, since lots of libraries are available
Disadvantages
Server is on a different process, therefor possibly slow inter process communication
Necessity of parsing SQL queries
Built for multithreaded environments, so lots of unnecessary synchronization
Rows can't be directly accessed using pointers but need to be copied at least twice per change
Unnecessary delays on the UPDATE query, since the whole table needs to be searched and the WHERE case checked
These were just a few from the top of my head, there might be a lot more
Memory Mapped Files
Advantages
Direct memory mapping, so direct pointer access possible
Very fast compared to databases
Disadvantages
Forceful termination could lead to a whole page not being written
Lots of code (I don't actually mind that)
No forced synchronization possible
Increasing file size might take a lot of time
C++ vector*
Advantages
Direct pointer access possible, however, needs to manually notify of changes
Very fast compared to databases
Total programming freedom
Disadvantages
Possibly slow because of many calls to WriteFile
Lots of code (I don't actually mind that)
C++ vector with complete write every few seconds
Advantages
Direct pointer access possible
Very fast compared to databases
Total programming freedom
Disadvantages
Lots of unchanged data being rewritten to file, alternatively lots of RAM wasted on preventing unnecessary writes
Inaccessibility during writes of lots of RAM wasted on copy
Could lose multiple seconds worth of data
Multiple threads and therefor synchronization needed
*Basically, a wrapper class that only exposes per row read/write functionality of a vector OR allows direct write to memory, but relies on the caller to notify of changes, all reads are done from a copy in memory, all writes are done to a copy in memory and the file itself on a per-command basis
Also, is it possible to write to different parts of a file without flushing, and then flushing all changes at once with a guarantee that the file will be written either completely or not at all even in case of a forced termination during write? All I can think of is the following workflow:
Duplicate target file on startup, then for every set of data:
Write all changes to duplicate -> Flush by replacing original with duplicate
However, I feel like this would be a horrible waste of hard disk space for big files.
Thanks in advance for any input!
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
the program i am working on at the moment processes a large amount of data (>32GB). Due to "pipelining" however, a maximum of arround 600 MB is present in the main memory at each given time (i checked that, that works as planned).
If the program has finished however and i switch back to the workspace with Firefox open, for example (but also others), it takes a while till i can use it again (also HDD is highly active for a while). This fact makes me wonder if Linux (operating system i use) swaps out other programs while my program is running and why?
I have 4 GB of RAM installed on my machine and while my program is active it never goes above 2 GB of utilization.
My program only allocates/deallocates dynamic memory of only two different sizes. 32 and 64 MB chunks. It is written in C++ and i use new and delete. Should Linux not be sufficiently smart enough to reuse these blocks once i freed them and leave my other memory untouched?
Why does Linux kick my stuff out of the memory?
Is this some other effect i have not considered?
Can i work arround this problem without writing a custom memory management system?
The most likely culprit is file caching. The good news is that you can disable file caching. Without caching, your software will run more quickly, but only if you don't need to reload the same data later.
You can do this directly with linux APIs, but I suggest you use a library such as Boost ASIO. If your software is I/O bound, you should additionally make use of asynchronous I/O to improve performance.
All the recently-used pages are causing older pages to get squeezed out of the disk cache. As a result, when some other program runs, it has to paged back in.
What you want to do is use posix_fadvise (or posix_madvise if you're memory mapping the file) to eject pages you've forced the OS to cache so that your program doesn't have a huge cache footprint. This will let older pages from other programs remain in cache.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I am working on a project for a device that must constantly write information to a storage device. The device will need to be able to lose power but accurately retain the information that it collects up until the time that power is lost.
I've been looking for answers for what would happen if power was lost on a system like this. Are there any issues with losing power and not closing the file? Is data corruption a possibility?
Thank you
The whole subject of "safely storing data when power may be cut" is quite hard to solve in a generic way - the exact solution will depend on the exact type of data, rate data is stored, etc.
To retain information "while power is off", the data needs to be stored in non-volatile memory (either flash, eeprom or battery backed RAM). Again, this is a hardware solution.
If you may "lose data written to file"? Yes, it's entirely possible that the file may not be correctly written if the power to the file-storage device is lost when the system is in the middle of writing.
The answer to this really depends on how much freedom you have to build/customise the hardware to cope with this situation. Systems that are designed for high reliability will have a way to detect power-cuts and still run for several seconds (sometimes a lot more) after a power-cut, and when the power-cut happens, it goes into "save all data, and shut down nicely" mode. Typically, this is done by using an uninterruptable power supply (UPS), which has an alarm mechanism that signals that the external power is gone, and when the system receives this signal, starts a emergency shutdown.
If you don't have any way to connect a UPS and shut down in an orderly fashion, then there's other features, such as journaling filesystem that can give you a good set of data, but it's not guaranteed to give you complete data (and you need to handle your fileformat such that "cut off data" doesn't completely ruin the file - the classic example is a zip-file, which stores the "directory" (list of contents) at the very end of the file. So you can have 99.9% of the file complete, but the missing 0.1% is what you need to decode all the content.
Yes, data corruption is definitely a possibility.
However there are a few guidelines to minimize it in a purely software way:
Use a journalling filesystem and put it in its maximum journal mode (eg. for ext3/ext4 use data=journal, no less).
Avoid software buffers. If you don't have a choice, flush them ASAP.
Synchronize the filesystem ASAP (either through the sync/syncfs/fsync system calls, or using the sync mount option).
Never overwrite existing data, just append new data to existing files.
Be prepared to deal with incomplete data records.
This way, even if you lose data it will only be the last few bytes written, and the filesystem in general won't be corrupt.
You'll notice that I assumed a Unix-y OS. As far as I know, Windows doesn't give you enough control to enforce that kind of constraints on the filesystem.