Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I realise that the question I'm asking isn't a simple "O, that's easy! Do a simple this and that and voilĂ !" Fact is, without thinking one night I deleted the wrong partition. I tried a few Windows and Linux tools (Partition disk doctor, Easeus, Test disk, etc) but none of them worked. And I think it's because of the way I deleted the partition.
I have written my own boot sector creators / backup tools in C++ before as well as one or two kernels in C and Assembler (albeit fairly useless kernels...) so I think I have sufficient knowledge to at the very least TRY to recover it manually.
My drive was set up as follows:
Size: 1.82TB
part0 100MB (redundant windows recovery partition)
part1 ~1760MB (my data partition)
How I broke it:
In Windows 7, I deleted the first partition. I then extended the second to take up the first's free space, which meant I still had 2 partitions, now acting as one dynamic partition. I rebooted into my Ubuntu OS, and realised I could no longer read it. I rebooted back into Windows, deleted the first partition, then thought, wait...i shouldn't have done that. Needless to say it's dead now.
What I would like is some advice / good links on where to start, what not to do, and what not to expect. I'm hoping that if the journals are still intact I'll be able to recover the drive.
Edit:
This is an NTFS drive. After posting this question, I was wondering: given that I know the approximate location of where my partition was located, is there a way to easily identify the journals? Maybe I can reconstruct some of the other drive / partition info myself and write it to the disk.
The first step, I think, is to figure out how exactly those "dynamic partitions" as you call them work in windows 7. From your description, it sounds as if you created a kind of logical volumn from two physical partitions. My guess is that the second partition now contains some kind of header for that volume, which is why recovery tools unfamiliar with that format fail to function.
If you figure out what windows 7 did exactly when you merged the two partitions, you should be able to writen an application which extracts an image of the logical volume.
Or, you could check out NTFS-3G, the FUSE implementation of NTFS at http://www.tuxera.com/community/ntfs-3g-download/. By studying that code, I bet that you can find a way to locate the NTFS filesystem on your borked disk. Once you have that, try extracting everything from the beginning of the filesystem to the end of the disk into an image, and run some ntfs filesystem checker on it. With a little luck, you'll get a moutable filesystem back.
If you're wondering how to access the disk, just open the corresponding device in linux as if it was a regular file. You might need to align your reads to 512 bytes, though (or whatever the sector size of your disk is. 512 and to a lesser extend 4096 are common values), otherwise read() might return an error.
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 11 months ago.
Improve this question
I have an idea that I am working on. I have a windows mini-filter driver that I am trying to create that will virtualize changes to files by certain processes. I am doing this by capturing the writes, and sending the writes to a file that is in a virtualized location. Here is the issue:
If the process tries to read, it needs to get unaltered reads for parts of the file it has not written to, but it needs to get the altered reads from parts that have been written to. How do I track the segments of the file that have been altered in an efficient way? I seem to remember a way you can use a bitmask to map file segments, but I may be misremembering. Anyway any help would be greatly appreciated.
Two solutions:
Simply copy the original file to virtualized storage, and use only this file. For small files, it will probably be the best and fastest solution.
To give an example, let's say that any file smaller than 65536 bytes would be fully copied - use a power of two in any case.
If file is growing above limit, see solution 2.
For big files, keep overwritten segments in virtualized storage, use them according to current file position when needed. Easiest way will be to split it in 65536 bytes chunks... You get the chunk number by shifting file's position by 16 to the right, and the position within the chunk is obtained by masking only the lower 16 bits.
Example:
file_position = 165 232 360
chunk_number = file_position >> 16 (== 2 521)
chunk_pos = file_position & 0xFFFF (== 16 104)
So, your virtualized storage become a directory, storing chunk named trivially (chunk #2521 = 2521.chunk, for example).
When a write occurs, you start to copy the original data to a new chunk in virtualized storage, then you allow application to write inside.
Obviously, if file is growing, simply add chunks that will exist only in virtualized storage.
It's not perfect - you can use delta chunks instead of full ones, to save disk space - but it's a good start that can be optimized later.
Also, it's quite easy to add versions, and keep trace of:
Various applications that use the file (keep multiple virtualized storages),
Successive launches (run #1 modifies start of file, run #2 modifies end of file, you keep both virtualizations and you can easily "revert" the last launch).
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
how to overwrite all free disk space with zeros, like the cipher command in Windows; for example:
cipher /wc:\
This will overwrite the free disk space in three passes. How can I do this in C or C++? (I want to this in one pass and as fast as possible.)
You can create a set a files and write random bytes to them until available disk space is filled. These files should be removed before exiting the program.
The files must be created on the device you wish to clean.
Multiple files may be required on some file systems, due to file size limitations.
It is important to use different non repeating random sequences in these files to avoid file system compression and deduplicating strategies that may reduce the amount of disk space actually written.
Note also that the OS may have quota systems that will prevent you from filling available disk space and may also show erratic behavior when disk space runs out for other processes.
Removing the files may cause the OS to skip the cache flushing mechanism, causing some blocks to not be written to disk. A sync() system call or equivalent might be required. Further synching at the hardware level might be delayed, so waiting for some time before removing the files may be necessary.
Repeating this process with a different random seed improves the odds of hardware recovery through surface analysis with advanced forensic tools. These tools are not perfect, especially when recovery would be a life saver for a lost Bitcoin wallet owner, but may prove effective in other more problematic circumstances.
Using random bytes has a double purpose:
prevent some file systems from optimizing the blocks and compress or share them instead of writing to the media, thus overwriting existing data.
increase the difficulty in recovering previously written data with advanced hardware recovery tools, just like these security envelopes that have random patterns printed on the inside to prevent exposing the contents of the letter by simply scanning the envelope over a strong light.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
the program i am working on at the moment processes a large amount of data (>32GB). Due to "pipelining" however, a maximum of arround 600 MB is present in the main memory at each given time (i checked that, that works as planned).
If the program has finished however and i switch back to the workspace with Firefox open, for example (but also others), it takes a while till i can use it again (also HDD is highly active for a while). This fact makes me wonder if Linux (operating system i use) swaps out other programs while my program is running and why?
I have 4 GB of RAM installed on my machine and while my program is active it never goes above 2 GB of utilization.
My program only allocates/deallocates dynamic memory of only two different sizes. 32 and 64 MB chunks. It is written in C++ and i use new and delete. Should Linux not be sufficiently smart enough to reuse these blocks once i freed them and leave my other memory untouched?
Why does Linux kick my stuff out of the memory?
Is this some other effect i have not considered?
Can i work arround this problem without writing a custom memory management system?
The most likely culprit is file caching. The good news is that you can disable file caching. Without caching, your software will run more quickly, but only if you don't need to reload the same data later.
You can do this directly with linux APIs, but I suggest you use a library such as Boost ASIO. If your software is I/O bound, you should additionally make use of asynchronous I/O to improve performance.
All the recently-used pages are causing older pages to get squeezed out of the disk cache. As a result, when some other program runs, it has to paged back in.
What you want to do is use posix_fadvise (or posix_madvise if you're memory mapping the file) to eject pages you've forced the OS to cache so that your program doesn't have a huge cache footprint. This will let older pages from other programs remain in cache.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
In a few months I will start to write my bachelor-thesis. Although we only discussed the topic of my thesis very roughly, the main problem will be something like this:
A program written in C++ (more or less a HTTP-Server, but I guess it doesn't matter here) has to be executed to fulfill its task. There are several instances of this program running at the same time, and a loadbalancer takes care of equal distribution of http-requests between all instances. Every time the program's code is changed to enhance it, or to get rid of bugs, all instances have to be restarted. This can take up to 40 minutes, for one instance. As there are more than ten instances running, the restart process can take up to one work day. This is way to slow.
The presumed bottleneck is the access to the database during startup to load all necessary data (guess it will be a mysql-database). The idea of the teamleader to decrease the amount of time needed for the startup-process is to serialize the content of the database to a file, and read from this file instead of reading from the database. That would be my task. Of course the problem is to check if there is new data in the database, that is not in the file. I guess write processes are still applied to the database, not to the serialized file. My first idea is to use apache thrift for serialization and deserialization, as I already worked with it and it is fast, as far as I know (maybe i write some small python programm, to take care of this). However, I have some basic questions regarding this problem:
Is it a good solution to read from file instead of reading from database. Is there any chance this will save time?
Would thrift work well in this scenario, or is there some faster way for serialization/deserialization
As I am only reading, not writing, I don't have to take care of consistency, right?
Can you recommend some books or online literature that is worth to read regarding this topic.
If I'm missing Information, just ask. Thanks in advance. I just want to be well informed and prepared before I start with the thesis, this is why I ask.
Kind regards
Michael
Cache is king
As a general recommendation: Cache is king, but don't use files.
Cache? What cache?
The cache I'm talking about is of course an external cache. There are plenty of systems available, a lot of them are able to form a cache cluster with cached items spread across multiple machine's RAM. If you are doing it cleverly, the cost of serializing/deserializing into memory will make your algorithms shine, compared to the cost of grinding the database. And on top of that, you get nice features like TTL for cached data, a cache that persists even if your business logic crashes, and much more.
What about consistency?
As I am only reading, not writing, I don't have to take care of consistency, right?
Wrong. The issue is not, who writes to the database. It is about whether or not someone writes to the database, how often this happens, and how up-to-date your data need to be.
Even if you cache your data into a file as planned in your question, you have to be aware that this produces a redundant data duplicate, disconnected from the original data source. So the real question you have to answer (I can't do this for you) is, what the optimum update frequency should be. Do you need immediate updates in near-time? Is a certain time lag be acceptable?
This is exactly the purpose of the TTL (time to live) value that you can put onto your cached data. If you need more frequent updates, set a short TTL. If you are ok with updates in a slower frequency, set the TTL accordingly or have a scheduled task/thread/process running that does the update.
Ok, understood. Now what?
Check out Redis, or the "oldtimer" Memcached. You didn't say much about your platform, but there are Linux and Windows versions available for both (and especially on Windows you will have a lot more fun with Redis).
PS: Oh yes, Thrift serialization can be used for the serialization part.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I am working on a project for a device that must constantly write information to a storage device. The device will need to be able to lose power but accurately retain the information that it collects up until the time that power is lost.
I've been looking for answers for what would happen if power was lost on a system like this. Are there any issues with losing power and not closing the file? Is data corruption a possibility?
Thank you
The whole subject of "safely storing data when power may be cut" is quite hard to solve in a generic way - the exact solution will depend on the exact type of data, rate data is stored, etc.
To retain information "while power is off", the data needs to be stored in non-volatile memory (either flash, eeprom or battery backed RAM). Again, this is a hardware solution.
If you may "lose data written to file"? Yes, it's entirely possible that the file may not be correctly written if the power to the file-storage device is lost when the system is in the middle of writing.
The answer to this really depends on how much freedom you have to build/customise the hardware to cope with this situation. Systems that are designed for high reliability will have a way to detect power-cuts and still run for several seconds (sometimes a lot more) after a power-cut, and when the power-cut happens, it goes into "save all data, and shut down nicely" mode. Typically, this is done by using an uninterruptable power supply (UPS), which has an alarm mechanism that signals that the external power is gone, and when the system receives this signal, starts a emergency shutdown.
If you don't have any way to connect a UPS and shut down in an orderly fashion, then there's other features, such as journaling filesystem that can give you a good set of data, but it's not guaranteed to give you complete data (and you need to handle your fileformat such that "cut off data" doesn't completely ruin the file - the classic example is a zip-file, which stores the "directory" (list of contents) at the very end of the file. So you can have 99.9% of the file complete, but the missing 0.1% is what you need to decode all the content.
Yes, data corruption is definitely a possibility.
However there are a few guidelines to minimize it in a purely software way:
Use a journalling filesystem and put it in its maximum journal mode (eg. for ext3/ext4 use data=journal, no less).
Avoid software buffers. If you don't have a choice, flush them ASAP.
Synchronize the filesystem ASAP (either through the sync/syncfs/fsync system calls, or using the sync mount option).
Never overwrite existing data, just append new data to existing files.
Be prepared to deal with incomplete data records.
This way, even if you lose data it will only be the last few bytes written, and the filesystem in general won't be corrupt.
You'll notice that I assumed a Unix-y OS. As far as I know, Windows doesn't give you enough control to enforce that kind of constraints on the filesystem.