Text file recovery on corrupted file system of Flashdrive - c++

I am able to read raw data of the corrupted file system of USB drive.
Is there any simple way for me to recover only text and docx files by using these raw data? (Programming Language: C++)

It might be possible to do it, but it won't be simple.
First of all you will need to parse the file system (i assume it's fat32 from the tags). In fact you will need to parse File Allocation Table (if it's corrupted and mirror copy of FAT was enabled on your drive, then you can try with it). Depending on corruption you it might be possible to extract some files. Read this article for more info about FAT32 structure and you can use this Microsoft specification as more strict guide. Good approach to understand the filesystem is to make some small usb or logical drive with sample file and parse it manually using some hex editor (free wxHexEditor or proprietary WinHex for example).
You can try to search sequences of ASCII characters in your Hex image, but then you will need to sort them manually.
As for docx, this format internally is a collection of XML files and resources, compressed in zip. So it will be way to complicated task to restore it from raw hex image

Related

How do I input and output various file types in c++

I've seen a lot of examples of i/o with text files I'm just wondering if you can do the same with other file types like mp3's, jpg's, zip files, etc..?
Will iostream and fstream work for all of these or do I need another library? Do I need a new sdk?
It's all binary data so I'd think it would be that simple. But I've been unpleasently surprised before.
Could I convert all files to text or binary?
It depend on what you mean by "work"
You can think of those files as a book written in Greek.
If you want to just mess with binary representation (display text in Greek on screen) then yes, you can do that.
If you want to actually extract some info: edit video stream, remove voice from audio (actually understand what is written), then you would need to either parse file format yourself (learn Greek) or use some specialized library (hire a translator).
Either way, filestreams are suited to actually access those files data (and many libraries do use them under the hood)
You can work on binary streams by opening them with openmode binary :
ifstream ifs("mydata.mp3", ios_base::binary);
Then you read and write any binary content. However, if you need to generate or modify such content, play a video or display a piture, the you you need to know the inner details of the format you are using. This can be exremely complex, so a library would be recomended. And even with a library, advanced programming skills are required.
Examples of open source libraries: ffmpeg for usual audio/video format, portaudio for audio, CImg for image processing (in C++), libpng for png graphic format, lipjpeg for jpeg. Note that most libraries offer a C api.
Some OS also supports some native file types (example, windows bitmaps).
You can open these files using fstream, but the important thing to note is you must be intricately aware of what is contained within the file in order to process it.
If you just want to open it and spit out junk, then you can definitely just start at the first line of the file and exhaustively push all data into your console.
If you know what the file looks like on the inside, then you can process it just as you would any other file.
There may be specific libraries for processing specific files, but the fstream library will allow you to access any file you'd like.
All files are just bytes. There's nothing stopping you from reading/writing those bytes however you see fit.
The trick is doing something useful with those bytes. You could read the bytes from a .jpg file, for example, but you have to know what those bytes mean, and that's complicated. Usually it's best to use libraries written by people who know about the format in question, and let them deal with that complexity.

Opening an existing .doc file using ofstream in C++

Assuming I have a file with .doc extension in Windows platform, how can I open the the file for outputting its contents on the screen using the ofstream object in C++? I am aware that the object can be used to open files in text and binary modes. But I would like to know if a .doc (or even .pdf) file can be opened and its contents read.
I've never actually done this before, but after reading up on it, I think I might have a suggestion. The .docx format is actually just XML that is zipped up. After unzipping, the file is located at word/document.xml. Doing this in a program is where it gets fun.
Two options: If you're using C++ CLR (.NET) then Microsoft has an SDK for you. It should make it pretty easy to open Office documents.
Otherwise if you're just using regular C++, you might have to do some extra work.
Open the file and unzip it using a library like zlib
Find the document.xml file inside
Parse the XML document. You'll probably want to use some kind of XML parsing library for this. You'll have to look up the specs for the XML to figure out how to get the text you want.
C++ std library has ifstream class that can be used to read simple text files, and for read binary files too.
It is up to you to interpret these bytes in the file. To proper interpret the binary file you need to know the format of the file.
If you think of MS Word files then I would start from here: http://en.wikipedia.org/wiki/Office_Open_XML to understand MS Word 2007 format.
You might find the Boost Iostreams library ( http://www.boost.org/doc/libs/1_52_0/libs/iostreams/doc/home.html ) somehow useful if you want to make some filter by yourself.

Application that Recovers Disk Contents in RAW form and Dumps as One Big Binary File?

I have a bunch of work that was lost on a hard drive that I had accidentally formatted at one time. I realize there are many data recovery tools out there, but all of them seem to try and scan the drive for actual partition data to recover files. I don't need this. I simply need a program to access each sector in a RAW fashion and dump the contents, byte-by-byte, to some file. The reason being, is that most of my work are ASCII files, so I don't care if the contents are part of a recognizable and complete file. I'm OK with parsing through the RAW data and trying to recover whatever text I can.
So does something like this exist, or am I resigned to code my own program (something I'm more than OK with)? It'd be a rather simple program that literally would scan the entire disk (sector by sector) and dump the contents (byte by byte) to a file (or likely several smaller files to make it more manageable) on another disk. What's the suggested way to make RAW disk reads from C/C++?
Could you boot from a Linux Live CD, identify the device name of the hard disk (eg with fdisk which will also tell you the partition size) then use the dd command, eg dd if=/dev/sda1 of=/mnt/path_to_external_drive to extract the raw contents of the disk?

Storing UTF-8 XML using Word's CustomXMLPart or any other supported way

I am writing a Word add-in which is supposed to store some own XML data per document using Word object model and its CustomXMLPart. The problem I am now facing is the lack of IStream-like functionality for reading/writing XML to/from a CustomXMLPart. It only provides BSTR interface and I am puzzled how to handle UTF-8 XMLs with BSTRs. To my understanding an UTF-8 XML file should really never have to undergo this sort of Unicode conversion. I am not sure what to expect as a result here.
Is there another way of using Word automation interfaces to store arbitrary custom information inside a DOCX file?
The "package" is an OPC document (Open Packaging Convention), which is basically a structured zip folder with a different extension (e.g. .pptx, .docx, .xps, etc.). You can get that file in stream and manipulate it any which way you like - but not artibitrarily. It will not be recognized as valid docx if you put things in the wrong places (not just xml elements, but also files in the folders inside the zip file). But if you're just talking "artibitrary" meaning CustomXMLPart, then that's okay.
This is a good kicker page to learn more about the Open XML SDK and if you're up to it, which allows for somewhat easier access to the file formats than using (.NET) System.IO.Packaging or a third-party zip library. To go deeper, grab the eBook (free) Open XML Explained.
With the Open XML SDK (again, this can all be done without the SDK) in .NET, this is what you'll want to do: How to: Insert Custom XML to an Office Open XML Package by Using the Open XML API.

C++ Importing and Renaming/Resaving an Image

Greetings all,
I am currently a rising Sophomore (CS major), and this summer, I'm trying to teach myself C++ (my school codes mainly in Java).
I have read many guides on C++ and gotten to the part with ofstream, saving and editing .txt files.
Now, I am interested in simply importing an image (jpeg, bitmap, not really important) and renaming the aforementioned image.
I have googled, asked around but to no avail.
Is this process possible without the download of external libraries (I dled CImg)?
Any hints or tips on how to expedite my goal would be much appreciated
Renaming an image is typically about the same as renaming any other file.
If you want to do more than that, you can also change the data in the Title field of the IPTC metadata. This does not require JPEG decoding, or anything like that -- you need to know the file format well enough to be able to find the IPTC metadata, and study the IPTC format well enough to find the Title field, but that's about all. Exactly how you'll get to the IPTC metadata will vary -- navigating a TIFF (for one example) takes a fair amount of code all by itself.
When you say "renaming the aforementioned image," do you mean changing metadata in the image file, or just changing the file name? If you are referring to metadata, then you need to either understand the file format or use a library that understands the file format. It's going to be different for each type of image file. If you basically just want to copy a file, you can either stream the contents from one file stream to another, or use a file system API.
std::ifstream infs("input.txt", std::ios::binary);
std::ofstream outfs("output.txt", std::ios::binary);
outfs << insfs.rdbuf();
An example of a file system API is CopyFile on Win32.
It's possible without libraries - you just need the image specs and 'C', the question is why?
Targa or bmp are probably the easiest, it's just a header and the image data as a binary block of values.
Gif, jpeg and png are more complex - the data is compressed