How to unzip carved files from disk in raw format - compression

Sorry for making such an unclear title
I have a disk image disk.raw from which I carved deleted files using Sleuth Kit and its command blkls disk.raw 1-8000 > carved which put into a file the data in the unallocated blocks from 1 to 8000 (where I know my deleted files are)
So my output is a file containing some data and many empty spaces in between. For example, if I open it on a notepad, I get texts like these :
1 4 µ½;ÓóÆJv4éA°¿S*îÔy÷è„¡d:ÄÕԈȤÒX2ÛK]8øâ†+[ÛÖ7jiº;Îàdƒ”ÜRÒ€
¥¾‘…λ5y)‹F¹ž8rÀÉø±9ŸÎ:ÿf¤$cªW›
jȉ…j,ܬ3®°d¥²¥®:Þ FhãŽß[ÔÀZ
÷·Îâ#§B¢† Uƒ†=qÁ[е#Sy(JØš#œKÊÏ9êþáð0•›nÊÑ=q­¡ŽšOk'ë#ëÚšÚjN1V&l?Hù´m,0㼕(nôTúèªÎb4z
„áñP$¼YèÐ%É‚YSÄÔΔú%ió΃P¥ð"÷…ž8«¾oÀE‚f¤X§üS(‘Àº.8H§÷ëü1¥ãùBÁ
ÏÉε”˜Ê<wªf”œàºš¯+kô¨§
÷*ÎÛMøÈ”Âqú2>XME[9¿
[æÀ‹dJ¹—×™
#¦e³ž‹
&ýãY
™qA›¥ì„5šI‰h{?–hZ%"?mÓ{ƒÌ‡5mf
R‹sàì‰;½˜\E€Îñ$‡jYÀK%ØnDwí[=û Ú‘;1„LQP!ðè.¦(w
‘ªb,†ä‚ž8®©8¢BMMã×›Œx
£®‘ÚFëÖбgi·ÖŠ.O&ÂëR¹5–{íy˜÷æ¡žÜç¦^ñbj˜1Úî5G)©Äš¸#¡
? qâ1q[µ­£É>½¥f–#žÞPžR›#T3lÂ.DcSÚ˥ѹ‹e¬·!$ù­“àYž{¨Ü˜ÉbJ…8¬‘#"b 3Ø„¤Í
qµ~#©Á42û,èLE²‰Iv+Áƒ™MšÅÄ
$Bn×ÖXya1£²ŒJçj-Õ7 :AÚ0è#eP#sef}#NÈ­è?¸¯ãß8µ#Q?ÒY͆ۡ)†ë3F›Œ[ŽF®8©!PóÚª]p [íˆyÊn;ãÕ§rBvÏŸ`‚ȨŠöMë÷S¸50¦è€\¾i'7ÒÚTT•vˆ›™¸ë‹ƒÞS>ðºjû&]WÆ–˜ÚÔG•5ÚÑÎ¥Vñ`;´0æ6\wuo«íîÕµ¬t–‚Âþ‘)ü¨Òíi¼_¡•o_iùab›,âezkM#Þ­æ]–h6Š¨+S$"4”4^ÞóD*í£0Ìmk¼#G•¨pG‡Ï{∉ŒB™ƒ)1Y¸E1<1¼
S’5éà‚z[A¬TD‰·Ý¾é m2ËTÍÌšÛrvF€Â«j¤ô?ÿþ­¢Zh4œ6<劕n´öñ>ï9Ì}
I know that those bunch of data represents a compressed file. Is there a way for me to decode these and read the files inside ? Is there a tool that does that given this previous input ?
I'm really new to this and have basic knowledge :)
Thank you kindly in advance,

Sometimes carved files lose their headers format, repairing it might lead to the file being accessible again. As mentioned by Mark Setchell using a hex editor is better than a notepad. Make sure also to look for the correct header and save the file again with the correct format. Hopefully this is helpful.

Related

How to identify whether a file is DICOM or not which has no extension

I have few files in my GCP bucket folder like below:
image1.dicom
image2.dicom
image3
file1
file4.dicom
Now, I want to even check the files which has no extension i.e image3, file1 are dicom or not.
I use pydicom reader to read dicom files to get the data.
dicom_dataset = pydicom.dcmread("dicom_image_file_path")
Please suggest is there a way to validate the above two files are dicom or not in one sentence.
You can use the pydicom.misc.is_dicom function or do:
try:
ds = dcmread(filename)
except InvalidDicomError:
pass
Darcy has the best general answer, if you're looking to check file types prior to processing them. Apart from checking is the file a dicom file, it will also make sure the file doesn't have any dicom problems itself.
However, another way to quickly check that may or may not be better, depending on your use case, is to simply check the file's signature (or magic number as they're sometimes known.
See https://en.wikipedia.org/wiki/List_of_file_signatures
Basically if the bytes from position 128 to 132 in the file are "DICM" then it should be a dicom file.
If you only want to check for 'is it dicom?' for a set of files, and do it very quickly, this might be another approach

Parsing deleted pdfs

I'm trying to do some file carving on a disk with c++. I can't find any resources on the web related to the on-disk structure of a pdf file. The thing is that I can find the %PDF-1.x token at the start of a cluster but I can't find out the size of a PDF file anywhere.
Let's say hypothetically that the file system entry for this particular document is lost. I find the start of the document and I keep reading until I run into the "startxref number %%EOF". The thing is that I don't know when to stop since there are multiple "%%EOF" markers in the content of a document.
I've tried stopping after reading, let's say 10 clusters, and not finding any pdf specific keyword like "obj", "stream", "trailer", "xref" anywhere. But it's quite arbitrary and it's not a deterministic method of finding the ending of the document so I can determine it's size.
I've also seen some "Length number" markers at the start of some "obj"s but the number doesn't really fit most of the time.
Any ideas on what I can try next? Is there a way to determine the exact size of the entire document? I'm interested in recovering documents programmatically.
Since PDF's are "free format" (pretty much like text files, but with less obviousness to humans when it comes to "reading" the content), it's probably hard to piece them together if they aren't in order.
A stream does have a length, which is a key to where the endstream goes. (A blank line before and after the stream itself). Streams are used t introduce bitmaps and similar things [fonts, line-art data in compressed form, etc] into the document). But if you have several 4KB segments that could go in as the same block in the middle of a stream then there's no way to tell which way they go, other than pasting it together and seeing which ones look sane and which doesn't. Similarly, if there are several segments of streams and objects, you can't really tell which goes where.
Of course, this applies to almost all types of files with "variable content" - you can find the first few kilobytes of a JPG, but knowing what the REST of the of is, won't be easy - only be visually inspecting the content can you determine which blocks of bytes belong where - if you get it wrong, you'll probably just get some random garbage.
The open source tool bulk_extractor has a module called scan_pdf that does pretty much what you are describing here. It can recognize the individual parts of a PDF file on a drive, automatically decompresses the compressed regions, and extracts text using a two strategies. It will recover data from fragments of PDFs even if the xref table cannot be found.

Beginner - data storage through XML or text files

I am a beginner in visual studio and has only code C and C++ in command line settings.
Currently, I am taking a module(software development) which requires me to come up with an expense tracker - a program which helps user tracks his/her daily expenses. Therefore, at the end of each individual day, or after a user uses finishes the program, we would have to perform data storage to store all the info in one place which we would export it during the next usage.
My constraint include not using any relational database(although i have no idea what it is :( ). Data storage must be done using XML or text files. Following this, I have several questions regarding data storage:
1) If data is stored successfully, do we export it everytime we start the program? And everytime after the user closes the program, we overwrite the existing data file and then store it accordingly?
2) I have heard from some people that using text file may be easier. Searching on the internet and library only provides me with information regarding XML and not text. Would anyone be able to help me with it? Like tutorials link and stuff?
Thank you very much!
File writing/handling works similar to every other buffer in c++.
you can enable file handling using the fstream header. You can create a file, write to it and over-write every time the program is run, or can even create a file the first time the program is run and then append to it every subsequent time the program runs.
Ive only ever done text files, never tried XML, but Im guessing they're similar.
http://www.cplusplus.com/doc/tutorial/files/ should give you everything you need to know.
Your choice of XML vs plain text depends on the kind of data that you'll be storing.
The reason why you'll only find XML libraries on the internet is because XML is a lot more complicated than plain text. If you don't know what XML is or if the data that you're storing isn't very complex, then I would suggest going with plain text.
For example, to track expenses, you might store a file like this:
sandwich 5.00
coffee 2.30
soft drink 1.50
...
It's very easy to read/write lines like this to/from a file in C++.

C++ Importing and Renaming/Resaving an Image

Greetings all,
I am currently a rising Sophomore (CS major), and this summer, I'm trying to teach myself C++ (my school codes mainly in Java).
I have read many guides on C++ and gotten to the part with ofstream, saving and editing .txt files.
Now, I am interested in simply importing an image (jpeg, bitmap, not really important) and renaming the aforementioned image.
I have googled, asked around but to no avail.
Is this process possible without the download of external libraries (I dled CImg)?
Any hints or tips on how to expedite my goal would be much appreciated
Renaming an image is typically about the same as renaming any other file.
If you want to do more than that, you can also change the data in the Title field of the IPTC metadata. This does not require JPEG decoding, or anything like that -- you need to know the file format well enough to be able to find the IPTC metadata, and study the IPTC format well enough to find the Title field, but that's about all. Exactly how you'll get to the IPTC metadata will vary -- navigating a TIFF (for one example) takes a fair amount of code all by itself.
When you say "renaming the aforementioned image," do you mean changing metadata in the image file, or just changing the file name? If you are referring to metadata, then you need to either understand the file format or use a library that understands the file format. It's going to be different for each type of image file. If you basically just want to copy a file, you can either stream the contents from one file stream to another, or use a file system API.
std::ifstream infs("input.txt", std::ios::binary);
std::ofstream outfs("output.txt", std::ios::binary);
outfs << insfs.rdbuf();
An example of a file system API is CopyFile on Win32.
It's possible without libraries - you just need the image specs and 'C', the question is why?
Targa or bmp are probably the easiest, it's just a header and the image data as a binary block of values.
Gif, jpeg and png are more complex - the data is compressed

Game Programming: .DAT file?

I've seen a lot of games use something similar to a .DAT file or a specific file type that the game has for itself. I'm just beginning with C++ and DirectX and I was interested in keeping my information in something similar to a .DAT.
My initial conception was that it would hold information on the files you wanted to store within the .DAT file. Something similar to a .RAR file. Unfortunately, my googleing skills did not help me in finding the answers.
Right now I'm simply loading textures and sound files from a folder called Data.
EDIT: While I understand that .DAT is short for data, and I've found that a .DAT file generally contains any assortment of information, I'm still unsure about how to go about doing something as packing images and sound files into any type of file and being able to read them.
I'm not sure about using fstreams to achieve my task, however I will look into streams related to storing data and how to properly read from that data. Meanwhile if anyone has another answer to offer based on this new information, it would be appreciated.
EDIT: Thanks to the answers, I stumbled across a similar question on stackoverflow and felt I'd share it here. Combining resources into a single binary file
I don't think there is really such thing as .dat file format. It's short for "data," and different applications just put in some proprietary stuff in it and call it ".dat." You can read up on fstream classes to do file IO in C++. See Input/Output with files.
What you then do is make up your own file format. For example, first 4 byte is int that indicates the number of blocks in the .dat and for each block, you have 4 byte indicating the length of each block, 4 byte indicating the type of the block, the variable length data itself .. something like that.
DAT obviously stands for data, and there is no real or de facto standard on what that extension actually refers to. Your decisions on the best file formats should be based on technical considerations, not pointless attempts at security through obscurity.
Professional games use a technique where they put all the needed resources (models, textures, sounds, ai, config, etc) zipped/packed into a single file thus making it faster to manage, harder to change (some even make use of a virtual filing system from what's inside the data file). Now, for what's inside the file is different depending on the needs of the game and the data structures that you use.
If you're just starting into gamedev, i recommend you stick with keeping all you assets separate and don't bother too much about packing them into a single file.
Now if you really want to start using a packed format here's a good pointer:
Creating a PAK File Format
Here's a link which claims that .dat is a movie format, 'DAT' being short for Digital Audio Tape.
I'm not sure I believe the link, but I do remember something about a Microsoft supported format called DAT, from long ago, when I used an earlier version of Windows.
It makes more sense as a logical extension for a DATA file of some kind.
.dat, as others have said, is literally just a data file. In reality, the file extension means nothing other than association with a program. For example, I could make a word processor that saves all the documents with the .mp3 file extension. These files wouldn't be playable in any media software, but the software might try. File extensions are used to help programs know what types of files they can and cannot open--however those rules don't have to be followed.
Anyway, you can dump any sort of information to a file. Programmers/software writers will often choose .dat as the extension of that file because it has become the standard to signify 'this file just holds a ton of data' and that the data doesn't necessarily hold any standardized headers, footers, or formatting.
A dat file could really contain anything. It might be as simple as a zip archive with the extension changed, or it could be a completely custom file type. If you're just starting out, you probably don't want to write your own file format, although doing so can be fun and educational. If you want to encapsulate your data files into some kind of container, you should probably go with a zip, paq, or maybe tar.gz.