Seeing as how MP3 frames are (mostly) independent units, I thought it would work well to simply concatenate several tagless MP3 files together in order to merge them losslessly.
However, as it seems, many player programs (including mplayer and mpd) seem to be detecting the file length in some way that I cannot find documentation for, and only see and play the part corresponding to the first of the files I concatenated.
Whatever this information is, I'm sure it can't be too hard to write a program to remove and then rebuild it, but I have no clue what it is. Does anyone know?
Just to make sure: Yes, I removed the ID3 tags of all the files.
Ah. I found the mp3val program, which I tried running at the resultant file after concatenation, and it told me that the file contains a "Xing header" which, while masquerading as an empty data chunk, apparently contains redundant information about the number of frames and length of the file.
It seems to be made for VBR files, but these CBR files contain it nonetheless. Either way, Google does point to a lot of information about the Xing header, which appears to be what I need.
Related
There's such a thing in Inno Setup as SolidCompression and there's a flag used in the [Files] section, which is solidbreak. Could anybody explain to me how does the aforementioned flag work, when do we really need to use it and when does the decompression process take place?
Solid compression means that the files are compressed as though all the files were just one big file. This usually results in better compression because compression knowledge built during one file will be carried over to the next, instead of a restart. The downside is that in order to decompress a specific file during installation, one has to decompress all the files before it.
The solidbreak flag, when applied, tells the compression engine to split up the solid compression and start a new stream when it comes to the source the flag is applied to, so that if that file specifically needs to be decompressed, the decompression code can simply seek to the position in the file where it starts. Basically, the downside from above disappears, but then some of the bonus of that compression knowledge gets lost as well.
If you want to use solid compression, and you have the sort of files that you have to install all of them, don't use solidbreak, but if you have a list of checkboxes to select modules, you might want to consider applying solidbreak to some or all of the optional modules. If you don't, all the files will be decompressed even though only one some are needed for the selectedd options. The exact result will vary with file size and so on so I can't say more than that you might have to experiment to see the results.
I am writing a program that produces a formatted file for the user, but it's not only producing the formatted file, it does more.
I want to distribute a single binary to the end user and when the user runs the program, it will generate the xml file for the user with appropriate data.
In order to achieve this, I want to give the file contents to a char array variable that is compiled in code. When the user runs the program, I will write out the char file to generate an xml file for the user.
char* buffers = "a xml format file contents, \
this represent many block text \
from a file,...";
I have two questions.
Q1. Do you have any other ideas for how to compile my file contents into binary, i.e, distribute as one binary file.
Q2. Is this even a good idea as I described above?
What you describe is by far the norm for C/C++. For large amounts of text data, or for arbitrary binary data (or indeed any data you can store in a file - e.g. zip file) you can write the data to a file, link it into your program directly.
An example may be found on sites like this one
I'll recommend using another file to contain data other than putting data into the binary, unless you have your own reasons. I don't know other portable ways to put strings into binary file, but your solution seems OK.
However, note that using \ at the end of line to form strings of multiple lines, the indentation should be taken care of, because they are concatenated from the begging of the next lineļ¼
char* buffers = "a xml format file contents, \
this represent many block text \
from a file,...";
Or you can use another form:
char *buffers =
"a xml format file contents,"
"this represent many block text"
"from a file,...";
Probably, my answer provides much redundant information for topic-starter, but here are what I'm aware of:
Embedding in source code: plain C/C++ solution it is a bad idea because each time you will want to change your content, you will need:
recompile
relink
It can be acceptable only your content changes very rarely or never of if build time is not an issue (if you app is small).
Embedding in binary: Few little more flexible solutions of embedding content in executables exists, but none of them cross-platform (you've not stated your target platform):
Windows: resource files. With most IDEs it is very simple
Linux: objcopy.
MacOS: Application Bundles. Even more simple than on Windows.
You will not need recompile C++ file(s), only re-link.
Application virtualization: there are special utilities that wraps all your application resources into single executable, that runs it similar to as on virtual machine.
I'm only aware of such utilities for Windows (ThinApp, BoxedApp), but there are probably such things for other OSes too, or even cross-platform ones.
Consider distributing your application in some form of installer: when starting installer it creates all resources and unpack executable. It is similar to generating whole stuff by main executable. This can be large and complex package or even simple self-extracting archive.
Of course choice, depends on what kind of application you are creating, who are your target auditory, how you will ship package to end-users etc. If it is a game and you targeting children its not the same as Unix console utility for C++ coders =)
It depends. If you are doing some small unix style utility with no perspective on internatialization, then it's probably fine. You don't want to bloat a distributive with a file no one would ever touch anyways.
But in general it is a bad practice, because eventually someone might want to modify this data and he or she would have to rebuild the whole thing just to fix a typo or anything.
The decision is really up to you.
If you just want to keep your distributive in one piece, you might also find this thread interesting: Store data in executable
Why don't you distribute your application with an additional configuration file? e.g. package your application executable and config file together.
If you do want to make it into a single file, try embed your config file into the executable one as resources.
I see it more of an OS than C/C++ issue. You can add the text to the resource part of your binary/program. In Windows programs HTML, graphics and even movie files are often compiled into resources that make part of the final binary.
That is handy for possible future translation into another language, plus you can modify resource part of the binary without recompiling the code.
I'm trying to do some file carving on a disk with c++. I can't find any resources on the web related to the on-disk structure of a pdf file. The thing is that I can find the %PDF-1.x token at the start of a cluster but I can't find out the size of a PDF file anywhere.
Let's say hypothetically that the file system entry for this particular document is lost. I find the start of the document and I keep reading until I run into the "startxref number %%EOF". The thing is that I don't know when to stop since there are multiple "%%EOF" markers in the content of a document.
I've tried stopping after reading, let's say 10 clusters, and not finding any pdf specific keyword like "obj", "stream", "trailer", "xref" anywhere. But it's quite arbitrary and it's not a deterministic method of finding the ending of the document so I can determine it's size.
I've also seen some "Length number" markers at the start of some "obj"s but the number doesn't really fit most of the time.
Any ideas on what I can try next? Is there a way to determine the exact size of the entire document? I'm interested in recovering documents programmatically.
Since PDF's are "free format" (pretty much like text files, but with less obviousness to humans when it comes to "reading" the content), it's probably hard to piece them together if they aren't in order.
A stream does have a length, which is a key to where the endstream goes. (A blank line before and after the stream itself). Streams are used t introduce bitmaps and similar things [fonts, line-art data in compressed form, etc] into the document). But if you have several 4KB segments that could go in as the same block in the middle of a stream then there's no way to tell which way they go, other than pasting it together and seeing which ones look sane and which doesn't. Similarly, if there are several segments of streams and objects, you can't really tell which goes where.
Of course, this applies to almost all types of files with "variable content" - you can find the first few kilobytes of a JPG, but knowing what the REST of the of is, won't be easy - only be visually inspecting the content can you determine which blocks of bytes belong where - if you get it wrong, you'll probably just get some random garbage.
The open source tool bulk_extractor has a module called scan_pdf that does pretty much what you are describing here. It can recognize the individual parts of a PDF file on a drive, automatically decompresses the compressed regions, and extracts text using a two strategies. It will recover data from fragments of PDFs even if the xref table cannot be found.
I'm making a simple game with SFML 1.6 in C++. Of course, I have a lot of picture, level, and data files. Problem is, I don't want these files visible. Right now they're just plain picture files in a res/ subdirectory, and I want to either conceal them or encrypt them. Is it possible to put the raw data from the files into a resource file or something? Any solution is okay to me, I just don't want the files exposed to the user.
EDIT
Cross platform solutions best, but if they don't exist, that's okay, I'm working on windows. But I don't really want to use a library if it's not needed.
Most environments come with a resource compiler that converts images/icons/etc into string data and includes them in the source.
Another common technique is to copy them into the end of the final .exe as the last part of the build process. Then at run time, open the .exe as a file and read the data from some determined offset, see Embedding a filesystem in an executable?
The ideal way for this is to make your own archive format, which would contain all of your files' data along with some extra info needed to split files distinctly within it.
I've seen a lot of games use something similar to a .DAT file or a specific file type that the game has for itself. I'm just beginning with C++ and DirectX and I was interested in keeping my information in something similar to a .DAT.
My initial conception was that it would hold information on the files you wanted to store within the .DAT file. Something similar to a .RAR file. Unfortunately, my googleing skills did not help me in finding the answers.
Right now I'm simply loading textures and sound files from a folder called Data.
EDIT: While I understand that .DAT is short for data, and I've found that a .DAT file generally contains any assortment of information, I'm still unsure about how to go about doing something as packing images and sound files into any type of file and being able to read them.
I'm not sure about using fstreams to achieve my task, however I will look into streams related to storing data and how to properly read from that data. Meanwhile if anyone has another answer to offer based on this new information, it would be appreciated.
EDIT: Thanks to the answers, I stumbled across a similar question on stackoverflow and felt I'd share it here. Combining resources into a single binary file
I don't think there is really such thing as .dat file format. It's short for "data," and different applications just put in some proprietary stuff in it and call it ".dat." You can read up on fstream classes to do file IO in C++. See Input/Output with files.
What you then do is make up your own file format. For example, first 4 byte is int that indicates the number of blocks in the .dat and for each block, you have 4 byte indicating the length of each block, 4 byte indicating the type of the block, the variable length data itself .. something like that.
DAT obviously stands for data, and there is no real or de facto standard on what that extension actually refers to. Your decisions on the best file formats should be based on technical considerations, not pointless attempts at security through obscurity.
Professional games use a technique where they put all the needed resources (models, textures, sounds, ai, config, etc) zipped/packed into a single file thus making it faster to manage, harder to change (some even make use of a virtual filing system from what's inside the data file). Now, for what's inside the file is different depending on the needs of the game and the data structures that you use.
If you're just starting into gamedev, i recommend you stick with keeping all you assets separate and don't bother too much about packing them into a single file.
Now if you really want to start using a packed format here's a good pointer:
Creating a PAK File Format
Here's a link which claims that .dat is a movie format, 'DAT' being short for Digital Audio Tape.
I'm not sure I believe the link, but I do remember something about a Microsoft supported format called DAT, from long ago, when I used an earlier version of Windows.
It makes more sense as a logical extension for a DATA file of some kind.
.dat, as others have said, is literally just a data file. In reality, the file extension means nothing other than association with a program. For example, I could make a word processor that saves all the documents with the .mp3 file extension. These files wouldn't be playable in any media software, but the software might try. File extensions are used to help programs know what types of files they can and cannot open--however those rules don't have to be followed.
Anyway, you can dump any sort of information to a file. Programmers/software writers will often choose .dat as the extension of that file because it has become the standard to signify 'this file just holds a ton of data' and that the data doesn't necessarily hold any standardized headers, footers, or formatting.
A dat file could really contain anything. It might be as simple as a zip archive with the extension changed, or it could be a completely custom file type. If you're just starting out, you probably don't want to write your own file format, although doing so can be fun and educational. If you want to encapsulate your data files into some kind of container, you should probably go with a zip, paq, or maybe tar.gz.