Determine if file is binary or text

Determine if file is binary or text - mfc

Is there a way to determine if a file is a binary or text file using the the File Management functions or MFC?
In the File Management functions, GetFileType doesn't seem to distinguish between binary and text files. Same with the dwFileAttributes attribute here.
In MFC, I tried looking at CFile::GetStatus(), but the m_attribute doesn't say anything about whether files are binary or text.
Does anyone know a way to do this using one of these two libraries? Thank you.
(I'd like to know because I am trying to make a function that recursively goes through a directory. I rewrite the text files (using CStdioFile) and replace some words here and there... but it seems to screw up any images I have in the directory. I'd like to be able to copy the images too... but i need a way to distinguish between binary and text files so I can treat them differently.)

As far as I know, there's no simple API to do this, MFC or otherwise. However, there's a bunch of useful ideas in these similar questions:
How do I distinguish between 'binary' and 'text' files?
How to identify the file content as ASCII or binary

Related

How to convert .trc file type to text file using C++?

I have got a trace file that is binary in nature. I want to convert it to a text file and convert the data inside it to decimal form. I mean I am not sure, how to do this. This .trc file contains data in the form of telegrams and I want to extract particular kind of telegram and save them in text file which is readable in nature. I have to do all of this using C++.
Do you suggest any other language for it or does anyone has any idea about doing this in C++?

Binary trace files are usually encoded in proprietary formats. And there are applications or profilers specifically built to parse them.
Unless you know the file format, the only way to decode it is through reverse engineering. And in most cases it's not worth the effort.
Try to find documentation about it. Or maybe an application or utility that loads the file and exports data that is easier to read.

In case you are speaking about .trc binary files from Teledyne Lecroy Oscilloscopes, I would suggest to any of the following libraries out there for that:
https://pypi.org/project/lecroyparser/
https://github.com/jneer/lecroy-reader
https://github.com/yetifrisstlama/readTrc
https://igit.ific.uv.es/ferhue/lecroyparser

How do I input and output various file types in c++

I've seen a lot of examples of i/o with text files I'm just wondering if you can do the same with other file types like mp3's, jpg's, zip files, etc..?
Will iostream and fstream work for all of these or do I need another library? Do I need a new sdk?
It's all binary data so I'd think it would be that simple. But I've been unpleasently surprised before.
Could I convert all files to text or binary?

It depend on what you mean by "work"
You can think of those files as a book written in Greek.
If you want to just mess with binary representation (display text in Greek on screen) then yes, you can do that.
If you want to actually extract some info: edit video stream, remove voice from audio (actually understand what is written), then you would need to either parse file format yourself (learn Greek) or use some specialized library (hire a translator).
Either way, filestreams are suited to actually access those files data (and many libraries do use them under the hood)

You can work on binary streams by opening them with openmode binary :
ifstream ifs("mydata.mp3", ios_base::binary);
Then you read and write any binary content. However, if you need to generate or modify such content, play a video or display a piture, the you you need to know the inner details of the format you are using. This can be exremely complex, so a library would be recomended. And even with a library, advanced programming skills are required.
Examples of open source libraries: ffmpeg for usual audio/video format, portaudio for audio, CImg for image processing (in C++), libpng for png graphic format, lipjpeg for jpeg. Note that most libraries offer a C api.
Some OS also supports some native file types (example, windows bitmaps).

You can open these files using fstream, but the important thing to note is you must be intricately aware of what is contained within the file in order to process it.
If you just want to open it and spit out junk, then you can definitely just start at the first line of the file and exhaustively push all data into your console.
If you know what the file looks like on the inside, then you can process it just as you would any other file.
There may be specific libraries for processing specific files, but the fstream library will allow you to access any file you'd like.

All files are just bytes. There's nothing stopping you from reading/writing those bytes however you see fit.
The trick is doing something useful with those bytes. You could read the bytes from a .jpg file, for example, but you have to know what those bytes mean, and that's complicated. Usually it's best to use libraries written by people who know about the format in question, and let them deal with that complexity.

How to distinguish between movie and image

is there any "good" way to distinguish between movie file and image file?
I would like to know what exactly my "std::wstring filePath" is - a movie, or an image.
Therefore, I could go further with strong assurance I am working with known file type.
In other words, I have two classes MyImage and MyMovie both need path to file in their constructors. I would like to verify path to file somehow before creating one of those classes.
bool isMovie(const std::wstring & filePath);
bool isImage(const std::wstring & filePath);
Of course I thought about file extensions, but I'm not sure that it is good and not prone to errors solution. So is it good to use file extension or any other feasible solution is possible.?
Thanks in advance

You can use libmagic to detect what kind of file it is. You pass the file path in and it'll give you a textual description or MIME type for the file.

Usually files have special so called magic bytes. I you have a control over the specification I would use this. If you try opening zip, gif, or other binary stuff you can usually find some distinctive strings there.
There is a unix tool utility called file that provides such functionality, so probably some sort of standard exists.
SQLite 3 provides a nice example. Look at 1.2.1 and 1.2.5. So not only the info that it is a SQLite 3 DB is given, but also additional application id, so other tools can recognize which application's DB it is.
I personally used few first bytes of a file to code type and version info for my files when I was playing with binary stuff.

Opening an existing .doc file using ofstream in C++

Assuming I have a file with .doc extension in Windows platform, how can I open the the file for outputting its contents on the screen using the ofstream object in C++? I am aware that the object can be used to open files in text and binary modes. But I would like to know if a .doc (or even .pdf) file can be opened and its contents read.

I've never actually done this before, but after reading up on it, I think I might have a suggestion. The .docx format is actually just XML that is zipped up. After unzipping, the file is located at word/document.xml. Doing this in a program is where it gets fun.
Two options: If you're using C++ CLR (.NET) then Microsoft has an SDK for you. It should make it pretty easy to open Office documents.
Otherwise if you're just using regular C++, you might have to do some extra work.
Open the file and unzip it using a library like zlib
Find the document.xml file inside
Parse the XML document. You'll probably want to use some kind of XML parsing library for this. You'll have to look up the specs for the XML to figure out how to get the text you want.

C++ std library has ifstream class that can be used to read simple text files, and for read binary files too.
It is up to you to interpret these bytes in the file. To proper interpret the binary file you need to know the format of the file.
If you think of MS Word files then I would start from here: http://en.wikipedia.org/wiki/Office_Open_XML to understand MS Word 2007 format.
You might find the Boost Iostreams library ( http://www.boost.org/doc/libs/1_52_0/libs/iostreams/doc/home.html ) somehow useful if you want to make some filter by yourself.

Include static data/text file

I have a text file (>50k lines) of ascii numbers, with string identifiers, that can be thought of as a collection of data vectors. Based on user input, the application only needs one of these data vectors at runtime.
As far as I can see, I have 3 options for getting the information from this text file:
Keep it as a text file, extract the required vector at run-time. I believe the downside is that you can't have a relative path in the code, so the user would have to point to the file's correct location (?). Or alternatively, get the configure script to inject the absolute path as a macro.
Convert it to a static unsigned char using xxd (as explained here) and then include the resulting file. Downside is that a 5MB file turns into a 25MB include file. Am I correct in thinking that this 25MB is loaded into memory for the duration of the runtime?
Convert it to an object and link using objcopy as explained here. This seems to keep the file size about the same -- are there other trade-offs?
Is there a standard/recommended method for doing this? I can use C or C++ if that makes a difference.
Thanks.
(Running on linux with gcc)

I would go with number 1 and pass the filepath into the program as an argument. There's nothing wrong with doing that and it is simple and straight-forward.

You should have a look at the answers here:
Directory of running program
The top voted answer gives you a glue how to handle your data file. But instead of the home folder I would suggest to save it under /usr/share as explained in the link.

I'd preffer to use zlib (and both ways are possible:side file or include with compressed data).

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Determine if file is binary or text - mfc

As far as I know, there's no simple API to do this, MFC or otherwise. However, there's a bunch of useful ideas in these similar questions: How do I distinguish between 'binary' and 'text' files? How to identify the file content as ASCII or binary

Related

How to convert .trc file type to text file using C++?

How do I input and output various file types in c++

How to distinguish between movie and image

Opening an existing .doc file using ofstream in C++

Include static data/text file

Categories

Resources