File Merger C++ - c++

I was developing an algorithm for a file spliter & merger, where I stumbled upon a problem on How do I merge the files(I split ) with their original extension(file format). I have an idea of writing the file format in the start of the very first chunk of split file (i-e if I have split a file in three files, then store the file format in 1.bij file). Will this idea work? Do you know any better idea, so please share with me.
Thanks

Why not let the user choose the filename with a command line argument ? You could use a -o command line option. Bonus point for letting the user redirects to the standard output using the -filename. He could then pipe with another tool. For instance: merger -o - part*.bin | tar zxvf -

You may include a header for each splitted file with its full filename, but also - for example - with original size, checksum and so on.
Edit: How to write text to binary stream
fstream f(/* initialize*/);
std::string s = "asdf";
// Store the size of text
auto size = s.size();
f.write(&size, sizeof(size));
// Store the string itself
f.write(s.c_str(), s.size());

Related

C++ - Missing end of line characters in file read

I am using the C++ streams to read in a bunch of files in a directory and then write them to another directory. Since these files may be of different types, I am using a the generic ios::binary flag when reading/writing these files.
Example code below:
std::fstream inf( "ex.txt", std::ios::in | std::ios::binary);
char c;
while( inf >> c ) {
// writing to another file in binary format
}
The issue I have is that in the case of files containing text, the end of line characters in these text files are not being written to the output file.
Edit: Or at least they do not appear to be as when the newly written file is opened, there is only a single continuous line of characters.
Edit again: The problem (of the continuous string) appears to persist even when the read / write is made in text mode.
Thus, I was wondering if there was a way to check if a file has text or binary and then read/write it appropriately. Else, is there any way to preserve the end of line characters even when opening the file in binary format?
Edit: I am using the g++ 4.8.2 compiler
When you want to manipulate bytes, you need to use read and write methods, not >> << operators.
You can get the intended behavior with inp.flags(inp.flags() & ~std::ios_base::skipws);, though.

Read file in zip without unzip(c++)

I can read file in c++. This is my code:
std::string ReadFile(std::string file)
{
char buff[20480];
std::ifstream fread(file, std::ios::binary | std::ios::app);
fread.read(buff,sizeof(buff));
std::string str = buff;
fread.close();
return str;
}
The variable "file" is the FilePath. And I get a folder .zip, I want to read the file in folder. What should I do? I try to use libzip, but it can't solve my problem, maybe I didn't use it by wrong way.
No. To unzip a file, you must unzip a file.
You don't need to invoke the unzip utility to do it: there are libraries that can expose decompression through a streams API, resulting in code that looks rather similar to what you've written above. But you need to install and learn how to use those libraries.
Unless you have access to the API that allows to unpack your file I don't see how directly in the code.
If you are lazy you could write a small script in whatever language you prefer that does the unpacking and then calls your program on the unpacked file
Assuming, you have unzip available. Did you try something like:
FILE * file = popen("unzip -p filename", "r");
Similarly, popen("gzip -f filename", "r") should work for gzip.
In order to parse the output, I'd refer to this post (with Windows hints). I don't know about a more C++-style way of doing this.

Including data file into C++ project

I have a data file data.txt which includes character and numeric data.
Usually I read the data.txt in my program by using file streams like
ifstream infile("C:\\data.txt",ios::in); then use infile.getline to read the values.
Is it anyway possible to have the data.txt file included to the project and compile
it with the project such that when I read the file I do not have to worry about the path
of the file ( I mean I just use something like ifstream infile("data.txt",ios::in) ).
Moreover if I can compile the file with my project I will not have to worry about
providing a separate data.txt file with my release build to anyone else who wants to use
my program.
I do not want to change the data.txt file to some kind of header file. I want to keep the
.txt file as is and somehow package it within my executable that I am building. I still
want to keep using ifstream infile("data.txt",ios::in) and read the lines from the file
but want data.txt file to be with the project just like anyother .h or .cpp files.
I am using C++ visual studio 2010.
It would be kind of someone to provide some insight into the above thing I am trying to
do.
Update
I managed to use the code below to read in the data file as resource
HRSRC hRes = FindResource(GetModuleHandle(NULL), MAKEINTRESOURCE(IDR_TEXT1), _T("TEXT"));
DWORD dwSize = SizeofResource(GetModuleHandle(NULL), hRes); HGLOBAL hGlob = LoadResource(GetModuleHandle(NULL), hRes);
const BYTE* pData = reinterpret_cast<const BYTE*>(::LockResource(hGlob));
but how do I read the separate lines ? Somehow I am unable to read the separate lines. I can't seem to differentiate one line from another.
I can just give you a workaround, if you don't want to worry about the path of the file, you can just:
- add you file to your project
- add a post building event to copy your data.txt file in your build folder.
There was a similar question, that also required inclusion of external file into C++ code. Please check my answer here.
Another way is to include a custom resource in your project, and then use FindResource, LoadResource, LockResource to access it.
You can put the contents of the file in std::string variable:
std::string data_txt = "";
Then use sscanf or stringstream from STL to parse the contents.
One more thing - you will need to handle special characters like '"' by using \ character before each one.
For any kind of file, base on RBerteig anwser you could do something simple as this with python:
This program will generate a text.txt.c file that can be compiled and linked to your code, to embed any text or binary file directly to your exe and read it directly from a variable:
import struct; # Needed to convert string to byte
f = open("text.txt","rb") # Open the file in read binary mode
s = "unsigned char text_txt_data[] = {"
b = f.read(1) # Read one byte from the stream
db = struct.unpack("b",b)[0] # Transform it to byte
h = hex(db) # Generate hexadecimal string
s = s + h; # Add it to the final code
b = f.read(1) # Read one byte from the stream
while b != "":
s = s + "," # Add a coma to separate the array
db = struct.unpack("b",b)[0] # Transform it to byte
h = hex(db) # Generate hexadecimal string
s = s + h; # Add it to the final code
b = f.read(1) # Read one byte from the stream
s = s + "};" # Close the bracktes
f.close() # Close the file
# Write the resultan code to a file that can be compiled
fw = open("text.txt.c","w");
fw.write(s);
fw.close();
Will generate something like
unsigned char text_txt_data[] = {0x52,0x61,0x6e,0x64,0x6f,0x6d,0x20,0x6e,0x75...
You can latter use your data in another c file using the variable with a code like this:
extern unsigned char text_txt_data[];
Right now I cant think of two ways to converting it to readable text. Using memory streams or converting it to a c-string.

what is Link indicator (file type) in tar parser

I want to know the file-type of html file present inside a tar file. I have stored theis tar file in a buffer and i know that i can know the size of the file at location buffer[124] but i want to know is -
(1.) I want to know that if the file present in tar is HTML file ?? Any idea how to do that ?? I think that i can know the file form Link Indicator(but i am not sure). could any one explain please how to do that ??
(2.) Once if i am sure that i have html file inside the tar file then i want to staotr the contents of these html file in the tar file.
There are many other files also present in tar file (not only html file-Please attention on this line, so i don't know the loctaion of the html file)
Any idea how to achieve this ??
Ah, you mean the typeflag field. No, it's not for that kind of filetype, it's to tell if the file is a regular file, directory, hard link, soft link, device special file, etc.
Your system (if it's POSIX compliant) should have a <tar.h> system header file (usually in /usr/include) that contains these flags. Or you can see the official POSIX specification.
# Joachim thanks for your suggestion finally i have done it the code is as below you can see if you want -
char* StartPosition;
size_t skip= 0;
char HtmlFileContents [200000];
char contents [8000];
do
{
int SizeOfFile = CreateOctalToInteger(&buffer[skip+124],11);
size_t distance= ((SizeOfFile%512) ? SizeOfFile + 512 - (SizeOfFile%512) : SizeOfFile );
skip += distance + 512;
memcpy(contents,&buffer[skip],100);
if (StartPosition=strstr(contents,".html"))
{
MessageBox(m_hwndPreview,L"finally string is copied",L"BTN WND6",MB_ICONINFORMATION);
int SizeOfFile = CreateOctalToInteger(&buffer[skip+124],11);
memcpy(HtmlFileContents,&buffer[skip+512],SizeOfFile);
break;
}
}
while(strcmp(contents,".html") != NULL);
and it can work for every file stored in tar file . we just need to put the file extension (.html in my case) and this code will give us its contents.

embedding a text file in an exe which can be accessed using fopen

I would like to embed a text file with some data into my program.
let's call it "data.txt".
This text file is usually loaded with a function which requires the text file's file name as input and is eventually opened using a fopen() call... some something to the lines of
FILE* name = fopen("data.txt");
I can't really change this function and I would like the routine to open this same file every time it runs. I've seen people ask about embedding the file as a header but it seems that I wouldn't be able to call fopen() on a file that I embed into the header.
So my question is: is there a way to embed a text file as a callable file/variable to fopen()?
I am using VS2008.
Yes and No. The easiest way is to transform the content of the text file into an initialized array.
char data_txt[] = {
'd','a','t','a',' ','g','o','e','s',' ','h','e','r','e', //....
};
This transformation is easily done with a small perl script or even a small C program. You then compile and link the resulting module into your program.
An old trick to make this easier to manage with a Makefile is to make the script transform its data into the body of the initializer and write it to a file without the surrounding variable declaration or even the curly braces. If data.txt is transformed to data.inc, then it is used like so:
char data_txt[] = {
#include "data.inc"
};
Update
On many platforms, it is possible to append arbitrary data to the executable file itself. The trick then is to find it at run time. On platforms where this is possible, there will be file header information for the executable that indicates the length of the executable image. That can be used to compute an offset to use with fseek() after you have opened the executable file for reading. That is harder to do in a portable way, since it may not even be possible to learn the actual file name of your executable image at run time in a portable way. (Hint, argv[0] is not required to point to the actual program.)
If you cannot avoid the call to fopen(), then you can still use this trick to keep a copy of the content of data.txt, and put it back in a file at run time. You could even be clever and only write the file if it is missing....
If you can drop the call to fopen() but still need a FILE * pointing at the data, then this is likely possible if you are willing to play fast and loose with your C runtime library's implementation of stdio. In the GNU version of libc, functions like sprintf() and sscanf() are actually implemented by creating a "real enough" FILE * that can be passed to a common implementation (vfprintf() and vfscanf(), IIRC). That faked FILE is marked as buffered, and points its buffer to the users's buffer. Some magic is used to make sure the rest of stdio doesn't do anything stupid.
For any kind of file, base on RBerteig anwser you could do something simple as this with python:
This program will generate a text.txt.c file that can be compiled and linked to your code, to embed any text or binary file directly to your exe and read it directly from a variable:
import struct; # Needed to convert string to byte
f = open("text.txt","rb") # Open the file in read binary mode
s = "unsigned char text_txt_data[] = {"
b = f.read(1) # Read one byte from the stream
db = struct.unpack("b",b)[0] # Transform it to byte
h = hex(db) # Generate hexadecimal string
s = s + h; # Add it to the final code
b = f.read(1) # Read one byte from the stream
while b != "":
s = s + "," # Add a coma to separate the array
db = struct.unpack("b",b)[0] # Transform it to byte
h = hex(db) # Generate hexadecimal string
s = s + h; # Add it to the final code
b = f.read(1) # Read one byte from the stream
s = s + "};" # Close the bracktes
f.close() # Close the file
# Write the resultan code to a file that can be compiled
fw = open("text.txt.c","w");
fw.write(s);
fw.close();
Will generate something like
unsigned char text_txt_data[] = {0x52,0x61,0x6e,0x64,0x6f,0x6d,0x20,0x6e,0x75...
You can latter use your data in another c file using the variable with a code like this:
extern unsigned char text_txt_data[];
Right now I cant think of two ways to converting it to readable text. Using memory streams or converting it to a c-string.