I have been trying to read, decode and then compress data from a json Tiled file such as the one below:
{ "height":40,
"layers":[
{
"compression":"zlib",
"data":"eJztmNkKwjAQRaN9cAPrAq5Yq3Xf6v9\/nSM2VIbQJjEZR+nDwQZScrwztoORECLySBcIgZ7nc2y4KfyWDLx+Jb9nViNgDEwY+KioAXUgQN4+zpoCMwPmQAtoAx2CLFbA2oDEo9+hwG8DnIDtF\/2K8ks086Tw2zH0uyMv7HcRr\/6\/EvvhnsPrsrxwX7rwU\/0ODig\/eV3mh3N1ld8eraWPaX6+64s9McesfrqcHfg1MpoifxcVEWjukyw+9AtFPl\/I71pER3Of6j4bv7HI54s+MChhqLlPdZ\/P3qMmFuo5h5NnTOhjM5tReN2yT51n5\/v7J3F0vi46fk+ne7aX0i9l6If7mpufTX3f5wsqv9TAD2fJLT9VrTn7UeZnM5tR+v0LMQOHXwFnxe2\/warGFRWf8QDjOLfP",
"encoding":"base64",
"height":40,
"name":"Ground",
"opacity":1,
"type":"tilelayer",
"visible":true,
"width":40,
"x":0,
"y":0
}],
"nextobjectid":1,
"orientation":"orthogonal",
"properties":
{
},
"renderorder":"right-down",
"tileheight":32,
"tilesets":[
{
"firstgid":1,
"source":"..\/..\/..\/Volumes\/Tiled 0.14.2\/examples\/desert.tsx"
}],
"tilewidth":32,
"version":1,
"width":40
}
I'm using the libraries
1. "json" (https://github.com/nlohmann/json),
2. "base64" (http://www.adp-gmbh.ch/cpp/common/base64.html) and
3. "zlib" (http://zlib.net).
This is my code:
#include <iostream>
#include <fstream>
#include <string>
#include "json.hpp"
#include "base64.hpp"
#include "zlib.h"
using json = nlohmann::json;
using namespace std;
int main(int argc, const char * argv[]) {
// Get string from json file
ifstream t("/Users/Klas/Desktop/testmap_zlib_compressed.json");
stringstream ss;
ss << t.rdbuf();
string sd = ss.str();
// Parse json string
auto j = json::parse(sd);
// Get encoded data
string encoded = j["layers"][0]["data"];
printf("Encoded: \n\n%s\n\n", encoded.c_str());
// Decode encoded data
string decoded = base64_decode(encoded);
// Convert string to char array
char b[decoded.size() + 1];
strcpy(b, decoded.c_str());
// Set size of uncompressed and compressed data
uLong h = j["layers"][0]["height"];
uLong w = j["layers"][0]["width"];
uLong ucompSize = w * h * 4; // Estimate
uLong compSize = strlen(b);
char c[ucompSize];
printf("Decoded (Compressed): \n\n%s\n\n\n", b);
// Uncompress data
uncompress((Bytef *)c, &ucompSize, (Bytef *)b, compSize);
printf("Decoded (Uncompressed): \n\n%s\n\n\n", c);
return 0;
}
When I run the program with the json file I get the output:
Encoded:
eJztmNkKwjAQRaN9cAPrAq5Yq3Xf6v9/nSM2VIbQJjEZR+nDwQZScrwztoORECLySBcIgZ7nc2y4KfyWDLx+Jb9nViNgDEwY+KioAXUgQN4+zpoCMwPmQAtoAx2CLFbA2oDEo9+hwG8DnIDtF/2K8ks086Tw2zH0uyMv7HcRr/6/EvvhnsPrsrxwX7rwU/0ODig/eV3mh3N1ld8eraWPaX6+64s9McesfrqcHfg1MpoifxcVEWjukyw+9AtFPl/I71pER3Of6j4bv7HI54s+MChhqLlPdZ/P3qMmFuo5h5NnTOhjM5tReN2yT51n5/v7J3F0vi46fk+ne7aX0i9l6If7mpufTX3f5wsqv9TAD2fJLT9VrTn7UeZnM5tR+v0LMQOHXwFnxe2/warGFRWf8QDjOLfP
Decoded (Compressed):
x\234\355\230\331
\3020E\243}p\353\256X\253u\337\352\377\235#6T\206\320&1G\351\303\301Rr\2743\266\203\221"\362H\201\236\347sl\270)\374\226\274~%\277gV#`L\370\250\250u #\336>Κ3\346#h\202,V\300ڀģߡ\300o\234\200\355\375\212\362K4\363\244\360\3331\364\273#/\354w\257\376\277\373\341\236\303벼p_\272\360S\375(?y]\346\207su\225\337\255\245\217i~\276\353\213=1Ǭ~\272\234\37052\232"h\356\223,>\364E>_\310\357ZDGs\237\352>\277\261\310\347\213>0(a\250\271Ou\237\317ޣ&\3529\207\223gL\350c3\233QxݲO\235g\347\373\373'qt\276.:~O\247{\266\227\322/e\350\207\373\232\233\237M}\337\347*\277\324\300g\311-?U\2559\373Q\346g3\233Q\372\3751\207_g\305\355\277\301\252\306\237\361
Decoded (Uncompressed):
Program ended with exit code: 0
Everything seems to be working fine before it comes to the uncompressing. I'm not sure what goes wrong. Any help to figure this out is appreciated.
You can't use strlen() on binary data. If there is a zero in there, it has nothing to do with the length of the binary data. If there isn't a zero in there, you will run off the end of the data looking for a zero. Use decoded.size().
You can't use strcpy() for the same reason. Use memcpy(). Or in this case I don't see why you would even copy it. Just give decoded.str() and decoded.size() to uncompress().
You can't necessarily print the compressed or uncompressed data as a string (%s), again for the same reason. In fact, the uncompressed data in this case consists mostly of zeros.
Related
I have been trying to encode the binary data of an application as base64 (specifically boosts base64), but I have run into an issue where the carriage return after the dos header is not being encoded correctly.
it should look like this:
This program cannot be run in DOS mode.[CR]
[CR][LF]
but instead its outputting like this:
This program cannot be run in DOS mode.[CR][LF]
it seems this first carriage return is being skipped, which then causes the DOS header to be invalid when attempting to run the program.
the code for the base64 algorithm I am using can be found at: https://www.boost.org/doc/libs/1_66_0/boost/beast/core/detail/base64.hpp
Thanks so much!
void load_file(const char* filename, char** file_out, size_t& size_out)
{
FILE* file;
fopen_s(&file, filename, "r");
if (!file)
return false;
fseek(file, 0, SEEK_END);
size = ftell(file);
rewind(file);
*out = new char[size];
fread(*out, size, 1, file);
fclose(file);
}
void some_func()
{
char* file_in;
size_t file_in_size;
load_file("filename.bin", &file_in, file_in_size);
auto encoded_size = base64::encoded_size(file_in_size);
auto file_encoded = new char[encoded_size];
memset(0, file_encoded, encoded_size);
base64::encode(file_encoded, file_in, file_in_size);
std::ofstream orig("orig.bin", std::ios_base::binary);
for (int i = 0; i < file_in_size; i++)
{
auto c = file_in[i];
orig << c; // DOS header contains a NULL as the 3rd char, don't allow it to be null terminated early, may cause ending nulls but does not affect binary files.
}
orig.close();
std::ofstream encoded("encoded.txt"); //pass this output through a base64 to file website.
encoded << file_encoded; // for loop not required, does not contain nulls (besides ending null) will contain trailing encoded nulls.
encoded.close();
auto decoded_size = base64::decoded_size(encoded_size);
auto file_decoded = new char[decoded_size];
memset(0, file_decoded, decoded_size); // again trailing nulls but it doesn't matter for binary file operation. just wasted disk space.
base64::decode(file_decoded, file_encoded, encoded_size);
std::ofstream decoded("decoded.bin", std::ios_base::binary);
for (int i = 0; i < decoded_size; i++)
{
auto c = file_decoded[i];
decoded << c;
}
decoded.close();
free(file_in);
free(file_encoded);
free(file_decoded);
}
The above code will show that the file reading does not remove the carriage return, while the encoding of the file into base64 does.
Okay thanks for adding the code!
I tried it, and indeed there was "strangeness", even after I simplified the code (mostly to make it C++, instead of C).
So what do you do? You look at the documentation for the functions. That seems complicated since, after all, detail::base64 is, by definition, not part of public API, and "undocumented".
However, you can still read the comments at the functions involved, and they are pretty clear:
/** Encode a series of octets as a padded, base64 string.
The resulting string will not be null terminated.
#par Requires
The memory pointed to by `out` points to valid memory
of at least `encoded_size(len)` bytes.
#return The number of characters written to `out`. This
will exclude any null termination.
*/
std::size_t
encode(void* dest, void const* src, std::size_t len)
And
/** Decode a padded base64 string into a series of octets.
#par Requires
The memory pointed to by `out` points to valid memory
of at least `decoded_size(len)` bytes.
#return The number of octets written to `out`, and
the number of characters read from the input string,
expressed as a pair.
*/
std::pair<std::size_t, std::size_t>
decode(void* dest, char const* src, std::size_t len)
Conclusion: What Is Wrong?
Nothing about "dos headers" or "carriage returns". Perhaps maybe something about "rb" in fopen (what's the differences between r and rb in fopen), but why even use that:
template <typename Out> Out load_file(std::string const& filename, Out out) {
std::ifstream ifs(filename, std::ios::binary); // or "rb" on your fopen
ifs.exceptions(std::ios::failbit |
std::ios::badbit); // we prefer exceptions
return std::copy(std::istreambuf_iterator<char>(ifs), {}, out);
}
The real issue is: your code ignored all return values from encode/decode.
The encoded_size and decoded_size values are estimations that will give you enough space to store the result, but you have to correct it to the actual size after performing the encoding/decoding.
Here's my fixed and simplified example. Notice how the md5sums checkout:
Live On Coliru
#include <boost/beast/core/detail/base64.hpp>
#include <fstream>
#include <iostream>
#include <vector>
namespace base64 = boost::beast::detail::base64;
template <typename Out> Out load_file(std::string const& filename, Out out) {
std::ifstream ifs(filename, std::ios::binary); // or "rb" on your fopen
ifs.exceptions(std::ios::failbit |
std::ios::badbit); // we prefer exceptions
return std::copy(std::istreambuf_iterator<char>(ifs), {}, out);
}
int main() {
std::vector<char> input;
load_file("filename.bin", back_inserter(input));
// allocate "enough" space, using an upperbound prediction:
std::string encoded(base64::encoded_size(input.size()), '\0');
// encode returns the **actual** encoded_size:
auto encoded_size = base64::encode(encoded.data(), input.data(), input.size());
encoded.resize(encoded_size); // so adjust the size
std::ofstream("orig.bin", std::ios::binary)
.write(input.data(), input.size());
std::ofstream("encoded.txt") << encoded;
// allocate "enough" space, using an upperbound prediction:
std::vector<char> decoded(base64::decoded_size(encoded_size), 0);
auto [decoded_size, // decode returns the **actual** decoded_size
processed] // (as well as number of encoded bytes processed)
= base64::decode(decoded.data(), encoded.data(), encoded.size());
decoded.resize(decoded_size); // so adjust the size
std::ofstream("decoded.bin", std::ios::binary)
.write(decoded.data(), decoded.size());
}
Prints. When run on "itself" using
g++ -std=c++20 -O2 -Wall -pedantic -pthread main.cpp -o filename.bin && ./filename.bin
md5sum filename.bin orig.bin decoded.bin
base64 -d < encoded.txt | md5sum
It prints
d4c96726eb621374fa1b7f0fa92025bf filename.bin
d4c96726eb621374fa1b7f0fa92025bf orig.bin
d4c96726eb621374fa1b7f0fa92025bf decoded.bin
d4c96726eb621374fa1b7f0fa92025bf -
I've already opened the bmp file( one channel grayscale) and stored each pixel color in a new line as hex.
after some doing processes on the data (not the point of this question), I need to export a bmp image from my data.
how can I load the textfile(data) and use stb_image_write?
pixel to image :
#include <cstdio>
#include <cstdlib>
#define STB_IMAGE_WRITE_IMPLEMENTATION
#include "stb_image_write.h"
using namespace std;
int main() {
FILE* datafile ;
datafile = fopen("pixeldata.x" , "w");
unsigned char* pixeldata ;//???
char Image2[14] = "image_out.bmp";
stbi_write_bmp(Image2, 512, 512, 1, pixeldata);
image to pixel:
#include <cstdio>
#include <cstdlib>
#define STB_IMAGE_IMPLEMENTATION
#include "stb_image.h"
using namespace std;
const size_t total_pixel = 512*512;
int main() {
FILE* datafile ;
datafile = fopen("pixeldata.x" , "w");
char Image[10] = "image.bmp";
int witdth;
int height;
int channels;
unsigned char *pixeldata = stbi_load( (Image) , &witdth, &height, &channels, 1);
if(pixeldata != NULL){
for(int i=0; i<total_pixel; i++)
{
fprintf(datafile,"%x%s", pixeldata[i],"\n");
}
}
}
There are a lot of weaknesses in the question – too much to sort this out in comments...
This question is tagged C++. Why the error-prone fprintf()? Why not std::fstream? It has similar capabilities (if not even more) but adds type-safety (which printf() family cannot provide).
The counter-part of fprintf() is fscanf(). The formatters are similar but the storage type has to be configured in formatters even more carefully than in fprintf().
If the first code sample is the attempt to read pixels back from datafile.x... Why datafile = fopen("pixeldata.x" , "w");? To open a file with fopen() for reading, it should be "r".
char Image2[14] = "image_out.bmp"; is correct (if I counted correctly) but maintenance-unfriendly. Let the compiler do the work for you:
char Image2[] = "image_out.bmp";
To provide storage for pixel data with (in OPs case) fixed size of 512 × 512 bytes, the simplest would be:
unsigned char pixeldata[512 * 512];
Storing an array of that size (512 × 512 = 262144 Bytes = 256 KByte) in a local variable might be seen as potential issue by certain people. The alternative would be to use a std::vector<unsigned char> pixeldata; instead. (std::vector allocates storage dynamically in heap memory where local variables usually on a kind of stack memory which in turn is usually of limited size.)
Concerning the std::vector<unsigned char> pixeldata;, I see two options:
definition with pre-allocation:
std::vector<unsigned char> pixeldata(512 * 512);
so that it can be used just like the array above.
definition without pre-allocation:
std::vector<unsigned char> pixeldata;
That would allow to add every read pixel just to the end with std::vector::push_back().
May be, it's worth to reserve the final size beforehand as it's known from beginning:
std::vector<unsigned char> pixeldata;
pixeldata.reserve(512 * 512); // size reserved but not yet used
So, this is how it could look finally:
#include <cstdio>
#include <cstdlib>
#include <iostream>
#include <vector>
#define STB_IMAGE_WRITE_IMPLEMENTATION
#include "stb_image_write.h"
int main()
{
const int w = 512, h = 512;
// read data
FILE *datafile = fopen("pixeldata.x" , "r");
if (!datafile) { // success of file open should be tested ALWAYS
std::cerr << "Cannot open 'pixeldata.x'!\n";
return -1; // ERROR! (bail out)
}
typedef unsigned char uchar; // for convenience
std::vector<uchar> pixeldata(w * h);
char Image2[] = "image_out.bmp";
for (int i = 0, n = w * h; i < n; ++i) {
if (fscanf(datafile, "%hhx", &pixeldata[i]) < 1) {
std::cerr << "Failed to read value " << i << of 'pixeldata.x'!\n";
return -1; // ERROR! (bail out)
}
}
fclose(datafile);
// write BMP image
stbi_write_bmp(Image2, w, h, 1, pixeldata.data());
// Actually, success of this should be tested as well.
// done
return 0;
}
Some additional notes:
Please, take this code with a grain of salt. I haven't compiled or tested it. (I leave this as task to OP but will react on "bug reports".)
I silently removed using namespace std;: SO: Why is “using namespace std” considered bad practice?
I added checking of success of file operations. File operations are something which are always good for failing for a lot of reasons. For file writing, even the fclose() should be tested. Written data might be cached until file is closed and just writing the cached data to file might fail (because just this might overflow the available volume space).
OP used magic numbers (image width and size) which is considered as bad practice. It makes code maintenance-unfriendly and might be harder to understand for other readers: SO: What is a magic number, and why is it bad?
A colleague of mine used labview to write an ASCII string as an attribute in an HDF5 file. I can see that the attribute exist, and read it, but I can't print it.
The attribute is, as shown in HDF Viewer:
Date = 2015\07\09
So "Date" is its name.
I'm trying to read the attribute with this code
hsize_t sz = H5Aget_storage_size(dateAttribHandler);
std::cout<<sz<<std::endl; //prints 16
hid_t atype = H5Aget_type(dateAttribHandler);
std::cout<<atype<<std::endl; //prints 50331867
std::cout<<H5Aread(dateAttribHandler,atype,(void*)date)<<std::endl; //prints 0
std::cout<<date<<std::endl; //prints messy characters!
//even with an std::string
std::string s(date);
std::cout<<s<<std::endl; //also prints a mess
Why is this happening? How can I get this string as a const char* or std::string?
I tried also using the type atype = H5Tcopy (H5T_C_S1);, and that didn't work too...
EDIT:
Here I provide a full, self-contained program as it was requested:
#include <string>
#include <iostream>
#include <fstream>
#include <hdf5/serial/hdf5.h>
#include <hdf5/serial/hdf5_hl.h>
std::size_t GetFileSize(const std::string &filename)
{
std::ifstream file(filename.c_str(), std::ios::binary | std::ios::ate);
return file.tellg();
}
int ReadBinFileToString(const std::string &filename, std::string &data)
{
std::fstream fileObject(filename.c_str(),std::ios::in | std::ios::binary);
if(!fileObject.good())
{
return 1;
}
size_t filesize = GetFileSize(filename);
data.resize(filesize);
fileObject.read(&data.front(),filesize);
fileObject.close();
return 0;
}
int main(int argc, char *argv[])
{
std::string filename("../Example.hdf5");
std::string fileData;
std::cout<<"Success read file into memory: "<<
ReadBinFileToString(filename.c_str(),fileData)<<std::endl;
hid_t handle;
hid_t magFieldsDSHandle;
hid_t dateAttribHandler;
htri_t dateAtribExists;
handle = H5LTopen_file_image((void*)fileData.c_str(),fileData.size(),H5LT_FILE_IMAGE_DONT_COPY | H5LT_FILE_IMAGE_DONT_RELEASE);
magFieldsDSHandle = H5Dopen(handle,"MagneticFields",H5P_DEFAULT);
dateAtribExists = H5Aexists(magFieldsDSHandle,"Date");
if(dateAtribExists)
{
dateAttribHandler = H5Aopen(magFieldsDSHandle,"Date",H5P_DEFAULT);
}
std::cout<<"Reading file done."<<std::endl;
std::cout<<"Open handler: "<<handle<<std::endl;
std::cout<<"DS handler: "<<magFieldsDSHandle<<std::endl;
std::cout<<"Attributes exists: "<<dateAtribExists<<std::endl;
hsize_t sz = H5Aget_storage_size(dateAttribHandler);
std::cout<<sz<<std::endl;
char* date = new char[sz+1];
std::cout<<"mem bef: "<<date<<std::endl;
hid_t atype = H5Aget_type(dateAttribHandler);
std::cout<<atype<<std::endl;
std::cout<<H5Aread(dateAttribHandler,atype,(void*)date)<<std::endl;
fprintf(stderr, "Attribute string read was '%s'\n", date);
date[sz] = '\0';
std::string s(date);
std::cout<<"mem aft: "<<date<<std::endl;
std::cout<<s<<std::endl;
H5Dclose(magFieldsDSHandle);
H5Fclose(handle);
return 0;
}
Printed output of this program:
Success read file into memory: 0
Reading file done.
Open handler: 16777216
DS handler: 83886080
Attributes exists: 1
16
mem bef:
50331867
0
Attribute string read was '�P7'
mem aft: �P7
�P7
Press <RETURN> to close this window...
Thanks.
It turned out that H5Aread has to be called with a reference of the char pointer... so pointer of a pointer:
H5Aread(dateAttribHandler,atype,&date);
Keep in mind that one doesn't have to reserve memory for that. The library will reserve memory, and then you can free it with H5free_memory(date).
This worked fine.
EDIT:
I learned that this is the case only when the string to be read has variable length. If the string has a fixed length, then one has to manually reserve memory with size length+1 and even manually set the last char to null (to get a null-terminated string. There is a function in the hdf5 library that checks whether a string is fixed in length.
I discovered that if you do not allocate date and pass the &date to H5Aread, then it works. (I use the C++ and python APIs, so I do not know the C api very well.) Specifically change:
char* date = 0;
// std::cout<<"mem bef: "<<date<<std::endl;
std::cout << H5Aread(dateAttribHandler, atype, &date) << std::endl;
And you should see 2015\07\09 printed.
You may want to consider using the C++ API. Using the C++ API, your example becomes:
std::string filename("c:/temp/Example.hdf5");
H5::H5File file(filename, H5F_ACC_RDONLY);
H5::DataSet ds_mag = file.openDataSet("MagneticFields");
if (ds_mag.attrExists("Date"))
{
H5::Attribute attr_date = ds_mag.openAttribute("Date");
H5::StrType stype = attr_date.getStrType();
std::string date_str;
attr_date.read(stype, date_str);
std::cout << "date_str= <" << date_str << ">" << std::endl;
}
As a simpler alternative to existing APIs, your use-case could be solved as follows in C using HDFql:
// declare variable 'value'
char *value;
// register variable 'value' for subsequent use (by HDFql)
hdfql_variable_register(&value);
// read 'Date' (from 'MagneticFields') and populate variable 'value' with it
hdfql_execute("SELECT FROM Example.hdf5 MagneticFields/Date INTO MEMORY 0");
// display value stored in variable 'value'
printf("Date=%s\n", value);
FYI, besides C, the code above can be used in C++, Python, Java, C#, Fortran or R with minimal changes.
I have never worked with binary files before. I opened an .mp3 file using the mode ios::binary, read data from it, assigned 0 to each byte read and then rewrote them to another file opened in ios::binary mode. I opened the output file on a media player, it sounds corrupted but I can still hear the song. I want to know what happened physically.
How can I access/modify the raw data ( bytes ) of an audio ( video, images, ... ) using C++ ( to practice file encryption/decryption later )?
Here is my code:
#include <iostream>
#include <fstream>
#include <cstring>
using namespace std;
int main(){
char buffer[256];
ifstream inFile;
inFile.open("Backstreet Boys - Incomplete.mp3",ios::binary);
ofstream outFile;
outFile.open("Output.mp3",ios::binary);
while(!inFile.eof()){
inFile.read(buffer,256);
for(int i = 0; i<strlen(buffer); i++){
buffer[i] = 0;
}
outFile.write(buffer,256);
}
inFile.close();
outFile.close();
}
What you did has nothing to do with binary files or audio. You simply copied the file while zeroing some of the bytes. (The reason you didn't zero all of the bytes is because you use i<strlen(buffer), which simply counts up to the first zero byte rather than reporting the size of the buffer. Also you modify the buffer which means strlen(buffer) will report the length as zero after you zero the first byte.)
So the exact change in audio you get is entirely dependent on the mp3 file format and the audio compression it uses. MP3 is not an audio format that can be directly manipulated in useful ways.
If you want to manipulate digital audio, you need to learn about how raw audio is represented by computers.
It's actually not too difficult. For example, here's a program that writes out a raw audio file containing just a 400Hz tone.
#include <fstream>
#include <limits>
int main() {
const double pi = 3.1415926535;
double tone_frequency = 400.0;
int samples_per_second = 44100;
double output_duration_seconds = 5.0;
int output_sample_count =
static_cast<int>(output_duration_seconds * samples_per_second);
std::ofstream out("signed-16-bit_mono-channel_44.1kHz-sample-rate.raw",
std::ios::binary);
for (int sample_i = 0; sample_i < output_sample_count; ++sample_i) {
double t = sample_i / static_cast<double>(samples_per_second);
double sound_amplitude = std::sin(t * 2 * pi * tone_frequency);
// encode amplitude as a 16-bit, signed integral value
short sample_value =
static_cast<short>(sound_amplitude * std::numeric_limits<short>::max());
out.write(reinterpret_cast<char const *>(&sample_value),
sizeof sample_value);
}
}
To play the sound you need a program that can handle raw audio, such as Audacity. After running the program to generate the audio file, you can File > Import > Raw data..., to import the data for playing.
How can I access/modify the raw data ( bytes ) of an audio ( video, images, ... ) using C++ ( to practice file encryption/decryption later )?
As pointed out earlier, the reason your existing code is not completely zeroing out the data is because you are using an incorrect buffer size: strlen(buffer). The correct size is the number of bytes read() put into the buffer, which you can get with the function gcount():
inFile.read(buffer,256);
int buffer_size = inFile.gcount();
for(int i = 0; i < buffer_size; i++){
buffer[i] = 0;
}
outFile.write(buffer, buffer_size);
Note: if you were to step through your program using a debugger you probably would have pretty quickly seen the problem yourself when you noticed the inner loop executing less than you expected. Debuggers are a really handy tool to learn how to use.
I notice you're using open() and close() methods here. This is sort of pointless in this program. Just open the file in the constructor, and allow the file to be automatically closed when inFile and outFile go out of scope:
{
ifstream inFile("Backstreet Boys - Incomplete.mp3",ios::binary);
ofstream outFile("Output.mp3",ios::binary);
// don't bother calling .close(), it happens automatically.
}
This question already has answers here:
How to read line by line after i read a text into a buffer?
(4 answers)
Closed 10 years ago.
I'm trying to ask a similar question to this post:
C: read binary file to memory, alter buffer, write buffer to file
but the answers didn't help me (I'm new to c++ so I couldn't understand all of it)
How do I have a loop access the data in memory, and go through line by line so that I can write it to a file in a different format?
This is what I have:
#include <fstream>
#include <iostream>
#include <string>
#include <sstream>
#include <vector>
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#include <stdlib.h>
using namespace std;
int main()
{
char* buffer;
char linearray[250];
int lineposition;
double filesize;
string linedata;
string a;
//obtain the file
FILE *inputfile;
inputfile = fopen("S050508-v3.txt", "r");
//find the filesize
fseek(inputfile, 0, SEEK_END);
filesize = ftell(inputfile);
rewind(inputfile);
//load the file into memory
buffer = (char*) malloc (sizeof(char)*filesize); //allocate mem
fread (buffer,filesize,1,inputfile); //read the file to the memory
fclose(inputfile);
//Check to see if file is correct in Memory
cout.write(buffer,filesize);
free(buffer);
}
I appreciate any help!
Edit (More info on the data):
My data is different files that vary between 5 and 10gb. There are about 300 million lines of data. Each line looks like
M359
T359 3520 359
M400
A3592 zng 392
Where the first element is a character, and the remaining items could be numbers or characters. I'm trying to read this into memory since it will be a lot faster to loop through line by line, than reading a line, processing, and then writing. I am compiling in 64bit linux. Let me know if I need to clarify further. Again thank you.
Edit 2
I am using a switch statement to process each line, where the first character of each line determines how to format the rest of the line. For example 'M' means millisecond, and I put the next three numbers into a structure. Each line has a different first character that I need to do something different for.
So pardon the potentially blatantly obvious, but if you want to process this line by line, then...
#include <iostream>
#include <fstream>
#include <string>
using namespace std;
int main(int argc, char *argv[])
{
// read lines one at a time
ifstream inf("S050508-v3.txt");
string line;
while (getline(inf, line))
{
// ... process line ...
}
inf.close();
return 0;
}
And just fill in the body of the while loop? Maybe I'm not seeing the real problem (a forest for the trees kinda thing).
EDIT
The OP is inline with using a custom streambuf which may not necessarily be the most portable thing in the world, but he's more interested in avoiding flipping back and forh between input and output files. With enough RAM, this should do the trick.
#include <iostream>
#include <fstream>
#include <iterator>
#include <memory>
using namespace std;
struct membuf : public std::streambuf
{
membuf(size_t len)
: streambuf()
, len(len)
, src(new char[ len ] )
{
setg(src.get(), src.get(), src.get() + len);
}
// direct buffer access for file load.
char * get() { return src.get(); };
size_t size() const { return len; };
private:
std::unique_ptr<char> src;
size_t len;
};
int main(int argc, char *argv[])
{
// open file in binary, retrieve length-by-end-seek
ifstream inf(argv[1], ios::in|ios::binary);
inf.seekg(0,inf.end);
size_t len = inf.tellg();
inf.seekg(0, inf.beg);
// allocate a steam buffer with an internal block
// large enough to hold the entire file.
membuf mb(len+1);
// use our membuf buffer for our file read-op.
inf.read(mb.get(), len);
mb.get()[len] = 0;
// use iss for your nefarious purposes
std::istream iss(&mb);
std::string s;
while (iss >> s)
cout << s << endl;
return EXIT_SUCCESS;
}
You should look into fgets and scanf, in which you can pull out matched pieces of data so it is easier to manipulate, assuming that is what you want to do. Something like this could look like:
FILE *input = fopen("file.txt", "r");
FILE *output = fopen("out.txt","w");
int bufferSize = 64;
char buffer[bufferSize];
while(fgets(buffer,bufferSize,input) != EOF){
char data[16];
sscanf(buffer,"regex",data);
//manipulate data
fprintf(output,"%s",data);
}
fclose(output);
fclose(input);
That would be more of the C way to do it, C++ handles things a little more eloquently by using an istream:
http://www.cplusplus.com/reference/istream/istream/
If I had to do this, I'd probably use code something like this:
std::ifstream in("S050508-v3.txt");
std::istringstream buffer;
buffer << in.rdbuf();
std::string data = buffer.str();
if (check_for_good_data(data))
std::cout << data;
This assumes you really need the entire contents of the input file in memory at once to determine whether it should be copied to output or not. If (for example) you can look at the data one byte at a time, and determine whether that byte should be copied without looking at the others, you could do something more like:
std::ifstream in(...);
std::copy_if(std::istreambuf_iterator<char>(in),
std::istreambuf_iterator<char>(),
std::ostream_iterator<char>(std::cout, ""),
is_good_char);
...where is_good_char is a function that returns a bool saying whether that char should be included in the output or not.
Edit: the size of files you're dealing with mostly rules out the first possibility I've given above. You're also correct that reading and writing large chunks of data will almost certainly improve speed over working on one line at a time.