I am using the boost gzip_decompressor() from the following link:
How can I read line-by-line using Boost IOStreams' interface for Gzip files?
Reading the gzip file works fine, but how do I read the gzip_params? I want to know the original file name that's stored in the gzip_params.file_name.
Excellent question.
The solution is to use component<N, T> to get a pointer to the actual decompressor instance:
Live On Coliru
#include <iostream>
#include <fstream>
#include <boost/iostreams/filtering_stream.hpp>
#include <boost/iostreams/filter/gzip.hpp>
int main()
{
std::ifstream file("file.gz", std::ios_base::in | std::ios_base::binary);
try {
boost::iostreams::filtering_istream in;
using gz_t = boost::iostreams::gzip_decompressor;
in.push(gz_t());
in.push(file);
for(std::string str; std::getline(in, str); )
{
std::cout << "Processed line " << str << '\n';
}
if (gz_t* gz = in.component<0, gz_t>()) {
std::cout << "Original filename: " << gz->file_name() << "\n";
std::cout << "Original mtime: " << gz->mtime() << "\n";
std::cout << "Zip comment: " << gz->comment() << "\n";
}
}
catch(const boost::iostreams::gzip_error& e) {
std::cout << e.what() << '\n';
}
}
Preparing a sample file using
gzip testj.txt
mv testj.txt.gz file.gz
Prints
Processed line Hello world
Original filename: testj.txt
Original mtime: 1518987084
Zip comment:
Related
This question is a continuation of my previous question.
error: implicit instantiation of undefined template 'llvm::yaml::MissingTrait
I am working on a project which uses the LLVM YAML I/O library. This is the documentation/tutorial that I am following:
https://www.llvm.org/docs/YamlIO.html
I have created a small program that would read in a yaml file into objects in memory. Then it would print those objects. Finally it would write the objects out into a different output file.
First it parses the command line arguments. Then it creates the input reader. Then it creates the YAML Input object, which is used to read the *.yaml file in, and parse it. It is in the parsing step that I am having an error. Supposing if parsing the input from the *.yaml file was successful, the data will get stored into the DocType myDoc;. Then it prints all the Person objects stored in that std::vector. An overloaded operator<<() prints each element. Then it creates the output writer, creates the YAML Output, and writes myDoc into the file output_file.yaml.
The goal of this program is to demonstrate reading and writing a *.yaml file with LLVM YAML I/O. It successfully writes the output file, but it cannot read the input file.
Now suppose that instead of filling myDoc with elements from the yin, I would be manually adding elements instead. So I activate the code that push_back each element, and I disable the code that reads the input from the yin into memory.
DocType myDoc;
///*
myDoc.push_back(Person("Tom", 8));
myDoc.push_back(Person("Dan", 7));
myDoc.push_back(Person("Ken"));
//*/
/* Reading input into the memory */
/*
yin >> myDoc;
if (error_code errc = yin.error()) {
errs() << "error parsing YAML input from file " << InputFile << '\n';
errs() << errc.message() << '\n';
return EXIT_FAILURE;
} else {
outs() << "parsing YAML input from file " << InputFile << '\n';
}
*/
In that case, the programs works fine. The myDoc is initialized with those elements, and then it prints each element to the stdout. Then in creates the output writer, creates the YAML Output, and writes the myDoc into the output_file.yaml.
Here is what the output file looks like when it is written:
---
- name: Tom
hat-size: 8
- name: Dan
hat-size: 7
- name: Ken
...
I copy the output file into the input file, for testing the input functionality of the program.
cp output_file.yaml input_file.yaml
Then I deactivate the code which manually fills the myDoc, and I activate the code which fills the myDoc from the yin.
DocType myDoc;
/*
myDoc.push_back(Person("Tom", 8));
myDoc.push_back(Person("Dan", 7));
myDoc.push_back(Person("Ken"));
*/
/* Reading input into the memory */
///*
yin >> myDoc;
if (error_code errc = yin.error()) {
errs() << "error parsing YAML input from file " << InputFile << '\n';
errs() << errc.message() << '\n';
return EXIT_FAILURE;
} else {
outs() << "parsing YAML input from file " << InputFile << '\n';
}
//*/
After that the code no longer works. If I try to provide that same input_file.yaml to the application, I get a bug. LLVM YAML I/O fails to parse the *.yaml file and prints an error! It's weird because this is the exact format that this same LLVM YAML I/O was outputting into the file.
./yaml_project --input-file=input_file2.yaml --output-file=output_file.yaml
opening input file input_file2.yaml
reading input file input_file2.yaml
input_file2.yaml:1:1: error: not a sequence
-
^
error parsing YAML input from file input_file2.yaml
Invalid argument
I cannot find why is it refusing to accept well formatted YAML code from the input file. If anyone knows how to fix this bug, please help me.
Here is the full listing of my code:
#include "llvm/Support/raw_ostream.h"
#include "llvm/Support/YAMLTraits.h"
#include "llvm/Support/YAMLParser.h"
#include "llvm/Support/ErrorOr.h"
#include "llvm/Support/MemoryBuffer.h"
#include "llvm/Support/CommandLine.h"
#include <cstdlib> /* for EXIT_FAILURE */
#include <string> /* for std::string */
#include <vector> /* for std::vector */
#include <system_error> /* for std::error_code */
using std::string;
using std::vector;
using std::error_code;
using llvm::outs;
using llvm::errs;
using llvm::raw_ostream;
using llvm::raw_fd_ostream;
using llvm::ErrorOr;
using llvm::MemoryBuffer;
using llvm::yaml::ScalarEnumerationTraits;
using llvm::yaml::MappingTraits;
using llvm::yaml::IO;
using llvm::yaml::Input;
using llvm::yaml::Output;
using llvm::cl::opt;
using llvm::cl::desc;
using llvm::cl::ValueRequired;
using llvm::cl::OptionCategory;
using llvm::cl::ParseCommandLineOptions;
/* Command line options description: */
// Apply a custom category to all command-line options so that they are the
// only ones displayed.
// The category tells the CommonOptionsParser how to parse the argc and argv.
OptionCategory yamlCategory("yaml_project options");
opt<string> InputFile("input-file", desc("The input YAML file"), ValueRequired);
opt<string> OutputFile("output-file", desc("The output YAML file"), ValueRequired);
struct Person {
string name;
int hatSize;
Person(string name = "", int hatSize = 0)
: name(name), hatSize(hatSize) {}
};
raw_ostream& operator<<(raw_ostream& os, const Person& person) {
os << "{ " << person.name;
if (person.hatSize)
os << " , " << person.hatSize;
os << " }";
return os;
}
template <>
struct MappingTraits<Person> {
static void mapping(IO& io, Person& info) {
io.mapRequired("name", info.name);
io.mapOptional("hat-size", info.hatSize, 0);
}
};
typedef vector<Person> DocType;
LLVM_YAML_IS_SEQUENCE_VECTOR(Person)
int main(int argc, const char **argv) {
/* Command line parsing: */
ParseCommandLineOptions(argc, argv);
if (InputFile.empty()) {
errs() << "No input file specified\n";
return EXIT_FAILURE;
}
if (OutputFile.empty()) {
errs() << "No output file specified\n";
return EXIT_FAILURE;
}
/* Create the input reader */
auto reader = MemoryBuffer::getFile(InputFile, true);
if (error_code errc = reader.getError()) {
errs() << "error opening input file " << InputFile << '\n';
errs() << errc.message() << '\n';
// MemoryBuffer does not need to be closed
return EXIT_FAILURE;
} else {
outs() << "opening input file " << InputFile << '\n';
}
/* Create the YAML Input */
// dereference once to strip away the llvm::ErrorOr
// dereference twice to strip away the std::unique_ptr
Input yin(**reader);
if (error_code errc = yin.error()) {
errs() << "error reading input file " << InputFile << '\n';
outs() << errc.message() << '\n';
// MemoryBuffer does not need to be closed
return EXIT_FAILURE;
} else {
outs() << "reading input file " << InputFile << '\n';
}
DocType myDoc;
/*
myDoc.push_back(Person("Tom", 8));
myDoc.push_back(Person("Dan", 7));
myDoc.push_back(Person("Ken"));
*/
/* Reading input into the memory */
///*
yin >> myDoc;
if (error_code errc = yin.error()) {
errs() << "error parsing YAML input from file " << InputFile << '\n';
errs() << errc.message() << '\n';
return EXIT_FAILURE;
} else {
outs() << "parsing YAML input from file " << InputFile << '\n';
}
//*/
for (const Person& element : myDoc)
outs() << element << '\n';
/* Create the output writer */
error_code errc;
raw_fd_ostream writer(OutputFile, errc);
if (errc) {
errs() << "error opening output file " << OutputFile << '\n';
errs() << errc.message() << '\n';
writer.close();
return EXIT_FAILURE;
} else {
outs() << "opening output file " << OutputFile << '\n';
}
/* Create the YAML Output */
Output yout(writer);
/* Writing output into file */
yout << myDoc;
outs() << "writing YAML output into file " << OutputFile << '\n';
writer.close();
return EXIT_SUCCESS;
}
The problem is in the line
auto reader = MemoryBuffer::getFile(InputFile, true);
with llvm-12 change the line to
auto reader = MemoryBuffer::getFile(InputFile);
it should work.
I'm attempting to write a simple program to extract some data from a bunch of AVRO files. The schema for each file may be different so I would like to read the files generically (i.e. without having to pregenerate and then compile in the schema for each) using the C++ interface.
I have been attempting to follow the generic.cc example but it assumes a separate schema where I would like to read the schema from each AVRO file.
Here is my code:
#include <fstream>
#include <iostream>
#include "Compiler.hh"
#include "DataFile.hh"
#include "Decoder.hh"
#include "Generic.hh"
#include "Stream.hh"
const std::string BOLD("\033[1m");
const std::string ENDC("\033[0m");
const std::string RED("\033[31m");
const std::string YELLOW("\033[33m");
int main(int argc, char**argv)
{
std::cout << "AVRO Test\n" << std::endl;
if (argc < 2)
{
std::cerr << BOLD << RED << "ERROR: " << ENDC << "please provide an "
<< "input file\n" << std::endl;
return -1;
}
avro::DataFileReaderBase dataFile(argv[1]);
auto dataSchema = dataFile.dataSchema();
// Write out data schema in JSON for grins
std::ofstream output("data_schema.json");
dataSchema.toJson(output);
output.close();
avro::DecoderPtr decoder = avro::binaryDecoder();
auto inStream = avro::fileInputStream(argv[1]);
decoder->init(*inStream);
avro::GenericDatum datum(dataSchema);
avro::decode(*decoder, datum);
std::cout << "Type: " << datum.type() << std::endl;
return 0;
}
Everytime I run the code, no matter what file I use, I get this:
$ ./avrotest twitter.avro
AVRO Test
terminate called after throwing an instance of 'avro::Exception'
what(): Cannot have negative length: -40 Aborted
In addition to my own data files, I have tried using the data files located here: https://github.com/miguno/avro-cli-examples, with the same result.
I tried using the avrocat utility on all of the same files and it works fine. What am I doing wrong?
(NOTE: outputting the data schema for each file in JSON works correctly as expected)
After a bunch more fooling around, I figured it out. You're supposed to use DataFileReader templated with GenericDatum. With the end result being something like this:
#include <fstream>
#include <iostream>
#include "Compiler.hh"
#include "DataFile.hh"
#include "Decoder.hh"
#include "Generic.hh"
#include "Stream.hh"
const std::string BOLD("\033[1m");
const std::string ENDC("\033[0m");
const std::string RED("\033[31m");
const std::string YELLOW("\033[33m");
int main(int argc, char**argv)
{
std::cout << "AVRO Test\n" << std::endl;
if (argc < 2)
{
std::cerr << BOLD << RED << "ERROR: " << ENDC << "please provide an "
<< "input file\n" << std::endl;
return -1;
}
avro::DataFileReader<avro::GenericDatum> reader(argv[1]);
auto dataSchema = reader.dataSchema();
// Write out data schema in JSON for grins
std::ofstream output("data_schema.json");
dataSchema.toJson(output);
output.close();
avro::GenericDatum datum(dataSchema);
while (reader.read(datum))
{
std::cout << "Type: " << datum.type() << std::endl;
if (datum.type() == avro::AVRO_RECORD)
{
const avro::GenericRecord& r = datum.value<avro::GenericRecord>();
std::cout << "Field-count: " << r.fieldCount() << std::endl;
// TODO: pull out each field
}
}
return 0;
}
Perhaps an example like this should be included with libavro...
ifstream fin;
fin.open("C:\\Users\\Zach\\Desktop\\input.txt");
if (!fin)
{
cout << "e";
}
e is printing whether I use the full pathway or just input.txt from a resource file
If the file exists, make sure that you have got the path specified correctly. Since you're running on Windows, you can verify the full path to your executable with the following code.
#include <iostream>
#include <fstream>
#include <string>
#include <windows.h>
#define BUFSIZE 4096
std::string getExePath()
{
char result[BUFSIZE];
return std::string(result, GetModuleFileName(NULL, result, BUFSIZE));
}
int main()
{
std::ifstream infile("input.txt");
if (infile.is_open())
{
std::cout << "Success!" << std::endl;
infile.close();
}
else
{
std::cout << "Failed to open input.txt!" << std::endl;
std::cout << "Executable path is ->" << getExePath() << "<-" << std::endl;
}
return 0;
}
This will allow you to verify that your path to the input file is correct, assuming that it's collocated with your executable.
You need to direct output into the ifstream object by using fin << "string"; and not directing to standard out via cout.
I would like to update existing json file.
This is example json file:
{
"Foo": 51.32,
"Number": 100,
"Test": "Test1"
}
Logs from program:
Operation successfully performed
100
"Test1"
51.32
46.32
Done
Looks like everythink works as expected...
If I change fstream to ifstream to read and later ofstream to write it's working...
I tried use debugger and as I see I have wrong data in basic_ostream object... but I dont know why, I use data from string with corrected (updated data).
Any idea what is wrong :-) ?
You have a few problems here.
First the command json json_data(fs); reads to the end of the file setting the EOF flag. The stream will stop working until that flag is cleared.
Second the file pointer is at the end of the file. If you want to overwrite the file you need to move back to the beginning again:
if (fs.is_open())
{
json json_data(fs); // reads to end of file
fs.clear(); // clear flag
fs.seekg(0); // move to beginning
Unfortunately that still doesn't fix everything because if the file you write back is smaller than the one you read in there will be some of the old data tagged to the end of the new data:
std::cout << "Operation successfully performed\n";
std::cout << json_data.at("Number") << std::endl;
std::cout << json_data.at("Test") << std::endl;
std::cout << json_data.at("Foo") << std::endl;
json_data.at("Foo") = 4.32; // what if new data is smaller?
Json file:
{
"Foo": 4.32, // this number is smaller than before
"Number": 100,
"Test": "Test1"
}} // whoops trailing character from previous data!!
In this situation I would simply open one file for reading then another for writing, its much less error prone and expresses the intention to overwrite everything.
Something like:
#include "json.hpp"
#include <iostream>
#include <fstream>
#include <string>
using json = nlohmann::json;
void readAndWriteDataToFile(std::string fileName) {
json json_data;
// restrict scope of file object (auto-closing raii)
if(auto fs = std::ifstream(fileName))
{
json_data = json::parse(fs);
std::cout << "Operation successfully performed\n";
std::cout << json_data.at("Number") << std::endl;
std::cout << json_data.at("Test") << std::endl;
std::cout << json_data.at("Foo") << std::endl;
}
else
{
throw std::runtime_error(std::strerror(errno));
}
json_data.at("Foo") = 4.32;
std::cout << json_data.at("Foo") << std::endl;
std::string json_content = json_data.dump(3);
if(auto fs = std::ofstream(fileName))
{
fs.write(json_content.data(), json_content.size());
std::cout << "Done" << std::endl;
}
else
{
throw std::runtime_error(std::strerror(errno));
}
}
int main()
{
try
{
std::string fileName = "C:/new/json1.json";
readAndWriteDataToFile(fileName);
}
catch(std::exception const& e)
{
std::cerr << e.what() << '\n';
return EXIT_FAILURE;
}
return EXIT_SUCCESS;
}
I am trying a reasonably simple program to test binary input/output. I am basically writing a file with a header (string) and some data (doubles). The code is as follows:
#include <iostream>
#include <iomanip>
#include <fstream>
#include <string>
#include <vector>
#include <iterator>
#include <algorithm>
int main() {
typedef std::ostream_iterator<double> oi_t;
typedef std::istream_iterator<double> ii_t;
std::ofstream ofs("data.bin", std::ios::in);
//-If file doesn't exist, create a new one now
if(!ofs) {
ofs.open("data.bin", std::ios::out|std::ios::binary|std::ios::app);
}
else {
ofs.close();
ofs.open("data.bin", std::ios::out|std::ios::binary|std::ios::app);
}
//-Write a header consisting of length of grid subdomain and its name
///*
const std::string grid = "Header";
unsigned int olen = grid.size();
ofs.write(reinterpret_cast<const char*>(&olen), sizeof(olen));
ofs.write(grid.c_str(), olen);
//*/
//-Now write the data
///*
std::vector<double> data_out;
//std::vector<std::pair<int, int> > cell_ids;
for(int i=0; i<100; ++i) {
data_out.push_back(5.0*double(i) + 100.0);
}
ofs << std::setprecision(4);
std::copy(data_out.begin(), data_out.end(), oi_t(ofs, " "));
//*/
ofs.close();
//-Now read the binary file; first header then data
std::ifstream ifs("data.bin", std::ios::binary);
///*
unsigned int ilen;
ifs.read(reinterpret_cast<char*>(&ilen), sizeof(ilen));
std::string header;
if(ilen > 0) {
char* buf = new char[ilen];
ifs.read(buf,ilen);
header.append(buf,ilen);
delete[] buf;
}
std::cout << "Read header: " << header << "\n";
//*/
///*
std::vector<double> data_in;
ii_t ii(ifs);
std::copy(ii, ii_t(), std::back_inserter(data_in));
std::cout << "Read data size: " << data_in.size() << "\n";
//*/
ifs.close();
//-Check the result
///*
for(int i=0; i < data_out.size(); ++i) {
std::cout << "Testing input/output element #" << i << " : "
<< data_out[i] << " " << data_in[i] << "\n";
}
std::cout << "Element sizes: " << data_out.size() << " " << data_in.size() <<
"\n";
//*/
return 0;
}
The problem is that when I try to write and read (and then print) both the header and the data it fails (I confirmed that it doesn't read the data then, but displays the header correctly). But when I comment out one of the write sections (header and/or data), it displays that part correctly indicating the read worked. I am sure I am not doing the read properly. Perhaps I am missing the usage of seekg somewhere.
The code runs fine for me. However you never check if the file is successfully opened for writing, so it could be silently failing on your system. After you open ofs you should add
if (!ofs) {
std::cout << "Could not open file for writing" << std::endl;
return 1;
}
And the same thing after you open ifs
if (!ifs) {
std::cout << "Could not open file for reading" << std::endl;
return 1;
}
Or something along those lines. Also I do not understand why you check if the file exists first since you do the same whether it exists or not.
This should work
#include <iostream>
using std::cout;
using std::cerr;
using std::cin;
using std::endl;
#include <fstream>
using std::ifstream;
#include <cstdint>
int main() {
ifstream fin;
fin.open("input.dat", std::ios::binary | std::ios::in);
if (!fin) {
cerr << "Cannot open file " << "input.dat" << endl;
exit(1);
}
uint8_t input_byte;
while (fin >> input_byte) {
cout << "got byte " << input_byte << endl;
}
return 0;
}