boost log format single attribute with logging::init_from_stream - c++

When I set up format params in code, to format date time output I can use something like this
logging::formatter simpleFormat(expr::format("%1% %2%") %
expr::format_date_time<boost::posix_time::ptime>("TimeStamp", "%H:%M:%S") %
expr::smessage
);
But when I initialize logger with a config file, I can specify format only in attributes position notation, not their format details.
so, this line in a boost log config file
Format="[%TimeStamp%]: %Message%"
produces output:
[2015-Feb-06 09:32:27.401496]: blah blah blah
I want to reduce timestamp to something like this
[06.02.2015 09:32:27]
How can it be described in boost log config file, ot it cant be done at all?

Preamble
My answer is valid for boost 1.55 (haven't tested with latest one). And it was only tested with MSVC 2013 compiler.
Answer
Looks like you need custom formatter_factory for TimeStamp attribute to be able to specify it's format. This works for me:
#include <fstream>
#include "boost/shared_ptr.hpp"
#include "boost/log/trivial.hpp"
#include "boost/log/expressions.hpp"
#include "boost/log/utility/setup.hpp"
#include "boost/log/support/date_time.hpp"
class timestamp_formatter_factory :
public boost::log::basic_formatter_factory<char, boost::posix_time::ptime>
{
public:
formatter_type create_formatter(boost::log::attribute_name const& name, args_map const& args)
{
args_map::const_iterator it = args.find("format");
if (it != args.end())
return boost::log::expressions::stream << boost::log::expressions::format_date_time<boost::posix_time::ptime>(boost::log::expressions::attr<boost::posix_time::ptime>(name), it->second);
else
return boost::log::expressions::stream << boost::log::expressions::attr<boost::posix_time::ptime>(name);
}
};
int main()
{
// Initializing logging
boost::log::register_formatter_factory("TimeStamp", boost::make_shared<timestamp_formatter_factory>());
boost::log::add_common_attributes();
std::ifstream file("settings.ini");
boost::log::init_from_stream(file);
// Testing
BOOST_LOG_TRIVIAL(info) << "Test";
return 0;
}
And now it your settings file you can specify format argument for TimeStamp attribute. Like this:
[Sinks.ConsoleOut]
Destination=Console
AutoFlush=true
Format="[%TimeStamp(format=\"%Y.%m.%d %H:%M:%S\")%]: %Message%"

You should be able to use set_formatter as documented here
sink->set_formatter
(
expr::stream << expr::format_date_time< boost::posix_time::ptime >("TimeStamp", "%Y-%m-%d %H:%M:%S")
);

Related

C++ Apache Orc is not filtering data correctly

I am posting a simple c++ Apache orc file reading program which:
Read data from ORC file.
Filter data based on the given string.
Sample Code:
#include <iostream>
#include <list>
#include <memory>
#include <chrono>
// Orc specific headers.
#include <orc/Reader.hh>
#include <orc/ColumnPrinter.hh>
#include <orc/Exceptions.hh>
#include <orc/OrcFile.hh>
int main(int argc, char const *argv[])
{
auto begin = std::chrono::steady_clock::now();
orc::RowReaderOptions m_RowReaderOpts;
orc::ReaderOptions m_ReaderOpts;
std::unique_ptr<orc::Reader> m_Reader;
std::unique_ptr<orc::RowReader> m_RowReader;
auto builder = orc::SearchArgumentFactory::newBuilder();
std::string required_symbol("FILTERME");
/// THIS LINE SHOULD FILTER DATA BASED ON COLUMNS.
/// INSTEAD OF FILTERING IT TRAVERSE EACH ROW OF ORC FILE.
builder->equals("column_name", orc::PredicateDataType::STRING, orc::Literal(required_symbol.c_str(), required_symbol.size()));
std::string file_path("/orc/file/path.orc");
m_Reader = orc::createReader(orc::readFile(file_path.c_str()), m_ReaderOpts);
m_RowReader = m_Reader->createRowReader(m_RowReaderOpts);
m_RowReaderOpts.searchArgument(builder->build());
auto batch = m_RowReader->createRowBatch(5000);
try
{
std::cout << builder->build()->toString() << std::endl;
while(m_RowReader->next(*batch))
{
const auto &struct_batch = dynamic_cast<const orc::StructVectorBatch&>(*batch.get());
/** DO CALCULATIONS */
}
}
catch(const std::exception& e)
{
std::cerr << e.what() << '\n';
}
auto end = std::chrono::steady_clock::now();
std::cout << "Total Time taken to read ORC file: " << std::chrono::duration_cast<std::chrono::milliseconds>(end - begin).count() << " ms.\n";
return 0;
}
I tried searching on google for almost a week and tried to convert every possible java program into c++ to make my code works.
I tried to use the example in the STACKOVERFLOW LINK which has a similar issue but didn't work for me.
**Question:**
1. Am I wiring filtering code correctly. If yes then why it is not filtering data based on the given string.
2. Where can I find the C++ or 'relevant Java code' for row-level or strip-level filter.
Finally after trying multiple scenarios, I have resolved the above issue with ORC data filtering.
It was because of using the incorrect column number, I am not sure why there is a difference between the column id of the columns to fetch and columns to filter.
In above example I tried to filter data with column name and issue of filtering ORC with column name is still there. But unfortulately it is working fine with column number.
New Code:
#include <iostream>
#include <list>
#include <memory>
#include <chrono>
// Orc specific headers.
#include <orc/Reader.hh>
#include <orc/ColumnPrinter.hh>
#include <orc/Exceptions.hh>
#include <orc/OrcFile.hh>
int main(int argc, char const *argv[])
{
auto begin = std::chrono::steady_clock::now();
orc::RowReaderOptions m_RowReaderOpts;
orc::ReaderOptions m_ReaderOpts;
std::unique_ptr<orc::Reader> m_Reader;
std::unique_ptr<orc::RowReader> m_RowReader;
auto builder = orc::SearchArgumentFactory::newBuilder();
std::string required_symbol("FILTERME");
// <-- HERE COLUMN IDS ARE STARTING FROM 0-N. -->
std::list<uint64_t> cols = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
m_RowReaderOpts.include(cols);
int column_id = 7; // IN cols ABOVE, THIS COLUMN_ID 7 IS ACTUALLY 6. WHICH MEANS COLUMN_ID TO FILTER COLUMN IS +1 OF COLUMN ID PROVIDED IN DATA FETCH.
builder->equals(column_id, orc::PredicateDataType::STRING, orc::Literal(required_symbol.c_str(), required_symbol.size()));
std::string file_path("/orc/file/path.orc");
m_Reader = orc::createReader(orc::readFile(file_path.c_str()), m_ReaderOpts);
m_RowReader = m_Reader->createRowReader(m_RowReaderOpts);
m_RowReaderOpts.searchArgument(builder->build());
auto batch = m_RowReader->createRowBatch(5000);
try
{
std::cout << builder->build()->toString() << std::endl;
while(m_RowReader->next(*batch))
{
const auto &struct_batch = dynamic_cast<const orc::StructVectorBatch&>(*batch.get());
/** DO CALCULATIONS */
}
}
catch(const std::exception& e)
{
std::cerr << e.what() << '\n';
}
auto end = std::chrono::steady_clock::now();
std::cout << "Total Time taken to read ORC file: " << std::chrono::duration_cast<std::chrono::milliseconds>(end - begin).count() << " ms.\n";
return 0;
}
As per my understanding while resolving above issue is, column ids for fetching data starts from 0-N and for filtering it is 1-N. This is why you should provide 1, when you require to filter data at column 0.
To explain the confusion in the above answer:
In ORC, column field id is a different thing than column type id:
For files that have structs as the top-level object, field id 0 corresponds to the first struct field, field id 1 for the second struct field, and so on. See comments here: https://github.com/apache/orc/blob/v1.7.3/c++/include/orc/Reader.hh#L122-L123
Column type id is the pre-order traversal index of the type tree. As mentioned in the spec: The type tree is flattened in to a list via a pre-order traversal where each type is assigned the next id. Clearly the root of the type tree is always type id 0.
So if there are no nested types (struct/array/map) in the ORC file, we can see columnTypeId == columnFieldId + 1 on all columns except the root struct type.
The ids used in building sargs are column type ids. However, the ids used in RowReaderOptions::include(const std::list<uint64_t>& include) are column field ids. To have a consistent id mapping, I'd recommend using the include method for type ids:
RowReaderOptions::includeTypes(const std::list<uint64_t>& types);

How I can paint a text background with spdlog?

I did a test with the {fmt} and was super easy to change background color, but I can't get the same with spdlog.
I can get the same result with spdlog, but is a weird code
fmt::print( bg( fmt::terminal_color::yellow ) | fg( fmt::terminal_color::black ), " {1:<{0}}\n", 120, "message" );
spdlog::set_pattern( "%v" );
spdlog::info( "\033[43m\033[30m {1:<{0}} \033[m", 120, "message" );
If I understand your question correctly, you want to format your spdlog output using fmt, rather than having to specify the escape sequences yourself. For this, you can use the fmt::format function and use its output as a variable for spdlog:
#include <spdlog/spdlog.h>
#include <spdlog/fmt/bundled/color.h>
int main(){
spdlog::info( "Printing a string formatted with fmt: {}",
fmt::format(
fmt::bg( fmt::terminal_color::yellow ) |
fmt::fg( fmt::terminal_color::black ) |
fmt::emphasis::bold,
"message"
)
);
return 0;
}

mongodb: query across a date range

Using the mongocxx driver, I need to query mongodb for documents (of stock data) that fall within a certain date range.
Consider the following document format:
{
date : ISODate("2010-01-01T00:00:00Z"),
open : 12.00,
high : 13.00,
low : 11.00,
close : 12.50,
volume : 100000
}
Say I have one collection per stock, and hundreds of these documents per collection, each with a different date.
If a user supplies two dates formatted as strings (yyyy-mm-dd):
std::string start_date = "2010-01-01";
std::string end_date = "2010-02-05";
How can I query mongo to get all the files with dates between "start_date" and "end_date", (inclusive)?
Note: I am using mongodb 3.2.12, mongocxx driver version 3.0.2
Thanks,
Unfortunately, there doesn't seem to be a way to parse dates from strings with arbitrary timezones; all dates parse are assumed to be in the user's locale, which means you'll need to provide an offset to be able to correctly query the UTC dates stored in the database. Ideally these could be generated when the user provides a string, but this will obviously depend on the nature of your application.
Once you have the offset and the date string, std::get_time will get you most of the way there. After that, you just need to convert the std::tm to a type that you can construct a bsoncxx::types::b_date from and then query as usual. Here's some sample code that does the job:
#include <chrono>
#include <cstdint>
#include <ctime>
#include <iomanip>
#include <iostream>
#include <ostream>
#include <sstream>
#include <string>
#include <bsoncxx/builder/basic/document.hpp>
#include <bsoncxx/builder/basic/kvp.hpp>
#include <bsoncxx/builder/basic/sub_document.hpp>
#include <bsoncxx/json.hpp>
#include <bsoncxx/types.hpp>
#include <mongocxx/client.hpp>
#include <mongocxx/uri.hpp>
bsoncxx::types::b_date read_date(const std::string& date,
std::int32_t offset_from_utc) {
std::tm utc_tm{};
std::istringstream ss{date};
// Read time into std::tm.
ss >> std::get_time(&utc_tm, "%Y-%m-%d");
// Convert std::tm to std::time_t.
std::time_t utc_time = std::mktime(&utc_tm);
// Convert std::time_t std::chrono::systemclock::time_point.
std::chrono::system_clock::time_point time_point =
std::chrono::system_clock::from_time_t(utc_time);
return bsoncxx::types::b_date{time_point +
std::chrono::hours{offset_from_utc}};
}
int main() {
// User inputs
std::string start_date = "2010-01-01";
std::string end_date = "2010-02-05";
std::int32_t offset_from_utc = -5;
// Code to execute query
mongocxx::client client{mongocxx::uri{}};
mongocxx::collection coll = client["db_name"]["coll_name"];
bsoncxx::builder::basic::document filter;
filter.append(bsoncxx::builder::basic::kvp(
"date", [start_date, end_date,
offset_from_utc](bsoncxx::builder::basic::sub_document sd) {
sd.append(bsoncxx::builder::basic::kvp(
"$gte", read_date(start_date, offset_from_utc)));
sd.append(bsoncxx::builder::basic::kvp(
"$lte", read_date(end_date, offset_from_utc)));
}));
for (auto&& result : coll.find(filter.view())) {
std::cout << bsoncxx::to_json(result) << std::endl;
}
}

JSON phasing with REST API

I am trying to read 2nd levels in a JSON file with the REST API's Json fuctionallity #include <cpprest/json.h>
I need to get from the following JSON the name field:
{"desc":"","id":"57681f5dc4864c821cc73bfa","lists":[{"id":"576973346263056c88cfe845","name":"Board info"},{"id":"57681f5dc4864c821cc73bfb","name":"Misc"},{"id":"576978294972d812e4a91580","name":"thing"},{"id":"57681fdc228443c3306cc762","name":"thing2"},{"id":"5768200b1fbf41dd2c974052","name":"thing3"},{"id":"57681feb72ca90abb3afe170","name":"thingy"},{"id":"57681f5dc4864c821cc73bfc","name":"meep"},{"id":"57681f5dc4864c821cc73bfd","name":"BannedWordsPhrases"},{"id":"57681fba60fdfbf576abaece","name":"Errors"}],"name":"READER"}
(lets call this file JSON1)
I can get JSON1["lists"] but not JSON1["lists"]["name"].
here is my code:
#include "cpprest/json.h" //how I am importing stuff
...
typedef web::json::value JsonValue; //all of these are being uses
typedef web::json::value::value_type JsonValueType;
typedef std::wstring String;
typedef std::wstringstream StringStream;
using namespace utility;
using namespace web;
using namespace web::http;
using namespace web::http::client;
using namespace concurrency::streams;
...
int main()
{
...
web::json::value J1 = web::json::value::parse(S);
web::json::value &J2 = web::json::value::parse(S1);
output(J2);
wfstream _file("jsonFile.json");
_file >> obj;
wcout << obj[L"lists"][L"name"]; // the broken line
cout << endl;
}
all the functions and variables work and are correct, it functions with out the [L"name"] and with [L"lists"].
What am I doing wrong?
Note: my answer may be wrong, it would help if you could post the output of wcout << obj[L"lists"];. Also, I am assuming you are trying to get "name" as in "Board info" and "Mics" and not "READER". I will assume your output for wcout << obj[L"lists"]; is:
[{"id":"576973346263056c88cfe845","name":"Board info"},{"id":"57681f5dc4864c821cc73bfb","name":"Misc"},{"id":"576978294972d812e4a91580","name":"thing"},{"id":"57681fdc228443c3306cc762","name":"thing2"},{"id":"5768200b1fbf41dd2c974052","name":"thing3"},{"id":"57681feb72ca90abb3afe170","name":"thingy"},{"id":"57681f5dc4864c821cc73bfc","name":"meep"},{"id":"57681f5dc4864c821cc73bfd","name":"BannedWordsPhrases"},{"id":"57681fba60fdfbf576abaece","name":"Errors"}]
Proposed Answer: obj[L"lists"]; returns a list of 9 JSON objects in this case (listed above). You can access these JSON objects by index (0-8). For example, according to cpprestsdk, obj[L"lists"][0]; or obj[L"lists"].at(0);should return {"id":"576973346263056c88cfe845","name":"Board info"}.
From there you can get the name, for example:
obj[L"lists"][0][L"name"] should return Board info.

Boost Log Time Zone

I've been learning the Boost Log library
http://www.boost.org/doc/libs/develop/libs/log/doc/html/index.html
but I've been unable to figure out how to display the user's time zone. There is a %q and %Q format option that looks promising but doesn't seem to work (I'm using MSVC++ 2013). Using this format "%Y-%m-%d %H:%M:%S.%f%Q", I get the following output:
1 [2015-08-18 21:27:16.860724] main.cpp#11, Test App Started.
but I would have expected
1 [2015-08-18 21:27:16.860724-08.00] main.cpp#11, Test App Started.
as explained in:
http://www.boost.org/doc/libs/develop/libs/log/doc/html/log/detailed/expressions.html#log.detailed.expressions.formatters
Here's the code I've been trying and a few commented out lines that I have also tried with no luck:
void Log::init() const
{
boost::log::core::get()->add_global_attribute("TimeStamp", boost::log::attributes::utc_clock());
// boost::log::core::get()->add_global_attribute("TimeStamp", boost::log::attributes::local_clock());
boost::log::register_simple_formatter_factory<Severity, char>("Severity");
// boost::log::register_formatter_factory("TimeStamp", boost::make_shared<timestamp_formatter_factory>());
boost::log::add_common_attributes();
boost::log::add_file_log
(
boost::log::keywords::file_name = "appname_%N.log",
boost::log::keywords::rotation_size = 10 * 1024 * 1024,
boost::log::keywords::time_based_rotation = boost::log::sinks::file::rotation_at_time_point(0, 0, 0),
boost::log::keywords::format =
boost::log::expressions::stream
<< boost::log::expressions::attr<unsigned>("LineID") << " "
<< "[" << boost::log::expressions::format_date_time<boost::posix_time::ptime>("TimeStamp", "%Y-%m-%d %H:%M:%S.%f%Q"<< "]" << " "
<< "<" << boost::log::expressions::attr<Severity>("Severity") << _NSTR(">") << _NSTR(" ")
<< boost::log::expressions::smessage
// "%LineID% [%TimeStamp(format=\"%Y-%m-%d %H:%M:%S.%f%Q\")%] <%Severity%>: %%Message%"
);
const auto severity = boost::log::expressions::attr<Severity>("Severity");
boost::log::core::get()->set_filter
(
severity >= severityThreshold_
);
}
Any suggestions on what I might be doing wrong?
Both utc_clock and local_clock produce values of type boost::posix_time::ptime, which do not have information of a time zone. The difference between the clocks is what time ptime represents - UTC or local time according to the time zone set on the machine. The formatter has no use for %Q and %q and replaces them with an empty string.
The time zone is present in the boost::local_time::local_date_time type, the %Q and %q placeholders will work for it. The library does not have a clock attribute that produces local_date_time, so you will have to write one yourself. See here for an example.