mongodb: query across a date range - c++

Using the mongocxx driver, I need to query mongodb for documents (of stock data) that fall within a certain date range.
Consider the following document format:
{
date : ISODate("2010-01-01T00:00:00Z"),
open : 12.00,
high : 13.00,
low : 11.00,
close : 12.50,
volume : 100000
}
Say I have one collection per stock, and hundreds of these documents per collection, each with a different date.
If a user supplies two dates formatted as strings (yyyy-mm-dd):
std::string start_date = "2010-01-01";
std::string end_date = "2010-02-05";
How can I query mongo to get all the files with dates between "start_date" and "end_date", (inclusive)?
Note: I am using mongodb 3.2.12, mongocxx driver version 3.0.2
Thanks,

Unfortunately, there doesn't seem to be a way to parse dates from strings with arbitrary timezones; all dates parse are assumed to be in the user's locale, which means you'll need to provide an offset to be able to correctly query the UTC dates stored in the database. Ideally these could be generated when the user provides a string, but this will obviously depend on the nature of your application.
Once you have the offset and the date string, std::get_time will get you most of the way there. After that, you just need to convert the std::tm to a type that you can construct a bsoncxx::types::b_date from and then query as usual. Here's some sample code that does the job:
#include <chrono>
#include <cstdint>
#include <ctime>
#include <iomanip>
#include <iostream>
#include <ostream>
#include <sstream>
#include <string>
#include <bsoncxx/builder/basic/document.hpp>
#include <bsoncxx/builder/basic/kvp.hpp>
#include <bsoncxx/builder/basic/sub_document.hpp>
#include <bsoncxx/json.hpp>
#include <bsoncxx/types.hpp>
#include <mongocxx/client.hpp>
#include <mongocxx/uri.hpp>
bsoncxx::types::b_date read_date(const std::string& date,
std::int32_t offset_from_utc) {
std::tm utc_tm{};
std::istringstream ss{date};
// Read time into std::tm.
ss >> std::get_time(&utc_tm, "%Y-%m-%d");
// Convert std::tm to std::time_t.
std::time_t utc_time = std::mktime(&utc_tm);
// Convert std::time_t std::chrono::systemclock::time_point.
std::chrono::system_clock::time_point time_point =
std::chrono::system_clock::from_time_t(utc_time);
return bsoncxx::types::b_date{time_point +
std::chrono::hours{offset_from_utc}};
}
int main() {
// User inputs
std::string start_date = "2010-01-01";
std::string end_date = "2010-02-05";
std::int32_t offset_from_utc = -5;
// Code to execute query
mongocxx::client client{mongocxx::uri{}};
mongocxx::collection coll = client["db_name"]["coll_name"];
bsoncxx::builder::basic::document filter;
filter.append(bsoncxx::builder::basic::kvp(
"date", [start_date, end_date,
offset_from_utc](bsoncxx::builder::basic::sub_document sd) {
sd.append(bsoncxx::builder::basic::kvp(
"$gte", read_date(start_date, offset_from_utc)));
sd.append(bsoncxx::builder::basic::kvp(
"$lte", read_date(end_date, offset_from_utc)));
}));
for (auto&& result : coll.find(filter.view())) {
std::cout << bsoncxx::to_json(result) << std::endl;
}
}

Related

Converting json file to json object jumbles up the order of objects

I am trying to parse a json file into a json object using nlohmann json library.
This is the json file:
{
"n":1,
"data":
{
"name":"Chrome",
"description":"Browse the internet.",
"isEnabled":true
}
}
This is my code:
#include <nlohmann/json.hpp>
#include <iostream>
#include <fstream>
using namespace std;
using json = nlohmann::json;
int main()
{
ifstream f("/Users/callum/sfucourses/cmpt373/test/example2.json");
json data = json::parse(f);
cout << data << endl;
}
If I don't do the parse and just do cout << f.rdbuf() I get correct output:
./interpret
/Users/callum/sfucourses/cmpt373/build
{
"n":1,
"data":
{
"name":"Chrome",
"description":"Browse the internet.",
"isEnabled":true
}
}
but if I do the parse and print out the json object 'data',then "n":1 and "name":"Chrome" is placed at the end instead of the beginning:
./interpret
{"data":{"description":"Browse the internet.","isEnabled":true,"name":"Chrome"},"n":1}
How do I get it to print in the correct order?
JSON is normally not ordered, but the library provides nlohmann::ordered_json that keeps the insertion order:
auto data = nlohmann::ordered_json::parse(f);
Parsing the file like above and printing it like you do produces this output:
{"n":1,"data":{"name":"Chrome","description":"Browse the internet.","isEnabled":true}}

Equivalent of mysql_real_escape_string()

In MySQL there is the mysql_real_escape_string function.
Is there an such for MS SQL to correctly handle strings like this one?
SELECT * FROM MyTable WHERE Phrase='Mr Charlie's dog's dog and Mrs Molly's cat's cat plus Chris' bicycle' AND Item='wood';
I use Microsoft SQL
#include <sqlext.h>
#include <sqltypes.h>
#include <sql.h>
Rather than composing a query as a string, write a parameterised query and supply whatever string as the parameter.
You don't say what sql library you use, but it'd look something like this
void prepare_find(pqxx::connection_base &c)
{
c.prepare(
"find",
"SELECT * FROM MyTable WHERE Phrase = $1 AND Item = $2");
}
pqxx::result execute_find(
pqxx::transaction_base &t, std::string phrase, std::string item)
{
return t.exec_prepared("find", phrase, item);
}

C++ Apache Orc is not filtering data correctly

I am posting a simple c++ Apache orc file reading program which:
Read data from ORC file.
Filter data based on the given string.
Sample Code:
#include <iostream>
#include <list>
#include <memory>
#include <chrono>
// Orc specific headers.
#include <orc/Reader.hh>
#include <orc/ColumnPrinter.hh>
#include <orc/Exceptions.hh>
#include <orc/OrcFile.hh>
int main(int argc, char const *argv[])
{
auto begin = std::chrono::steady_clock::now();
orc::RowReaderOptions m_RowReaderOpts;
orc::ReaderOptions m_ReaderOpts;
std::unique_ptr<orc::Reader> m_Reader;
std::unique_ptr<orc::RowReader> m_RowReader;
auto builder = orc::SearchArgumentFactory::newBuilder();
std::string required_symbol("FILTERME");
/// THIS LINE SHOULD FILTER DATA BASED ON COLUMNS.
/// INSTEAD OF FILTERING IT TRAVERSE EACH ROW OF ORC FILE.
builder->equals("column_name", orc::PredicateDataType::STRING, orc::Literal(required_symbol.c_str(), required_symbol.size()));
std::string file_path("/orc/file/path.orc");
m_Reader = orc::createReader(orc::readFile(file_path.c_str()), m_ReaderOpts);
m_RowReader = m_Reader->createRowReader(m_RowReaderOpts);
m_RowReaderOpts.searchArgument(builder->build());
auto batch = m_RowReader->createRowBatch(5000);
try
{
std::cout << builder->build()->toString() << std::endl;
while(m_RowReader->next(*batch))
{
const auto &struct_batch = dynamic_cast<const orc::StructVectorBatch&>(*batch.get());
/** DO CALCULATIONS */
}
}
catch(const std::exception& e)
{
std::cerr << e.what() << '\n';
}
auto end = std::chrono::steady_clock::now();
std::cout << "Total Time taken to read ORC file: " << std::chrono::duration_cast<std::chrono::milliseconds>(end - begin).count() << " ms.\n";
return 0;
}
I tried searching on google for almost a week and tried to convert every possible java program into c++ to make my code works.
I tried to use the example in the STACKOVERFLOW LINK which has a similar issue but didn't work for me.
**Question:**
1. Am I wiring filtering code correctly. If yes then why it is not filtering data based on the given string.
2. Where can I find the C++ or 'relevant Java code' for row-level or strip-level filter.
Finally after trying multiple scenarios, I have resolved the above issue with ORC data filtering.
It was because of using the incorrect column number, I am not sure why there is a difference between the column id of the columns to fetch and columns to filter.
In above example I tried to filter data with column name and issue of filtering ORC with column name is still there. But unfortulately it is working fine with column number.
New Code:
#include <iostream>
#include <list>
#include <memory>
#include <chrono>
// Orc specific headers.
#include <orc/Reader.hh>
#include <orc/ColumnPrinter.hh>
#include <orc/Exceptions.hh>
#include <orc/OrcFile.hh>
int main(int argc, char const *argv[])
{
auto begin = std::chrono::steady_clock::now();
orc::RowReaderOptions m_RowReaderOpts;
orc::ReaderOptions m_ReaderOpts;
std::unique_ptr<orc::Reader> m_Reader;
std::unique_ptr<orc::RowReader> m_RowReader;
auto builder = orc::SearchArgumentFactory::newBuilder();
std::string required_symbol("FILTERME");
// <-- HERE COLUMN IDS ARE STARTING FROM 0-N. -->
std::list<uint64_t> cols = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
m_RowReaderOpts.include(cols);
int column_id = 7; // IN cols ABOVE, THIS COLUMN_ID 7 IS ACTUALLY 6. WHICH MEANS COLUMN_ID TO FILTER COLUMN IS +1 OF COLUMN ID PROVIDED IN DATA FETCH.
builder->equals(column_id, orc::PredicateDataType::STRING, orc::Literal(required_symbol.c_str(), required_symbol.size()));
std::string file_path("/orc/file/path.orc");
m_Reader = orc::createReader(orc::readFile(file_path.c_str()), m_ReaderOpts);
m_RowReader = m_Reader->createRowReader(m_RowReaderOpts);
m_RowReaderOpts.searchArgument(builder->build());
auto batch = m_RowReader->createRowBatch(5000);
try
{
std::cout << builder->build()->toString() << std::endl;
while(m_RowReader->next(*batch))
{
const auto &struct_batch = dynamic_cast<const orc::StructVectorBatch&>(*batch.get());
/** DO CALCULATIONS */
}
}
catch(const std::exception& e)
{
std::cerr << e.what() << '\n';
}
auto end = std::chrono::steady_clock::now();
std::cout << "Total Time taken to read ORC file: " << std::chrono::duration_cast<std::chrono::milliseconds>(end - begin).count() << " ms.\n";
return 0;
}
As per my understanding while resolving above issue is, column ids for fetching data starts from 0-N and for filtering it is 1-N. This is why you should provide 1, when you require to filter data at column 0.
To explain the confusion in the above answer:
In ORC, column field id is a different thing than column type id:
For files that have structs as the top-level object, field id 0 corresponds to the first struct field, field id 1 for the second struct field, and so on. See comments here: https://github.com/apache/orc/blob/v1.7.3/c++/include/orc/Reader.hh#L122-L123
Column type id is the pre-order traversal index of the type tree. As mentioned in the spec: The type tree is flattened in to a list via a pre-order traversal where each type is assigned the next id. Clearly the root of the type tree is always type id 0.
So if there are no nested types (struct/array/map) in the ORC file, we can see columnTypeId == columnFieldId + 1 on all columns except the root struct type.
The ids used in building sargs are column type ids. However, the ids used in RowReaderOptions::include(const std::list<uint64_t>& include) are column field ids. To have a consistent id mapping, I'd recommend using the include method for type ids:
RowReaderOptions::includeTypes(const std::list<uint64_t>& types);

boost log format single attribute with logging::init_from_stream

When I set up format params in code, to format date time output I can use something like this
logging::formatter simpleFormat(expr::format("%1% %2%") %
expr::format_date_time<boost::posix_time::ptime>("TimeStamp", "%H:%M:%S") %
expr::smessage
);
But when I initialize logger with a config file, I can specify format only in attributes position notation, not their format details.
so, this line in a boost log config file
Format="[%TimeStamp%]: %Message%"
produces output:
[2015-Feb-06 09:32:27.401496]: blah blah blah
I want to reduce timestamp to something like this
[06.02.2015 09:32:27]
How can it be described in boost log config file, ot it cant be done at all?
Preamble
My answer is valid for boost 1.55 (haven't tested with latest one). And it was only tested with MSVC 2013 compiler.
Answer
Looks like you need custom formatter_factory for TimeStamp attribute to be able to specify it's format. This works for me:
#include <fstream>
#include "boost/shared_ptr.hpp"
#include "boost/log/trivial.hpp"
#include "boost/log/expressions.hpp"
#include "boost/log/utility/setup.hpp"
#include "boost/log/support/date_time.hpp"
class timestamp_formatter_factory :
public boost::log::basic_formatter_factory<char, boost::posix_time::ptime>
{
public:
formatter_type create_formatter(boost::log::attribute_name const& name, args_map const& args)
{
args_map::const_iterator it = args.find("format");
if (it != args.end())
return boost::log::expressions::stream << boost::log::expressions::format_date_time<boost::posix_time::ptime>(boost::log::expressions::attr<boost::posix_time::ptime>(name), it->second);
else
return boost::log::expressions::stream << boost::log::expressions::attr<boost::posix_time::ptime>(name);
}
};
int main()
{
// Initializing logging
boost::log::register_formatter_factory("TimeStamp", boost::make_shared<timestamp_formatter_factory>());
boost::log::add_common_attributes();
std::ifstream file("settings.ini");
boost::log::init_from_stream(file);
// Testing
BOOST_LOG_TRIVIAL(info) << "Test";
return 0;
}
And now it your settings file you can specify format argument for TimeStamp attribute. Like this:
[Sinks.ConsoleOut]
Destination=Console
AutoFlush=true
Format="[%TimeStamp(format=\"%Y.%m.%d %H:%M:%S\")%]: %Message%"
You should be able to use set_formatter as documented here
sink->set_formatter
(
expr::stream << expr::format_date_time< boost::posix_time::ptime >("TimeStamp", "%Y-%m-%d %H:%M:%S")
);

how to get all timezone names in ICU

I am using boost::locale with ICU Backend to do time conversion between different timezones.when creating boost::local::Calendar, I can pass in a string like "America/New_York" to specify the timezone information.
but how do I get a list of all valid timezone names?
from ICU doc, it mentioned that users can use TimeZone.getAvailableIDs() method to iterate through all timezone names. but I can't even find a method called getAvailableIDs in timezone.h.
you can use TimeZone.createEnumeration() to get a list of all timezone names. it does says in the doc that using getAvailabeIDs, but this method seems not exist anymore.
I managed to implement it like this, using ICU 4.4.2:
#include <iostream>
#include <unicode/timezone.h>
#include <unicode/unistr.h>
using namespace icu;
int main()
{
StringEnumeration *timeZoneIds = TimeZone::createEnumeration();
UErrorCode status = U_ZERO_ERROR;
const UnicodeString *zoneId = timeZoneIds->snext(status);
while (zoneId != NULL && status == U_ZERO_ERROR)
{
std::string zoneIdString;
zoneId->toUTF8String(zoneIdString);
std::cout << zoneIdString << std::endl;
zoneId = timeZoneIds->snext(status);
}
delete timeZoneIds;
return 0;
}