BaseX XML database in C++ encoding issue - c++

I work with Base X and try to integrate a XML database with c++ on Windows 7. I use the BaseXclient API from https://github.com/JohnLeM/BasexCPPAPI/
it contains the pugixml parser and uses the boost lib. I got it to work but I have issues with the encoding. The xml dokuments in my database are utf-8 and contain some letters and symbols that are not displayed correctly on console output(like ä and °).
I set the console code page with chcp 65001.
I changed the locale with std::setlocale(LC_ALL, ""); in c++ and when I cout these letters and symbols directly in my Programm and not from the database they are displayed correctly. The database output also changed but is still wrong.
I also set the pugi parser with pugi::xml_encoding::encoding_utf8; but the database output is not affected. here is a code example from the string list interface:
virtual ~my_string_list_interface() {};
my_string_list_interface(const std::string& DBHOST, const std::string& DBPORT, const std::string& DBUSER,const std::string& DBPASSWD) : base_type(DBHOST,DBPORT,DBUSER,DBPASSWD) {};
virtual my_string_list get(string query,int stage){
my_string_list my_string_list_;
pugi::xml_encoding::encoding_auto;
pugi::xml_parse_result parse_;
pugi::xml_document doc;
string results = session().execute("XQUERY "+query);
parse_ = doc.load(results.c_str());
pugi::xpath_node_set is = doc.select_nodes("/record/mid");
The API uses boost streambuf to get the data. The code from boost streambuf looks like that:
std::string read_streambuffer(){return read_streambuffer(response_);};
std::string read_streambuffer(boost::asio::streambuf & response)
{
std::string results;
boost::system::error_code error;
boost::asio::streambuf::const_buffers_type bufs = response.data();
std::size_t size(0);
std::string line;
auto ptr_b = boost::asio::buffers_begin(bufs);
for(; ptr_b != boost::asio::buffers_end(bufs); ++ptr_b, ++ size)
{if (*ptr_b != 0) {line.push_back(*ptr_b);} else if (size > 1) break; };
response.consume(size);
return line;
};
Is there a way to specify the encoding for the buffer stream or string? I use a string list for the database output. should I use wstring or is there something i missed?
Thanks!

Related

Saving the output from Lua in C++ with SOL3 to an std::string

I'm trying to implement a lua interpreter to my C++ code. I have implemented a small editor for my project using ImGui and I'm saving the output from the editor to an std::vector.
My attempted implementation of my lua interpeter looks like so;
// header
std::string ExecuteLua();
std::vector<char> m_luaEditorData;
// cpp
std::string Proxy::ExecuteLua()
{
// Load the Lua code from the string
std::string luaCode(m_luaEditorData.data());
// Create a Lua state
sol::state lua;
// Load standard Lua libraries
lua.open_libraries(sol::lib::base, sol::lib::package, sol::lib::string, sol::lib::table);
// Execute the Lua code and store the result
sol::protected_function_result result = lua.script(luaCode);
// Check for errors
if (!result.valid())
{
sol::error error = result;
std::string errorMsg = error.what();
return "Lua error: " + errorMsg;
}
// Get the result as a string
std::string output = lua["tostring"](result.get<sol::object>());
// Return the output
return output;
}
...
if (m_luaEditorData.empty())
m_luaEditorData.push_back('\0');
auto luaEditorFlags = ImGuiInputTextFlags_AllowTabInput | ImGuiInputTextFlags_CallbackResize;
ImGui::InputTextMultiline("##LuaEditor", m_luaEditorData.data(), m_luaEditorData.size(), ImVec2(ImGui::GetWindowContentRegionWidth(), ImGui::GetWindowHeight() - (ImGui::GetTextLineHeight() * 16)), luaEditorFlags, ResizeInputTextCallback, &m_luaEditorData);
...
When I run this code, I only get nil in my output, the correct output to stdout (don't really want it to output here, but to my std::string and when I put in bad code, it throws an exception in sol.hpp. I didn't really find any examples on how I can do this and I'm therefore am trying to figure this out on my own.

Using C++ protobuf formatted structure in leveldb. set/get operations

I'd like to make a POC of using leveldb in order to store key-value table of different data types in protobuf format.
So far I was able to open the database file, and I also saw the get function with the following signature :
virtual Status Get(const ReadOptions& options, const Slice& key, std::string* value)=0
I understand that the value is actually refers to a binary string like vector and not regular alphanumeric string, so I guess it can fit for multi type primitives like string, uint, enum) but how can it support struct/class that represent protobuf layout in c++ ?
So this is my proto file that I'd like to store in the leveldb:
message agentStatus {
string ip = 1;
uint32 port = 2;
string url = 3;
google.protobuf.Timestamp last_seen = 4;
google.protobuf.Timestamp last_keepalive = 5;
bool status = 6;
}
and this is my current POC code. How can I use the get method to access any of the variables from the table above ?
#include <leveldb/db.h>
void main () {
std::string db_file_path = "/tmp/data.db";
leveldb::DB* db;
leveldb::Status status;
leveldb::Options options;
options.create_if_missing = false;
status_ = leveldb::DB::Open(options, db_file_path, &db);
if (!status_.ok()) {
throw std::logic_error("unable to open db");
}
Thanks !
You need to serialize the protobuf message into a binary string, i.e. SerilaizeToString, and use the Put method to write the binary string to LevelDB with a key.
Then you can use the Get method to retrieve the binary value with the given key, and parse the binary string to a protobuf message, i.e. ParseFromString.
Finally, you can get fields of the message.

How to parse std::list to json::value in casablanca?

I'm trying to parse std::list to json::value with casablanca1.2.0 with Visual Studio 2012 and send JSON request with REST service(POST) from C++ application to Java Application.
REST service requires a request DTO like this.
// java
public class MyProfile {
private String name;
private List<String> favoriteFood;
...
}
So I tried to write C++ code, but I couldn't find how to convert std::list to json::value.
// C++
std::wstring name = "AAA";
std::list<std::wstring> favoriteFood = ...;
json::value requestData;
requestData[L"name"] = json::value::string(name);
std::vector<json::value::string> vvv;
for (auto itr = favoriteFood.begin(); itr != favoriteFood.end(); ++itr) {
vvv.push_back(json::value::string(*itr));
}
requestData[L"favoriteFood"] = json::value::array(vvv); // compile error occurs
I'm totally new to Casablanca and JSON, so I can't find any solution.
Any help would be very nice!!!

How to use Xerces to parse XML in a string [duplicate]

I know how to create a complete dom from an xml file just using XercesDOMParser:
xercesc::XercesDOMParser parser = new xercesc::XercesDOMParser();
parser->parse(path_to_my_file);
parser->getDocument(); // From here on I can access all nodes and do whatever i want
Well, that works... but what if I'd want to parse a string? Something like
std::string myxml = "<root>...</root>";
xercesc::XercesDOMParser parser = new xercesc::XercesDOMParser();
parser->parse(myxml);
parser->getDocument(); // From here on I can access all nodes and do whatever i want
I'm using version 3. Looking inside the AbstractDOMParser I see that parse method and its overloaded versions, only parse files.
How can I parse from a string?
Create a MemBufInputSource and parse that:
xercesc::MemBufInputSource myxml_buf(myxml.c_str(), myxml.size(),
"myxml (in memory)");
parser->parse(myxml_buf);
Use the following overload of XercesDOMParser::parse():
void XercesDOMParser::parse(const InputSource& source);
passing it a MemBufInputSource:
MemBufInputSource src((const XMLByte*)myxml.c_str(), myxml.length(), "dummy", false);
parser->parse(src);
Im doing it another way. If this is incorrect, please tell me why. It seems to work.
This is what parse expects:
DOMDocument* DOMLSParser::parse(const DOMLSInput * source )
So you need to put in a DOMLSInput instead of a an InputSource:
xercesc::DOMImplementation * impl = xercesc::DOMImplementation::getImplementation();
xercesc::DOMLSParser *parser = (xercesc::DOMImplementationLS*)impl)->createLSParser(xercesc::DOMImplementation::MODE_SYNCHRONOUS, 0);
xercesc::DOMDocument *doc;
xercesc::Wrapper4InputSource source (new xercesc::MemBufInputSource((const XMLByte *) (myxml.c_str()), myxml.size(), "A name");
parser->parse(&source);
You may use MemBufInputSource as found in the xercesc/framework/MemBufInputSource.cpp, and the header file, MemBufInputSource.hpp contains extensive documentation, as similar to answers above:
#include <xercesc/framework/MemBufInputSource.hpp>
char* myXMLBufString = "<root>hello xml</root>";
MemBufInputSource xmlBuf((const XMLByte*)myXMLBufString, 23, "myXMLBufName", false);
But take note, this doesn't seem to work unless you first initialize the system, as below (taken from the xerces-c-3.2.3/samples/src/SAX2Count/SAX2Count.cpp)
bool recognizeNEL = false;
char localeStr[64];
memset(localeStr, 0, sizeof localeStr);
// Initialize the XML4C2 system
try {
if (strlen(localeStr)) {
XMLPlatformUtils::Initialize(localeStr);
} else {
XMLPlatformUtils::Initialize();
}
if (recognizeNEL) {
XMLPlatformUtils::recognizeNEL(recognizeNEL);
}
} catch (const XMLException& toCatch) {
XERCES_STD_QUALIFIER cerr << "Error during initialization! Message:\n"
<< StrX(toCatch.getMessage()) << XERCES_STD_QUALIFIER endl;
return 1;
}
Of course reading a file wouldn't require thinking about this type of prep since you just pass a file path to the program which the parser takes. So for those experiencing seg faults, this could be the answer.

Boost Log Formatter Using the same string as in keyword

Here is how I set up my loggers:
namespace logger = boost::log;
namespace src = boost::log::sources;
namespace sinks = boost::log::sinks;
namespace expr = boost::log::expressions;
using LEVEL = boost::log::trivial::severity_level;
static void log_Severity(LEVEL level, sender_t sender, std::string message);
static void throw_Severity(LEVEL level, sender_t sender, std::string message);
static std::string getUnescaped(std::string input);
static std::string format(sender_t sender, std::string message);;
static std::string HRESULTSTRING(HRESULT result);
typedef sinks::synchronous_sink< sinks::text_ostream_backend > text_sink;
static std::string CreateFormat()
{
logger::add_common_attributes();
logger::register_simple_formatter_factory< LEVEL, char >("Severity");
return "[%TimeStamp%] [%ThreadID%] [%Severity%]: %Message%";
}
static void AddTerminalLogger(std::string format)
{
auto sink = boost::make_shared<text_sink>();
sink->locked_backend()->add_stream(boost::shared_ptr<std::ostream>(&std::cout, boost::null_deleter()));
sink->locked_backend()->auto_flush(true);
//sink->set_formatter( format );
logger::core::get()->add_sink(sink);
}
static void AddFileLogger(std::string path, std::string format)
{
logger::add_file_log
(
logger::keywords::file_name = path + "ManualTest_%Y-%m-%d_%H-%M-%S.%N.log",
logger::keywords::rotation_size = 10 * 1024 * 1024,
logger::keywords::time_based_rotation = sinks::file::rotation_at_time_point(0, 0, 0),
logger::keywords::format = format
);
}
static void SetLogLevel(LEVEL level)
{
logger::core::get()->set_filter(logger::trivial::severity >= level);
}
void LogHelper::SetupLoggers(std::string path)
{
std::string format = CreateFormat();
AddFileLogger(path, format);
AddTerminalLogger(format);
SetLogLevel(LEVEL::trace);
}
I want to use my existing format string to set up my console logging as well.
How can I reuse "[%TimeStamp%] [%ThreadID%] [%Severity%]: %Message%" so I do not repeat myself when I create my format expression?
Edit:
To clarify: This is not valid as far as I know: sink->set_formatter(expr::format("[%TimeStamp%] [%ThreadID%] [%Severity%]: %Message%")); If I wanted to use set_formatter I would have to write an expression that does the same thing as this logger::keywords::format = "[%TimeStamp%] [%ThreadID%] [%Severity%]: %Message%". If I do so I would use one approach per logger (Terminal, File) to hopefully get the same formatting in both. Both loggers are a sink that is added to the core. So i assume the method logger::add_file_log uses something like set_formatter under the hood. I would like to use the capabilities that are built in somewhere that would allow me to apply the string "[%TimeStamp%] [%ThreadID%] [%Severity%]: %Message%" to a sink. I can not find the documentation though. When I look this topic up I do find how to use set_formatter but it always ends up in developing something different that gets the same result. I feel this introduces error potential since I would just repeat myself in the sense that I would rewrite the formatting I want in my logging just with slight variations.
Edit:
Changed the source code to better reflect the question.
First, you can use add_console_log function similarly to how you use add_file_log in your code.
Second, both these functions use the formatter parser (the parse_formatter function) to convert the format string to the actual formatter you can supply to set_formatter. You can use this function directly to avoid parsing the string multiple times.