c++ connect to mysql/mariadb is very slow - c++

I write some c++ https server which get connection - make query to base and send answer. If I try send GET query for index.html endpoint I get real good result:
but if I send POST and connect to mysql, Requests/sec is very small:
I try different connectors: mysql, mariadb++, mariadbcpp (in code example) etc the result is the same.
Code example:
nlohmann::json PostResult = {};
const char *uri = "tcp://192.168.1.130:3306/test";
const char *user = "root";
const char *passwd = "123";
MariaCpp::scoped_library_init maria_lib_init;
try {
MariaCpp::Connection conn;
conn.connect(MariaCpp::Uri(uri), user, passwd);
std::auto_ptr<MariaCpp::PreparedStatement> stmt(conn.prepare("SELECT a.id, a.msg_id, a.NUMBER, a.sign FROM chiffa a WHERE a.id =1"));
stmt->execute();
while (stmt->fetch()) {
PostResult["id"] = stmt->getInt(0);
PostResult["msg_id"] = stmt->getString(1);
PostResult["NUMBER"] = stmt->getString(2);
PostResult["msg_id"] = stmt->getString(3);
}
conn.close();
}
catch (MariaCpp::Exception &e) {
std::cerr << e << std::endl;
}
Maybe anyone can help, how to increase Requests/sec? Thanks!

200 connections and queries per second isn't that bad for a naive implementation, particularly when the database is being hosted across another network boundary.
I'm certainly not surprised that it's 50 times faster than something that just returns some HTML.
Your server/application should maintain one connection (or a pool of connections), that it keeps alive all the time, and use that when a request is made, rather than constructing a fresh connection for each request.

Related

Whats the Efficient way to call http request and read inputstream in spark MapTask

Please see the below code sample
JavaRDD<String> mapRDD = filteredRecords
.map(new Function<String, String>() {
#Override
public String call(String url) throws Exception {
BufferedReader in = null;
URL formatURL = new URL((url.replaceAll("\"", ""))
.trim());
try {
HttpURLConnection con = (HttpURLConnection) formatURL
.openConnection();
in = new BufferedReader(new InputStreamReader(con
.getInputStream()));
return in.readLine();
} finally {
if (in != null) {
in.close();
}
}
}
});
here url is http GET request. example
http://ip:port/cyb/test?event=movie&id=604568837&name=SID&timestamp_secs=1460494800&timestamp_millis=1461729600000&back_up_id=676700166
This piece of code is very slow . IP and port are random and load is distributed so ip can have 20 different value with port so I dont see bottleneck .
When I comment
in = new BufferedReader(new InputStreamReader(con
.getInputStream()));
return in.readLine();
The code is too fast.
NOTE: Input data to process is 10GB. Using spark to read from S3.
is there anything wrong I am doing with BufferedReader or InputStreamReader any alternative .
I cant use foreach in spark as I have to get the response back from server and need to save JAVARdd as textFile on HDFS.
if we use mappartition code something as below
JavaRDD<String> mapRDD = filteredRecords.mapPartitions(new FlatMapFunction<Iterator<String>, String>() {
#Override
public Iterable<String> call(Iterator<String> tuple) throws Exception {
final List<String> rddList = new ArrayList<String>();
Iterable<String> iterable = new Iterable<String>() {
#Override
public Iterator<String> iterator() {
return rddList.iterator();
}
};
while(tuple.hasNext()) {
URL formatURL = new URL((tuple.next().replaceAll("\"", ""))
.trim());
HttpURLConnection con = (HttpURLConnection) formatURL
.openConnection();
try(BufferedReader br = new BufferedReader(new InputStreamReader(con
.getInputStream()))) {
rddList.add(br.readLine());
} catch (IOException ex) {
return rddList;
}
}
return iterable;
}
});
here also for each record we are doing same .. isnt it ?
Currently you are using
map function
which creates a url request for each row in the partition.
You can use
mapPartition
Which will make the code run faster as it creates connection to the server only once , that is only one connection per partition.
A big cost here is setting up TCP/HTTPS connections. This is exacerbated by the fact that Even if you only read the first (short) line of a large file, in an attempt to re-use HTTP/1.1 connections better, modern HTTP clients try to read() to the end of the file, so avoiding aborting the connection. This is a good strategy for small files, but not for those in MB.
There is a solution there: set the content-length on the read, so that only a smaller block is read in, reducing the cost of the close(); the connection recycling then reduces HTTPS setup costs. This is what the latest Hadoop/Spark S3A client does if you set fadvise=random on the connection: requests blocks rather than the entire multi-GB file. Be aware though: that design is actually really bad if you are going byte-by-byte through a file...

C++ Legacy Driver mongoDB Replicaset in Class of a DLL

I have built a dll which includes a class implementing the mongoDB replicaset operations. Here is a summary of the class.
#include "mongo/client/dbclient.h"
mongoimp::mongoimp() {
mongo::client::initialize();
}
mongoimp::~mongoimp() {
mongo::client::shutdown();
}
int mongoimp::queryTRecords() {
string errmsg;
vector<mongo::HostAndPort> hosts = { mongo::HostAndPort("xx-a0.yyyy.com:xxxxx"), mongo::HostAndPort("xx-a1.yyyy.com:xxxxx") };
static mongo::DBClientReplicaSet con("xx", hosts, 0);
con.connect();
con.auth("dbname", "username", "password", errmsg);
auto_ptr<DBClientCursor> cursor = con.query("dbname.t", BSONObj());
BSONObj response;
con.logout("xx", response);
if (cursor->more()) {
BSONObj recordnm = cursor->nextSafe();
return(recordnm.getIntField("lastid"));
} else return(-1);
}
The above code is working. But here are my questions:
1) With the above setting, I can do normal mongoDB operations with the dll but since my application needs to constantly update mongoDB data (close to real-time, up to hundreds a second), I am getting error (No valid replicaset instance servers found) when updating data.
2) Only the server needs to talk to the mongoDB database. So basically I just need one connection to the database. So I want to declare the mongo::DBClientReplicaSet con as a static global variable and connect to it in the class construct function. But it seemed I cannot do it. My application cannot run at all. With that, I constantly get the following error.
Assertion failed: px != 0, file C:\Boost\include\boost-1_62\boost/smart_ptr/scoped_ptr.hpp, line 105
Does anybody know how to solve the problem?
Below is the code I tried:
static mongo::DBClientReplicaSet con("xx", { mongo::HostAndPort("xx-a0.yyyy.com:xxxxx"), mongo::HostAndPort("xx-a1.yyyy.com:xxxxx") }, 0);
mongoimp::mongoimp() {
mongo::client::initialize();
string errmsg;
con.connect();
con.auth("dbname", "username", "password", errmsg);
}
mongoimp::~mongoimp() {
BSONObj response;
con.logout("xx", response);
mongo::client::shutdown();
}
int mongoimp::queryTRecords() {
auto_ptr<DBClientCursor> cursor = con.query("dbname.t", BSONObj());
if (cursor->more()) {
BSONObj recordnm = cursor->nextSafe();
return(recordnm.getIntField("lastid"));
} else return(-1);
}
3) Last question, I noticed there is mongo/client/dbclient_rs.h" file for replicaset. But it seemed I cannot use it. With that, I am getting error for initialize() and auto_ptr cursor. How can I use the file and take full advantage of replicaset features? How can I initialize the relica set if I can use "dbclient_rs.h"? How do I do query and fetch data in that case?
Thanks a lot in advance!
For question No. 2, I remembered the reason for the error:
You need to call mongo::client::initialize before you construct any driver objects, or BSON for that matter.
But how to make that global definition possible, I still need a solution.

Some Problems of Indy 10 IdHTTP Implementation

In regard to Indy 10 of IdHTTP, many things have been running perfectly, but there are a few things that don't work so well here. That is why, once again, I need your help.
Download button has been running perfectly. I'm using the following code :
void __fastcall TForm1::DownloadClick(TObject *Sender)
{
MyFile = SaveDialog->FileName;
TFileStream* Fist = new TFileStream(MyFile, fmCreate | fmShareDenyNone);
Download->Enabled = false;
Urlz = Edit1->Text;
Url->Caption = Urlz;
try
{
IdHTTP->Get(Edit1->Text, Fist);
IdHTTP->Connected();
IdHTTP->Response->ResponseCode = 200;
IdHTTP->ReadTimeout = 70000;
IdHTTP->ConnectTimeout = 70000;
IdHTTP->ReuseSocket;
Fist->Position = 0;
}
__finally
{
delete Fist;
Form1->Updated();
}
}
However, a "Cancel Resume" button is still can't resume interrupted downloads. Meant, it is always sending back the entire file every time I call Get() though I've used IdHTTP->Request->Ranges property.
I use the following code:
void __fastcall TForm1::CancelResumeClick(TObject *Sender)
{
MyFile = SaveDialog->FileName;;
TFileStream* TFist = new TFileStream(MyFile, fmCreate | fmShareDenyNone);
if (IdHTTP->Connected() == true)
{
IdHTTP->Disconnect();
CancelResume->Caption = "RESUME";
IdHTTP->Response->AcceptRanges = "Bytes";
}
else
{
try {
CancelResume->Caption = "CANCEL";
// IdHTTP->Request->Ranges == "0-100";
// IdHTTP->Request->Range = Format("bytes=%d-",ARRAYOFCONST((TFist->Position)));
IdHTTP->Request->Ranges->Add()->StartPos = TFist->Position;
IdHTTP->Get(Edit1->Text, TFist);
IdHTTP->Request->Referer = Edit1->Text;
IdHTTP->ConnectTimeout = 70000;
IdHTTP->ReadTimeout = 70000;
}
__finally {
delete TFist;
}
}
Meanwhile, by using the FormatBytes function, found here, has been able to shows only the size of download files. But still unable to determine the speed of download or transfer speed.
I'm using the following code:
void __fastcall TForm1::IdHTTPWork(TObject *ASender, TWorkMode AWorkMode, __int64 AWorkCount)
{
__int64 Romeo = 0;
Romeo = IdHTTP->Response->ContentStream->Position;
// Romeo = AWorkCount;
Download->Caption = FormatBytes(Romeo) + " (" + IntToStr(Romeo) + " Bytes)";
ForSpeed->Caption = FormatBytes(Romeo);
ProgressBar->Position = AWorkCount;
ProgressBar->Update();
Form1->Updated();
}
Please advise and give an example. Any help would sure be appreciated!
In your DownloadClick() method:
Calling Connected() is useless, since you don't do anything with the result. Nor is there any guarantee that the connection will remain connected, as the server could send a Connection: close response header. I don't see anything in your code that is asking for HTTP keep-alives. Let TIdHTTP manage the connection for you.
You are forcing the Response->ResponseCode to 200. Don't do that. Respect the response code that the server actually sent. The fact that no exception was raised means the response was successful whether it is 200 or 206.
You are reading the ReuseSocket property value and ignoring it.
There is no need to reset the Fist->Position property to 0 before closing the file.
Now, with that said, your CancelResumeClick() method has many issues.
You are using the fmCreate flag when opening the file. If the file already exists, you will overwrite it from scratch, thus TFist->Position will ALWAYS be 0. Use fmOpenReadWrite instead so an existing file will open as-is. And then you have to seek to the end of the file to provide the correct Position to the Ranges header.
You are relying on the socket's Connected() state to make decisions. DO NOT do that. The connection may be gone after the previous response, or may have timed out and been closed before the new request is made. The file can still be resumed either way. HTTP is stateless. It does not matter if the socket remains open between requests, or is closed in between. Every request is self-contained. Use information provided in the previous response to govern the next request. Not the socket state.
You are modifying the value of the Response->AcceptRanges property, instead of using the value provided by the previous response. The server tells you if the file supports resuming, so you have to remember that value, or query it before then attempting to resumed download.
When you actually call Get(), the server may or may not respect the requested Range, depending on whether the requested file supports byte ranges or not. If the server responds with a response code of 206, the requested range is accepted, and the server sends ONLY the requested bytes, so you need to APPEND them to your existing file. However, if the server response with a response code of 200, the server is sending the entire file from scratch, so you need to REPLACE your existing file with the new bytes. You are not taking that into account.
In your IdHTTPWork() method, in order to calculate the download/transfer speed, you have to keep track of how many bytes are actually being transferred in between each event firing. When the event is fired, save the current AWorkCount and tick count, and then the next time the event is fired, you can compare the new AWorkCount and current ticks to know how much time has elapsed and how many bytes were transferred. From those value, you can calculate the speed, and even the estimated time remaining.
As for your progress bar, you can't use AWorkCount alone to calculate a new position. That only works if you set the progress bar's Max to AWorkCountMax in the OnWorkBegin event, and that value is not always know before a download begins. You need to take into account the size of the file being downloaded, whether it is being downloaded fresh or being resumed, how many bytes are being requested during a resume, etc. So there is lot more work involved in displaying a progress bar for a HTTP download.
Now, to answer your two questions:
How to retrieve and save the download file to a disk by using its original name?
It is provided by the server in the filename parameter of the Content-Disposition header, and/or in the name parameter of the Content-Type header. If neither value is provided by the server, you can use the filename that is in the URL you are requesting. TIdHTTP has a URL property that provides the parsed version of the last requested URL.
However, since you are creating the file locally before sending your download request, you will have to create a local file using a temp filename, and then rename the local file after the download is complete. Otherwise, use TIdHTTP.Head() to determine the real filename (you can also use it to determine if resuming is supported) before creating the local file with that filename, then use TIdHTTP.Get() to download to that local file. Otherwise, download the file to memory using TMemoryStream instead of TFileStream, and then save with the desired filename when complete.
when I click http://get.videolan.org/vlc/2.2.1/win32/vlc-2.2.1-win32.exe then the server will process requests to its actual url. http://mirror.vodien.com/videolan/vlc/2.2.1/win32/vlc-2.2.1-win32.exe. The problem is that IdHTTP will not automatically grab through it.
That is because VideoLan is not using an HTTP redirect to send clients to the real URL (TIdHTTP supports HTTP redirects). VideoLan is using an HTML redirect instead (TIdHTTP does not support HTML redirects). When a webbrowser downloads the first URL, a 5 second countdown timer is displayed before the real download then begins. As such, you will have to manually detect that the server is sending you an HTML page instead of the real file (look at the TIdHTTP.Response.ContentType property for that), parse the HTML to determine the real URL, and then download it. This also means that you cannot download the first URL directly into your target local file, otherwise you will corrupt it, especially during a resume. You have to cache the server's response first, either to a temp file or to memory, so you can analyze it before deciding how to act on it. It also means you have to remember the real URL for resuming, you cannot resume the download using the original countdown URL.
Try something more like the following instead. It does not take into account for everything mentioned above (particularly speed/progress tracking, HTML redirects, etc), but should get you a little closer:
void __fastcall TForm1::DownloadClick(TObject *Sender)
{
Urlz = Edit1->Text;
Url->Caption = Urlz;
IdHTTP->Head(Urlz);
String FileName = IdHTTP->Response->RawHeaders->Params["Content-Disposition"]["filename"];
if (FileName.IsEmpty())
{
FileName = IdHTTP->Response->RawHeaders->Params["Content-Type"]["name"];
if (FileName.IsEmpty())
FileName = IdHTTP->URL->Document;
}
SaveDialog->FileName = FileName;
if (!SaveDialog->Execute()) return;
MyFile = SaveDialog->FileName;
TFileStream* Fist = new TFileStream(MyFile, fmCreate | fmShareDenyWrite);
try
{
try
{
Download->Enabled = false;
Resume->Enabled = false;
IdHTTP->Request->Clear();
//...
IdHTTP->ReadTimeout = 70000;
IdHTTP->ConnectTimeout = 70000;
IdHTTP->Get(Urlz, Fist);
}
__finally
{
delete Fist;
Download->Enabled = true;
Updated();
}
}
catch (const EIdHTTPProtocolException &)
{
DeleteFile(MyFile);
throw;
}
}
void __fastcall TForm1::ResumeClick(TObject *Sender)
{
TFileStream* Fist = new TFileStream(MyFile, fmOpenReadWrite | fmShareDenyWrite);
try
{
Download->Enabled = false;
Resume->Enabled = false;
IdHTTP->Request->Clear();
//...
Fist->Seek(0, soEnd);
IdHTTP->Request->Ranges->Add()->StartPos = Fist->Position;
IdHTTP->Request->Referer = Edit1->Text;
IdHTTP->ConnectTimeout = 70000;
IdHTTP->ReadTimeout = 70000;
IdHTTP->Get(Urlz, Fist);
}
__finally
{
delete Fist;
Download->Enabled = true;
Updated();
}
}
void __fastcall TForm1::IdHTTPHeadersAvailable(TObject*Sender, TIdHeaderList *AHeaders, bool &VContinue)
{
Resume->Enabled = ( ((IdHTTP->Response->ResponseCode == 200) || (IdHTTP->Response->ResponseCode == 206)) && TextIsSame(AHeaders->Values["Accept-Ranges"], "bytes") );
if ((IdHTTP->Response->ContentStream) && (IdHTTP->Request->Ranges->Count > 0) && (IdHTTP->Response->ResponseCode == 200))
IdHTTP->Response->ContentStream->Size = 0;
}
#Romeo:
Also, you can try a following function to determine the real download filename.
I've translated this to C++ based on the RRUZ'function. So far so good, I'm using it on my simple IdHTTP download program, too.
But, this translation result is of course still need value improvement input from Remy Lebeau, RRUZ, or any other master here.
String __fastcall GetRemoteFileName(const String URI)
{
String result;
try
{
TIdHTTP* HTTP = new TIdHTTP(NULL);
try
{
HTTP->Head(URI);
result = HTTP->Response->RawHeaders->Params["Content-Disposition"]["filename"];
if (result.IsEmpty())
{
result = HTTP->Response->RawHeaders->Params["Content-Type"]["name"];
if (result.IsEmpty())
result = HTTP->URL->Document;
}
}
__finally
{
delete HTTP;
}
}
catch(const Exception &ex)
{
ShowMessage(const_cast<Exception&>(ex).ToString());
}
return result;
}

Design puzzle about database connectivity architecture

I am designing a database browsing application, which till now had MySQL support, but recently I have started implementing supporting Sqlite too and I face some ugliness while designing the way the connectivity architecture is being implemented. This is only about the "connection" part (ie: where you get the user/db/host, or for sqlite the filename), not the database functionality. That is sorted out already.
I have a base class "Connection" which exposes "normal" methods like name(), or pure virtual methods which are like virtual string fullLocation() = 0 which returns me a string that can be used to identify the database (such as: database#host for MySql, or /etc/mydb.sqlite for Sqlite).
Now, the user of course needs to specify a database he wants to connect to, so in the GUI of the application he simply chooses the type and then fills in the credentials. And here my troubles start. I have created a MySqlConnection and an SqliteConnectionclasses, both derived from Connection but most of the cases I end up with something like:
Connection* c = 0;
if(gui->engine_name() == "MYSQL")
{
string host = gui->getHost();
string user = gui->getUser();
string password = gui->getPassword();
int port = gui->getPort();
string db = gui->getDatabase();
c = new MySqlConnection(host, user, password, db, port);
}
else
{
string dbFile = gui->getSqliteDbFile();
c = new SqliteConnection(dbFile);
}
string meta = application->use_connection(&c);
and I have the fear, that this will continue through the entire application, due to the so different nature of these two database engines.
Do you have some guidance on how to solve this in an elegant way?
You need pattern Factory, that will create Connection for you in abstract way. It is superficial answer.
This Factory would be nice parametrize with Builder pattern. Something like that:
ParamBuilder *b = new ParamBuilder;
if(gui->engine_name() == "MYSQL")
{
b->setHost(gui->getHost())
->setUser(gui->getUser());
->setPassword(gui->getPassword());
...
}
else
{
b->setFile(gui->getSqliteDbFile());
}
Connection *c = globalConnectionFactory->createConnection(b);
A much more elegant way would be designing a factory class and handling GUI inputs in the GenerateConnection() method of that factory:
void ConnectionFactory::GenerateConnection(Connection* c)
{
if(gui->engine_name() == "MYSQL")
{
string host = gui->getHost();
string user = gui->getUser();
string password = gui->getPassword();
int port = gui->getPort();
string db = gui->getDatabase();
c = new MySqlConnection(host, user, password, db, port);
}
else
{
string dbFile = gui->getSqliteDbFile();
c = new SqliteConnection(dbFile);
}
}
If you do not prefer a dependency on the gui, you can just define a struct named Parameters and update its instance according to the gui inputs and give this object into the connection generating method of connection factory.

How Can I Call PL/pgSQL Function From C++ Code

I am trying to call a function which is declared in PostgreSQL with PL/pgSQL. For that I write the code below. My function is working but after that I am taking a "PGRES_FATAL_ERROR". Also when I changed "select removestopwords()" with an sql query like "DELETE * FROM TABLE1" it's working successfully.
I am considering, that error can cause some big problem in future even if now working. How can I call a PL/pgSQL function without taking error?
void removeStopWordsDB(PGconn* conn) {
PGresult *res = PQexec(conn, "select removestopwords()");
if (PQresultStatus(res) != PGRES_COMMAND_OK) {
printf("removestopwords failed");
cout<<PQresultStatus(res);
PQclear(res);
exit_nicely(conn);
}
printf("removestopwords - OK\n");
PQclear(res);
}
If you get PGRES_FATAL_ERROR from PQresultStatus you should use PQresultErrorField to get all the error data from the result set to provide a useful error message. This will allow you to determine what the actual error is here (quite likely an error being sent over from the server).
Consider creating a class to hold PostgreSQL error details that can be constructed from q PQresult pointer, e.g.:
PgError(const PGresult *rs)
{
severity = GetErrorField(rs, PG_DIAG_SEVERITY);
sqlstate = GetErrorField(rs, PG_DIAG_SQLSTATE);
primary = GetErrorField(rs, PG_DIAG_MESSAGE_PRIMARY);
// ...
}
static std::string GetErrorField(const PGresult *rs, int fieldCode)
{
const char *message = PQresultErrorField(rs, fieldCode);
if (message == NULL) return "";
return std::string(message);
}
Then you can, for example, encapsulate dumping out the error to a stream in this object to provide details just like psql and friends do (although strictly speaking, you'd need the input SQL as well for all of that)
PostgreSQL API doesn't support some flag like "ignore all errors". If you would to ignore result, then just don't check result in host environment. But it is bad strategy.