X devAPI batch insert extremely slow

X devAPI batch insert extremely slow - c++

I use the C++ connector for MySQL and the X dev API code.
On my test server (my machine), doing a single insert in loop is pretty slow (about 1000 per second) on a basic table with a few columns. It has a unique index on a char(40) field which is possibly the cause of the slowness. But since the DB is configured as developer mode, I guess this should be expected.
I wanted to improve this by doing batch inserts. The problem is that it is even slower (about 20 per second). The execute() itself is quite fast, but the .values() are extremely slow. The code looks something like this:
try
{
mysqlx::TableInsert MyInsert = m_DBRegisterConnection->GetSchema()->getTable("MyTable").insert("UniqueID", "This", "AndThat");
for (int i = 0; i < ToBeInserted; i++)
{
MyInsert = MyInsert.values(m_MyQueue.getAt(i)->InsertValues[0],
m_MyQueue.getAt(i)->InsertValues[1],
m_MyQueue.getAt(i)->InsertValues[2]);
}
MyInsert.execute();
}
catch (std::exception& e)
{
}
Here is the table create:
CREATE TABLE `players` (
`id` bigint NOT NULL AUTO_INCREMENT,
`UniqueID` char(32) CHARACTER SET ascii COLLATE ascii_general_ci NOT NULL,
`PlayerID` varchar(500) DEFAULT NULL,
`Email` varchar(255) DEFAULT NULL,
`Password` varchar(63) DEFAULT NULL,
`CodeEmailValidation` int DEFAULT NULL,
`CodeDateGenerated` datetime DEFAULT NULL,
`LastLogin` datetime NOT NULL,
`Validated` tinyint DEFAULT '0',
`DateCreated` datetime NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `UniqueID_UNIQUE` (`UniqueID`)
) ENGINE=InnoDB AUTO_INCREMENT=21124342 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
Any clue why this is much slower? Is there a better way to do a batch insert?

The issue is in your code.
MyInsert = MyInsert.values(m_MyQueue.getAt(i)->InsertValues[0],
m_MyQueue.getAt(i)->InsertValues[1],
m_MyQueue.getAt(i)->InsertValues[2]);
You are copying over and over again the MyInsert object to a temporary and restroying it....
Should only be:
MyInsert.values(m_MyQueue.getAt(i)->InsertValues[0],
m_MyQueue.getAt(i)->InsertValues[1],
m_MyQueue.getAt(i)->InsertValues[2]);
However, since this could be prevented on the connector code, I'll report a bug to fix the copy behavior.

INSERT up to 1000 rows in a single INSERT statement. That will run 10 times as fast.
Is that CHAR(40) some form of UUID or Hash? If so, would it be possible sort the data before inserting? That may help it run faster. However, please provide SHOW CREATE TABLE so I can discuss this aspect further. I really need to seen all the indexes and datatypes.

Related

Is it a good idea to have an index on a boolean?

I have a table with a boolean field, IsNew, that indicates whether or not the corresponding entity is new. I want to periodically query for all entities in a particular state. What are the implications of having index on boolean (or enum)? Will it create a hotspot? Any limitations on QPS?

A secondary index is implemented internally as a table that has a primary key based on the declared secondary index key, plus whatever indexed table keys weren't mentioned in the secondary index explicitly. So, say you have a table like this:
CREATE TABLE UserThings (
UserId INT64 NOT NULL,
ThingId INT64 NOT NULL,
...
IsNew BOOL NOT NULL,
...
) PRIMARY KEY(UserId, ThingId), ...
And you create an index like this:
CREATE INDEX UserThingsByIsNew ON UserThings(IsNew, ThingId)
That'll create an internal table that looks something like this:
CREATE TABLE UserThingsByStatus_Index (
IsNew BOOL,
ThingId INT64 NOT NULL,
UserId INT64 NOT NULL,
) PRIMARY KEY(new, ThingId, UserId), ...
So, when you update rows of UserThings to change the value of the IsNew column, it will delete the old row in UserThingsByIsNew_Index, and insert an additional row. This will tend to create a lot of churn in the index if the IsNew value of rows is changing at a high frequency. This might not be a problem at all, but you will only really know by testing your scenario under a real-world workload for a sustained time.
If you don't update the IsNew field of entities too frequently, then you probably won't have any hot-spotting problems. That's why I mentioned earlier that Cloud Spanner also appends the original table keys to the keys of the index: assuming that your original table rows are well-distributed by the table's keys, then the portion of the index for IsNew=true and IsNew=false, respectively, will have a similar distribution, and shouldn't cause a hotspot.

dynamodb - scan items where map contains a key

I have a table that contains a field (not a key field), called appsMap, and it looks like this:
appsMap = { "qa-app": "abc", "another-app": "xyz" }
I want to scan all rows whose appsMap contains the key "qa-app" (the value is not important, just the key). I tried something like this but it doesn't work in the way I need:
FilterExpression = '#appsMap.#app <> :v',
ExpressionAttributeNames = {
"#app": "qa-app",
"#appsMap": "appsMap"
},
ExpressionAttributeValues = {
":v": { "NULL": True }
},
ProjectionExpression = "deviceID"
What's the correct syntax?
Thanks.

There is a discussion on the subject here:
https://forums.aws.amazon.com/thread.jspa?threadID=164470
You might be missing this part from the example:
ExpressionAttributeValues: {":name":{"S":"Jeff"}}
However, just wanted to echo what was already being said, scan is an expensive procedure that goes through every item and thus making your database hard to scale.
Unlike with other databases, you have to do plenty of setup with Dynamo in order to get it to perform at it's great level, here is a suggestion:
1) Convert this into a root value, for example add to the root: qaExist, with possible values of 0|1 or true|false.
2) Create secondary index for the newly created value.
3) Make query on the new index specifying 0 as a search parameter.
This will make your system very fast and very scalable regardless of how many records you get in there later on.

If I understand the question correctly, you can do the following:
FilterExpression = 'attribute_exists(#0.#1)',
ExpressionAttributeNames = {
"#0": "appsMap",
"#1": "qa-app"
},
ProjectionExpression = "deviceID"

Since you're not being a bit vague about your expectations and what's happening ("I tried something like this but it doesn't work in the way I need") I'd like to mention that a scan with a filter is very different than a query.
Filters are applied on the server but only after the scan request is executed, meaning that it will still iterate over all data in your table and instead of returning you each item, it applies a filter to each response, saving you some network bandwidth, but potentially returning empty results as you page trough your entire table.
You could look into creating a GSI on the table if this is a query you expect to have to run often.

Trying to add foreign key in mysql with heidisql

I've been trying to add a foreign key to my table using heidisql and I keep getting the error 1452.
After reading around I made sure all my tables were running on InnoDB as well as checking that they had the same datatype and the only way I can add my key is if I drop all my data which I don't intend to do since I have spent quite a few hours on this.
here is my table create code:
CREATE TABLE `data` (
`ID` INT(10) NOT NULL AUTO_INCREMENT,
#bunch of random other columns stripped out
`Ability_1` SMALLINT(5) UNSIGNED NOT NULL DEFAULT '0',
#more stripped tables
`Extra_Info` SET('1','2','3','Final','Legendary') NOT NULL DEFAULT '1' COLLATE 'utf8_unicode_ci',
PRIMARY KEY (`ID`),
UNIQUE INDEX `ID` (`ID`)
)
COLLATE='utf8_unicode_ci'
ENGINE=InnoDB
AUTO_INCREMENT=650;
here is table 2
CREATE TABLE `ability` (
`ability_ID` SMALLINT(5) UNSIGNED NOT NULL AUTO_INCREMENT,
#stripped columns
`Name_English` VARCHAR(12) NOT NULL COLLATE 'utf8_unicode_ci',
PRIMARY KEY (`ability_ID`),
UNIQUE INDEX `ability_ID` (`ability_ID`)
)
COLLATE='utf8_unicode_ci'
ENGINE=InnoDB
AUTO_INCREMENT=165;
Finally here is the create code along with the error.
ALTER TABLE `data`
ADD CONSTRAINT `Ability_1` FOREIGN KEY (`Ability_1`) REFERENCES `ability` (`ability_ID`) ON UPDATE CASCADE ON DELETE CASCADE;
/* SQL Error (1452): Cannot add or update a child row: a foreign key constraint fails (`check`.`#sql-ec0_2`, CONSTRAINT `Ability_1` FOREIGN KEY (`Ability_1`) REFERENCES `ability` (`ability_ID`) ON DELETE CASCADE ON UPDATE CASCADE) */
If there is anything else I can provide please let me know this is really bothering me. I'm also using 5.5.27 - MySQL Community Server (GPL) that came with xampp installer.

If you are using HeidiSQL it is pretty easy.
Just see the image, click on the +Add to add foreign keys.
I prefer GUI way of creating tables and its attribute because it saves time and reduces errors.

I found it. Sorry everyone. The problem was that I had 0 as a default value for my fields while my original table had no value for 0.

Here is how you can do it ;
Create your Primary keys. For me this was straight forward so I won't post how to do that here
To create your FOREIGN KEYS you need to change the table / engine type for each table from MyIASM to InnoDb. To do this Select the table on the right hand side then select the OPTIONS tab on the right hand side and change the engine from MyIASM to InnoDb for every table.

SQLite UPDATE 100ms

I'm using the Qt database abstraction layer to interface with Sqlite3.
int x = GetTickCount();
database.exec("UPDATE controls SET dtype=32 WHERE id=2");
qDebug() << GetTickCount()-x;
The table is:
CREATE TABLE controls (
id INTEGER PRIMARY KEY AUTOINCREMENT,
internal_id TEXT,
name TEXT COLLATE NOCASE,
config TEXT,
dtype INTEGER,
dconfig TEXT,
val TEXT,
device_id INTEGER REFERENCES devices(id) ON DELETE CASCADE
);
Results in an update time of ~100 ms! Even though nothing else is accessing the db and there are a grand total of 3 records in that table.
That seems ridiculously long to me. 10 records would already take a second to complete. Is this the performance I should expect from sqlite or is something messing me up somewhere? SELECT queries are fast enough ~1ms.
Edit 1
So it's not Qt.
sqlite3 *db;
if ( sqlite3_open("example.db",&db ) != SQLITE_OK )
{
qDebug() << "Could not open";
return;
}
int x = GetTickCount();
sqlite3_exec(db, "UPDATE controls SET dtype=3 WHERE id=2",0,0,0);
qDebug() << "Took" << GetTickCount() - x;
sqlite3_close(db);
This guy takes just the same amount of time.

When you access your hard disk it can take long.
Try one of these:
PRAGMA journal_mode = memory;
PRAGMA synchronous = off;
So it will not touch the disk immediately.
SQLite can be very very fast when it is used well. The select statement for your small database is answered from cache.
There are other ways to tweak your database. See other questions like this:
Improve INSERT-per-second performance of SQLite?

Multiple access to static data in a django app

I'm building an application and I'm having trouble making a choice about how is the best way to access multiple times to static data in a django app. My experience in the field is close to zero so I could use some help.
The app basically consists in a drag & drop of foods. When you drag a food to a determined place(breakfast for example) differents values gets updated: total breakfast calories, total day nutrients(Micro/Macro), total day calories, ...That's why I think the way I store and access the data it's pretty important performance speaking.
This is an excerpt of the json file I'm currently using:
foods.json
{
"112": {
"type": "Vegetables",
"description": "Mushrooms",
"nutrients": {
"Niacin": {
"unit": "mg",
"group": "Vitamins",
"value": 3.79
},
"Lysine": {
"units": "g",
"group": "Amino Acids",
"value": 0.123
},
... (+40 nutrients)
"amount": 1,
"unit": "cup whole",
"grams": 87.0 }
}
I've thought about different options:
1) JSON(The one I'm currently using):
Every time I drag a food to a "droppable" place, I call a getJSON function to access the food data and then update the corresponding values. This file has a 2mb size, but it surely will increase as I add more foods to it. I'm using this option because it was the most quickest to begin to build the app but I don't think it's a good choice for the live app.
2) RDBMS with normalized fields:
I could create two models: Food and Nutrient, each food has 40+ nutrients related by a FK. The problem I see with this is that every time a food data request is made, the app will hit the db a lot of times to retrieve it.
3) RDBMS with picklefield:
This is the option I'm actually considering. I could create a Food models and put the nutrients in a picklefield.
4) Something with Redis/Django Cache system:
I'll dive more deeply into this option. I've read some things about them but I don't clearly know if there's some way to use them to solve the problem I have.
Thanks in advance,
Mariano.

This is a typical use case for a relational database. More or less normalized form is the proper way most of the time.
I wrote this data model up from the top of my head, according to your example:
CREATE TABLE unit(
unit_id integer PRIMARY KEY
,unit text NOT NULL
,metric_unit text NOT NULL
,atomic_amount numeric NOT NULL
);
CREATE TABLE food_type(
food_type_id integer PRIMARY KEY
,food_type text NOT NULL
);
CREATE TABLE nutrient_type(
nutrient_type_id integer PRIMARY KEY
,nutrient_type text NOT NULL
);
CREATE TABLE food(
food_id serial PRIMARY KEY
,food text NOT NULL
,food_type_id integer REFERENCES food_type(food_type_id) ON UPDATE CASCADE
,unit_id integer REFERENCES unit(unit_id) ON UPDATE CASCADE
,base_amount numeric NOT NULL DEFAULT 1
);
CREATE TABLE nutrient(
nutrient_id serial PRIMARY KEY
,nutrient text NOT NULL
,metric_unit text NOT NULL
,base_amount numeric NOT NULL
,calories integer NOT NULL DEFAULT 0
);
CREATE TABLE food_nutrient(
food_id integer references food (food_id) ON UPDATE CASCADE ON DELETE CASCADE
,nutrient_id integer references nutrient (nutrient_id) ON UPDATE CASCADE
,amount numeric NOT NULL DEFAULT 1
,CONSTRAINT food_nutrient_pkey PRIMARY KEY (food_id, nutrient_id)
);
CREATE TABLE meal(
meal_id serial PRIMARY KEY
,meal text NOT NULL
);
CREATE TABLE meal_food(
meal_id integer references meal(meal_id) ON UPDATE CASCADE ON DELETE CASCADE
,food_id integer references food (food_id) ON UPDATE CASCADE
,amount numeric NOT NULL DEFAULT 1
,CONSTRAINT meal_food_pkey PRIMARY KEY (meal_id, food_id)
);
This is definitely not, how it should work:
every time a food data request is made, the app will hit the db a lot
of times to retrieve it.
You should calculate / aggregate all values you need in a view or function and hit the database only once per request, not many times.
Simple example to calculate the calories of a meal according to the above model:
SELECT sum(n.calories * fn.amount * f.base_amount * u.atomic_amount * mf.amount)
AS meal_calories
FROM meal_food mf
JOIN food f USING (food_id)
JOIN unit u USING (unit_id)
JOIN food_nutrient fn USING (food_id)
JOIN nutrient n USING (nutrient_id)
WHERE mf.meal_id = 7;
You can also use materialized views. For instance, store computed values per food in a table and update it automatically if underlying data changes. Most likely, those rarely change (but are still easily updated this way).

I think the flat file version you are using comes in last place. Every time it is requested it is being read from top to bottom. For the size I think this comes in last place. The cache system would provide the best performance, but the RDBMS would be the easiest to manage/extend, plus your queries will automatically be cached.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

X devAPI batch insert extremely slow - c++

Related

Is it a good idea to have an index on a boolean?

dynamodb - scan items where map contains a key

Trying to add foreign key in mysql with heidisql

SQLite UPDATE 100ms

Multiple access to static data in a django app

Categories

Resources