Want to improve innodb insertion performance - c++

I have 24 gb ram , centos 7 Os , mariadb 10.3 Version database and 1TB ssd linux server and i am running my cpp script and mariacpp connector to insert data in tables and i have around 50000 tables and want to insert data only in 5000 tables from those 50000 tables but when i run my script using cpp threading and mariacpp connector i get only 50 insertion / sec . Which is very low so i want to improve InnoDb insertion performance. So Please suggest me any configurations and any other ways.
My Current InnoDB Configurations :
Innodb_read_io_threads=64;
innodb_write_io_threads=64 ;
innodb_buffer_pool_size=16G;
innodb_buffer_pool_load_at_startup=ON;
innodb_log_file_size = 1G ;
innodb_log_files_in_group=10;
innodb_file_per_table=1;
innodb_log_buffer_size=1G;
innodb_flush_method=O_DIRECT;
innodb_flush_log_at_trx_commit=2;
skip-innodb_doublewrite ;
innodb_io_capacity = 2000;
innodb_io_capacity_max = 3000;
innodb_flush_sync=1;
I tried many thing related variables but not improved and i changed my all tables from MyISAM to InnoDB using Alter table query.

innodb_log_buffer_size=1G
That may be dangerously high. Lower it to 1% of RAM.
Having 50K tables is usually a bad schema design pattern; please explain what you are doing.
There are multiple ways to do fast INSERTs:
LOAD DATA INFILE -- but only if you already have the data in a CSV file.
Batched INSERT -- INSERT INTO t (...) VALUES (1,2), (3,4), ... -- Above 1000 rows at a time you get into "diminishing returns and other inefficiencies.
Putting several INSERTs into a single transaction. This also has issues with diminishing returns and inefficiencies.
Which of those have you employed? (Then we can critique things further.)
Please provide SHOW CREATE TABLE for one of the tables. There may be clues on what would help. (One example: lots of unnecessary indexes.)
Describe your disk subsystem -- HDD vs SSD; RAID; etc.

Related

SAS PROC SQL: How to clear cache between testing

I am reading this paper: "Need for Speed - Boost Performance in Data Processing with SAS/Access® Interface to Oracle". And I would like to know how to clear the cache / buffer in SAS, so my repeated query / test will be reflective of the changes accurately?
I noticed the same query running the first time takes 10 seconds, and (without) changes running it immediately after will take shorter time (say 1-2 seconds). Is there a command / instruction to clear the cache / buffer. So I can have a clean test for my new changes.
I am using SAS Enterprise Guide with data hosted on an Oracle server. Thanks!
In order to flush caches on the Oracle side, you need both DBA privileges (to run alter system flush buffer_cache; in Oracle) and OS-level access (to flush the OS' buffer cache - echo 3 > /proc/sys/vm/drop_caches on common filesystems under Linux).
If you're running against a production database, you probably don't have those permissions -- you wouldn't want to run those commands on a production database anyways, since it would degrade the performance for all users of the database, and other queries would affect the time it takes to run yours.
Instead of trying to accurately measure the time it takes to run your query, I would suggest paying attention to how the query is executed:
what part of it is 'pushed down' to the DB and how much data flows between SAS and Oracle
what is Oracle's explain plan for the query -- does it have obvious inefficiencies
When a query is executed in a clearly suboptimal way, you will find (more often than not) that the fixed version will run faster both with cold and hot caches.
To apply this to the case you mention (10 seconds vs 2 seconds) - before thinking how to measure this accurately, start by looking
if your query gets correctly pushed down to Oracle (it probably does),
and whether it requires a full table (partition) scan of a sufficiently large table (depending on how slow the IO in your DB is - on the order of 1-10 GB).
If you find that the query needs to read 1 GB of data and your typical (in-database) read speed is 100MB/s, then 10s with cold cache is the expected time to run it.
I'm no Oracle expert but I doubt there's any way you can 'clear' the oracle cache (and if there were you would probably need to be a DBA to do so).
Typically what I do is I change the parameters of the query slightly so that the exact query no longer matches anything in the cache. For example, you could change the date range you are querying against.
It won't give you an exact performance comparison (because you're pulling different results) but it will give you a pretty good idea if one query performs significantly better than the other.

Why Amazon Redshift UNLOAD performance is much better for fresh data?

I wonder why unloading from a big table (>100 bln rows) when selecting by a column, which is NOT a sort key or a part of sort key, is immensely faster for newly added data. How Redshift understands that it is time to stop sequential scan in the second scenario?
Time the query spent executing. 39m 37.02s:
UNLOAD ('SELECT * FROM production.some_table WHERE daytime BETWEEN
\\'2017-01-15\\' AND \\'2017-01-16\\'') TO ...
vs.
Time the query spent executing. 23.01s :
UNLOAD ('SELECT * FROM production.some_table WHERE daytime BETWEEN
\\'2017-06-24\\' AND \\'2017-06-25\\'') TO ...
Thanks!
Amazon Redshift uses zone maps to identify the minimum and maximum value stored in each 1MB block on disk. Each block only stores data related to a single column (eg daytime).
If the SORTKEY is not set to daytime, then the data is unsorted and any particular date could appear in many different blocks. If SORTKEY is used, then a particular date will only appear in a minimum number of blocks.
Your second query possibly executes faster, even without a SORTKEY, because you are querying data that was probably added recently and is therefore all stored together in just a few blocks. The historical data might be spread in many blocks because a VACUUM probably reordered the data based upon the correct SORTKEY. In fact, if you did a VACUUM now, you might find that your second query becomes slower.

VoltDB is exhausting the RAM while loading the data

I am trying to load the database tables into VoltDB database using csvloader utility of VoltDB. When I am trying to load one table of size 5GB, Voltdb eats the RAM so fast that free RAM become 200 MB from 55 GB, then the VoltDB process gets killed by the system.
What can be the reason for this and what are the recommended setting for VoltDB to avoid this?
Is the table you are loading partitioned? That's the first thing to check, because if you have the default sitesperhost=8 on a single server, and the table is not partitioned, there will be a complete copy of the table in each of the 8 partitions. If the table is partitioned, the data is distributed among the partitions based on the hashing assignment of the values of the partitioning key column.
If it's partitioned and you still can't load all of the data, the next thing to look at would be the schema. There are formulas in the Planning Guide that describe the memory usage for given datatypes and for indexes. The VMC interface also has a sizing worksheet that gives you the mins and maxes based on the schema. You could also post the definition of the table you are trying to load, along with any indexes you have defined on it, and we can explain more about the bytes it would use per row.

Disk usage when redshift doing vacuum merge?

I know redshift split vacuum progress into two stages: sort and merge.
During sorting, the disk usage will not change, but merge stage seems will occupy a lot of free space.
My cluster is 3 nodes of dw2.xlarge, total 480 GB SSD. Before vacuuming, the total disk usage is around 50%.
I'm doing a vacuum on a a table of 81GB, but it failed when trying to do merge, due to disk full error.
I want to know how much space should I reserve for vacuuming a large unsorted table?
I asked this question to redshift support team, but haven't got any reply until now. Does anyone have experience with it?
Yes for longer vacuum use deep copy
This will should avoid disk usage problems.
while you to insert into ( select * from...) here in select you select sorted order for data so that you data will be inserted in sorted fashion.
Do incremental insert into (select * from ) as per your sort key
e.g if you are storing data for 30 days then do deep copy day by day
This should avoid space issues

Which is the fastest way to retrieve all items in SQLite?

I am programming on windows, I store my infors in sqlite.
However I find to get all items is a bit slow.
I am using the following way:
select * from XXX;
Retrieving all items in 1.7MB SQLite DB takes about 200-400ms.
It is too slow. Can anyone help?
Many Thanks!
Thanks for your answers!
I have to do a complex operation on the data, so everytime, when I open the app, I need to read all information from DB.
I would try the following:
Vacuum your database by running the "vacuum" command
SQLite starts with a default cache size of 2000 pages. (Run the command "pragma cache_size" to be sure. Each page is 512 bytes, so it looks like you have about 1 MByte of cache, which is not quite enough to contain your database. Increase your cache size by running "pragma default_cache_size=4000". That should get you 2 Mbytes cache, which is enough to get your entire database into the cache. You can run these pragma commands from the sqlite3 command line, or through your program as if it were another query.
Add an index to your table on the field you are ordering with.
You could possibly speed it up slightly by selecting only those columns you want, but otherwise nothing will beat an unordered select with no where clause for getting all the data.
Other than that a faster disk/cpu is your only option.
What type of hardware is this on?