Postgres does not apply parallel query - postgresql-11

I am using "PostgreSQL 11.4, compiled by Visual C++ build 1914, 64-bit"
I want to run parallel query for testing purpose, below is my pg_settings parameter values:
"checkpoint_completion_target" >> "0.5"
"default_statistics_target" >> "100"
"effective_cache_size" >> "524288" "8kB"
"maintenance_work_mem" >> "65536" "kB"
"max_connections" >> "100"
"max_parallel_workers" >> "8"
"max_parallel_workers_per_gather" >> "2"
"max_wal_size" >> "1024" "MB"
"max_worker_processes" >> "8"
"min_wal_size" >> "80" "MB"
"random_page_cost" >> "4"
"shared_buffers" >> "16384" "8kB"
"wal_buffers" >> "512" "8kB"
"work_mem" >> "4096" "kB"
While I try to explain the query it doesn't show any 'Workers Planned' how to confirm if my DB support parallel query or not?
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
EXPLAIN select /*+ PARALLEL(test_table 2) */ * from test_table where col_6 = 'Submitted'
--------------------------------------------------------------------
"Seq Scan on test_table (cost=0.00..3858.50 rows=2633 width=928)"
" Filter: (col_6 = 'Submitted'::text)"
====================================================================
If i enable force_parallel_mode then it shows 'Workers Planned' but always value with 1. what is the wrong with my setting or DB to run parallel query?
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
set force_parallel_mode = on;
--------------------------------------------------------------------
EXPLAIN select /*+ PARALLEL(test_table 2) */ * from test_table where col_6 = 'Submitted'
--------------------------------------------------------------------
"Gather (cost=1000.00..5121.80 rows=2633 width=928)"
" Workers Planned: 1"
" Single Copy: true"
" -> Seq Scan on test_table (cost=0.00..3858.50 rows=2633 width=928)"
" Filter: (col_6 = 'Submitted'::text)"
====================================================================

Related

How to increment count id for each insert and every loop iteration?

I have for loop that should increment count_id for every query and each loop iteration. Here is example of my code:
qryCode = queryExecute("SELECT max(display_order) AS maxdisplay FROM type_ref",{},{datasource: application.datasource}); // Result set of the query is: 58
qryItems = queryExecute("SELECT DISTINCT type_id, label, type_shortcode FROM types gtr WHERE item_id = :item_id",{item_id: {cfsqltype: "cf_sql_numeric",value: arguments.item_id}},{datasource: application.datasource});
// Result set of qryItems:
TYPE_ID LABEL TYPE_SHORTCODE
1 2012-1 HOA
2 2012-1 HOC
5 2012-1 HOR
local.display_count = qryCode.maxdisplay;
for ( row in qryItems ) {
local.sqlInsert &= " INSERT INTO type_ref (display_order) VALUES (?) ";
qryParams.append({cfsqltype="CF_SQL_NUMERIC", value: display_count+1});
local.sqlInsert &= " INSERT INTO type_ref (display_order) VALUES (?) ";
qryParams.append({cfsqltype="CF_SQL_NUMERIC", value: display_count+2});
display_count++;
}
The code above will increment first two values correct (59 & 60) but for the second iteration it will start from 60 instead of 61. The code should produce count_id's int his order: 59,60,61,62,63,64. There are three records in qryItems. The qryCode has max value of 58. The first query in first iteration should start from 58 + 1 = 59. The next one should be 58 + 2 = 60. In the second iteration the first count_id should be 61 and so on. I'm not sure why the code I have above starts second iteration from 60 instead of 61. I do have this line that should increase the count_id at the end of each iteration: display_count++;.
It's because you're doing 2 inserts per iteration, therefore you should increment display_count by 2 instead of 1. So your for loop should look like this instead.
for ( row in qryItems ) {
local.sqlInsert &= " INSERT INTO type_ref (display_order) VALUES (?) ";
qryParams.append({cfsqltype="CF_SQL_NUMERIC", value: display_count+1});
local.sqlInsert &= " INSERT INTO type_ref (display_order) VALUES (?) ";
qryParams.append({cfsqltype="CF_SQL_NUMERIC", value: display_count+2});
display_count +=2;
}
How about
for ( row in qryItems ) {
local.sqlInsert &= " INSERT INTO type_ref (display_order) VALUES (?),(?) ";
qryParams.append({cfsqltype="CF_SQL_NUMERIC", value: ++display_count});
qryParams.append({cfsqltype="CF_SQL_NUMERIC", value: ++display_count});
}
Also see: Inserting multiple rows in a single SQL query?

Slow distance query in GeoDjango with PostGIS

I am using GeoDjango with Postgres 10 and PostGIS. I have two models as follows:
class Postcode(models.Model):
name = models.CharField(max_length=8, unique=True)
location = models.PointField(geography=True)
class Transaction(models.Model):
transaction_id = models.CharField(max_length=60)
price = models.IntegerField()
date_of_transfer = models.DateField()
postcode = models.ForeignKey(Postcode, on_delete=models.CASCADE)
property_type = models.CharField(max_length=1,blank=True)
street = models.CharField(blank=True, max_length=200)
class Meta:
indexes = [models.Index(fields=['-date_of_transfer',]),
models.Index(fields=['price',]),
]
Given a particular postcode, I would like to find the nearest transactions within a specified distance. To do this, I am using the following code:
transactions = Transaction.objects.filter(price__gte=min_price) \
.filter(postcode__location__distance_lte=(pc.location,D(mi=distance))) \
.annotate(distance=Distance('postcode__location',pc.location)).order_by('distance')[0:25]
The query runs slowly taking about 20 - 60 seconds (depending on filter criteria) on a Windows PC i5 2500k with 16GB RAM. If I order by date_of_transfer then it runs in <1 second for larger distances (over 1 mile) but is still slow for small distances (e.g. 45 seconds for a distance of 0.1m).
So far I have tried:
* changing the location field from Geometry to Geography
* using dwithin instead of distance_lte
Neither of these had more than a marginal impact on the speed of the query.
The SQL generated by GeoDjango for the current version is:
SELECT "postcodes_transaction"."id",
"postcodes_transaction"."transaction_id",
"postcodes_transaction"."price",
"postcodes_transaction"."date_of_transfer",
"postcodes_transaction"."postcode_id",
"postcodes_transaction"."street",
ST_Distance("postcodes_postcode"."location",
ST_GeogFromWKB('\x0101000020e6100000005471e316f3bfbf4ad05fe811c14940'::bytea)) AS "distance"
FROM "postcodes_transaction" INNER JOIN "postcodes_postcode"
ON ("postcodes_transaction"."postcode_id" = "postcodes_postcode"."id")
WHERE ("postcodes_transaction"."price" >= 50000
AND ST_Distance("postcodes_postcode"."location", ST_GeomFromEWKB('\x0101000020e6100000005471e316f3bfbf4ad05fe811c14940'::bytea)) <= 1609.344
AND "postcodes_transaction"."date_of_transfer" >= '2000-01-01'::date
AND "postcodes_transaction"."date_of_transfer" <= '2017-10-01'::date)
ORDER BY "distance" ASC LIMIT 25
On the postcodes table, there is an index on the location field as follows:
CREATE INDEX postcodes_postcode_location_id
ON public.postcodes_postcode
USING gist
(location);
The transaction table has 22 million rows and the postcode table has 2.5 million rows. Any suggestions on what approaches I can take to improve the performance of this query?
Here is the query plan for reference:
"Limit (cost=2394838.01..2394840.93 rows=25 width=76) (actual time=19028.400..19028.409 rows=25 loops=1)"
" Output: postcodes_transaction.id, postcodes_transaction.transaction_id, postcodes_transaction.price, postcodes_transaction.date_of_transfer, postcodes_transaction.postcode_id, postcodes_transaction.street, (_st_distance(postcodes_postcode.location, '0101 (...)"
" -> Gather Merge (cost=2394838.01..2893397.65 rows=4273070 width=76) (actual time=19028.399..19028.407 rows=25 loops=1)"
" Output: postcodes_transaction.id, postcodes_transaction.transaction_id, postcodes_transaction.price, postcodes_transaction.date_of_transfer, postcodes_transaction.postcode_id, postcodes_transaction.street, (_st_distance(postcodes_postcode.location, (...)"
" Workers Planned: 2"
" Workers Launched: 2"
" -> Sort (cost=2393837.99..2399179.33 rows=2136535 width=76) (actual time=18849.396..18849.449 rows=387 loops=3)"
" Output: postcodes_transaction.id, postcodes_transaction.transaction_id, postcodes_transaction.price, postcodes_transaction.date_of_transfer, postcodes_transaction.postcode_id, postcodes_transaction.street, (_st_distance(postcodes_postcode.loc (...)"
" Sort Key: (_st_distance(postcodes_postcode.location, '0101000020e6100000005471e316f3bfbf4ad05fe811c14940'::geography, '0'::double precision, true))"
" Sort Method: quicksort Memory: 1013kB"
" Worker 0: actual time=18615.809..18615.948 rows=577 loops=1"
" Worker 1: actual time=18904.700..18904.721 rows=576 loops=1"
" -> Hash Join (cost=699247.34..2074281.07 rows=2136535 width=76) (actual time=10705.617..18841.448 rows=5573 loops=3)"
" Output: postcodes_transaction.id, postcodes_transaction.transaction_id, postcodes_transaction.price, postcodes_transaction.date_of_transfer, postcodes_transaction.postcode_id, postcodes_transaction.street, _st_distance(postcodes_postcod (...)"
" Inner Unique: true"
" Hash Cond: (postcodes_transaction.postcode_id = postcodes_postcode.id)"
" Worker 0: actual time=10742.668..18608.763 rows=5365 loops=1"
" Worker 1: actual time=10749.748..18897.838 rows=5522 loops=1"
" -> Parallel Seq Scan on public.postcodes_transaction (cost=0.00..603215.80 rows=6409601 width=68) (actual time=0.052..4214.812 rows=5491618 loops=3)"
" Output: postcodes_transaction.id, postcodes_transaction.transaction_id, postcodes_transaction.price, postcodes_transaction.date_of_transfer, postcodes_transaction.postcode_id, postcodes_transaction.street"
" Filter: ((postcodes_transaction.price >= 50000) AND (postcodes_transaction.date_of_transfer >= '2000-01-01'::date) AND (postcodes_transaction.date_of_transfer <= '2017-10-01'::date))"
" Rows Removed by Filter: 2025049"
" Worker 0: actual time=0.016..4226.643 rows=5375779 loops=1"
" Worker 1: actual time=0.016..4188.138 rows=5439515 loops=1"
" -> Hash (cost=682252.00..682252.00 rows=836667 width=36) (actual time=10654.921..10654.921 rows=1856 loops=3)"
" Output: postcodes_postcode.location, postcodes_postcode.id"
" Buckets: 131072 Batches: 16 Memory Usage: 1032kB"
" Worker 0: actual time=10692.068..10692.068 rows=1856 loops=1"
" Worker 1: actual time=10674.101..10674.101 rows=1856 loops=1"
" -> Seq Scan on public.postcodes_postcode (cost=0.00..682252.00 rows=836667 width=36) (actual time=5058.685..10651.176 rows=1856 loops=3)"
" Output: postcodes_postcode.location, postcodes_postcode.id"
" Filter: (_st_distance(postcodes_postcode.location, '0101000020e6100000005471e316f3bfbf4ad05fe811c14940'::geography, '0'::double precision, true) <= '1609.344'::double precision)"
" Rows Removed by Filter: 2508144"
" Worker 0: actual time=5041.442..10688.265 rows=1856 loops=1"
" Worker 1: actual time=5072.242..10670.215 rows=1856 loops=1"
"Planning time: 0.538 ms"
"Execution time: 19065.962 ms"

Siddhi - Fetching from Event tables, which are not updated within certain time

In Siddhi query, I am importing two stream S1 and S2. If I receive in S1 stream I will insert in event table T1, and when I receive in S2 I will update in the T1 table based on the id, and also I will send the updated values from the table into Output stream O1.
As a part of the requirement, I need to get the content which table T1, which is inserted before 5 min(ie, if a record resides more than 5 min) and send to another output stream O2.
#name('S1')
from S1
select id, srcId, 'null' as msgId, 'INP' as status
insert into StatusTable;
#name('S2')
from S2#window.time(1min) as g join StatusTable[t.status == 'INP'] as t
on ( g.srcId == t.id)
select t.id as id, g.msgId as msgId, 'CMP' as status
update StatusTable on TradeStatusTable.id == id;
#name('Publish')
from S2 as g join StatusTable[t.status == 'CMP'] as t on ( g.srcId == t.id and t.status == 'CMP')
select t.id as id, t.msgId as msgId, t.status as status
insert into O1;
How to add a query in this existing query to fetch the records from TradeStatus table, which receides more than 5 minutes. Since the table cannot be used alone, I need to join it with a stream, how to do this scenario?
String WebAttackSuccess = "" +
"#info(name = 'found_host_charged1') "+
"from ATDEventStream[ rid == 10190001 ]#window.timeBatch(10 sec) as a1 "+
"join ATDEventStream[ rid == 10180004 ]#window.time(10 sec) as a2 on a2.src_ip == a1.src_ip and a2.dst_ip == a1.dst_ip " +
" select UUID() as uuid,1007 as cid,a1.sensor_id as sensor_id,a1.interface_id as interface_id,a1.other_id as other_id,count(a1.uuid) as event_num,min(a1.timestamp) as first_seen,max(a2.timestamp) as last_seen,'' as IOC,a1.dst_ip as victim,a1.src_ip as attacker,a1.uuid as NDE4,sample:sample(a2.uuid) as Sample_NDE4 " +
" insert into found_host_charged1;"+
""+
"#info(name = 'found_host_charged2') "+
"from every a1 = found_host_charged1 " +
"-> a2 = ATDEventStream[dns_answers != ''] "+
"within 5 min "+
"select UUID() as uuid,1008 as cid,a2.sensor_id as sensor_id,a2.interface_id as interface_id,a2.other_id as other_id,count(a2.uuid) as event_num,a1.first_seen as first_seen,max(a2.timestamp) as last_seen,a2.dns_answers as IOC,a2.dst_ip as victim,a2.src_ip as attacker,a1.uuid as NDE5,sample:sample(a2.uuid) as Sample_NDE5 " +
"insert into found_host_charged2; ";
This is part of my work,i use two stream,maybe you can get the data from StatusTable in your second stream.If not yet resolved,you can change StatusTable to S1.

How can I use C++ to update an SQLite row relative to its original value?

I am trying to update a row in a table in an SQLite database using C++, but I want to update it relative to its current value.
This is what I have tried so far:
int val=argv[2];
string bal = "UPDATE accounts SET balance = balance + " + argv[1] + "WHERE account_id = " + bal + argv[2];
if (sqlite3_open("bank.db", &db) == SQLITE_OK)
{
sqlite3_prepare( db, balance.c_str(), -1, &stmt, NULL );//preparing the statement
sqlite3_step( stmt );//executing the statement
}
So that the first parameter is the account_id, and the second parameter is the current balance.
However, this does not work. What can I do to have the database successfully update?
Thank you!
EDIT: Sorry for the confusion. The primary situation is having a table with many entries, each with a unique account id. For example, one has an id of 1 with a balance of 5.
If I run this program with the parameters "1 5", the balance should now be 10. If I run it again with "1 7", it should be 17.
You cannot use the + operator to concatenate C-style strings and string literals. A quick and dirty fix:
string bal = string("UPDATE accounts SET balance = balance + ") + argv[1] + string( " WHERE account_id = " ) + argv[2];

Select nth to nth row while table still have values unselected with python and pyodbc

I have a table with 10,000 rows and I want to select the first 1000 rows and then select again and this time, the next set of rows, which is 1001-2001.
I am using the BETWEEN clause in order to select the range of values. I can also increment the values. Here is my code:
count = cursor.execute("select count(*) from casa4").fetchone()[0]
ctr = 1
ctr1 = 1000
str1 = ''
while ctr1 <= count:
sql = "SELECT AccountNo FROM ( \
SELECT AccountNo, ROW_NUMBER() OVER (ORDER BY Accountno) rownum \
FROM casa4 ) seq \
WHERE seq.rownum BETWEEN " + str(ctr) + " AND " + str(ctr1) + ""
ctr = ctr1 + 1
ctr1 = ctr1 + 1000
cursor.execute(sql)
sleep(2) #interval in printing of the rows.
for row in cursor:
str1 = str1 + '|'.join(map(str,row)) + '\n'
print "Records:" + str1 #var in storing the fetched rows from database.
print sql #prints the sql statement(str) and I can see that the var, ctr and ctr1 have incremented correctly. The way I want it.
What I want to achieve is using a messaging queue, RabbitMQ, I will send this rows to another database and I want to speed up the process. Selecting all and sending it to the queue returns an error.
The output of the code is that it returns 1-1000 rows correctly on the 1st but, on the 2nd loop, instead of 1001-2001 rows, it returns 1-2001 rows, 1-3001 and so on.. It always starts on 1.
I was able to recreate your issue with both pyodbc and pypyodbc. I also tried using
WITH seq (AccountNo, rownum) AS
(
SELECT AccountNo, ROW_NUMBER() OVER (ORDER BY Accountno) rownum
FROM casa4
)
SELECT AccountNo FROM seq
WHERE rownum BETWEEN 11 AND 20
When I run that in SSMS I just get rows 11 through 20, but when I run it from Python I get all the rows (starting from 1).
The following code does work using pyodbc. It uses a temporary table named #numbered, and might be helpful in your situation since your process looks like it would do all of its work using the same database connection:
import pyodbc
cnxn = pyodbc.connect("DSN=myDb_SQLEXPRESS")
crsr = cnxn.cursor()
sql = """\
CREATE TABLE #numbered (rownum INT PRIMARY KEY, AccountNo VARCHAR(10))
"""
crsr.execute(sql)
cnxn.commit()
sql = """\
INSERT INTO #numbered (rownum, AccountNo)
SELECT
ROW_NUMBER() OVER (ORDER BY Accountno) AS rownum,
AccountNo
FROM casa4
"""
crsr.execute(sql)
cnxn.commit()
sql = "SELECT AccountNo FROM #numbered WHERE rownum BETWEEN ? AND ? ORDER BY rownum"
batchsize = 1000
ctr = 1
while True:
crsr.execute(sql, [ctr, ctr + batchsize - 1])
rows = crsr.fetchall()
if len(rows) == 0:
break
print("-----")
for row in rows:
print(row)
ctr += batchsize
cnxn.close()