This is the code I'm running in Python. The table has been created in the DB already. I'm doing a commit, so I don't know why it's working.
The code executes just fine, but no data is inserted into the table. I ran the same insert statement directly via sqlite command line and it worked just fine.
import os
import sqlite3
current_dir = os.path.dirname(__file__)
db_file = os.path.join(current_dir, '../data/trips.db')
trips_db = sqlite3.connect(db_file)
c = trips_db.cursor()
print 'inserting data into aggregate tables'
c.execute(
'''
insert into route_agg_data
select
pickup_loc_id || ">" || dropoff_loc_id as ride_route,
count(*) as rides_count
from trip_data
group by
pickup_loc_id || ">" || dropoff_loc_id
'''
)
trips_db.commit
trips_db.close
I changed the last 2 lines of my code to this:
trips_db.commit()
trips_db.close()
Thanks #thesilkworm
Could you write stored procedure inside sqlite:
Then Try In Python:
cur = connection.cursor()
cur.callproc('insert_into_route_agg_data', [request.data['value1'],
request.data['value2'], request.data['value3'] ] )
results = cur.fetchone()
cur.close()
I am trying to create a MonetDB database that shall hold 100k columns and approximately 2M rows of smallint type.
To generate 100k columns I am using a C code, i.e., a loop that performs the following sql request:
ALTER TABLE test ADD COLUMN s%d SMALLINT;
where %d is a number from 1 till 100000.
I observed that after 80000 sql requests each transaction takes about 15s, meaning that I need a lot of time to complete the table creation.
Could you tell me if there is a simple way of creating 100k columns?
Also, do you know what exactly what is going on with MonetDB?
You should use only one create table
in script shell (bash) :
#!/bin/bash
fic="/tmp/100k.sql"
col=1
echo "CREATE TABLE bigcol (" > $fic
while [[ $col -lt 100000 ]]
do
echo "field$col SMALLINT," >> $fic
col=$(($col + 1))
done
echo "field$col SMALLINT);" >> $fic
And in command line :
sh 100k.sh
mclient yourbdd < /tmp/100k.sql
wait about 2 minutes :D
mclient yourbdd
> \d bigcol
[ ... ... ...]
"field99997" SMALLINT,
"field99998" SMALLINT,
"field99999" SMALLINT,
"field100000" SMALLINT
);
DROP TABLE bigcol is against very very long. I do not know why.
I also think it is not a good idea, but it answer your question.
Pierre
I found isql(using subprocess) is taking less time compare to Sybase module in python.
Could someone please suggest me, should I use subprocess or Sybase.
Below is the small test script which I have used for my understanding.
Query = 'select count(*) from my_table'
start_time1 = datetime.now()
db = Sybase.connect(mdbserver,muserid,mpassword,mdatabase)
c = db.cursor()
c.execute(Query)
list1 = c.fetchall()
end_time1 = datetime.now()
print (end_time1-start_time1)
start_time2 = datetime.now()
command = "./isql -S "+mdbserver+" -U "+muserid+" -P "+mpassword+" -D "+mdatabase+" -s '"+Delimiter+"' --retserverror -w 99999 <<EOF\nSET NOCOUNT ON\n "+Query+"\ngo\nEOF"
proc = subprocess.Popen(
command,
stdout=subprocess.PIPE,stderr=subprocess.PIPE,
shell=True,
cwd=sybase_bin
)
output, error = proc.communicate()
end_time2 = datetime.now()
print (end_time2 - start_time2)
isql is intended for interactive access to the database, and it returns data formatted for screen output. There is additional padding and formatting that can't be directly controlled. It also does not work well when you are looking at binary/image or other non varchar data.
The Python Module will pull the data as expected, without additional formatting.
So as long as you are only pulling columns that aren't too wide, or have binary data, then you can probably get away with using subprocess. The better solution would be to use the python module.
I have a large database ( there are 6000 rows inserted per minute ) on a partitioned table, I was working cool when the database was small, now I have a large database. I was using this solution a previous solutions SQL joined by date but it is using 250MB of hard disk and grows up while my table grows, then I decide on change it to an iteration of simple queries, that works well for 10 rows, but get slow with more than 10 cars ( 15 secods for response ) and uses more than 200MB of hard disk.
My question is how to build a good query faster for resolve this issue
Extra info
Query is called on a django app by ajax
I was thinking on iterative ajax calls instead one call with full items list response
Tables are partitioned by day
My actual query is
CREATE OR REPLACE FUNCTION gps_get_last_positions (
_plates varchar(8)
)
RETURNS TABLE (
plate varchar,
device_id integer,
date_time_process timestamp with time zone,
latitude double precision,
longitude double precision,
course smallint,
speed smallint,
mileage integer,
gps_signal smallint,
gsm_signal smallint,
alarm_status boolean,
gsm_status boolean,
vehicle_status boolean,
alarm_over_speed boolean,
other text,
realtime double precision
) AS $func$
DECLARE
arr varchar[];
BEGIN
arr := regexp_split_to_array(_plates, E'\\s+');
FOR i IN 1..array_length(arr, 1) LOOP
RETURN QUERY SELECT
gpstracking_vehicles.registration,
gpstracking_device_tracks.device_id,
gpstracking_device_tracks.date_time_process,
gpstracking_device_tracks.latitude,
gpstracking_device_tracks.longitude,
gpstracking_device_tracks.course,
gpstracking_device_tracks.speed,
gpstracking_device_tracks.mileage,
gpstracking_device_tracks.gps_signal,
gpstracking_device_tracks.gsm_signal,
gpstracking_device_tracks.alarm_status,
gpstracking_device_tracks.gps_status,
gpstracking_device_tracks.vehicle_status,
gpstracking_device_tracks.alarm_over_speed,
gpstracking_device_tracks.other,
EXTRACT(EPOCH FROM current_timestamp - gpstracking_device_tracks.date_time_process)/60 AS realtime
FROM (
gpstracking_devices INNER JOIN (
gpstracking_vehicles INNER JOIN gpstracking_vehicles_devices ON ( gpstracking_vehicles.id = gpstracking_vehicles_devices.vehicle_id AND gpstracking_vehicles_devices.is_joined = TRUE )
) ON ( gpstracking_devices.id = gpstracking_vehicles_devices.device_id AND gpstracking_vehicles_devices.is_joined = TRUE )
) INNER JOIN gpstracking_device_tracks ON ( gpstracking_devices.id = gpstracking_device_tracks.device_id )
WHERE gpstracking_vehicles.registration = arr[i]::VARCHAR
ORDER BY gpstracking_device_tracks.date_time_process DESC
LIMIT 1;
END LOOP;
RETURN;
END;
$func$
LANGUAGE plpgsql VOLATILE SECURITY DEFINER;
Configuration params
application_name phpPgAdmin_5.0.4 client
constraint_exclusion on configuration file
DateStyle ISO, MDY session
default_text_search_config pg_catalog.english configuration file
external_pid_file /var/run/postgresql/9.1-main.pid configuration file
lc_messages en_US.UTF-8 configuration file
lc_monetary en_US.UTF-8 configuration file
lc_numeric en_US.UTF-8 configuration file
lc_time en_US.UTF-8 configuration file
log_line_prefix %t configuration file
log_timezone localtime environment variable
max_connections 100 configuration file
max_stack_depth 2MB environment variable
port 5432 configuration file
shared_buffers 24MB configuration file
ssl on configuration file
TimeZone localtime environment variable
unix_socket_directory /var/run/postgresql configuration file
My first slow query was:
CREATE OR REPLACE VIEW view_vehicle_devices AS
SELECT
gpstracking_vehicles_devices.id AS id,
gpstracking_devices.id AS device_id,
gpstracking_vehicles.id AS vehicle_id,
gpstracking_drivers.id AS driver_id,
gpstracking_device_protocols.name AS protocol,
gpstracking_vehicles.registration AS plate,
gpstracking_drivers.firstname as first_name,
gpstracking_drivers.lastname as last_name,
gpstracking_devices.imei,
gpstracking_devices.simcard,
gpstracking_device_tracks.date_time_process,
gpstracking_device_tracks.latitude,
gpstracking_device_tracks.longitude,
gpstracking_device_tracks.course,
gpstracking_device_tracks.speed,
gpstracking_device_tracks.mileage,
gpstracking_device_tracks.gps_signal,
gpstracking_device_tracks.gsm_signal,
gpstracking_device_tracks.alarm_status,
gpstracking_device_tracks.gps_status,
gpstracking_device_tracks.vehicle_status,
gpstracking_device_tracks.alarm_over_speed,
gpstracking_device_tracks.other,
gpstracking_device_tracks.point,
EXTRACT(EPOCH FROM current_timestamp - gpstracking_device_tracks.date_time_process)/60 realtime,
gpstracking_devices.created,
gpstracking_devices.updated,
gpstracking_devices.is_connected
FROM (
gpstracking_vehicles LEFT JOIN (
gpstracking_drivers LEFT JOIN gpstracking_vehicles_drivers ON gpstracking_drivers.id = gpstracking_vehicles_drivers.driver_id AND gpstracking_vehicles_drivers.is_joined = TRUE
) ON gpstracking_vehicles.id = gpstracking_vehicles_drivers.vehicle_id AND gpstracking_vehicles_drivers.is_joined = TRUE
) LEFT JOIN (((
gpstracking_device_protocols RIGHT JOIN gpstracking_devices ON gpstracking_device_protocols.id = gpstracking_devices.device_protocol_id
) LEFT JOIN (
SELECT DISTINCT ON (gpstracking_device_tracks.device_id) gpstracking_device_tracks.device_id,
gpstracking_device_tracks.date_time_process,
gpstracking_device_tracks.latitude,
gpstracking_device_tracks.longitude,
gpstracking_device_tracks.course,
gpstracking_device_tracks.speed,
gpstracking_device_tracks.mileage,
gpstracking_device_tracks.gps_signal,
gpstracking_device_tracks.gsm_signal,
gpstracking_device_tracks.alarm_status,
gpstracking_device_tracks.gps_status,
gpstracking_device_tracks.vehicle_status,
gpstracking_device_tracks.alarm_over_speed,
gpstracking_device_tracks.other,
gpstracking_device_tracks.point
FROM gpstracking_device_tracks
ORDER BY gpstracking_device_tracks.device_id, gpstracking_device_tracks.date_time_process DESC
) AS gpstracking_device_tracks ON gpstracking_devices.id = gpstracking_device_tracks.device_id
) LEFT JOIN gpstracking_vehicles_devices ON ( gpstracking_devices.id = gpstracking_vehicles_devices.device_id AND gpstracking_vehicles_devices.is_joined = TRUE )
) ON ( gpstracking_vehicles.id = gpstracking_vehicles_devices.vehicle_id AND gpstracking_vehicles_devices.is_joined = TRUE )
I have changed it for the loop that is starting my post, my loop rujns faster, however is not enought faster as I need
Your problem is that a planner can not know in which partition is an answer to your query. It only has statistics. So you do not benefit from partitioning your data by a day at all.
To benefit from it you can modify your query so it'll look for latest coordinates from current day, if not found then from yesterday, if not found from a day before an so on. I suppose 99% answers would be found in todays partition only.
Or you can partition by for example device_id % 256 instead.
But even better would be to create an additional table with several recent device coordinates only. It would be maintained with a trigger on gpstracking_device_tracks which will just do (pseudocode):
if random()*64 < 1.0 then
-- statistically once per 64 runs do a cleanup
with todelete as (
-- lock rows in particular order to avoid possible deadlocks
-- if run concurrently
select id from gpstracking_device_tracks_recent where device_id=?
order by id for share;
)
delete from gpstracking_device_tracks_recent
where id in (select id from todelete)
end if;
insert into gpstracking_device_tracks_recent (...) values (...);
And then look for latest coordinates in this much smaller table.
I have been a long time reader of this forum, it has helped me a lot, however I have a question which I cant find a solution specific to my requirements, so this is the first time I have had to ask anything.
I have a select statement which returns meter readings sorted by date (the newest readings at the top), in 99.9% of cases the meter readings always go up as the date moves on, however due to system errors occasionally some go down, I need to identify instances where the reading in the row below (previous reading) is GREATER than the latest reading (Current cell)
I have come across the LEAD function, however its only in Oracle or SS-MS-2012, I'm using SS-MS-2008.
Here is a simplified version of my select statment:
SELECT Devices.SerialNumber,
MeterReadings.ScanDateTime,
MeterReadings.TotalMono,
MeterReadings.TotalColour
FROM dbo.MeterReadings AS MeterReadings
JOIN DBO.Devices AS Devices
ON MeterReadings.DeviceID = Devices.DeviceID
WHERE Devices.serialnumber = 'ANY GIVEN DEVICE SERIAL NUMBER'
AND Meterreadings.Scandatetime > 'ANY GIVEN SCAN DATE TIME'
ORDER BY MeterReadings.ScanDateTime DESC, Devices.SerialNumber ASC
This is the code I used in the end
WITH readings AS
(
SELECT
d.SerialNumber
, m.TotalMono
, m.TotalColour
, m.ScanDateTime
FROM dbo.MeterReadings m
INNER JOIN dbo.Devices d ON m.DeviceId = d.DeviceId
WHERE m.ScanDateTime > '2012-01-01'
)
SELECT top 1 *
FROM readings r
LEFT JOIN readings p ON p.SerialNumber = r.SerialNumber
and p.ScanDateTime < r.ScanDateTime
and p.TotalMono > r.TotalMono
order by r.serialnumber, p.TotalMono desc, r.TotalMono asc
Try something like this.
;WITH readings AS
(
SELECT
d.SerialNumber
, m.TotalMono
, m.TotalColour
, m.ScanDateTime
FROM dbo.MeterReadings m
INNER JOIN dbo.Devices d ON m.DeviceId = d.DeviceId
)
SELECT *
FROM readings r
LEFT JOIN readings p ON p.SerialNumber = r.SerialNumber
AND p.ScanDateTime < r.ScanDateTime
WHERE p.reading > r.reading