SQL Comparison to a value in the next row

SQL Comparison to a value in the next row - compare

I have been a long time reader of this forum, it has helped me a lot, however I have a question which I cant find a solution specific to my requirements, so this is the first time I have had to ask anything.
I have a select statement which returns meter readings sorted by date (the newest readings at the top), in 99.9% of cases the meter readings always go up as the date moves on, however due to system errors occasionally some go down, I need to identify instances where the reading in the row below (previous reading) is GREATER than the latest reading (Current cell)
I have come across the LEAD function, however its only in Oracle or SS-MS-2012, I'm using SS-MS-2008.
Here is a simplified version of my select statment:
SELECT Devices.SerialNumber,
MeterReadings.ScanDateTime,
MeterReadings.TotalMono,
MeterReadings.TotalColour
FROM dbo.MeterReadings AS MeterReadings
JOIN DBO.Devices AS Devices
ON MeterReadings.DeviceID = Devices.DeviceID
WHERE Devices.serialnumber = 'ANY GIVEN DEVICE SERIAL NUMBER'
AND Meterreadings.Scandatetime > 'ANY GIVEN SCAN DATE TIME'
ORDER BY MeterReadings.ScanDateTime DESC, Devices.SerialNumber ASC

This is the code I used in the end
WITH readings AS
(
SELECT
d.SerialNumber
, m.TotalMono
, m.TotalColour
, m.ScanDateTime
FROM dbo.MeterReadings m
INNER JOIN dbo.Devices d ON m.DeviceId = d.DeviceId
WHERE m.ScanDateTime > '2012-01-01'
)
SELECT top 1 *
FROM readings r
LEFT JOIN readings p ON p.SerialNumber = r.SerialNumber
and p.ScanDateTime < r.ScanDateTime
and p.TotalMono > r.TotalMono
order by r.serialnumber, p.TotalMono desc, r.TotalMono asc

Try something like this.
;WITH readings AS
(
SELECT
d.SerialNumber
, m.TotalMono
, m.TotalColour
, m.ScanDateTime
FROM dbo.MeterReadings m
INNER JOIN dbo.Devices d ON m.DeviceId = d.DeviceId
)
SELECT *
FROM readings r
LEFT JOIN readings p ON p.SerialNumber = r.SerialNumber
AND p.ScanDateTime < r.ScanDateTime
WHERE p.reading > r.reading

Related

Redshift Result size exceeds LISTAGG limit on svl_statementtext

Trying to reconstruct my query history from svl_statementtext using listagg.
Getting error :
Result size exceeds LISTAGG limit (limit: 65535)
However, I cannot see how or where I have exceeded limit.
My failing query :
SELECT pid,xid, min(starttime) AS starttime,
pg_catalog.listagg(
CASE WHEN (len(rtrim(("text")::text)) = 0) THEN ("text")::text ELSE rtrim(("text")::text) END
, ''::text
) WITHIN GROUP(ORDER BY "sequence")
AS query_statement
FROM svl_statementtext
GROUP BY pid,xid
HAVING min(starttime) >= '2022-06-27 10:00:00';
After the fail, I checked to see if I could find where the excessive size was coming from :
SELECT pid,xid, min(starttime) AS starttime,
SUM(OCTET_LENGTH(
CASE WHEN (len(rtrim(("text")::text)) = 0) THEN ("text")::text ELSE rtrim(("text")::text) END
)) as total_bytes
FROM svl_statementtext
GROUP BY pid,xid
HAVING min(starttime) >= '2022-06-27 10:00:00'
ORDER BY total_bytes desc;
However the largest size that this query reports is 2962
So how/why is listagg complaining about 65535 ??
Have seen some other posts mentioning using listaggdistinct, and catering for when the value being aggregated is null, but none seem to change my problem.
Any guidance appreciated :)

The longest string that Redshift can hold is 64K bytes. Listagg() is likely generating a string longer than this. The "text" column in svl_statementtext is 200 characters so if you have more than 319 segments you can overflow this string size.
The other issue I see is that your query will combine multiple statements into one string. You are only grouping by xid and pid which will give you all statements for a transaction. Add starttime to your group by list and this will break different statements into different results.
Also remember that xid and pid values repeat every few days so have some date range limit can help prevent a lot of confusion.
You need to add
where sequence < 320
to your query and also group by starttime.
Here's a query I have used to put together statements in Redshift:
select xid, pid, starttime, max(datediff('sec',starttime,endtime)) as runtime, type, listagg(regexp_replace(text,'\\\\n*',' ')) WITHIN GROUP (ORDER BY sequence) || ';' as querytext
from svl_statementtext
where pid = (SELECT pg_backend_pid()) --current session
and sequence < 320
and starttime > getdate() - interval '24 hours'
group by starttime, 1, 2, "type" order by starttime, 1 asc, "type" desc ;

DAX function that check if value is in range and return corresponding value in other table

I have two tables: a "Range" table that has the ranges and its corresponding values and a "Data" table that has the values I want to check against the ranges.
My Data table is as follows, but it has more rows than the Range table. The "PM" values are the ones that I need to check.
Location PM
B01 1,05
B02 1,04888
B15 1,05787
B16 1,05787
B03 2,03714
B04 2,03714
B09 2,03714
B10 2,03714
B17 2,03714
And the "Range" table is like this sample:
P.E P.S PV
0,48 1,03 10,00%
1,03 2,02 10,03%
2,02 3,63 8,87%
3,63 6,23 8,24%
6,23 10,17 7,62%
10,17 15,79 6,46%
15,79 22,37 5,75%
22,37 30,70 5,29%
30,70 41,27 4,99%
41,27 54,88 4,86%
54,88 71,57 4,65%
So, summing up, I need to create a DAX measure or column that checks if the PM value is between the PE and PS values and return the corresponding PV.
On excel, I managed to do this usind the LOOKUP function, as this function rounds the searched value to the nearest smaller value in the corresponding table to give a match. On Power Bi I coudn't find a way to replicate this.
Does someone knows if it's possible?
Thanks for all the help!

You need a measure as below-
respected_pv =
CALCULATE(
MAX(range[PV]),
FILTER(
ALL(range),
range[P.E] <= MIN(data[PM])
&& range[P.S] >= MIN(data[PM])
)
)
Here is the code for a Custom Column-
respected_pv_column =
CALCULATE(
MAX(range[PV]),
FILTER(
ALL(range),
range[P.E] <= data[PM]
&& range[P.S] >= data[PM]
)
)
Here below is the sample output. Remember, only first row get the PV as other ranges are not available in your sample data.

How to fix "Query returns and empty value" error in google spread sheet

I have the following code. I've read many threads on how to solve query returning empty value but none of them worked in my case.
I've also shared a workaround Google spreadsheet.
What I'm going to do is that search first tab by query and if it matched, show a green tick, otherwise show a red cross.
The code below works for green tick but in case of red cross, it shows an error.
How the code below is expected to operate:
It makes a query on tab1 and if student number is in column C (with some defined conditions) OR is in column D (without any condition), it shows a green tick, otherwise, it must insert a red cross.
if(OR(QUERY(LiveAttendanceForm!$A:$C,
"select C
where A >= timestamp '"&TEXT(D$1, "yyyy-MM-dd HH:mm:ss")&"'
and A <= timestamp '"&TEXT(D$2, "yyyy-MM-dd HH:mm:ss")&"'
and C = "&$C5, 0)=$C5,Query(LiveAttendanceForm!$A:$D,"select D where D = "&$C5,0)=$C5),"✅","❌")
I also tried using iferror function before the query, but it shows all fields as true and makes all cells green tick. I'll be very grateful if someone can help me fix this annoying issue! The share Google sheet's link is below:
https://docs.google.com/spreadsheets/d/1KfkA48OyOnZAPQAbtdEIs9AYaAPRFRFOpvQRM5QyPss/edit?usp=sharing

you will need to wrap your queries into IFERROR:
=IF(OR(IFERROR(QUERY(LiveAttendanceForm!$A:$C,
"select C
where A >= timestamp '"&TEXT(D$1, "yyyy-MM-dd HH:mm:ss")&"'
and A <= timestamp '"&TEXT(D$2, "yyyy-MM-dd HH:mm:ss")&"'
and C = "&$C4, 0))=$C4,
IFERROR(QUERY(LiveAttendanceForm!$A:$D,
"select D
where D = "&$C4, 0))=$C4), "✅", "❌")

Getting last rows by date on a large postgresl data base

I have a large database ( there are 6000 rows inserted per minute ) on a partitioned table, I was working cool when the database was small, now I have a large database. I was using this solution a previous solutions SQL joined by date but it is using 250MB of hard disk and grows up while my table grows, then I decide on change it to an iteration of simple queries, that works well for 10 rows, but get slow with more than 10 cars ( 15 secods for response ) and uses more than 200MB of hard disk.
My question is how to build a good query faster for resolve this issue
Extra info
Query is called on a django app by ajax
I was thinking on iterative ajax calls instead one call with full items list response
Tables are partitioned by day
My actual query is
CREATE OR REPLACE FUNCTION gps_get_last_positions (
_plates varchar(8)
)
RETURNS TABLE (
plate varchar,
device_id integer,
date_time_process timestamp with time zone,
latitude double precision,
longitude double precision,
course smallint,
speed smallint,
mileage integer,
gps_signal smallint,
gsm_signal smallint,
alarm_status boolean,
gsm_status boolean,
vehicle_status boolean,
alarm_over_speed boolean,
other text,
realtime double precision
) AS $func$
DECLARE
arr varchar[];
BEGIN
arr := regexp_split_to_array(_plates, E'\\s+');
FOR i IN 1..array_length(arr, 1) LOOP
RETURN QUERY SELECT
gpstracking_vehicles.registration,
gpstracking_device_tracks.device_id,
gpstracking_device_tracks.date_time_process,
gpstracking_device_tracks.latitude,
gpstracking_device_tracks.longitude,
gpstracking_device_tracks.course,
gpstracking_device_tracks.speed,
gpstracking_device_tracks.mileage,
gpstracking_device_tracks.gps_signal,
gpstracking_device_tracks.gsm_signal,
gpstracking_device_tracks.alarm_status,
gpstracking_device_tracks.gps_status,
gpstracking_device_tracks.vehicle_status,
gpstracking_device_tracks.alarm_over_speed,
gpstracking_device_tracks.other,
EXTRACT(EPOCH FROM current_timestamp - gpstracking_device_tracks.date_time_process)/60 AS realtime
FROM (
gpstracking_devices INNER JOIN (
gpstracking_vehicles INNER JOIN gpstracking_vehicles_devices ON ( gpstracking_vehicles.id = gpstracking_vehicles_devices.vehicle_id AND gpstracking_vehicles_devices.is_joined = TRUE )
) ON ( gpstracking_devices.id = gpstracking_vehicles_devices.device_id AND gpstracking_vehicles_devices.is_joined = TRUE )
) INNER JOIN gpstracking_device_tracks ON ( gpstracking_devices.id = gpstracking_device_tracks.device_id )
WHERE gpstracking_vehicles.registration = arr[i]::VARCHAR
ORDER BY gpstracking_device_tracks.date_time_process DESC
LIMIT 1;
END LOOP;
RETURN;
END;
$func$
LANGUAGE plpgsql VOLATILE SECURITY DEFINER;
Configuration params
application_name phpPgAdmin_5.0.4 client
constraint_exclusion on configuration file
DateStyle ISO, MDY session
default_text_search_config pg_catalog.english configuration file
external_pid_file /var/run/postgresql/9.1-main.pid configuration file
lc_messages en_US.UTF-8 configuration file
lc_monetary en_US.UTF-8 configuration file
lc_numeric en_US.UTF-8 configuration file
lc_time en_US.UTF-8 configuration file
log_line_prefix %t configuration file
log_timezone localtime environment variable
max_connections 100 configuration file
max_stack_depth 2MB environment variable
port 5432 configuration file
shared_buffers 24MB configuration file
ssl on configuration file
TimeZone localtime environment variable
unix_socket_directory /var/run/postgresql configuration file
My first slow query was:
CREATE OR REPLACE VIEW view_vehicle_devices AS
SELECT
gpstracking_vehicles_devices.id AS id,
gpstracking_devices.id AS device_id,
gpstracking_vehicles.id AS vehicle_id,
gpstracking_drivers.id AS driver_id,
gpstracking_device_protocols.name AS protocol,
gpstracking_vehicles.registration AS plate,
gpstracking_drivers.firstname as first_name,
gpstracking_drivers.lastname as last_name,
gpstracking_devices.imei,
gpstracking_devices.simcard,
gpstracking_device_tracks.date_time_process,
gpstracking_device_tracks.latitude,
gpstracking_device_tracks.longitude,
gpstracking_device_tracks.course,
gpstracking_device_tracks.speed,
gpstracking_device_tracks.mileage,
gpstracking_device_tracks.gps_signal,
gpstracking_device_tracks.gsm_signal,
gpstracking_device_tracks.alarm_status,
gpstracking_device_tracks.gps_status,
gpstracking_device_tracks.vehicle_status,
gpstracking_device_tracks.alarm_over_speed,
gpstracking_device_tracks.other,
gpstracking_device_tracks.point,
EXTRACT(EPOCH FROM current_timestamp - gpstracking_device_tracks.date_time_process)/60 realtime,
gpstracking_devices.created,
gpstracking_devices.updated,
gpstracking_devices.is_connected
FROM (
gpstracking_vehicles LEFT JOIN (
gpstracking_drivers LEFT JOIN gpstracking_vehicles_drivers ON gpstracking_drivers.id = gpstracking_vehicles_drivers.driver_id AND gpstracking_vehicles_drivers.is_joined = TRUE
) ON gpstracking_vehicles.id = gpstracking_vehicles_drivers.vehicle_id AND gpstracking_vehicles_drivers.is_joined = TRUE
) LEFT JOIN (((
gpstracking_device_protocols RIGHT JOIN gpstracking_devices ON gpstracking_device_protocols.id = gpstracking_devices.device_protocol_id
) LEFT JOIN (
SELECT DISTINCT ON (gpstracking_device_tracks.device_id) gpstracking_device_tracks.device_id,
gpstracking_device_tracks.date_time_process,
gpstracking_device_tracks.latitude,
gpstracking_device_tracks.longitude,
gpstracking_device_tracks.course,
gpstracking_device_tracks.speed,
gpstracking_device_tracks.mileage,
gpstracking_device_tracks.gps_signal,
gpstracking_device_tracks.gsm_signal,
gpstracking_device_tracks.alarm_status,
gpstracking_device_tracks.gps_status,
gpstracking_device_tracks.vehicle_status,
gpstracking_device_tracks.alarm_over_speed,
gpstracking_device_tracks.other,
gpstracking_device_tracks.point
FROM gpstracking_device_tracks
ORDER BY gpstracking_device_tracks.device_id, gpstracking_device_tracks.date_time_process DESC
) AS gpstracking_device_tracks ON gpstracking_devices.id = gpstracking_device_tracks.device_id
) LEFT JOIN gpstracking_vehicles_devices ON ( gpstracking_devices.id = gpstracking_vehicles_devices.device_id AND gpstracking_vehicles_devices.is_joined = TRUE )
) ON ( gpstracking_vehicles.id = gpstracking_vehicles_devices.vehicle_id AND gpstracking_vehicles_devices.is_joined = TRUE )
I have changed it for the loop that is starting my post, my loop rujns faster, however is not enought faster as I need

Your problem is that a planner can not know in which partition is an answer to your query. It only has statistics. So you do not benefit from partitioning your data by a day at all.
To benefit from it you can modify your query so it'll look for latest coordinates from current day, if not found then from yesterday, if not found from a day before an so on. I suppose 99% answers would be found in todays partition only.
Or you can partition by for example device_id % 256 instead.
But even better would be to create an additional table with several recent device coordinates only. It would be maintained with a trigger on gpstracking_device_tracks which will just do (pseudocode):
if random()*64 < 1.0 then
-- statistically once per 64 runs do a cleanup
with todelete as (
-- lock rows in particular order to avoid possible deadlocks
-- if run concurrently
select id from gpstracking_device_tracks_recent where device_id=?
order by id for share;
)
delete from gpstracking_device_tracks_recent
where id in (select id from todelete)
end if;
insert into gpstracking_device_tracks_recent (...) values (...);
And then look for latest coordinates in this much smaller table.

What is the best way to populate a load file for a date lookup dimension table?

Informix 11.70.TC4:
I have an SQL dimension table which is used for looking up a date (pk_date) and returning another date (plus1, plus2 or plus3_months) to the client, depending on whether the user selects a "1","2" or a "3".
The table schema is as follows:
TABLE date_lookup
(
pk_date DATE,
plus1_months DATE,
plus2_months DATE,
plus3_months DATE
);
UNIQUE INDEX on date_lookup(pk_date);
I have a load file (pipe delimited) containing dates from 01-28-2012 to 03-31-2014.
The following is an example of the load file:
01-28-2012|02-28-2012|03-28-2012|04-28-2012|
01-29-2012|02-29-2012|03-29-2012|04-29-2012|
01-30-2012|02-29-2012|03-30-2012|04-30-2012|
01-31-2012|02-29-2012|03-31-2012|04-30-2012|
...
03-31-2014|04-30-2014|05-31-2014|06-30-2014|
........................................................................................
EDIT : Sir Jonathan's SQL statement using DATE(pk_date + n UNITS MONTH on 11.70.TC5 worked!
I generated a load file with pk_date's from 01-28-2012 to 12-31-2020, and plus1, plus2 & plus3_months NULL. Loaded this into date_lookup table, then executed the update statement below:
UPDATE date_lookup
SET plus1_months = DATE(pk_date + 1 UNITS MONTH),
plus2_months = DATE(pk_date + 2 UNITS MONTH),
plus3_months = DATE(pk_date + 3 UNITS MONTH);
Apparently, DATE() was able to convert pk_date to DATETIME, do the math with TC5's new algorithm, and return the result in DATE format!
.........................................................................................
The rules for this dimension table are:
If pk_date has 31 days in its month and plus1, plus2 or plus3_months only have 28, 29, or 30 days, then let plus1, plus2 or plus3 equal the last day of that month.
If pk_date has 30 days in its month and plus1, plus2 or plus3 has 28 or 29 days in its month, let them equal the last valid date of those month, and so on.
All other dates fall on the same day of the following month.
My question is: What is the best way to automatically generate pk_dates past 03-31-2014 following the above rules? Can I accomplish this with an SQL script, "sed", C program?
EDIT: I mentioned sed because I already have more than two years worth of data and
could perhaps model the rest after this data, or perhaps a tool like awk is better?

The best technique would be to upgrade to 11.70.TC5 (on 32-bit Windows; generally to 11.70.xC5 or later) and use an expression such as:
SELECT DATE(given_date + n UNITS MONTH)
FROM Wherever
...
The DATETIME code was modified between 11.70.xC4 and 11.70.xC5 to generate dates according to the rules you outline when the dates are as described and you use the + n UNITS MONTH or equivalent notation.
This obviates the need for a table at all. Clearly, though, all your clients would also have to be on 11.70.xC5 too.
Maybe you can update your development machine to 11.70.xC5 and then use this property to generate the data for the table on your development machine, and distribute the data to your clients.
If upgrading at least someone to 11.70.xC5 is not an option, then consider the Perl script suggestion.

Can it be done with SQL? Probably, but it would be excruciating. Ditto for C, and I think 'no' is the answer for sed.
However, a couple of dozen lines of perl seems to produce what you need:
#!/usr/bin/perl
use strict;
use warnings;
use DateTime;
my #dates;
# parse arguments
while (my $datep = shift){
my ($m,$d,$y) = split('-', $datep);
push(#dates, DateTime->new(year => $y, month => $m, day => $d))
|| die "Cannot parse date $!\n";
}
open(STDOUT, ">", "output.unl") || die "Unable to create output file.";
my ($date, $end) = #dates;
while( $date < $end ){
my #row = ($date->mdy('-')); # start with pk_date
for my $mth ( qw[ 1 2 3 ] ){
my $fut_d = $date->clone->add(months => $mth);
until (
($fut_d->month == $date->month + $mth
&& $fut_d->year == $date->year) ||
($fut_d->month == $date->month + $mth - 12
&& $fut_d->year > $date->year)
){
$fut_d->subtract(days => 1); # step back until criteria met
}
push(#row, $fut_d->mdy('-'));
}
print STDOUT join("|", #row, "\n");
$date->add(days => 1);
}
Save that as futuredates.pl, chmod +x it and execute like this:
$ futuredates.pl 04-01-2014 12-31-2020
That seems to do the trick for me.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

SQL Comparison to a value in the next row - compare

Related

Redshift Result size exceeds LISTAGG limit on svl_statementtext

DAX function that check if value is in range and return corresponding value in other table

How to fix "Query returns and empty value" error in google spread sheet

Getting last rows by date on a large postgresl data base

What is the best way to populate a load file for a date lookup dimension table?

Categories

Resources