Getting last rows by date on a large postgresl data base

Getting last rows by date on a large postgresl data base - django

I have a large database ( there are 6000 rows inserted per minute ) on a partitioned table, I was working cool when the database was small, now I have a large database. I was using this solution a previous solutions SQL joined by date but it is using 250MB of hard disk and grows up while my table grows, then I decide on change it to an iteration of simple queries, that works well for 10 rows, but get slow with more than 10 cars ( 15 secods for response ) and uses more than 200MB of hard disk.
My question is how to build a good query faster for resolve this issue
Extra info
Query is called on a django app by ajax
I was thinking on iterative ajax calls instead one call with full items list response
Tables are partitioned by day
My actual query is
CREATE OR REPLACE FUNCTION gps_get_last_positions (
_plates varchar(8)
)
RETURNS TABLE (
plate varchar,
device_id integer,
date_time_process timestamp with time zone,
latitude double precision,
longitude double precision,
course smallint,
speed smallint,
mileage integer,
gps_signal smallint,
gsm_signal smallint,
alarm_status boolean,
gsm_status boolean,
vehicle_status boolean,
alarm_over_speed boolean,
other text,
realtime double precision
) AS $func$
DECLARE
arr varchar[];
BEGIN
arr := regexp_split_to_array(_plates, E'\\s+');
FOR i IN 1..array_length(arr, 1) LOOP
RETURN QUERY SELECT
gpstracking_vehicles.registration,
gpstracking_device_tracks.device_id,
gpstracking_device_tracks.date_time_process,
gpstracking_device_tracks.latitude,
gpstracking_device_tracks.longitude,
gpstracking_device_tracks.course,
gpstracking_device_tracks.speed,
gpstracking_device_tracks.mileage,
gpstracking_device_tracks.gps_signal,
gpstracking_device_tracks.gsm_signal,
gpstracking_device_tracks.alarm_status,
gpstracking_device_tracks.gps_status,
gpstracking_device_tracks.vehicle_status,
gpstracking_device_tracks.alarm_over_speed,
gpstracking_device_tracks.other,
EXTRACT(EPOCH FROM current_timestamp - gpstracking_device_tracks.date_time_process)/60 AS realtime
FROM (
gpstracking_devices INNER JOIN (
gpstracking_vehicles INNER JOIN gpstracking_vehicles_devices ON ( gpstracking_vehicles.id = gpstracking_vehicles_devices.vehicle_id AND gpstracking_vehicles_devices.is_joined = TRUE )
) ON ( gpstracking_devices.id = gpstracking_vehicles_devices.device_id AND gpstracking_vehicles_devices.is_joined = TRUE )
) INNER JOIN gpstracking_device_tracks ON ( gpstracking_devices.id = gpstracking_device_tracks.device_id )
WHERE gpstracking_vehicles.registration = arr[i]::VARCHAR
ORDER BY gpstracking_device_tracks.date_time_process DESC
LIMIT 1;
END LOOP;
RETURN;
END;
$func$
LANGUAGE plpgsql VOLATILE SECURITY DEFINER;
Configuration params
application_name phpPgAdmin_5.0.4 client
constraint_exclusion on configuration file
DateStyle ISO, MDY session
default_text_search_config pg_catalog.english configuration file
external_pid_file /var/run/postgresql/9.1-main.pid configuration file
lc_messages en_US.UTF-8 configuration file
lc_monetary en_US.UTF-8 configuration file
lc_numeric en_US.UTF-8 configuration file
lc_time en_US.UTF-8 configuration file
log_line_prefix %t configuration file
log_timezone localtime environment variable
max_connections 100 configuration file
max_stack_depth 2MB environment variable
port 5432 configuration file
shared_buffers 24MB configuration file
ssl on configuration file
TimeZone localtime environment variable
unix_socket_directory /var/run/postgresql configuration file
My first slow query was:
CREATE OR REPLACE VIEW view_vehicle_devices AS
SELECT
gpstracking_vehicles_devices.id AS id,
gpstracking_devices.id AS device_id,
gpstracking_vehicles.id AS vehicle_id,
gpstracking_drivers.id AS driver_id,
gpstracking_device_protocols.name AS protocol,
gpstracking_vehicles.registration AS plate,
gpstracking_drivers.firstname as first_name,
gpstracking_drivers.lastname as last_name,
gpstracking_devices.imei,
gpstracking_devices.simcard,
gpstracking_device_tracks.date_time_process,
gpstracking_device_tracks.latitude,
gpstracking_device_tracks.longitude,
gpstracking_device_tracks.course,
gpstracking_device_tracks.speed,
gpstracking_device_tracks.mileage,
gpstracking_device_tracks.gps_signal,
gpstracking_device_tracks.gsm_signal,
gpstracking_device_tracks.alarm_status,
gpstracking_device_tracks.gps_status,
gpstracking_device_tracks.vehicle_status,
gpstracking_device_tracks.alarm_over_speed,
gpstracking_device_tracks.other,
gpstracking_device_tracks.point,
EXTRACT(EPOCH FROM current_timestamp - gpstracking_device_tracks.date_time_process)/60 realtime,
gpstracking_devices.created,
gpstracking_devices.updated,
gpstracking_devices.is_connected
FROM (
gpstracking_vehicles LEFT JOIN (
gpstracking_drivers LEFT JOIN gpstracking_vehicles_drivers ON gpstracking_drivers.id = gpstracking_vehicles_drivers.driver_id AND gpstracking_vehicles_drivers.is_joined = TRUE
) ON gpstracking_vehicles.id = gpstracking_vehicles_drivers.vehicle_id AND gpstracking_vehicles_drivers.is_joined = TRUE
) LEFT JOIN (((
gpstracking_device_protocols RIGHT JOIN gpstracking_devices ON gpstracking_device_protocols.id = gpstracking_devices.device_protocol_id
) LEFT JOIN (
SELECT DISTINCT ON (gpstracking_device_tracks.device_id) gpstracking_device_tracks.device_id,
gpstracking_device_tracks.date_time_process,
gpstracking_device_tracks.latitude,
gpstracking_device_tracks.longitude,
gpstracking_device_tracks.course,
gpstracking_device_tracks.speed,
gpstracking_device_tracks.mileage,
gpstracking_device_tracks.gps_signal,
gpstracking_device_tracks.gsm_signal,
gpstracking_device_tracks.alarm_status,
gpstracking_device_tracks.gps_status,
gpstracking_device_tracks.vehicle_status,
gpstracking_device_tracks.alarm_over_speed,
gpstracking_device_tracks.other,
gpstracking_device_tracks.point
FROM gpstracking_device_tracks
ORDER BY gpstracking_device_tracks.device_id, gpstracking_device_tracks.date_time_process DESC
) AS gpstracking_device_tracks ON gpstracking_devices.id = gpstracking_device_tracks.device_id
) LEFT JOIN gpstracking_vehicles_devices ON ( gpstracking_devices.id = gpstracking_vehicles_devices.device_id AND gpstracking_vehicles_devices.is_joined = TRUE )
) ON ( gpstracking_vehicles.id = gpstracking_vehicles_devices.vehicle_id AND gpstracking_vehicles_devices.is_joined = TRUE )
I have changed it for the loop that is starting my post, my loop rujns faster, however is not enought faster as I need

Your problem is that a planner can not know in which partition is an answer to your query. It only has statistics. So you do not benefit from partitioning your data by a day at all.
To benefit from it you can modify your query so it'll look for latest coordinates from current day, if not found then from yesterday, if not found from a day before an so on. I suppose 99% answers would be found in todays partition only.
Or you can partition by for example device_id % 256 instead.
But even better would be to create an additional table with several recent device coordinates only. It would be maintained with a trigger on gpstracking_device_tracks which will just do (pseudocode):
if random()*64 < 1.0 then
-- statistically once per 64 runs do a cleanup
with todelete as (
-- lock rows in particular order to avoid possible deadlocks
-- if run concurrently
select id from gpstracking_device_tracks_recent where device_id=?
order by id for share;
)
delete from gpstracking_device_tracks_recent
where id in (select id from todelete)
end if;
insert into gpstracking_device_tracks_recent (...) values (...);
And then look for latest coordinates in this much smaller table.

Related

Redshift Result size exceeds LISTAGG limit on svl_statementtext

Trying to reconstruct my query history from svl_statementtext using listagg.
Getting error :
Result size exceeds LISTAGG limit (limit: 65535)
However, I cannot see how or where I have exceeded limit.
My failing query :
SELECT pid,xid, min(starttime) AS starttime,
pg_catalog.listagg(
CASE WHEN (len(rtrim(("text")::text)) = 0) THEN ("text")::text ELSE rtrim(("text")::text) END
, ''::text
) WITHIN GROUP(ORDER BY "sequence")
AS query_statement
FROM svl_statementtext
GROUP BY pid,xid
HAVING min(starttime) >= '2022-06-27 10:00:00';
After the fail, I checked to see if I could find where the excessive size was coming from :
SELECT pid,xid, min(starttime) AS starttime,
SUM(OCTET_LENGTH(
CASE WHEN (len(rtrim(("text")::text)) = 0) THEN ("text")::text ELSE rtrim(("text")::text) END
)) as total_bytes
FROM svl_statementtext
GROUP BY pid,xid
HAVING min(starttime) >= '2022-06-27 10:00:00'
ORDER BY total_bytes desc;
However the largest size that this query reports is 2962
So how/why is listagg complaining about 65535 ??
Have seen some other posts mentioning using listaggdistinct, and catering for when the value being aggregated is null, but none seem to change my problem.
Any guidance appreciated :)

The longest string that Redshift can hold is 64K bytes. Listagg() is likely generating a string longer than this. The "text" column in svl_statementtext is 200 characters so if you have more than 319 segments you can overflow this string size.
The other issue I see is that your query will combine multiple statements into one string. You are only grouping by xid and pid which will give you all statements for a transaction. Add starttime to your group by list and this will break different statements into different results.
Also remember that xid and pid values repeat every few days so have some date range limit can help prevent a lot of confusion.
You need to add
where sequence < 320
to your query and also group by starttime.
Here's a query I have used to put together statements in Redshift:
select xid, pid, starttime, max(datediff('sec',starttime,endtime)) as runtime, type, listagg(regexp_replace(text,'\\\\n*',' ')) WITHIN GROUP (ORDER BY sequence) || ';' as querytext
from svl_statementtext
where pid = (SELECT pg_backend_pid()) --current session
and sequence < 320
and starttime > getdate() - interval '24 hours'
group by starttime, 1, 2, "type" order by starttime, 1 asc, "type" desc ;

collect data via API from the table "apex_activity_log" in APEX ORACLE

I have one question, I want to have one page view statistics and send them through REST.
here is my request:
select count (*)
from apex_activity_log l
where flow_id = 100
and time_stamp> = sysdate - (1/24/60/60 * 2419200)
and userid is not null
and step_id = 172
;
answer : 867
This query works great in "SQL Commands". But when I use this query in Packages I get 0. Although the answer should be completely different. How do I give Packages access to apex_activity_log. That the correct answer came back to me. Thanks)
My api in Packages :
PROCEDURE post_connect_service (
p_status OUT NUMBER,
p_blob IN BLOB
) IS
v_blob blob := p_blob;
v_clob CLOB;
tv apex_json.t_values;
v_id varchar2(1000);
v_number varchar2(1000);
v_date_last date;
v_count_v_pl int;
v_count_sum_v int;
BEGIN
v_clob := iot_general.blob_to_clob(v_blob);
apex_json.parse(tv,v_clob);
v_id := apex_json.get_varchar2(p_path => 'id', p_values => tv);
select count(*) into v_count_sum_v
from apex_activity_log;
p_status := 200;
apex_json.open_object;
apex_json.write('success', true);
apex_json.write('count_views', v_count_v_pl);
apex_json.write('count_sum_views', v_count_sum_v);
apex_json.close_object;
EXCEPTION
WHEN OTHERS THEN
p_status := 500;
apex_json.open_object;
apex_json.write('success', false);
apex_json.write('message', substr(sqlerrm,1,4000));
apex_json.close_object;
END post_connect_service;

The view apex_activity_log has data for the current apex context. If it is queried outside of an apex context, it will return no rows. Easiest way is to create an apex session before querying it.
koen>SELECT COUNT(*) FROM apex_activity_log;
COUNT(*)
___________
0
koen>DECLARE
2 BEGIN
3 apex_session.create_session (
4 p_app_id => 286,
5 p_page_id => 1,
6 p_username => 'JOHN.DOE');
7 END;
8* /
PL/SQL procedure successfully completed.
koen>SELECT COUNT(*) FROM apex_activity_log;
COUNT(*)
___________
9327
koen>

first, there is a pre-configured REST API for this within ORDS.
https://docs.oracle.com/en/database/oracle/oracle-database/21/dbrst/api-oracle-apex.html
To answer your actual question:
Does your procedure run within the same workspace as application 100 is? Or are you dependent on the APEX_ADMINISTRATOR_ROLE for this to return rows?
if the latter, make sure to create your package or procedure with AUTHID CURRENT_USER to make sure that it ...
a) runs with the privieges of the invoking user
b) have roles (such as the APEX_ADMINISTRATOR_ROLE) enabled during PL/SQL execution.

using pd.read_sql() to extract large data (>5 million records) from oracle database, making the sql execution very slow

Initially tried using pd.read_sql().
Then I tried using sqlalchemy, query objects but none of these methods are
useful as the sql getting executed for long time and it never ends.
I tried using Hints.
I guess the problem is the following: Pandas creates a cursor object in the
background. With cx_Oracle we cannot influence the "arraysize" parameter which
will be used thereby, i.e. always the default value of 100 will be used which
is far too small.
CODE:
import pandas as pd
import Configuration.Settings as CS
import DataAccess.Databases as SDB
import sqlalchemy
import cx_Oracle
dfs = []
DBM = SDB.Database(CS.DB_PRM,PrintDebugMessages=False,ClientInfo="Loader")
sql = '''
WITH
l AS
(
SELECT DISTINCT /*+ materialize */
hcz.hcz_lwzv_id AS lwzv_id
FROM
pm_mbt_materialbasictypes mbt
INNER JOIN pm_mpt_materialproducttypes mpt ON mpt.mpt_mbt_id = mbt.mbt_id
INNER JOIN pm_msl_materialsublots msl ON msl.msl_mpt_id = mpt.mpt_id
INNER JOIN pm_historycompattributes hca ON hca.hca_msl_id = msl.msl_id AND hca.hca_ignoreflag = 0
INNER JOIN pm_tpm_testdefprogrammodes tpm ON tpm.tpm_id = hca.hca_tpm_id
inner join pm_tin_testdefinsertions tin on tin.tin_id = tpm.tpm_tin_id
INNER JOIN pm_hcz_history_comp_zones hcz ON hcz.hcz_hcp_id = hca.hca_hcp_id
WHERE
mbt.mbt_name = :input1 and tin.tin_name = 'x1' and
hca.hca_testendday < '2018-5-31' and hca.hca_testendday > '2018-05-30'
),
TPL as
(
select /*+ materialize */
*
from
(
select
ut.ut_id,
ut.ut_basic_type,
ut.ut_insertion,
ut.ut_testprogram_name,
ut.ut_revision
from
pm_updated_testprogram ut
where
ut.ut_basic_type = :input1 and ut.ut_insertion = :input2
order by
ut.ut_revision desc
) where rownum = 1
)
SELECT /*+ FIRST_ROWS */
rcl.rcl_lotidentifier AS LOT,
lwzv.lwzv_wafer_id AS WAFER,
pzd.pzd_zone_name AS ZONE,
tte.tte_tpm_id||'~'||tte.tte_testnumber||'~'||tte.tte_testname AS Test_Identifier,
case when ppd.ppd_measurement_result > 1e15 then NULL else SFROUND(ppd.ppd_measurement_result,6) END AS Test_Results
FROM
TPL
left JOIN pm_pcm_details pcm on pcm.pcm_ut_id = TPL.ut_id
left JOIN pm_tin_testdefinsertions tin ON tin.tin_name = TPL.ut_insertion
left JOIN pm_tpr_testdefprograms tpr ON tpr.tpr_name = TPL.ut_testprogram_name and tpr.tpr_revision = TPL.ut_revision
left JOIN pm_tpm_testdefprogrammodes tpm ON tpm.tpm_tpr_id = tpr.tpr_id and tpm.tpm_tin_id = tin.tin_id
left JOIN pm_tte_testdeftests tte on tte.tte_tpm_id = tpm.tpm_id and tte.tte_testnumber = pcm.pcm_testnumber
cross join l
left JOIN pm_lwzv_info lwzv ON lwzv.lwzv_id = l.lwzv_id
left JOIN pm_rcl_resultschipidlots rcl ON rcl.rcl_id = lwzv.lwzv_rcl_id
left JOIN pm_pcm_zone_def pzd ON pzd.pzd_basic_type = TPL.ut_basic_type and pzd.pzd_pcm_x = lwzv.lwzv_pcm_x and pzd.pzd_pcm_y = lwzv.lwzv_pcm_y
left JOIN pm_pcm_par_data ppd ON ppd.ppd_lwzv_id = l.lwzv_id and ppd.ppd_tte_id = tte.tte_id
'''
#method1: using query objects.
Q = DBM.getQueryObject(sql)
Q.execute({"input1":'xxxx',"input2":'yyyy'})
while not Q.AtEndOfResultset:
print Q
#method2: using sqlalchemy
connectstring = "oracle+cx_oracle://username:Password#(description=
(address_list=(address=(protocol=tcp)(host=tnsconnect string)
(port=pertnumber)))(connect_data=(sid=xxxx)))"
engine = sqlalchemy.create_engine(connectstring, arraysize=10000)
df_p = pd.read_sql(sql, params=
{"input1":'xxxx',"input2":'yyyy'}, con=engine)
#method3: using pd.read_sql()
df_p = pd.read_sql_query(SQL_PCM, params=
{"input1":'xxxx',"input2":'yyyy'},
coerce_float=True, con= DBM.Connection)
It would be great if some one could help me out in this. Thanks in advance.

And yet another possibility to adjust the array size without needing to create oraaccess.xml as suggested by Chris. This may not work with the rest of your code as is, but it should give you an idea of how to proceed if you wish to try this approach!
class Connection(cx_Oracle.Connection):
def __init__(self):
super(Connection, self).__init__("user/pw#dsn")
def cursor(self):
c = super(Connection, self).cursor()
c.arraysize = 5000
return c
engine = sqlalchemy.create_engine(creator=Connection)
pandas.read_sql(sql, engine)

Here's another alternative to experiment with.
Set a prefetch size by using the external configuration available to Oracle Call Interface programs like cx_Oracle. This overrides internal settings used by OCI programs. Create an oraaccess.xml file:
<?xml version="1.0"?>
<oraaccess xmlns="http://xmlns.oracle.com/oci/oraaccess"
xmlns:oci="http://xmlns.oracle.com/oci/oraaccess"
schemaLocation="http://xmlns.oracle.com/oci/oraaccess
http://xmlns.oracle.com/oci/oraaccess.xsd">
<default_parameters>
<prefetch>
<rows>1000</rows>
</prefetch>
</default_parameters>
</oraaccess>
If you use tnsnames.ora or sqlnet.ora for cx_Oracle, then put the oraaccess.xml file in the same directory. Otherwise, create a new directory and set the environment variable TNS_ADMIN to that directory name.
cx_Oracle needs to be using Oracle Client 12c, or later, libraries.
Experiment with different sizes.
See OCI Client-Side Deployment Parameters Using oraaccess.xml.

SQL Comparison to a value in the next row

I have been a long time reader of this forum, it has helped me a lot, however I have a question which I cant find a solution specific to my requirements, so this is the first time I have had to ask anything.
I have a select statement which returns meter readings sorted by date (the newest readings at the top), in 99.9% of cases the meter readings always go up as the date moves on, however due to system errors occasionally some go down, I need to identify instances where the reading in the row below (previous reading) is GREATER than the latest reading (Current cell)
I have come across the LEAD function, however its only in Oracle or SS-MS-2012, I'm using SS-MS-2008.
Here is a simplified version of my select statment:
SELECT Devices.SerialNumber,
MeterReadings.ScanDateTime,
MeterReadings.TotalMono,
MeterReadings.TotalColour
FROM dbo.MeterReadings AS MeterReadings
JOIN DBO.Devices AS Devices
ON MeterReadings.DeviceID = Devices.DeviceID
WHERE Devices.serialnumber = 'ANY GIVEN DEVICE SERIAL NUMBER'
AND Meterreadings.Scandatetime > 'ANY GIVEN SCAN DATE TIME'
ORDER BY MeterReadings.ScanDateTime DESC, Devices.SerialNumber ASC

This is the code I used in the end
WITH readings AS
(
SELECT
d.SerialNumber
, m.TotalMono
, m.TotalColour
, m.ScanDateTime
FROM dbo.MeterReadings m
INNER JOIN dbo.Devices d ON m.DeviceId = d.DeviceId
WHERE m.ScanDateTime > '2012-01-01'
)
SELECT top 1 *
FROM readings r
LEFT JOIN readings p ON p.SerialNumber = r.SerialNumber
and p.ScanDateTime < r.ScanDateTime
and p.TotalMono > r.TotalMono
order by r.serialnumber, p.TotalMono desc, r.TotalMono asc

Try something like this.
;WITH readings AS
(
SELECT
d.SerialNumber
, m.TotalMono
, m.TotalColour
, m.ScanDateTime
FROM dbo.MeterReadings m
INNER JOIN dbo.Devices d ON m.DeviceId = d.DeviceId
)
SELECT *
FROM readings r
LEFT JOIN readings p ON p.SerialNumber = r.SerialNumber
AND p.ScanDateTime < r.ScanDateTime
WHERE p.reading > r.reading

What is the best way to populate a load file for a date lookup dimension table?

Informix 11.70.TC4:
I have an SQL dimension table which is used for looking up a date (pk_date) and returning another date (plus1, plus2 or plus3_months) to the client, depending on whether the user selects a "1","2" or a "3".
The table schema is as follows:
TABLE date_lookup
(
pk_date DATE,
plus1_months DATE,
plus2_months DATE,
plus3_months DATE
);
UNIQUE INDEX on date_lookup(pk_date);
I have a load file (pipe delimited) containing dates from 01-28-2012 to 03-31-2014.
The following is an example of the load file:
01-28-2012|02-28-2012|03-28-2012|04-28-2012|
01-29-2012|02-29-2012|03-29-2012|04-29-2012|
01-30-2012|02-29-2012|03-30-2012|04-30-2012|
01-31-2012|02-29-2012|03-31-2012|04-30-2012|
...
03-31-2014|04-30-2014|05-31-2014|06-30-2014|
........................................................................................
EDIT : Sir Jonathan's SQL statement using DATE(pk_date + n UNITS MONTH on 11.70.TC5 worked!
I generated a load file with pk_date's from 01-28-2012 to 12-31-2020, and plus1, plus2 & plus3_months NULL. Loaded this into date_lookup table, then executed the update statement below:
UPDATE date_lookup
SET plus1_months = DATE(pk_date + 1 UNITS MONTH),
plus2_months = DATE(pk_date + 2 UNITS MONTH),
plus3_months = DATE(pk_date + 3 UNITS MONTH);
Apparently, DATE() was able to convert pk_date to DATETIME, do the math with TC5's new algorithm, and return the result in DATE format!
.........................................................................................
The rules for this dimension table are:
If pk_date has 31 days in its month and plus1, plus2 or plus3_months only have 28, 29, or 30 days, then let plus1, plus2 or plus3 equal the last day of that month.
If pk_date has 30 days in its month and plus1, plus2 or plus3 has 28 or 29 days in its month, let them equal the last valid date of those month, and so on.
All other dates fall on the same day of the following month.
My question is: What is the best way to automatically generate pk_dates past 03-31-2014 following the above rules? Can I accomplish this with an SQL script, "sed", C program?
EDIT: I mentioned sed because I already have more than two years worth of data and
could perhaps model the rest after this data, or perhaps a tool like awk is better?

The best technique would be to upgrade to 11.70.TC5 (on 32-bit Windows; generally to 11.70.xC5 or later) and use an expression such as:
SELECT DATE(given_date + n UNITS MONTH)
FROM Wherever
...
The DATETIME code was modified between 11.70.xC4 and 11.70.xC5 to generate dates according to the rules you outline when the dates are as described and you use the + n UNITS MONTH or equivalent notation.
This obviates the need for a table at all. Clearly, though, all your clients would also have to be on 11.70.xC5 too.
Maybe you can update your development machine to 11.70.xC5 and then use this property to generate the data for the table on your development machine, and distribute the data to your clients.
If upgrading at least someone to 11.70.xC5 is not an option, then consider the Perl script suggestion.

Can it be done with SQL? Probably, but it would be excruciating. Ditto for C, and I think 'no' is the answer for sed.
However, a couple of dozen lines of perl seems to produce what you need:
#!/usr/bin/perl
use strict;
use warnings;
use DateTime;
my #dates;
# parse arguments
while (my $datep = shift){
my ($m,$d,$y) = split('-', $datep);
push(#dates, DateTime->new(year => $y, month => $m, day => $d))
|| die "Cannot parse date $!\n";
}
open(STDOUT, ">", "output.unl") || die "Unable to create output file.";
my ($date, $end) = #dates;
while( $date < $end ){
my #row = ($date->mdy('-')); # start with pk_date
for my $mth ( qw[ 1 2 3 ] ){
my $fut_d = $date->clone->add(months => $mth);
until (
($fut_d->month == $date->month + $mth
&& $fut_d->year == $date->year) ||
($fut_d->month == $date->month + $mth - 12
&& $fut_d->year > $date->year)
){
$fut_d->subtract(days => 1); # step back until criteria met
}
push(#row, $fut_d->mdy('-'));
}
print STDOUT join("|", #row, "\n");
$date->add(days => 1);
}
Save that as futuredates.pl, chmod +x it and execute like this:
$ futuredates.pl 04-01-2014 12-31-2020
That seems to do the trick for me.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Getting last rows by date on a large postgresl data base - django

Related

Redshift Result size exceeds LISTAGG limit on svl_statementtext

collect data via API from the table "apex_activity_log" in APEX ORACLE

using pd.read_sql() to extract large data (>5 million records) from oracle database, making the sql execution very slow

SQL Comparison to a value in the next row

What is the best way to populate a load file for a date lookup dimension table?

Categories

Resources