MonetDB create 100.000 columns - c++

I am trying to create a MonetDB database that shall hold 100k columns and approximately 2M rows of smallint type.
To generate 100k columns I am using a C code, i.e., a loop that performs the following sql request:
ALTER TABLE test ADD COLUMN s%d SMALLINT;
where %d is a number from 1 till 100000.
I observed that after 80000 sql requests each transaction takes about 15s, meaning that I need a lot of time to complete the table creation.
Could you tell me if there is a simple way of creating 100k columns?
Also, do you know what exactly what is going on with MonetDB?

You should use only one create table
in script shell (bash) :
#!/bin/bash
fic="/tmp/100k.sql"
col=1
echo "CREATE TABLE bigcol (" > $fic
while [[ $col -lt 100000 ]]
do
echo "field$col SMALLINT," >> $fic
col=$(($col + 1))
done
echo "field$col SMALLINT);" >> $fic
And in command line :
sh 100k.sh
mclient yourbdd < /tmp/100k.sql
wait about 2 minutes :D
mclient yourbdd
> \d bigcol
[ ... ... ...]
"field99997" SMALLINT,
"field99998" SMALLINT,
"field99999" SMALLINT,
"field100000" SMALLINT
);
DROP TABLE bigcol is against very very long. I do not know why.
I also think it is not a good idea, but it answer your question.
Pierre

Related

Convert a number column into a time format in Power BI

I'm looking for a way to convert a decimal number into a valid HH:mm:ss format.
I'm importing data from an SQL database.
One of the columns in my database is labelled Actual Start Time.
The values in my database are stored in the following decimal format:
73758 // which translates to 07:27:58
114436 // which translates to 11:44:36
I cannot simply convert this Actual Start Time column into a Time format in my Power BI import as it returns errors for some values, saying it doesn't recognise 73758 as a valid 'time'. It needs to have a leading zero for cases such as 73758.
To combat this, I created a new Text column with the following code to append a leading zero:
Column = FORMAT([Actual Start Time], "000000")
This returns the following results:
073758
114436
-- which is perfect. Exactly what I needed.
I now want to convert these values into a Time.
Simply changing the data type field to Time doesn't do anything, returning:
Cannot convert value '073758' of type Text to type Date.
So I created another column with the following code:
Column 2 = FORMAT(TIME(LEFT([Column], 2), MID([Column], 3, 2), RIGHT([Column], 2)), "HH:mm:ss")
To pass the values 07, 37 and 58 into a TIME format.
This returns the following:
_______________________________________
| Actual Start Date | Column | Column 2 |
|_______________________________________|
| 73758 | 073758 | 07:37:58 |
| 114436 | 114436 | 11:44:36 |
Which is what I wanted but is there any other way of doing this? I want to ideally do it in one step without creating additional columns.
You could use a variable as suggested by Aldert or you can replace Column by the format function:
Time Format = FORMAT(
TIME(
LEFT(FORMAT([Actual Start Time],"000000"),2),
MID(FORMAT([Actual Start Time],"000000"),3,2),
RIGHT([Actual Start Time],2)),
"hh:mm:ss")
Edit:
If you want to do this in Power query, you can create a customer column with the following calculation:
Time.FromText(
if Text.Length([Actual Start Time])=5 then Text.PadStart( [Actual Start Time],6,"0")
else [Actual Start Time])
Once this column is created you can drop the old column, so that you only have one time column in the data. Hope this helps.
I, on purpose show you the concept of variables so you can use this in future with more complex queries.
TimeC =
var timeStr = FORMAT([Actual Start Time], "000000")
return FORMAT(TIME(LEFT([timeStr], 2), MID([timeStr], 3, 2), RIGHT([timeStr], 2)), "HH:mm:ss")

Redshift regexp_substr - extract data from a JSON type format

Help much appreciated - I have a field in Redshift giving data of the form:
{\"frequencyCapList\":[{\"frequencyCapped\":true,\"frequencyCapPeriodCount\":1,\"frequencyCapPeriodType\":\"DAYS\",\"frequencyCapCount\":501}]}
What I would like to do is parse this cleanly as the output of a Redshift query into some columns like:
Frequency Cap Period Count | Frequency Cap Period Type | Frequency Cap Count
1 | DAYS | 501
I believe I need to use the regexp_subst function to achieve this but I cannot work out the syntax to get the required output :(
Thanks in advance for any assistance,
Carter
Here you go
select json_extract_path_text(json_extract_array_element_text(json_extract_path_text(replace('{\"frequencyCapList\":[{\"frequencyCapped\":true,\"frequencyCapPeriodCount\":1,\"frequencyCapPeriodType\":\"DAYS\",\"frequencyCapCount\":501}]}','\\',''),'frequencyCapList'),0),'frequencyCapPeriodCount');
just replace the last string with each one you want to extract!

RRD DB fake value generator

I want to generate fake values in RRD DB for a period of 1 month and with 5 seconds as a frequency for data collection. Is there any tool which would fill RRD DB with fake data for given time duration.
I Googled a lot but did not find any such tool.
Please help.
I would recommend the following one liner:
perl -e 'my $start = time - 30 * 24 * 3600; print join " ","update","my.rrd",(map { ($start+$_*5).":".rand} 0..(30*24*3600/5))' | rrdtool -
this assumes you have an rrd file called my.rrd and that is contains just one data source expecting GAUGE type data.

GNUPlot select rows in Histogram

I have a Data file which looks like the one below. Now, I wanted to make a histogram chart using column 9, column 10 as errorbars. That works out pretty good. Bubt is there an option only to plot specific rows?
I tried the solution in a another thread that using a ternary operator:
plot 'Härte StS-123 bis 151.txt' using ( ( $0 == 4 || $0 == 6 ) ? $9 : 1/0 ):($9+$10):($9-$10):xticlabels(2)
this plots row 4 and 6 indeed, but leaves an empty space inbetween the datasets.
Is there any other way to achieve this?
Data File:
StS-123a "SBR / THF" 50.10 49.60 49.20 50.70 50.00 49.50 49.85 0.49 0.00974176
StS-123b "SBR / THF" 51.00 50.40 50.40 52.00 52.80 50.60 51.20 0.90 0.017614257
StS-124a "SBR+2phrGraphit" 49.60 49.40 49.30 48.90 49.40 49.10 49.28 0.23 0.004599753
What you may want is the index option to the plot command:
plot 'datafile' index 4 u 9:($9-$10):($9+$10):xticlabels(2), \
'' index 6 u 9:($9-$10):($9+$10):xticlabels(2)
This should plot just the data from the 4th and 6th datasets (rows), albeit with two different styles which you can adjust in the plot command.
Did you want to connect the values from the two datasets? That may be trickier.
If you want to only plot data from the 4th and 6th rows that have data, you can use external commands in gnuplot, like:
plot "<sed '/^$/d' data.dat | sed -n '4p; 6p'" u 9:($9-$10):($9+$10):xticlabels(2)
(This may not be the most compact way to use sed in this case, but it deletes blank lines then returns the 4th and 6th rows.)

What is the best way to populate a load file for a date lookup dimension table?

Informix 11.70.TC4:
I have an SQL dimension table which is used for looking up a date (pk_date) and returning another date (plus1, plus2 or plus3_months) to the client, depending on whether the user selects a "1","2" or a "3".
The table schema is as follows:
TABLE date_lookup
(
pk_date DATE,
plus1_months DATE,
plus2_months DATE,
plus3_months DATE
);
UNIQUE INDEX on date_lookup(pk_date);
I have a load file (pipe delimited) containing dates from 01-28-2012 to 03-31-2014.
The following is an example of the load file:
01-28-2012|02-28-2012|03-28-2012|04-28-2012|
01-29-2012|02-29-2012|03-29-2012|04-29-2012|
01-30-2012|02-29-2012|03-30-2012|04-30-2012|
01-31-2012|02-29-2012|03-31-2012|04-30-2012|
...
03-31-2014|04-30-2014|05-31-2014|06-30-2014|
........................................................................................
EDIT : Sir Jonathan's SQL statement using DATE(pk_date + n UNITS MONTH on 11.70.TC5 worked!
I generated a load file with pk_date's from 01-28-2012 to 12-31-2020, and plus1, plus2 & plus3_months NULL. Loaded this into date_lookup table, then executed the update statement below:
UPDATE date_lookup
SET plus1_months = DATE(pk_date + 1 UNITS MONTH),
plus2_months = DATE(pk_date + 2 UNITS MONTH),
plus3_months = DATE(pk_date + 3 UNITS MONTH);
Apparently, DATE() was able to convert pk_date to DATETIME, do the math with TC5's new algorithm, and return the result in DATE format!
.........................................................................................
The rules for this dimension table are:
If pk_date has 31 days in its month and plus1, plus2 or plus3_months only have 28, 29, or 30 days, then let plus1, plus2 or plus3 equal the last day of that month.
If pk_date has 30 days in its month and plus1, plus2 or plus3 has 28 or 29 days in its month, let them equal the last valid date of those month, and so on.
All other dates fall on the same day of the following month.
My question is: What is the best way to automatically generate pk_dates past 03-31-2014 following the above rules? Can I accomplish this with an SQL script, "sed", C program?
EDIT: I mentioned sed because I already have more than two years worth of data and
could perhaps model the rest after this data, or perhaps a tool like awk is better?
The best technique would be to upgrade to 11.70.TC5 (on 32-bit Windows; generally to 11.70.xC5 or later) and use an expression such as:
SELECT DATE(given_date + n UNITS MONTH)
FROM Wherever
...
The DATETIME code was modified between 11.70.xC4 and 11.70.xC5 to generate dates according to the rules you outline when the dates are as described and you use the + n UNITS MONTH or equivalent notation.
This obviates the need for a table at all. Clearly, though, all your clients would also have to be on 11.70.xC5 too.
Maybe you can update your development machine to 11.70.xC5 and then use this property to generate the data for the table on your development machine, and distribute the data to your clients.
If upgrading at least someone to 11.70.xC5 is not an option, then consider the Perl script suggestion.
Can it be done with SQL? Probably, but it would be excruciating. Ditto for C, and I think 'no' is the answer for sed.
However, a couple of dozen lines of perl seems to produce what you need:
#!/usr/bin/perl
use strict;
use warnings;
use DateTime;
my #dates;
# parse arguments
while (my $datep = shift){
my ($m,$d,$y) = split('-', $datep);
push(#dates, DateTime->new(year => $y, month => $m, day => $d))
|| die "Cannot parse date $!\n";
}
open(STDOUT, ">", "output.unl") || die "Unable to create output file.";
my ($date, $end) = #dates;
while( $date < $end ){
my #row = ($date->mdy('-')); # start with pk_date
for my $mth ( qw[ 1 2 3 ] ){
my $fut_d = $date->clone->add(months => $mth);
until (
($fut_d->month == $date->month + $mth
&& $fut_d->year == $date->year) ||
($fut_d->month == $date->month + $mth - 12
&& $fut_d->year > $date->year)
){
$fut_d->subtract(days => 1); # step back until criteria met
}
push(#row, $fut_d->mdy('-'));
}
print STDOUT join("|", #row, "\n");
$date->add(days => 1);
}
Save that as futuredates.pl, chmod +x it and execute like this:
$ futuredates.pl 04-01-2014 12-31-2020
That seems to do the trick for me.