DB2 IF and LENGTH usage - if-statement

I have this DB2 table
A | B | C
aaaa |123 |
bbbb |1 |
cccc |123456 |
All columns are varchars. I would like to have the column C filled up with the contents of B concatenated with the contents of A.
BUT the max length of C is 8. So if the concatenated string exceeds 8, then i would like to have only 5 characters + "...".
Basically:
if(length(A) + length(B) > maximum(C) {
//display only the first (maximum(C) - 3) characters, then add "..."
} else {
// display B + A
}
How can i do this in DB2?

One good option would be to define column C as generated column so you do not have to handle anything.
create table t3 (A varchar(10),
B varchar(10),
C varchar(8) generated always as (case when length(concat(A, B)) > 8 then substr(concat(A,B),1,5) || '...' else concat(A, B) end)
)
insert into t3 (A,B) values ('This', ' is a test');
insert into t3 (A,B) values ('ABCD', 'EFGH');
select * from t3
will return
A B C
----------------------------------
This is a test This ...
ABCD EFGH ABCDEFGH
Alternatives could be triggers, procedures, explicit code etc.

Related

How to repeatedly insert arguments from a list into a function until the list is empty?

Using R, I am working with simulating the outcome from an experiment where participants choose between two options (A or B) defined by their outcomes (x) and probabilities of winning the outcome (p). I have a function "f" that collects its arguments in a matrix with the columns "x" (outcome) and "p" (probability):
f <- function(x, p) {
t <- matrix(c(x,p), ncol=2)
colnames(t) <- c("x", "p")
t
}
I want to use this function to compile a big list of all the trials in the experiment. One way to do this is:
t1 <- list(1A=f(x=c(10), p=c(0.8)),
1B=f(x=c(5), p=c(1)))
t2 <- list(2A=f(x=c(11), p=c(0.8)),
2B=f(x=c(7), p=c(1)))
.
.
.
tn <- list(nA=f(x=c(3), p=c(0.8)),
nB=f(x=c(2), p=c(1)))
Big_list <- list(t1=t1, t2=t2, ... tn=tn)
rm(t1, t2, ... tn)
However, I have very many trials, which may change in future simulations, why repeating myself in this way is intractable. I have my trials in an excel document with the following structure:
| Option | x | p |
|---- |------| -----|
| A | 10 | 0.8 |
| B | 7 | 1 |
| A | 9 | 0.8 |
| B | 5 | 1 |
|... |...| ...|
I am trying to do some kind of loop which takes "x" and "p" from each "A" and "B" and inserts them into the function f, while skipping two rows ahead after each iteration (so that each option is only inserted once). This way, I want to get a set of lists t1 to tn while not having to hardcode everything. This is my best (but still not very good) attempt to explain it in pseudocode:
TRIALS <- read.excel(file_with_trials)
for n=1 to n=(nrows(TRIALS)-1) {
t(*PRINT 'n' HERE*) <- list(
(*PRINT 'n' HERE*)A=
f(x=c(*INSERT COLUMN 1, ROW n FROM "TRIALS"*),
p=c(*INSERT COLUMN 2, ROW n FROM "TRIALS"*)),
(*PRINT 'Z' HERE*)B=
f(x=c(*INSERT COLUMN 1, ROW n+1 FROM "TRIALS"*),
p=c(*INSERT COLUMN 2, ROW n+1 FROM "TRIALS"*)))
}
Big_list <- list(t1=t1, t2=t2, ... tn=tn)
That is, I want the code to create a numbered set of lists by drawing x and p from each pair of rows until my excel file is empty.
Any help (and feedback on how to improve this question) is greatly appreciated!

Google BigQuery - Execute dynamically generated queries from a select statement

Have a huge table in Google BigQuery with following structure (> 100 million rows):
name | departments
abc | 1,2,5,6
xyz | 4,5
pqr | 3,4,6
Want to convert the data into following format:
name | 1 | 2 | 3 | 4 | 5 | 6
abc | 1 | 1 | | | 1 | 1
xyz | | | | 1 | 1 |
pqr | | | 1 | 1 | | 1
As of now, able to generate the queries required to prepare the dataset in this format by using CONCAT and REGEX_REPLACE functions:
SELECT ' insert into dataset.output ( name, ' +
CONCAT(
'_' , replace(departments,',',',_') )
+ ' ) values( \'' + name +'\','+ REGEXP_REPLACE(departments, "([^,\n]+)", "1") +')'
FROM (
select name, departments from dataset.input )
This generates the output with the 100 M insert queries which can be used to create the data in the required structure.
However, now below are my questions:
Can we execute the output of this query (100 M insert queries) directly by using Big Query SQL or we would need to fire each insert one by one?
I believe there is no way to pivoting or transposing the data in a column with multiple comma separated values. Is that right?
Is there a more optimal way of achieving this using BigQuery SQL and not writing custom Java code?
Thanks.
Below example for BigQuery Standard SQL
#standardSQL
WITH `project.dataset.table` AS (
SELECT 'abc' name, '1,2,5,6' departments UNION ALL
SELECT 'xyz', '4,5' UNION ALL
SELECT 'pqr', '3,4,6'
)
SELECT
name,
IF(departments LIKE '%1%', 1, 0) AS d1,
IF(departments LIKE '%2%', 1, 0) AS d2,
IF(departments LIKE '%3%', 1, 0) AS d3,
IF(departments LIKE '%4%', 1, 0) AS d4,
IF(departments LIKE '%5%', 1, 0) AS d5,
IF(departments LIKE '%6%', 1, 0) AS d6
FROM `project.dataset.table`
with result as
Row name d1 d2 d3 d4 d5 d6
1 abc 1 1 0 0 1 1
2 xyz 0 0 0 1 1 0
3 pqr 0 0 1 1 0 1
So you need to run above with destination to whatever new table you prepared
Note, above assumes you have just 6 departments and most important there is no ambiguity in numbers like 1 does not conflict with 10 for example
If you do have such case - you need transform below lines
IF(departments LIKE '%2%', 1, 0) AS d2,
into
IF(CONCAT(',', departments, ',') LIKE '%,2,%', 1, 0) AS d2 ...
And of course, you can use just one simple INSERT statement
INSERT `project.dataset.new_table` (name, d1, d2, d3, d4, d5, d6)
SELECT
name,
IF(departments LIKE '%1%', 1, 0) AS d1,
IF(departments LIKE '%2%', 1, 0) AS d2,
IF(departments LIKE '%3%', 1, 0) AS d3,
IF(departments LIKE '%4%', 1, 0) AS d4,
IF(departments LIKE '%5%', 1, 0) AS d5,
IF(departments LIKE '%6%', 1, 0) AS d6
FROM `project.dataset.table`
So, the final point of all this is:
instead of generating INSERT STATEMENT for each and every row in original table - you should generate simple SELECT statement that does "pivoting"
Update for "extreme" minimizing generated code
See an example:
#standardSQL
CREATE TEMP FUNCTION c(departments STRING, department INT64) AS (
IF(departments LIKE CONCAT('%',CAST(department AS STRING),'%'), 1, 0)
);
WITH `project.dataset.table` AS (
SELECT 'abc' name, '1,2,5,6' departments UNION ALL
SELECT 'xyz', '4,5' UNION ALL
SELECT 'pqr', '3,4,6'
), temp AS (
SELECT name, departments AS d
FROM `project.dataset.table`
)
SELECT
name,
c(d,1)d1,
c(d,2)d2,
c(d,3)d3,
c(d,4)d4,
c(d,5)d5,
c(d,6)d6
FROM temp
as you can see - now each of your 10000 lines will be like c(d,N)dN, with max in length as c(d,10000)d10000, so you have chance to fit into query size limit

Django ORM calculations between records

Is it possible to perform calculations between records in a Django query?
I know how to perform calculations across records (e.g. data_a + data_b). Is there way to perform say the percent change between data_a row 0 and row 4 (i.e. 09-30-17 and 09-30-16)?
+-----------+--------+--------+
| date | data_a | data_b |
+-----------+--------+--------+
| 09-30-17 | 100 | 200 |
| 06-30-17 | 95 | 220 |
| 03-31-17 | 85 | 205 |
| 12-31-16 | 80 | 215 |
| 09-30-16 | 75 | 195 |
+-----------+--------+--------+
I am currently using Pandas to perform these type of calculations, but would like eliminate this additional step if possible.
I would go with a Database cursor raw SQL
(see https://docs.djangoproject.com/en/2.0/topics/db/sql/)
combined with a Lag() window function as so:
result = cursor.execute("""
select date,
data_a - lag(data_a) over (order by date) as data_change,
from foo;""")
This is the general idea, you might need to change it according to your needs.
There is no row 0 in a Django database, so we'll assume rows 1 and 5.
The general formula for calculation of percentage as expressed in Python is:
((b - a) / a) * 100
where a is the starting number and b is the ending number. So in your example:
a = 100
b = 75
((b - a) / a) * 100
-25.0
If your model is called Foo, the queries you want are:
(a, b) = Foo.objects.filter(id__in=[id_1, id_2]).values_list('data_a', flat=True)
values_list says "get just these fields" and flat=True means you want a simple list of values, not key/value pairs. By assigning it to the (a, b) tuple and using the __in= clause, you get to do this as a single query rather than as two.
I would wrap it all up into a standalone function or model method:
def pct_change(id_1, id_2):
# Get a single column from two rows and return percentage of change
(a, b) = Foo.objects.filter(id__in=[id_1, id_2]).values_list('data_a', flat=True)
return ((b - a) / a) * 100
And then if you know the row IDs in the db for the two rows you want to compare, it's just:
print(pct_change(233, 8343))
If you'd like to calculate progressively the change between row 1 and row 2, then between row 2 and row 3, and so on, you'd just run this function sequentially for each row in a queryset. Because row IDs might have gaps we can't just use n+1 to compute the next row. Instead, start by getting a list of all the row IDs in a queryset:
rows = [r.id for r in Foo.objects.all().order_by('date')]
Which evaluates to something like
rows = [1,2,3,5,6,9,13]
Now for each elem in list and the next elem in list, run our function:
for (index, row) in enumerate(rows):
if index < len(rows):
current, next_ = row, rows[index + 1]
print(current, next_)
print(pct_change(current, next_))

How to populate missing values for string variable in a column based on fixed criteria

To populate missing data with a fixed range of values
I would like to check how to populate column aktype with a range of values (the range of values for the same pidlink are always fixed at 11 types of values listed below) for those cells with missing values. I have about 17,000+ observations that are missing.
The range of values are as follows:
A
B
C
D
E
G
H
I
J
K
L
I have tried the following command but it does not work:-
foreach x of varlist aktype=1/11 {
replace aktype = "A" in 1 if aktype==""
replace aktype = "B" in 2 if aktype==""
replace aktype = "C" in 3 if aktype==""
replace aktype = "D" in 4 if aktype==""
replace aktype = "E" in 5 if aktype==""
replace aktype = "G" in 6 if aktype==""
replace aktype = "H" in 7 if aktype==""
replace aktype = "I" in 8 if aktype==""
replace aktype = "J" in 9 if aktype==""
replace aktype = "K" in 10 if aktype==""
replace aktype = "L" in 11 if aktype==""
}
Would appreciate it if you could advise on the right command to use. Many thanks!
I would generate a variable AK that has letters A-K in positions 1-11 (and 12-22, and 23-33, and so on). The replace missing values with the value of this variable AK.
* generate data
clear
set obs 20
generate aktype = ""
replace aktype = "foo" in 1/1
replace aktype = "bar" in 10/12
* generate variable with letters A-K
generate AK = char(65 + mod(_n - 1, 11))
* fill missing values
replace aktype = AK if missing(aktype)
list
This yields the following.
. list
+-------------+
| aktype AK |
|-------------|
1. | foo A |
2. | B B |
3. | C C |
4. | D D |
5. | E E |
|-------------|
This first addresses the comment "it does not work".
Generally, in this kind of forum you should always be specific and say exactly what happens, namely where the code breaks down and what the result is (e.g. what error message you get). If necessary, add why that is not what is wanted.
Specifically, in this case Stata would get no further than
foreach x of varlist aktype=1/11
which is illegal (as well as unclear to Stata programmers).
You can loop over a varlist. In this case looping over a single variable aktype is legal. (It is usually pointless, but that's style, not syntax.) So this is legal:
foreach x of varlist aktype
By the way, you define x as the loop argument, but never refer to it inside the loop. That isn't illegal, but it is unusual.
You can also loop over a numlist, e.g.
foreach x of numlist 1/11
although
forval x = 1/11
is a more direct way of doing that. All this follows from the syntax diagrams for the commands concerned, where whatever is not explicitly allowed is forbidden.
On occasions when you need to loop over a varlist and a numlist you will need to use different syntax, but what is best depends on the precise problem.
Now second to the question: I can't see any kind of rule in the question for which values get assigned A through L, so can't advise positively.

posix regexp to split a table

I'm currently working on data migration in PostgreSQL. Since I'm new to posix regular expressions, I'm having some trouble with a simple pattern and would appreciate your help.
I want to have a regular expression split my table on each alphanumeric char in a column, eg. when a column contains a string 'abc' I'd like to split it into 3 rows: ['a', 'b', 'c']. I need a regexp for that
The second case is a little more complicated, I'd like to split an expression '105AB' into ['105A', '105B'], I'd like to copy the numbers at the beginning of the string and split the table on uppercase letters, in the end joining the number with exactly 1 uppercase letter.
the function I'll be using is probably regexp_split_to_table(string, regexp)
I'm intentionally providing very little data not to confuse anyone, since what I posted is the essence of the problem. If you need more information please comment.
The first was already solved by you:
select regexp_split_to_table(s, ''), i
from (values
('abc', 1),
('def', 2)
) s(s, i);
regexp_split_to_table | i
-----------------------+---
a | 1
b | 1
c | 1
d | 2
e | 2
f | 2
In the second case you don't say if the numerics are always the first tree characters:
select
left(s, 3) || regexp_split_to_table(substring(s from 4), ''), i
from (values
('105AB', 1),
('106CD', 2)
) s(s, i);
?column? | i
----------+---
105A | 1
105B | 1
106C | 2
106D | 2
For a variable number of numerics:
select n || a, i
from (
select
substring(s, '^\d{1,3}') n,
regexp_split_to_table(substring(s, '[A-Z]+'), '') a,
i
from (values
('105AB', 1),
('106CD', 2)
) s(s, i)
) s;
?column? | i
----------+---
105A | 1
105B | 1
106C | 2
106D | 2