SQL - Cast (String to NVARCHAR) - casting

Row 4 - I'm trying to make the 'on' syntax, to the right of the = sign, dynamic. This is based on a two digit year value from any particular date (e.g., 2021-08-31'). Row 5, commented-out, displays the original logic. Row 4, to the right of the = sign, contains my attempt at dynamic logic.
Note, the actual date will eventually be supplied by a Date/Time parameter in MS Visual Studio (BIDS/SSRS).
Column Types: In row 5, a.cui = PK, varchar(8), not null. b.cy21_cui = nvarchar(30), null
SELECT 'b.cy'+substring('2021-08-31',3,2)+'_cui' = b.cy21_cui. This looks correct but the type is not... it is not the correct type and will not join to a.cui.
My question is, how do I CAST 'b.cy'+substring('2021-08-31',3,2)+'_cui' to a type that will join to a.cui? I've read all of the suggested threads here, based on the title of my question, and I cannot figure it out. I've testing a thousand CAST variations too.
Thanks in advance for any assistance.
1 select distinct bn
2 from fc_2021 a
3 join M..rx_crsw b
----------------------------------------------------------------------
4 on a.cui='b.cy'+substring('2021-08-31',3,2)+'_cui'
5 --on a.cui=b.cy21_cui
----------------------------------------------------------------------
6 join nf_bsc..cvrg c
7 on b.abc=c.abc
8 where a.Eff_Date<='2021-08-31' and a.term_date>='2021-08-31' and PA_Type<>0

Related

Nvarchar '45,56' casted to decimal is 4,00. Nvarchar '45' to tinyint is 4. Why?

There are 2 text boxes called #IMPORTOORARIO and #ANTICIPO.
The user writes in the first one '45,56', and in the second one '45'.
I want to cast the first string to decimal and the second one to tinyint.
No matter what I try: if the cast succeds, I end up with '45,56' casted to 4,00 and '45' to 4.
Just 4. Not 4,00.
Here is part of the store procedure I am using:
INSERT INTO TableName
VALUES
(
try_convert(decimal(6, 2),#IMPORTOORARIO),
try_convert(tinyint,#ANTICIPO)
)
I attach a screenshot to show the problem.
You will see more text boxes and table fields but the problem is always the same, just focus on #IMPORTOORARIO and #ANTICIPO.
#IMPORTOORARIO = number 3. #ANTICIPO = number 4
Info: the maximum number I am going to store in the database is decimal(9, 2).
So something like: 123456,78
So 6 numbers before the comma and 2 after the comma.
To solve the problem, I tried to use CAST, CONVERT, TRY_CONVERT, and something I did not understand with REPLACE:
Select try_convert(numeric(6, 2),replace('25,12', ',', '.'))

BIRT: Align rows in list element

I'm using the Birt list element to display my data from left to right. (see this question as reference). Eg. List element with a Grid in details and the grid set to inline.
The issue I'm facing now is, that the different rows in the grid are not aligned left to right (probably due to some rows having empty values in some fields). How can I force BIRT to align properly?
EDIT:
This is especially also a problem with longer text that wraps to more than 1 line. The wrapping /multiple lines should be reflected by all list elements in that "row of the output".
Unfortunately, I don't see any chance to accomplish this easily in the generic case - that is, if the number of records is unknown in advance, so you'd need more than one line:
student1 student2 student3
student4 student5
Let's call those line "main lines". One main line can contain up to 3 records. The number 3 may be different in your case, but we can assume it is a constant, since (at least for PDF reports) the paper width is restricted.
A possible solution could work like this:
In your data set, add two columns for each row: MAIN_LINE_NUM and COLUMN_NUM, where the meaning is obvious. For example, this could be done with pure SQL using analytic functions (untested):
select ...,
trunc((row_number() over (order by ...whatever...) - 1) / 3) + 1 as MAIN_LINE_NUM,
mod(row_number() over (order by ...whatever...) - 1), 3) +1 as COLUMN_NUM
from ...
order by ...whatever... -- The order must be the same as above.
Now you know where each record should go.
The next task is to transform the result set into a form where each record looks like this (for the example, think that you have 3 properties STUDENT_ID, NAME, ADDRESS for each student):
MAIN_LINE
STUDENT_ID_1
NAME_1
ADDRESS_1
STUDENT_ID_2
NAME_2
ADDRESS_2
STUDENT_ID_3
NAME_3
ADDRESS_3
You get the picture...
The SQL trick to achieve this is one that one should know.
I'll show this for the STUDENT_ID_1, STUDENT_ID_2 and NAME_1 column as an example:
with my_data as
( ... the query shown above including MAIN_LINE_NUM and COLUMN_NUM ...
)
select MAIN_LINE_NUM,
max(case when COLUMN_NUM=1 then STUDENT_ID else null end) as STUDENT_ID_1,
max(case when COLUMN_NUM=2 then STUDENT_ID else null end) as STUDENT_ID_2,
...
max(case when COLUMN_NUM=1 then NAME else null end) as NAME_1,
...
from my_data
group by MAIN_LINE_NUM
order by MAIN_LINE_NUM
As you see, this is quite clumsy if you need a lot of different columns.
On the other hand, this makes the output a lot easier.
Create a table item for your dat set, with 3 columns (for 1, 2, 3). It's best to not drag the dataset into the layout. Instead, use the "Insert element" context menu.
You need a detail row for each of the columns (STUDENT_ID, NAME, ADDRESS). So, add two more details rows (the default is one detail row).
Add header labels manually, if you like, or remove the header row if you don't need it (which is what I assume).
Remove the footer row, as you probably don't need it.
Drag the columns to the corresponding position in your table.
The table item should look like this now (data items):
+--------------+--------------+-------------+
+ STUDENT_ID_1 | STUDENT_ID_2 | STUDENT_ID3 |
+--------------+--------------+-------------+
+ NAME_1 | NAME_2 | NAME_3 |
+--------------+--------------+-------------+
+ ADDRESS_1 | ADDRESS_2 | ADDRESS_3 |
+--------------+--------------+-------------+
That's it!
This is one of the few examples where BIRT sucks IMHO in comparison to other tools like e.g. Oracle Reports - excuse my Klatchian.

“Otherwise” argument of my IF-function applies to blank cells, but should ignore them. What can I add to my formula to stop it?

In my IF-function the “otherwise” argument should conduct the subtraction “6 - value”. It works fine for cells containing numbers, but unfortunately also works fine with blank cells. This results in a lot of cells with 6 (6 - 0 = 6) instead of empty cells.
In detail:
I want to import and select data collected in an online questionnaire.
I import my extract of the raw data in sheet “Sample” with the following formula:
=IF(LOOKUP(D$1,'Analysis'!$A$2:$A,'Analysis'!$G$2:$G)="No",FILTER(FILTER(Import!$A$2:$CV,Import!$A$1:$CV$1=D$1),Import!$A$2:$A=0),ARRAYFORMULA(6-FILTER(FILTER(Import!$A$2:$CV,Import!$A$1:$CV$1=D$1),Import!$A$2:$A=0)))
= If the question has not to be reversed (“No”), then import the values as they are, otherwise (if the question has to be reversed, “Yes”) subtract 6 - value.
Sheets in Google Spreadsheets:
“Import”: This sheet contains the raw data. For each person that participated in the study, there is a row with the corresponding answers (that is 1, 2, 3, 4 or 5 according to the rating scale in the questionnaire). Because not every person in the list started or completed the questionnaire, there are blank cells where no answers were registered and blank cells at the end of the sheet.
“Sample”: This sheet should contain an extract of the raw data for further analysis. It’s the sheet where the IF-formula is applied.
“Analysis”: This sheet contains informations concerning the questions, e.g. if the answers of some questions have to be reversed (reversed rating scale: 1 -> 5, 2 -> 4, 3 stays 3 and so on).
Coordinates:
Sheet “Sample”: Cell D$1, E$1, F$1 and so on contain the names of the questions (e.g. question_1).
Sheet “Analysis”: A2 to A contain the names of the questions.
Sheet “Analysis”: G2 to G contain the information if the answers of the questions have to be reversed. If the answers have to be reversed (“Yes”), the raw data needs to be adjusted with “6-” (6-5 = 1, 6-4 = 2, 6-3 = 3 and so on).
Sheet “Import”: A2 to A contains if there are any missing values. Zero means there are no missing values. Only data rows with no missing values should be imported.
Problem:
The formula works fine and displays the answers and reversed answers for the questions of interest. BUT at the end of the sheet “Sample” the columns continue with 6, 6, 6, 6, 6, 6, 6… (only for reversed questions); for not reversed questions the cells after the last valid import are blank.
Attempts to fix it:
I tried different variations of nested if-functions that unfortunately don’t have any effect, e.g.:
=IF(ISBLANK(Import!E2:I8)," ",IF(LOOKUP(D$1,Analysis!$A$2:$A,Analysis!$G$2:$G)="No",FILTER(FILTER(Import!$A$2:$CV,Import!$A$1:$CV$1=D$1),Import!$A$2:$A=0),ARRAYFORMULA(6-FILTER(FILTER(Import!$A$2:$CV,Import!$A$1:$CV$1=D$1),Import!$A$2:$A=0))))
or:
=IF(LOOKUP(D$1,Analysis!$A$2:$A,Analysis!$G$2:$G)="No",FILTER(FILTER(Import!$A$2:$CV,Import!$A$1:$CV$1=D$1),Import!$A$2:$A=0),IF(Import!E2:E=" "," ",ARRAYFORMULA(6-FILTER(FILTER(Import!$A$2:$CV,Import!$A$1:$CV$1=D$1),Import!$A$2:$A=0))))
Alternatively, I could delete the cells with 6, 6, 6,… but that would be very time-consuming for all questionnaires.
Thanks for your help!
The following is the simple pattern
=IF(ISBLANK(A1),,6-A1)
This if A1 is blank, the will return a blank, otherwise, will return the result of 6-A1.
To apply the above to an open-ended reference, nest the above pattern inside FILTER in the following way:
=FILTER(IF(NOT(ISBLANK(A:A)),6-A:A,),LEN(A:A))
Replace A:A by a single column of the imported data, or a formula that returns a column of values.

Generating rolling z-scores of panel data in Stata

I have an unbalanced panel data set (countries and years). For simplicity let's say I have one variable, x, that I am measuring. The panel data sorted first by country (a 3-digit numeric country-code) and then by year. I would like to write a .do file that generates a new variable, z_x, containing the standardized values of the variable x. The variables should be standardized by subtracting the mean from the preceding (exclusive) m time periods, and then dividing by the standard deviation from those same time periods. If this is not possible, return a missing value.
Currently, the code I am using to accomplish this is the following (edited now for clarity)
xtset weocountrycode year
sort weocountrycode year
local win_len = 5 // Defining rolling window length.
quietly: rolling sd_x=r(sd) mean_x=r(mean), window(`win_len') saving(stats_x, replace): sum x
use stats_x, clear
rename end year
save, replace
use all_data_PROCESSED_FINAL.dta, clear
quietly: merge 1:1 (weocountrycode year) using stats_x
replace sd_x = . if `x'[_n-`win_len'+1] == . | weocountrycode[_n-`win_len'+1] != weocountrycode[_n] // This and next line are for deleting values that rolling calculates when I actually want missing values.
replace mean_`x' = . if `x'[_n-`win_len'+1] == . | weocountrycode[_n-`win_len'+1] != weocountrycode[_n]
gen z_`x' = (`x' - mean_`x'[_n-1])/sd_`x'[_n-1] // calculate z-score
UPDATE:
My struggle with rolling is that when rolling is set up to use a window length 5 rolling mean, it automatically does window length 1,2,3,4 means for the first, second, third and fourth entries (when there are not 5 preceding entries available to average out). In fact, it does this in general - if the first non-missing value is on entry 5, it will do a length 1 rolling average on entry 5, length 2 rolling average on entry 6, ..... and then finally start doing length 5 moving averages on entry 9. My issue is that I do not want this, so I would like to avoid performing these calculations. Until now, I have only been able to figure out how to delete them after they are done, which is both inefficient and bothersome.
I tried adding an if clause to the -rolling- statement:
quietly: rolling sd_x=r(sd) mean_x=r(mean) if x[_n-`win_len'+1] != . & weocountrycode[_n-`win_len'+1] != weocountrycode[_n], window(`win_len') saving(stats_x, replace): sum x
But it did not fix the problem and the output is "weird" in the sense that
1) If `win_len' is equal to, say, 10, there are 15 missing values in the resulting z_x variable, instead of 9.
2) Even though there are "extra" missing values in z_x, the observations still start out as window length 1 means, then window length 2 means, etc. which makes no sense to me.
Which leads me to believe I fundamentally don't understand 1) what -rolling- is doing and 2) how an if clause works in the context of -rolling-.
Does this help?
Thanks!
I'm not sure I understand completely but I'll try to answer based on what I think your problem is, and based on a comment by #NickCox.
You say:
... when rolling is set up to use a window length 5 rolling mean...
if the first non-missing value is
on entry 5, it will do a length 1 rolling average on entry 5, length 2
rolling average on entry 6, ...
This is expected. help rolling states:
The window size refers to calendar periods, not the number of
observations. If there
are missing data (for example, because of weekends), the actual number of observations used by command may be less than
window(#).
It's not actually doing a "length 1 rolling average", but I get to that later.
Below some examples to see what rolling does:
clear all
set more off
*-------------------------- example data -----------------------------
set obs 92
gen dat = _n - 1
format dat %tq
egen seq = fill(1 1 1 1 2 2 2 2)
tsset dat
tempfile main
save "`main'"
list in 1/12, separator(4)
*------------------- Example 1. None missing ------------------------
rolling mean=r(mean), window(4) stepsize(4) clear: summarize seq, detail
list in 1/12, separator(0)
*------- Example 2. All but one value, missing in first window ------
use "`main'", clear
replace seq = . in 1/3
list in 1/8
rolling mean=r(mean), window(4) stepsize(4) clear: summarize seq, detail
list in 1/12, separator(0)
*------------- Example 3. All missing in first window --------------
use "`main'", clear
replace seq = . in 1/4
list in 1/8
rolling mean=r(mean), window(4) stepsize(4) clear: summarize seq, detail
list in 1/12, separator(0)
Note I use the stepsize option to make things much easier to follow. Because the date variable is in quarters, I set windowsize(4) and stepsize(4) so rolling is just computing averages by year. I hope that's easy to see.
Example 1 does as expected. No problem here.
Example 2 on the other hand, should be more interesting for you. We've said that what matters are calendar periods, so the mean is computed for the whole year (four quarters), even though it contains missings. There are three missings and one non-missing. summarize is computing the mean over the whole year, but summarize ignores missings, so it just outputs the mean of non-missings, which in this case is just one value.
Example 3 has missings for all four quarters of the year. Therefore, summarize outputs . (missing).
Your problem, as I understand it, is that when you face a situation like Example 2, you'd like the output to be missing. This is where I think Nick Cox's advice comes in. You could try something like:
rolling mean=r(mean) N=r(N), window(4) stepsize(4) clear: summarize seq, detail
replace mean = . if N != 4
list in 1/12, separator(0)
This says: if the number of non-missings for the window (r(N), also computed by summarize), is not the same as the window size, then replace it with missing.

"fill in the blanks"

I'm trying to make a simple "fill in the blanks" type of exam in django and would like to know what is the best way to design the database.
Example: "9 is the sum of 4 and 5, or 3 and 6."
During the exam, the above sentence would appear as "__ is the sum of __ and _, or _ and __."
Obviously there are unlimited number of answers to this question, but assume that the above numbers are the only answers. But the catch is that you can switch the places of 4 and 5, or the places of 3 and 6 and still get the right answer. Besides, the number of blanks is not known, so it can be 1 or more.
I would go with something like. First define a Question table:
Question
--------------------------
Id Text
1 9 is the sum of 4 and 5, or 3 and 6
...
Then save the position of the hidden substrings, let's call them fields, in another table:
QuestionField
--------------------------
Id QuestionId StartsAt EndsAt Set
1 1 0 1 1
2 1 16 17 2
3 1 22 23 2 # NOTE: Is in the same set as QuestionField #2
...
This table lets you retrieve the actual value of the field by querying the Question table (e.g. entry one refers to the value '9' in the first question).
The "Set" column contains an identifier of the "set" in which this field is, where fields in the same set can be replaced by each other. When you populate it, you would have to ensure that all questions that can be replaced by each other are in the same set. The actual number of the set doesn't matter, as long as it's unique. But it makes sense to have it equal to the ID of one of the elements of the set.