Labeling variables after recoding them - stata

I would like to label the variables back to their original variable labels after I recode them in Stata. How can I accomplish this?
sysuse auto, clear
recode foreign (1=2 "Foreign") (0=1 "Domestic"), gen(foreign1)
drop foreign
rename foreign1 foreign
* label var foreign "Car type"
foreach var of varlist foreign {
local var_label: var label `var'
local var_label1: regexm("`var_label'", "\((.)+\)")
label var `var' "`var_label1'"
}

The solution with regexm() looks awkward to me, which is presumably part of the question.
In your example, there is a simple alternative that leaves the variable label intact:
sysuse auto, clear
replace foreign = 1 + foreign
label def origin 1 Domestic 2 Foreign, modify
. d foreign
Variable Storage Display Value
name type format label Variable label
-----------------------------------------------------------------------------------------
foreign byte %8.0g origin Car origin
This works too:
sysuse auto, clear
recode foreign (1=2 "Foreign") (0=1 "Domestic"), gen(foreign1)
_crcslbl foreign1 foreign
drop foreign
rename foreign1 foreign
d foreign
You are aware of the scope for saving the variable label as a local macro for safe-keeping.
(In general, 0-1 indicator variables are immensely more useful and natural statistically than 1-2 indicators, but I presume that you are just making up a reproducible example. If in doubt see e.g. https://www.stata-journal.com/article.html?article=dm0099 )

Related

Extract table name and columns from SQL schema

I need help to export table-name & columns from table schema (DDL) using regex.
CREATE TABLE todos (
id INTEGER NOT NULL,
user_id INTEGER NOT NULL,
team_id INTEGER NOT NULL,
title TEXT NOT NULL DEFAULT "Hello World!",
description TEXT NOT NULL UNIQUE,
UNIQUE (title),
PRIMARY KEY (id),
FOREIGN KEY (user_id) REFERENCES users (id),
FOREIGN KEY (team_id) REFERENCES teams (t_id)
ON UPDATE RESTRICT
ON DELETE RESTRICT
)
Table name
todos
2. Columns
id // as group 1 (column name)
INTEGER // as group 2 (column type)
NOT NULL // as group 3 (column nullable) empty if nothing
DEFAULT // as group 4 (default value for example "Hello World")
UNIQUE // as group 5 (column uniqueable) empty if nothing
Note: UNIQUE can be also on table level same as title column.
3. Primary key
id // as group 1 (primary key)
Table level: PRIMARY\sKEY\s+\(([^\)]+)\)
Column level: check below answer.
4. Foreign keys:
// first
user_id // as group 1 (foreign key)
users // as group 2 (reference table name)
id // as group 3 (reference primary)
// second
team_id // as group 1 (foreign key)
teams // as group 2 (reference table name)
t_id // as group 3 (reference primary)
ON UPDATE RESTRICT // as group 4
ON DELETE RESTRICT // as group 5
I've found a simple regex in [github] (https://github.com/yiisoft/yii2/issues/6351#issuecomment-91064631) but not support RESTRICT
/FOREIGN KEY\s+\(([^\)]+)\)\s+REFERENCES\s+([^\(^\s]+)\s*\(([^\)]+)\)/mi
Extract a table name:
CREATE\s+TABLE\s+([\w_]+)
Get column names:
\s+([\w_]+)[\s\w]+,
Get a primary key field:
\s*PRIMARY\s+KEY\s+\(([\w_]+)\)
Get foreign keys data:
\s*FOREIGN\s+KEY\s+\(([\w_]+)\)\s+REFERENCES\s+([\w_]+)\s+\(([\w_]+)\)
You can test it here (respectively):
https://regexr.com/59251
https://regexr.com/59254
https://regexr.com/5925a
https://regexr.com/594eb
The Regex are returning results into a named captured group, you can find the name if you look here (?'GREOUP-NAME'..myregex...). It makes it easier for you to reference them after a finished regex search, it will be easier to split them.
FULL SEARCH
((?'COLUMN_NAME'(?<=^\s\s)([[:lower:]]\w+))|(?'PRIMARY_KEY'(?<=PRIMARY\sKEY\s\()(\w+))|(?'TABLE_NAME'(?<=\bTABLE\s)(\w+)))
SPLIT SEARCH
Get table name:
(?'TABLE_NAME'(?<=\bTABLE\s)(\w+))
Get primary key:
(?'PRIMARY_KEY'(?<=PRIMARY\sKEY\s\()(\w+))
Get column name: This one is a little bit sloppy and will only capture columns that are lowercase. Since your text didn't have any tabs-characters. This was the best i could do but it's a bit risky.
(?'COLUMN_NAME'(?<=^\s\s)([[:lower:]]\w+))
You can run them here, regex101, and try it out.
Be aware that the regex is dependent on whatever regex-engine your are using. There are some shortcomings regarding standards, and some regex's might need to be translated to your engine. For ex. lookbehind is not supported on all engines.

How do i can create IR in apex oracle based on different table and column

IR based on PL/SQL Function Body returning SQL Query.
How do i can create Interactive reports based on multiple table and Deferent column name.
Exp :-
Select list item return three value
1 or 2 or 3
And the function return query basen on select list value
when Value equal 1
Select name, satate, country_id from cities
when value equal 2 Return
Select country, id from country
when value equal 3 Return
Select ocean,oc_id,from oceans
The three query return different column name and value.
Ok firstly, your question is poorly written. But from what I gather, you want an SQL query that returns different things based on an input.
I dont think you even need a plsql function body for this.
Simply do something like this:
SELECT * FROM
(SELECT name as name,
state as state,
country_id as id,
1 as value
FROM cities
UNION ALL
SELECT country as name,
NULL as state,
id as id,
2 as value
FROM country
UNION ALL
SELECT ocean as name,
NULL as state,
oc_id as id,
3 as value
FROM oceans)
WHERE value = :input_parameter_value;
Because if you are trying to display a variable number of columns and constantly changing their names and such. You are gonna have a bad time, it can be done, as can everything. But afaik its not exactly simple
No objections to what #TineO has said in their answer, I'd probably do it that way.
Though, yet another option: if your Apex version allows it, you can create three Interactive Report regions on the same page, each selecting values from its own table, keeping its own column labels.
Then create a server condition for each region; its type would be Function that returns a Boolean and look like
return :P1_LIST_ITEM = 1;
for the 1st region; = 2 for the 2nd and = 3 for the 3rd.
When you run the page, nothing would be displayed as P1_LIST_ITEM has no value. Once you set it, one of conditions would be met and appropriate region would be displayed.

Is it possible to sort a Cassandra Column Family by a specific column of a list of a user-defined datatype?

I'm having a little hard time understanding Cassandra. I simply couldn't write this question without making it look like confusing, but as I detail it below it may become clearer.
Suppose I have this datatype that I've created:
CREATE TYPE transaction (
transaction_id UUID,
value float,
transaction_date timestamp,
PRIMARY KEY (transaction_id, transaction_date)
);
PS: I'm using it as if it was a 'class', but that might be a logical mistake of mine, please correct me if it can't be used as such.
Anyway, also I have this Column Family, in which I've created a list of this 'transaction' datatype:
CREATE TABLE transactions_history_by_date (
wallet_address UUID,
user_id UUID,
transactions list <transaction>,
PRIMARY KEY (wallet_address, transaction_date))
WITH CLUSTERING ORDER BY (transaction_date DESC);
So what I'd like to know if this Column Family above is correct. I'd like to get all the transactions of a wallet, sorted by the transaction date (but the date is a column of the 'transaction' datatype - and to complicate it even more, in this Column Family there's a list of transactions, and not just a single one).
No, in Cassandra you can sort only on the value of the clustering column - in this case you need to move transaction_date into table itself...
To expand on Alex's answer, in your situation I think the best approach would probably be to denormalise your table. Rather than using a UDT, you could create something like this:
CREATE TABLE transactions_history_by_date (
wallet_address UUID,
user_id UUID,
transaction_id UUID,
value float,
transaction_date timestamp,
PRIMARY KEY ((wallet_address), transaction_date, transaction_id))
WITH CLUSTERING ORDER BY (transaction_date DESC);
Now you can make the following query and the results will be sorted by date:
SELECT * FROM transactions_history_by_date WHERE wallet_address = ...;
Note that I added transaction_id as a second clustering key. If this was omitted the table would not have been able to hold two transactions that had the same wallet_address and the same transaction_date. This is because unique rows are identified by the primary key.

How to store possible values of a variable in local macro?

I want to store the distinct values of a variable of my dataset in a local macro. I thought that there could be a way using a function as table and storing some r(). But I could not find any function with an useful r() that returns what I want.
As an example, I would like to find an expression to substitute in the code below and get as a return a local with Domestic Foreign
sysuse auto
table foreign
local foreign_unique_values = r(...)
As suggested by William Lisowski in comments, levelsof does this.
In my example code would be:
sysuse auto
levelsof foreign
local foreign_distinct_values = r(levels)
or with a categorical variable:
levelsof make
local make_distinct_values = r(levels)

How to avoid CTE or subquery in SQL?

Question
Say we have 1 as foo, and we want foo+1 as bar in SQL.
With CTE or subquery, like:-
select foo+1 as bar from (select 1 as foo) as abc;
We would get (in postgre which is what I am using):-
bar
-----
2
However, when I tried the following:-
select 1 as foo, foo+1 as bar;
The following error occurs:-
ERROR: column "foo" does not exist
LINE 1: select 1 as foo, foo+1 as bar;
^
Is there any way around this without the use of CTE or subquery?
Why do I ask?
I am using Django for a web service, to order and paginate objects in the database, I have to grab the count of the upvotes and downvotes and do some extra mathematical manipulation on those two values (ie. calculating the wilson score interval), where those two values are used multiple times.
All I can work with that I know of right now is the extra() function without breaking the ORM(?) [for example lazy queryset and prefetch_related() function].
Therefore I need a way to call those two values from somewhere instead of doing a SELECT multiple times when I calculate the score. (Or that's not the case in reality anyway?)
PS. Currently I am storing the vote count as database field and update them, but I already have a model of a vote, so it seems redundant and slow to update vote count and insert vote to database
No, you need the sub-query or CTE to do that. There is one alternative though: create a stored procedure.
CREATE FUNCTION wilson(upvote integer, downvote integer) RETURNS float8 AS $$
DECLARE
score float8;
BEGIN
-- Calculate the score
RETURN score;
END; $$ LANGUAGE plpgsql STRICT;
In your ORM you now call the function as part of your SELECT statement:
SELECT id, upvotes, downvotes, wilson(upvotes, downvotes) FROM mytable;
Also makes for cleaner code.