Stata equivalent of SQL's 'over(partition by id)

Stata equivalent of SQL's 'over(partition by id) - stata

I'm a SQL and R user who is just starting to use Stata. I'm trying to find an equivalent Stata program/code of doing the following in SQL:
SQL code:
MAX(VAR_A) OVER(PARTITION BY ID) AS MAX_VAR_A
I have tried foreach (perhaps wrong for the task) command but no luck.

Related

Query for listing Datasets and Number of tables in Bigquery

So I'd like make a query that shows all the datasets from a project, and the number of tables in each one. My problem is with the number of tables.
Here is what I'm stuck with :
SELECT
smt.catalog_name as `Project`,
smt.schema_name as `DataSet`,
( SELECT
COUNT(*)
FROM ***DataSet***.INFORMATION_SCHEMA.TABLES
) as `nbTable`,
smt.creation_time,
smt.location
FROM
INFORMATION_SCHEMA.SCHEMATA smt
ORDER BY DataSet
The view INFORMATION_SCHEMA.SCHEMATA lists all the datasets from the project the query is executed, and the view INFORMATION_SCHEMA.TABLES lists all the tables from a given dataset.
The thing is that the view INFORMATION_SCHEMA.TABLES needs to have the dataset specified like this give the tables informations : dataset.INFORMATION_SCHEMA.TABLES
So what I need is to replace the *** DataSet*** by the one I got from the query itself (smt.schema_name).
I am not sure if I can do it with a sub query, but I don't really know how to manage to do it.
I hope I'm clear enough, thanks in advance if you can help.

You can do this using some procedural language as follows:
CREATE TEMP TABLE table_counts (dataset_id STRING, table_count INT64);
FOR record IN
(
SELECT
catalog_name as project_id,
schema_name as dataset_id
FROM `elzagales.INFORMATION_SCHEMA.SCHEMATA`
)
DO
EXECUTE IMMEDIATE
CONCAT("INSERT table_counts (dataset_id, table_count) SELECT table_schema as dataset_id, count(table_name) from ", record.dataset_id,".INFORMATION_SCHEMA.TABLES GROUP BY dataset_id");
END FOR;
SELECT * FROM table_counts;
This will return something like:

Referencing a macro variable created by a prompt SAS EG

I created a prompt in SAS EG that takes a text input and creates the macro variable called 'variableName'.
I am trying to reference this macro variable like so:
proc sql;
create table MyTable as
select * from Source_Table as a
where a.field = &variableName ;
This gives me an error that says: "Syntax error, expecting one of the following: a name, a quoted string, a numeric constant, a datetime constant, a missing value, BTRIM, INPUT, PUT, SUBSTRING, USER."
I have also tried enclosing &variableName in single and double quotes but when I do that I just don't get any results.
I am able to reference the prompt when I use query builder and filter data based on the prompt, but I am trying to use the prompt's value in calculated expressions, etc. and in queries I write without query builder. How can i reference the variable I created in the prompt??
Edit: code with a value that the macro variable would have
proc sql;
create table MyTable as
select * from Source_Table as a
where a.field = 'NAME OF PERSON';
When I run that, I get the results I want.

It needs to resolve to valid SAS code. Assuming &variableName is a string, then it would be something like:
proc sql;
create table MyTable as
select * from Source_Table as a
where a.field = "&variableName." ;
If this isn't working, please show a query that does work with the same value as the macro variable would have. And then we can suggest how to change your code.
Edit: based on your comment you do not have the prompt connected to your query. Right click the query and link the prompt to your query and it will run before the query to provide the value.

Teiid execute immediate gives a parsing error when executing long queries

I'm using virtual procedures to expose a REST API using teiid. In my virtual procedure i am using execute immediate to execute SQL queries which takes input parameters from the virtual procedure as filters to the where clause (dynamic where cluase). This works fine for small select queries but when the query length is above a particular length it gives an parsing error.
Is there any solution for this problem?
Is there any alternative way of implementing dynamic where clauses in my SQL query?
Lets assume that the fallowing query has around 4000 characters. this works fine.
CREATE VIRTUAL PROCEDURE GetVals(IN filters string) RETURNS (json clob) OPTIONS (UPDATECOUNT 0, "REST:METHOD" 'GET', "REST:URI" 'get_vals')
AS
BEGIN execute immediate
'SELECT JSONOBJECT(JSONARRAY_AGG(JSONOBJECT(
col1,
col2,
col3,
col4,
col5,
col6,
....
....
)) as "data"
) as json FROM(
SELECT SUM((CASE
WHEN ((CASE
.....
....
.....
FROM ex_table AS ex
JOIN table1
ON ...
.....
WHERE a=b AND ' || filters || '
GROUP BY col)
) AB';
END
But as soon as I add more lines into above SQL query then it give an parsing error login an arbitrary line. There is nothing wrong with the syntax of my query. The only change I make is the length of the query adding more lines into it(eg. In my SELECT statement If I select onemore extra column this gives an parsing error)This happens only when I am using execute immediate to execute queries

What version or Teiid are you using? And what is your parsing exception?
If it is due to truncation, then you'll need to use a 9.1 or later release, which allows for longer evaluated sql strings - https://issues.jboss.org/browse/TEIID-4376

Date ranges in where clause of a proc SQL statement

There is a large table containing among other fields the following:
ID, effective_date, Expiration_date.
expiration_date is datetime20. format, and can be NULL
I'm trying to extract rows that expire after Dec 31, 2014 or do not expire (NULL).
Adding the following where statement to the proc sql query gives me no results
where coalesce(datepart(expiration_date),input('31/Dec/2020',date11.))
> input('31/Dec/2014',date11.);
However, when I only select NULL expiration dates and add the following fields:
put(coalesce(datepart(expiration_date),input('31/Dec/2020',date11.)),date11.) as value,
put(input('31/Dec/2014',date11.),date11.) as threshold,
case when coalesce(datepart(expiration_date),input('31/Dec/2020',date11.)) > input('31/Dec/2014',date11.)
then 'pass' else 'fail' end as tag
It shows 'pass' under TAG and all the other fields are correct.
This is an effort to duplicate what I used in SQL Server
where isnull(expiration_date,'9999-12-31') > '2014-12-31'
Using SAS Enterprise Guide 7.1 and while trying to figure it out I've been using
proc sql inobs=100;`
What am I doing wrong ? Thank you.
Some Expiration Dates:
30OCT2015:00:00:00
30OCT2015:00:00:00
29OCT2015:00:00:00
30OCT2015:00:00:00

I would recommend using a date constant ("31DEC2014"d) rather than date functions, or else either use explicit passthrough or disable implicit passthrough. Date functions are challenging when going between databases and so avoiding them when possible is best.

mysql.connector, multi=True, sql variable assignment not working

SQL code (all in one file that is eventually saved in the python variable "query"):
select #dtmax:=DATE_FORMAT(max(dt), '%Y%m') from table_A;
delete from table_B where DATE_FORMAT(dt, '%Y%m')=#dtmax;
Does mysql-connector allow the use of variable assignment like I've done in the query above. i.e. take the value of max(date) from TABLE_A and delete everything with that date from TABLE_B.
python code:
c = conn.cursor(buffered=True)
c.execute(query, multi=True)
conn.commit()
conn.close()
All I know is that the 2nd SQL statement doesnt execute.
I can copy and paste the SQL code into Toad and run it there without any problems but not through mysql.connector library. I would have used pandas but this is legacy script written by someone else and I don't have time to re-write everything.
I kindly appreciate some help.

When you use multi=True, then execute() will return a generator. You need to iterate over that generator to actually advance the processing to the next sql statement in your multi-statement query:
c = conn.cursor(buffered=True)
results = c.execute(query, multi=True)
for cur in results:
print('cursor:', cur)
if cur.with_rows:
print('result:', cur.fetchall())
conn.commit()
conn.close()
cur.with_rows will be True if there are results to fetch for the current statement.
This is all described in the documentation of MySQLCursor.execute()

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Stata equivalent of SQL's 'over(partition by id) - stata

I'm a SQL and R user who is just starting to use Stata. I'm trying to find an equivalent Stata program/code of doing the following in SQL: SQL code: MAX(VAR_A) OVER(PARTITION BY ID) AS MAX_VAR_A I have tried foreach (perhaps wrong for the task) command but no luck.

Related

Query for listing Datasets and Number of tables in Bigquery

Referencing a macro variable created by a prompt SAS EG

Teiid execute immediate gives a parsing error when executing long queries

Date ranges in where clause of a proc SQL statement

mysql.connector, multi=True, sql variable assignment not working

Categories

Resources