why double colon will not work in case statement - amazon-athena

I want to tag host names with '::' (this character) to be tagged as cloud and rest everything to 'not cloud'.
I tried using like operator, its not working, my result tags all the host names to not cloud
select a.department, count(host_name),
(CASE
WHEN host_name like '%::%' THEN 'Cloud'
ELSE 'Not cloud'
END) as cloud_instance
from
table a
Expected output:
If I have this expression '::' in my host name then it should appear as cloud.

Your current query does not make sense, because it invokes the COUNT() function, a table level function, along with individual row-level columns. I suspect that this is what you are trying to do:
SELECT
a.department,
COUNT(a.host_name) AS dept_cnt,
COUNT(CASE WHEN a.host_name LIKE '%::%' THEN 1 END) AS cloud_cnt,
COUNT(CASE WHEN a.host_name NOT LIKE '%::%' THEN 1 END) AS no_cloud_cnt
FROM yourTable a
GROUP BY
a.department;
Here we aggregate by department, and for each department turn out the total count, cloud, and non cloud counts.

Related

redshift - Not able to apply listagg function

I am getting error when trying to use listagg function.
Query
select
a.user_name,
listagg(a.group_name::text)
within group (order by a.group_name) as group_name
from (
SELECT
usename as user_name,
groname as group_name
FROM
pg_user
join
pg_group
on
pg_user.usesysid = ANY(pg_group.grolist) AND
pg_group.groname in (SELECT DISTINCT pg_group.groname from pg_group)
)a
group by user_name
Error
[Code: 500310, SQL State: XX000] Amazon Invalid operation: One or more of the used functions must be applied on at least one user created tables. Examples of user table only functions are LISTAGG, MEDIAN, PERCENTILE_CONT, etc;
None of the value is null.
Just like there are some functions that can only be run on the leader node there are some that can only be run on compute nodes - listagg() is one of these. If you need to run listagg() on leader data there are a few approaches you can use: (sorry I'm not on a cluster now so cannot test these directly - I saw your question was aging and thought I'd get you started. Grain of salt as I also cannot directly observe your issue but I think I've know what is going on.)
You can use a cursor to save the data from the leader node and use
this as the source for listagg(). A stored procedure can
streamline this. There are examples of this on stackoverflow.
You can make a temp table out of the leader node data and use this
in listagg() but I expect you will need to exit(unload) and
reenter(copy) the cluster to do this.
There just isn't a direct path from leader-node-only results to the compute nodes without some sort of this kind of push-up. Consequence of the large networked cluster architecture of Redshift.
UPDATE
I got some cluster time and there are several unexpected issues with this one. grolist is an array type that isn't generally support cluster wide and the need to user pg_group as source are key ones. So this is going to require #1 AND #2 from above.
The process goes like this:
Define cursor to hold the result of the pg_user / pg_group join select statement
Move cursor results to temp table
Use temp table as source to outer (list_agg()) select
A stored procedure can be written to do #1 and #2 which streamlines things. So you end up with the following SQL:
CREATE OR REPLACE procedure make_user_group()
AS
$$
DECLARE
row record;
BEGIN
create temp table user_group (user_name varchar(256),group_name varchar(256));
for row in SELECT
usename::text as user_name,
groname::text as group_name
FROM
pg_user
join
pg_group
on
pg_user.usesysid = ANY(pg_group.grolist) AND
pg_group.groname in (SELECT DISTINCT pg_group.groname from pg_group)
LOOP
INSERT INTO user_group(user_name,group_name) VALUES (row.user_name,row.group_name);
END LOOP;
END;
$$ LANGUAGE plpgsql;
call make_user_group();
select
user_name,
listagg(group_name::text, ', ')
within group (order by group_name) as group_name
from user_group
group by user_name;
Clearly the stored procedure only needs to be created once but called every time the temp table needs to be created.

How to use CASE and SUBSTRING_INDEX in doctrine query?

I have a sql query which is succesfully running in the mysql server and got the output.
But I couldn't able to convert this query in the doctrine format.
Query is as below
SELECT (CASE WHEN seqnum < 10 THEN domain ELSE 'Others' END) as domain,
SUM(c)
FROM (SELECT SUBSTRING_INDEX(SUBSTR(email, INSTR(email, '#') + 1), '.', 1) as domain,
COUNT(*) as C,
ROW_NUMBER() OVER (ORDER BY COUNT(*) DESC) as seqnum
FROM newsletter_recipient
WHERE LENGTH(email) > 0
GROUP BY domain
) d
GROUP BY (CASE WHEN seqnum < 10 THEN domain ELSE 'Others' END)
ORDER BY SUM(c) DESC;
When i am using this in doctrine it give errors like
Expected known function, got 'SUBSTRING_INDEX'
Hope someone could help me to convert this query in doctrine format.
You either need to implement vendor-specific functions so that DQL can translate it into proper SQL calls, or in this simple case a combination of built-in function calls might be enough:
SUBSTRING(email, LOCATE('#', email) + 1, ...
For a full list of available cross-platform functions, see the docs.
Also I cannot resist to mention that the domain name without the TLDs might contain dots, for example you might look for mail.example in info#mail.example.org. Up to your specifications, though.

REGEX help needed in Oracle

How to get all the table names from the below Sql? My sql returns only the last table name.
with t as
(select 'select col1,
(select max(col3) from dd3) max_timestamp
from dd1,
dd2
where dd1.col1 = dd2.col1
and dd1.col1 in(select col1 from dd4)' sql_text from dual)
select regexp_substr(regexp_substr(upper(sql_text), '\sFROM\s*(\w|\.|_)*'), '(\w|_|\.)+', 1,2)
from t
Thanks,
DD.
This is a more of a regex question than an Oracle question.
If you can run the sql through REPLACE(REPLACE(sql,CHR(13),' '),CHR(10),NULL) to replace all newlines with a space, so that the query fits on a single line, here is regex that will return all the tables in group 1 (for the ones after FROM) and group 3 for subsequent items in a list:
/FROM ([A-Z0-9$#_]+)(,[\s]*([A-Z0-9$#_]+))*/gi
Having multiple groups is not ideal, so I would look at the full match instead, see https://regex101.com/r/OZUalH/1/ for an example (see full match on the right, where every match has from followed by one or more tables).
But let me warn you this is not going to be robust, as these valid FROM clause expressions are not handled:
"my_table"
MY_TABLE AS A
MY_TABLE AS "a"
etc...
If it were me, I would write a function to run the query through explain plan (execute immediate 'explain plan for ...') and extract the tables from the plan tables (or possibly using SYS.DBMS_XPLAN)

PL/SQL regexp_like filters

I want to delete some tables and wrote this procedure:
set serveroutput on
declare
type namearray is table of varchar2(50);
total integer;
name namearray;
begin
--select statement here ..., please see below
total :=name.count;
dbms_output_line(total);
for i in 1 .. total loop
dbms_output.put_line(name(i));
-- execute immediate 'drop table ' || name(i) || ' purge';
End loop;
end;
/
The idea is to drop all tables with table name having pattern like this:
ERROR_REPORT[2 digit][3 Capital characters][10 digits]
example: ERROR_REPORT16MAY2014122748
However, I am not able to come up with the correct regexp. Below are my select statements and results:
select table_name bulk collect into name from user_tables where regexp_like(table_name, '^ERROR_REPORT[0-9{2}A-Z{3}0-9{10}]');
The results included all the table names I needed plus ERROR_REPORT311AUG20111111111. This should not be showing up in the result.
The follow select statement showed the same result, which meant the A-Z{3} had no effect on the regexp.
select table_name bulk collect into name from user_tables where regexp_like(table_name, '^ERROR_REPORT[0-9{2}0-9{10}]');
My question is what would be the correct regexp, and what's wrong with mine?
Thanks,
Alex
Correct regex is
'^ERROR_REPORT[0-9]{2}[A-Z]{3}[0-9]{10}'
I think this regex should work:
^ERROR_REPORT[0-9]{2}[A-Z]{3}[0-9]{10}
However, please check the regex101 link. I've assumed that you need 2 digits after ERROR_REPORT but your example name shows 3.

How to tweak LISTAGG to support more than 4000 character in select query?

Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production.
I have a table in the below format.
Name Department
Johny Dep1
Jacky Dep2
Ramu Dep1
I need an output in the below format.
Dep1 - Johny,Ramu
Dep2 - Jacky
I have tried the 'LISTAGG' function, but there is a hard limit of 4000 characters. Since my db table is huge, this cannot be used in the app. The other option is to use the
SELECT CAST(COLLECT(Name)
But my framework allows me to execute only select queries and no PL/SQL scripts.Hence i dont find any way to create a type using "CREATE TYPE" command which is required for the COLLECT command.
Is there any alternate way to achieve the above result using select query ?
You should add GetClobVal and also need to rtrim as it will return delimiter in the end of the results.
SELECT RTRIM(XMLAGG(XMLELEMENT(E,colname,',').EXTRACT('//text()')
ORDER BY colname).GetClobVal(),',') from tablename;
if you cant create types (you can't just use sql*plus to create on as a one off?), but you're OK with COLLECT, then use a built-in array. There's several knocking around in the RDBMS. run this query:
select owner, type_name, coll_type, elem_type_name, upper_bound, length
from all_coll_types
where elem_type_name = 'VARCHAR2';
e.g. on my db, I can use sys.DBMSOUTPUT_LINESARRAY which is a varray of considerable size.
select department,
cast(collect(name) as sys.DBMSOUTPUT_LINESARRAY)
from emp
group by department;
A derivative of #anuu_online but handle unescaping the XML in the result.
dbms_xmlgen.convert(xmlagg(xmlelement(E, name||',')).extract('//text()').getclobval(),1)
For IBM DB2, Casting the result to a varchar(10000) will give more than 4000.
select column1, listagg(CAST(column2 AS VARCHAR(10000)), x'0A') AS "Concat column"...
I end up in another approach using the XMLAGG function which doesn't have the hard limit of 4000.
select department,
XMLAGG(XMLELEMENT(E,name||',')).EXTRACT('//text()')
from emp
group by department;
You can use:
SELECT department
, REGEXP_REPLACE(XMLCAST(XMLAGG(XMLELEMENT(x, name, ',')) AS CLOB), ',$')
FROM emp
GROUP BY department
it will return CLOB that has no size limit, handles correctly XML entity escapes and separators.
Instead of REGEXP_REPLACE(..., ',$')) you can use RTRIM(..., ','), which should be faster, but will remove all separators from the end of the result (including those that can appear in name at the end, or previous ones if last names are empty).