How to use CASE and SUBSTRING_INDEX in doctrine query? - doctrine-orm

I have a sql query which is succesfully running in the mysql server and got the output.
But I couldn't able to convert this query in the doctrine format.
Query is as below
SELECT (CASE WHEN seqnum < 10 THEN domain ELSE 'Others' END) as domain,
SUM(c)
FROM (SELECT SUBSTRING_INDEX(SUBSTR(email, INSTR(email, '#') + 1), '.', 1) as domain,
COUNT(*) as C,
ROW_NUMBER() OVER (ORDER BY COUNT(*) DESC) as seqnum
FROM newsletter_recipient
WHERE LENGTH(email) > 0
GROUP BY domain
) d
GROUP BY (CASE WHEN seqnum < 10 THEN domain ELSE 'Others' END)
ORDER BY SUM(c) DESC;
When i am using this in doctrine it give errors like
Expected known function, got 'SUBSTRING_INDEX'
Hope someone could help me to convert this query in doctrine format.

You either need to implement vendor-specific functions so that DQL can translate it into proper SQL calls, or in this simple case a combination of built-in function calls might be enough:
SUBSTRING(email, LOCATE('#', email) + 1, ...
For a full list of available cross-platform functions, see the docs.
Also I cannot resist to mention that the domain name without the TLDs might contain dots, for example you might look for mail.example in info#mail.example.org. Up to your specifications, though.

Related

MYSQL get substring

I'm trying to get substring dynamically and group by it. So if my uri column contains records like: /uri1/uri2 and /somelongword/someotherlongword I would like to get everything up to second delimiter, namely up to second / and count it. I'm using this query but obviously it is cutting string statically (6 letters after the first one).
SELECT substr(uri, 1, 6) as URI,
COUNT(*) as COUNTER
FROM staging
GROUP BY substr(uri, 1, 6)
ORDER BY COUNTER DESC
How can I achieve that?
You can use combination of SUBSTRING() and POSITION()
schema:
CREATE TABLE Table1
(`uri` varchar(10))
;
INSERT INTO Table1
(`uri`)
VALUES
('some/text'),
('some/text1'),
('some/text2'),
('aa/bb'),
('aa/cc'),
('bb/cc')
;
query
SELECT
SUBSTRING(uri,1,POSITION('/' IN uri)-1),
COUNT(*)
FROM Table1
GROUP BY SUBSTRING(uri,1,POSITION('/' IN uri)-1);
http://sqlfiddle.com/#!9/293dd3/3/0
edit: here I found amazon athena documentation: https://docs.aws.amazon.com/athena/latest/ug/presto-functions.html and here is the string function documentation: https://prestodb.io/docs/0.217/functions/string.html
my answer above still stands, but you might need to change SUBSTRING to SUBSTR
edit 2: it seems there's a special function to achieve this in amazon athena called SPLIT_PART()
query:
SELECT SPLIT_PART(uri, '/', 1), COUNT(*) FROM tbl GROUP BY SPLIT_PART(uri, '/', 1)
from docs:
split_part(string, delimiter, index) → varchar
Splits string on delimiter and returns the field index. Field indexes start with 1. If the index is larger than than the number of fields, then null is returned.

Remove duplicate substring from a string in oracle

I have strings like below in my table
2001,2452,2452,2421,2421,2495
2001,2483,2421,2421,2482
2001,2420,2421,2421,2425
2001,2420,2421,2421,2422
2001,2452,2452,2421,2421,2464
I want to remove the repeated numbers like 2452 and 2421 and show them only once in the data like
2001,2452,2421,2495
2001,2483,2421,2482
2001,2420,2421,2425
2001,2420,2421,2422
2001,2452,2421,2464
Has anyone done something like this? please let me know how to solve this
Thanks!
In Oracle SQL, You can use the hierarchy query and listagg as follows:
select str, listagg(str_distinct, ',') within group (order by 1) as distinct_str from
(select distinct str, regexp_substr(str,'[^,]+',1,column_value) str_distinct from cte
cross join table(
cast(multiset(
select level lvl
from dual
connect by level <= regexp_count(str, '[^,]+'))
as sys.odcivarchar2list)
) lvls)
group by str;
db<>fiddle for one of the input string.

why double colon will not work in case statement

I want to tag host names with '::' (this character) to be tagged as cloud and rest everything to 'not cloud'.
I tried using like operator, its not working, my result tags all the host names to not cloud
select a.department, count(host_name),
(CASE
WHEN host_name like '%::%' THEN 'Cloud'
ELSE 'Not cloud'
END) as cloud_instance
from
table a
Expected output:
If I have this expression '::' in my host name then it should appear as cloud.
Your current query does not make sense, because it invokes the COUNT() function, a table level function, along with individual row-level columns. I suspect that this is what you are trying to do:
SELECT
a.department,
COUNT(a.host_name) AS dept_cnt,
COUNT(CASE WHEN a.host_name LIKE '%::%' THEN 1 END) AS cloud_cnt,
COUNT(CASE WHEN a.host_name NOT LIKE '%::%' THEN 1 END) AS no_cloud_cnt
FROM yourTable a
GROUP BY
a.department;
Here we aggregate by department, and for each department turn out the total count, cloud, and non cloud counts.

Special character to query from latest timestamp sharded table in BigQuery

From
https://cloud.google.com/bigquery/docs/partitioned-tables:
you can shard tables using a time-based naming approach such as [PREFIX]_YYYYMMDD
This enables me to do:
SELECT count(*) FROM `xxx.xxx.xxx_*`
and query across all the shards. Is there a special notation that queries only the latest shard? For example say I had:
xxx_20180726
xxx_20180801
could I do something along the lines of
SELECT count(*) FROM `xxx.xxx.xxx_{{ latest }}`
to query xxx_20180801?
SINGLE QUERY INSPIRED BY Mikhail Berlyant:
SELECT count(*) as c FROM `XXX.PREFIX_*` WHERE _TABLE_SUFFIX IN ( SELECT
SUBSTR(MAX(table_id), LENGTH('PREFIX_') + 2)
FROM
`XXX.__TABLES_SUMMARY__`
WHERE
table_id LIKE 'PREFIX_%')
If you do care about cost (meaning how many tables will be scaned by your query) - the only way to do so is to do in two steps like below
First query
#standardSQL
SELECT SUBSTR(MAX(table_id), LENGTH('PREFIX') + 1)
FROM `xxx.xxx.__TABLES_SUMMARY__`
WHERE table_id LIKE 'PREFIX%'
Second Query
#standardSQL
SELECT COUNT(*)
FROM `xxx.xxx.PREFIX_*`
WHERE _TABLE_SUFFIX = '<result of first query>'
so, if result of first query is 20180801 so, second query will obviously look like below
#standardSQL
SELECT COUNT(*)
FROM `xxx.xxx.PREFIX_*`
WHERE _TABLE_SUFFIX = '20180801'
If you don't care about cost but rather need just result - you can easily combine above two queries into one - but - again - remember - even though result will be out of last table - cost will be as you query all table that match xxx.xxx.PREFIX_*
Forgot to mention (even though it should be obvious): of course when you have only COUNT(1) in your SELECT - the cost will be 0(zero) for both options - but in reality - most likely you will have something more valuable than just count(1)
I know this is a kind of an old thread but I was surprised why no one offers an answer using Variables.
"Héctor Neri" already mentioned this in the comments but I thought might be better to have an actual answer with a sample code posted.
#standardSQL
DECLARE SHARD_DATE STRING;
SET SHARD_DATE=(
SELECT MAX(REPLACE(table_name,'{TABLE}_',''))
FROM `{PRJ}.{DATASET}.INFORMATION_SCHEMA.TABLES`
WHERE table_name LIKE '{TABLE}_20%'
);
SELECT * FROM `{PRJ}.{DATASET}.{TABLE}_*`
WHERE _TABLE_SUFFIX = SHARD_DATE
Make sure to replace {PRJ}, {DATASET}, and {TABLE} values with your table location.
If you run this on BigQuery Web UI, you will see this message:
WARNING: Could not compute bytes processed estimate for script.
But you can see that variable properly reduce the table scan to the latest partition and does not cause any extra cost after running the script.

REGEX help needed in Oracle

How to get all the table names from the below Sql? My sql returns only the last table name.
with t as
(select 'select col1,
(select max(col3) from dd3) max_timestamp
from dd1,
dd2
where dd1.col1 = dd2.col1
and dd1.col1 in(select col1 from dd4)' sql_text from dual)
select regexp_substr(regexp_substr(upper(sql_text), '\sFROM\s*(\w|\.|_)*'), '(\w|_|\.)+', 1,2)
from t
Thanks,
DD.
This is a more of a regex question than an Oracle question.
If you can run the sql through REPLACE(REPLACE(sql,CHR(13),' '),CHR(10),NULL) to replace all newlines with a space, so that the query fits on a single line, here is regex that will return all the tables in group 1 (for the ones after FROM) and group 3 for subsequent items in a list:
/FROM ([A-Z0-9$#_]+)(,[\s]*([A-Z0-9$#_]+))*/gi
Having multiple groups is not ideal, so I would look at the full match instead, see https://regex101.com/r/OZUalH/1/ for an example (see full match on the right, where every match has from followed by one or more tables).
But let me warn you this is not going to be robust, as these valid FROM clause expressions are not handled:
"my_table"
MY_TABLE AS A
MY_TABLE AS "a"
etc...
If it were me, I would write a function to run the query through explain plan (execute immediate 'explain plan for ...') and extract the tables from the plan tables (or possibly using SYS.DBMS_XPLAN)