MYSQL get substring - amazon-athena

I'm trying to get substring dynamically and group by it. So if my uri column contains records like: /uri1/uri2 and /somelongword/someotherlongword I would like to get everything up to second delimiter, namely up to second / and count it. I'm using this query but obviously it is cutting string statically (6 letters after the first one).
SELECT substr(uri, 1, 6) as URI,
COUNT(*) as COUNTER
FROM staging
GROUP BY substr(uri, 1, 6)
ORDER BY COUNTER DESC
How can I achieve that?

You can use combination of SUBSTRING() and POSITION()
schema:
CREATE TABLE Table1
(`uri` varchar(10))
;
INSERT INTO Table1
(`uri`)
VALUES
('some/text'),
('some/text1'),
('some/text2'),
('aa/bb'),
('aa/cc'),
('bb/cc')
;
query
SELECT
SUBSTRING(uri,1,POSITION('/' IN uri)-1),
COUNT(*)
FROM Table1
GROUP BY SUBSTRING(uri,1,POSITION('/' IN uri)-1);
http://sqlfiddle.com/#!9/293dd3/3/0
edit: here I found amazon athena documentation: https://docs.aws.amazon.com/athena/latest/ug/presto-functions.html and here is the string function documentation: https://prestodb.io/docs/0.217/functions/string.html
my answer above still stands, but you might need to change SUBSTRING to SUBSTR
edit 2: it seems there's a special function to achieve this in amazon athena called SPLIT_PART()
query:
SELECT SPLIT_PART(uri, '/', 1), COUNT(*) FROM tbl GROUP BY SPLIT_PART(uri, '/', 1)
from docs:
split_part(string, delimiter, index) → varchar
Splits string on delimiter and returns the field index. Field indexes start with 1. If the index is larger than than the number of fields, then null is returned.

Related

Remove duplicate substring from a string in oracle

I have strings like below in my table
2001,2452,2452,2421,2421,2495
2001,2483,2421,2421,2482
2001,2420,2421,2421,2425
2001,2420,2421,2421,2422
2001,2452,2452,2421,2421,2464
I want to remove the repeated numbers like 2452 and 2421 and show them only once in the data like
2001,2452,2421,2495
2001,2483,2421,2482
2001,2420,2421,2425
2001,2420,2421,2422
2001,2452,2421,2464
Has anyone done something like this? please let me know how to solve this
Thanks!
In Oracle SQL, You can use the hierarchy query and listagg as follows:
select str, listagg(str_distinct, ',') within group (order by 1) as distinct_str from
(select distinct str, regexp_substr(str,'[^,]+',1,column_value) str_distinct from cte
cross join table(
cast(multiset(
select level lvl
from dual
connect by level <= regexp_count(str, '[^,]+'))
as sys.odcivarchar2list)
) lvls)
group by str;
db<>fiddle for one of the input string.

REGEX help needed in Oracle

How to get all the table names from the below Sql? My sql returns only the last table name.
with t as
(select 'select col1,
(select max(col3) from dd3) max_timestamp
from dd1,
dd2
where dd1.col1 = dd2.col1
and dd1.col1 in(select col1 from dd4)' sql_text from dual)
select regexp_substr(regexp_substr(upper(sql_text), '\sFROM\s*(\w|\.|_)*'), '(\w|_|\.)+', 1,2)
from t
Thanks,
DD.
This is a more of a regex question than an Oracle question.
If you can run the sql through REPLACE(REPLACE(sql,CHR(13),' '),CHR(10),NULL) to replace all newlines with a space, so that the query fits on a single line, here is regex that will return all the tables in group 1 (for the ones after FROM) and group 3 for subsequent items in a list:
/FROM ([A-Z0-9$#_]+)(,[\s]*([A-Z0-9$#_]+))*/gi
Having multiple groups is not ideal, so I would look at the full match instead, see https://regex101.com/r/OZUalH/1/ for an example (see full match on the right, where every match has from followed by one or more tables).
But let me warn you this is not going to be robust, as these valid FROM clause expressions are not handled:
"my_table"
MY_TABLE AS A
MY_TABLE AS "a"
etc...
If it were me, I would write a function to run the query through explain plan (execute immediate 'explain plan for ...') and extract the tables from the plan tables (or possibly using SYS.DBMS_XPLAN)

Compare column value against list of regex values stored in another table and update accordingly

I am new to Oracle programming.
I want to check the "msg" value of "Table1" against the "regex" values from "Table2".
If the regular expression matches as such, I want to update the respective "regex_id" in "Table1".
Usual query: SELECT 'match found' FROM DUAL WHERE REGEXP_LIKE('s 27', '^(s27|s 27)')
Table1
MSG REG_EXID
Ss27 ?
s27 ?
s28 ?
s29 ?
Table2
REGEX REG_EXID RELEVANCE
^(s27|s 27) 1 10
^(s29|s 29) 2 2
^(m28|m 28) 3 2
^(s27|s 27) 4 100
Taking the newly added "relevance" into account, with Oracle 11g you could try along
UPDATE Table1 T1
SET T1.reg_exID =
(SELECT DISTINCT
MAX(reg_exID) KEEP (DENSE_RANK FIRST ORDER BY relevance DESC) OVER (PARTITION BY regex)
FROM Table2
WHERE REGEXP_LIKE(T1.msg, regex)
)
;
See SQL Fiddle.
You could work along
UPDATE Table1
SET reg_exID = (SELECT reg_exID FROM Table2 WHERE REGEXP_LIKE(Table1.msg, regex));
Please keep in mind:
None of your current sample records will be updated as REGEX are case sensitive.
The above UPDATE will fail, if more than a single REGEX does match.
You could rewrite the current REGEX expressions along "^m ?28".
See it in action: SQL Fiddle (With some data added to actually show the effect.)
Please comment if and as clarification/adjustment is required.

Matching number sequences in SQLite with random character separators

I have an sqlite database which has number sequences with random separators. For example
_id data
0 123-45/678>90
1 11*11-22-333
2 4-4-5-67891
I want to be able to query the database "intelligently" with and without the separators. For example, both these queries returning _id=0
SELECT _id FROM myTable WHERE data LIKE '%123-45%'
SELECT _id FROM myTable WHERE data LIKE '%12345%'
The 1st query works as is, but the 2nd query is the problem. Because the separators appear randomly in the database there are too many combinations to loop through in the search term.
I could create two columns, one with separators and one without, running each query against each column, but the database is huge so I want to avoid this if possible.
Is there some way to structure the 2nd query to achieve this as is ? Something like a regex on each row during the query ? Pseudo code
SELECT _id
FROM myTable
WHERE REPLACEALL(data,'(?<=\\d)[-/>*](?=\\d)','') LIKE '%12345%'
Ok this is far from being nice, but you could straightforwardly nest the REPLACE function. Example:
SELECT _id FROM myTable
WHERE REPLACE(..... REPLACE(REPLACE(data,'-',''),'_',''), .... '<all other separators>','') = '12345'
When using this in practice (--not that I would recommend it, but at least its simple), you surely might wrap it inside a function.
EDIT: for a small doc on the REPLACE function, see here, for example.
If I get it right, is this what you want?
SELECT _id
FROM myTable
WHERE Replace(Replace(Replace(data, '?', ''), '/', ''), '-', '') LIKE '%12345%'

split columns by a delimiter in postgres

I have a large table key(keyid,data) . In this table data consists of a text separated by /.
Eg x/y/z . I wish to extract the 2nd field (in the example y) for all the values stored in datails column in the table.
I tried using these
dblp1=# select regexp_split_to_array((select key from keytable),'/') as key_split;
ERROR: more than one row returned by a subquery used as an expression
dblp1=# SELECT split_part((select key from keytable), '/', 2);
ERROR: more than one row returned by a subquery used as an expression
Both work on single string .
Pretty close. You need the function to be wrapped right around the column name, like so:
select split_part(key, '/', 2) from keytable;