Matching number sequences in SQLite with random character separators - regex

I have an sqlite database which has number sequences with random separators. For example
_id data
0 123-45/678>90
1 11*11-22-333
2 4-4-5-67891
I want to be able to query the database "intelligently" with and without the separators. For example, both these queries returning _id=0
SELECT _id FROM myTable WHERE data LIKE '%123-45%'
SELECT _id FROM myTable WHERE data LIKE '%12345%'
The 1st query works as is, but the 2nd query is the problem. Because the separators appear randomly in the database there are too many combinations to loop through in the search term.
I could create two columns, one with separators and one without, running each query against each column, but the database is huge so I want to avoid this if possible.
Is there some way to structure the 2nd query to achieve this as is ? Something like a regex on each row during the query ? Pseudo code
SELECT _id
FROM myTable
WHERE REPLACEALL(data,'(?<=\\d)[-/>*](?=\\d)','') LIKE '%12345%'

Ok this is far from being nice, but you could straightforwardly nest the REPLACE function. Example:
SELECT _id FROM myTable
WHERE REPLACE(..... REPLACE(REPLACE(data,'-',''),'_',''), .... '<all other separators>','') = '12345'
When using this in practice (--not that I would recommend it, but at least its simple), you surely might wrap it inside a function.
EDIT: for a small doc on the REPLACE function, see here, for example.

If I get it right, is this what you want?
SELECT _id
FROM myTable
WHERE Replace(Replace(Replace(data, '?', ''), '/', ''), '-', '') LIKE '%12345%'

Related

MYSQL get substring

I'm trying to get substring dynamically and group by it. So if my uri column contains records like: /uri1/uri2 and /somelongword/someotherlongword I would like to get everything up to second delimiter, namely up to second / and count it. I'm using this query but obviously it is cutting string statically (6 letters after the first one).
SELECT substr(uri, 1, 6) as URI,
COUNT(*) as COUNTER
FROM staging
GROUP BY substr(uri, 1, 6)
ORDER BY COUNTER DESC
How can I achieve that?
You can use combination of SUBSTRING() and POSITION()
schema:
CREATE TABLE Table1
(`uri` varchar(10))
;
INSERT INTO Table1
(`uri`)
VALUES
('some/text'),
('some/text1'),
('some/text2'),
('aa/bb'),
('aa/cc'),
('bb/cc')
;
query
SELECT
SUBSTRING(uri,1,POSITION('/' IN uri)-1),
COUNT(*)
FROM Table1
GROUP BY SUBSTRING(uri,1,POSITION('/' IN uri)-1);
http://sqlfiddle.com/#!9/293dd3/3/0
edit: here I found amazon athena documentation: https://docs.aws.amazon.com/athena/latest/ug/presto-functions.html and here is the string function documentation: https://prestodb.io/docs/0.217/functions/string.html
my answer above still stands, but you might need to change SUBSTRING to SUBSTR
edit 2: it seems there's a special function to achieve this in amazon athena called SPLIT_PART()
query:
SELECT SPLIT_PART(uri, '/', 1), COUNT(*) FROM tbl GROUP BY SPLIT_PART(uri, '/', 1)
from docs:
split_part(string, delimiter, index) → varchar
Splits string on delimiter and returns the field index. Field indexes start with 1. If the index is larger than than the number of fields, then null is returned.

How to Extract Numeric Range from Text in SQL

I am fairly new to SQL and I'm trying to extract some data from an ORACLE DB. My goal is to return rows where the query value lies in the range speicified by the "AA_SYNTAX" column in a table.
For example:
If the "AA_SYNTAX" column is 'p.G12_V14insR' and my search value is 13, I want to return that row. This column is organized as p.%number_%number%. So basically I want to return the two numerical values from this column and see if my query value is between them.
I have all the tables I need joined together and everything, just not sure how to construct a query like this. I know in regex I would do something like "\d+" but im not sure how to translate this into SQL.
Thanks
Using Oracle, you can use Regular Expressions to extract a number from the string.
More specifically, I would look into REGEXP_SUBSTR.
Using the date given in your example above, you could use:
with cte as
(
select 'p.G12_V14insR' as AA_SYNTAX from dual
)
select
REGEXP_SUBSTR(AA_SYNTAX,'p\.[[:alpha:]]+([[:digit:]]+)', 1, 1, NULL, 1) as Bottom
,REGEXP_SUBSTR(AA_SYNTAX,'\_[[:alpha:]]+([[:digit:]]+)', 1, 1, NULL, 1) as Top
from cte
I'm sure you could clean up the Regular Expression quite a bit, but, given this, you get the value of 14 for Top and 12 for Bottom.
Hope this helps move you in the right direction.

RegEx in the select for DB2

I have a table with one column having a large json object in the format below. The column datatype is VARCHAR
column1
--------
{"key":"value",....}
I'm interested in the first value of the column data
in regex I can do it by .*?:(.*),.* with group(1) giving me the value
How can i use it in the select query
Don't do that, it's bad database design. Shred the keys and values to their own table as columns, or use the XML data type. XML would work fine because you can index the structure well, and you can use XPATH queries on the data. XPATH supports regexp natively.
You can use regular expression with xQuery, you just need to call the function matches from a SQL query or a FLORW query.
This is an example of how to use regular expressions from SQL:
db2 "with val as (
select t.text
from texts t
where xmlcast(xmlquery('fn:matches(\$TEXT,''^[A-Za-z 0-9]*$'')') as integer) = 0
)
select * from val"
For more information:
http://pic.dhe.ibm.com/infocenter/db2luw/v10r5/topic/com.ibm.db2.luw.xml.doc/doc/xqrfnmat.html
http://angocadb2.blogspot.fr/2014/04/regular-expressions-in-db2.html
DB2 doesn't have any built in regex functionality, unfortunately. I did find an article about how to add this with libraries:
http://www.ibm.com/developerworks/data/library/techarticle/0301stolze/0301stolze.html
Without regexes, this operation would be a mess. You could make a function that goes through the string character by character to find the first value. Or, if you will need to do more than this one operation, you could make a procedure that parses the json and throws it into a table of keys/values. Neither one sounds fun, though.
In DB2 for z/OS you will have to pass the variable to XMLQUERY with the PASSING option
db2 "with val as (
select t.text
from texts t
where xmlcast(xmlquery('fn:matches($TEXT,''^[A-Za-z 0-9]*$'')'
PASSING t.text as "TEXT") as integer) = 0
)
select * from val"

Splitting column values with Sybase and ColdFusion

I need to trim the date from a text string in the function call of my app.
The string comes out as text//date and I would like to trim or replace the date with blank space. The column name is overall_model and the value is ford//1911 or chevy//2011, but I need the date part removed so I can loop over the array or list to get an accurate count of all the models.
The problem is that if there is a chevy//2011 and a chevy//2010 I return two rows in my table because of the date. So if I can remove the date and loop over them I can get my results of chevy there are 23 chevy models.
I have not used Sybase in a while, but I remember its string functions are very similar to MS SQL Server.
If overall_model always contains "//", use charindex to return the position of the delimiter and substring to retrieve the "text" before it. Then combine it with a COUNT. (If the "//" is not always present, you will need to add a CASE statement as well).
SELECT SUBSTRING(overall_model, 1, CHARINDEX('/', overall_model)-1) AS Model
, COUNT(*) AS NumberOfRecords
FROM YourTable
GROUP BY SUBSTRING(overall_model, 1, CHARINDEX('/', overall_model)-1)
However, ideally the "text" and "date" should be stored separately. That would offer greater flexibility and generally better performance.

Extracting id number from a varchar in PostgreSQL

I have a field that contains mixed data with an id number that I want extract to another column. The column I wish to extract from has some records that match the format 'lastname, firstname-ID'. I only want to strip the 'ID' part, and from those columns who have a '-' and numbers following it.
So what I was trying to do was...
update data.xml_customerqueryrs
set new_id = regexp_replace(name, '[a-z]A-Z]', '')
where name like '%-%';
I know there is something minor that I need to fix, but I am not sure as the postgresql documentation for pattern matching doesn't really do a good job covering searching for only numerics.
If you actually only want to strip the 'ID' part:
new_id = regexp_replace(name, '-.*?$', '')
Or, if you, in fact, want to extract the ID part:
new_id = substring(name, '-(.*?)$')
I use the *? quantifier, so that only the last part is extracted, where a name has a - in it. Like:
Skeet-Gravell,John-1234
String functions in current manual
Or you can also do:
new_id = substr(name, strpos(name, '-'), length(name))
Ref: Strings