Extracting id number from a varchar in PostgreSQL - regex

I have a field that contains mixed data with an id number that I want extract to another column. The column I wish to extract from has some records that match the format 'lastname, firstname-ID'. I only want to strip the 'ID' part, and from those columns who have a '-' and numbers following it.
So what I was trying to do was...
update data.xml_customerqueryrs
set new_id = regexp_replace(name, '[a-z]A-Z]', '')
where name like '%-%';
I know there is something minor that I need to fix, but I am not sure as the postgresql documentation for pattern matching doesn't really do a good job covering searching for only numerics.

If you actually only want to strip the 'ID' part:
new_id = regexp_replace(name, '-.*?$', '')
Or, if you, in fact, want to extract the ID part:
new_id = substring(name, '-(.*?)$')
I use the *? quantifier, so that only the last part is extracted, where a name has a - in it. Like:
Skeet-Gravell,John-1234
String functions in current manual

Or you can also do:
new_id = substr(name, strpos(name, '-'), length(name))
Ref: Strings

Related

Regex for SQL slow query log analyzing

I am currently struggling with the following input:
# Time: 2022-06-01T20:00:00.000000Z
# User#Host: database[database] # [10.10.10.10] Id: 8888888
# Query_time: 0.000450 Lock_time: 0.000160 Rows_sent: 1 Rows_examined: 2
SET timestamp=1654715324;
SELECT id
FROM table_name
WHERE field = 'some-data' AND another_field != 'random-stuff'
ORDER BY field_2;
All my input data will look similar to this. Basically I want to check how many times a certain query shows up. Right now I am a little stuck because my regex cannot filter out the parameters between the single quotes.
I would like to match the following:
SELECT id
FROM table_name
WHERE field = '' AND another_field != ''
ORDER BY field_2;
I've managed to get the query from the input above with the following regExp, but right now this will only match the exact sql.
/(?<=\d;\n).+?(?=;)/gmi
I want to expand this regex so it will ignore anything between single quotes.
Help would be very much appreciated!

How to add a string on a specific string by using regex_replace method in Oracle

I am trying to add a string '_$' to a index name and a table name as follows. I need to use a method 'regexp_replace' in SELECT statement.
select regexp_replace(input_string......)
# Input
CREATE UNIQUE INDEX "SCOTT"."PK_EMP" ON "SCOTT"."EMP" ("EMP_NO")
# Desired Output
CREATE UNIQUE INDEX "SCOTT"."PK_EMP_$" ON "SCOTT"."EMP_$" ("EMP_NO")
Can you help me to build a regular expression for that?
Fairly brute solution would be using the following pattern:
(.*)(" ON ".*)(" \(.*)
with the following replace string:
\1_$\2_$\3
The pattern works by splitting the input in the places where you need to insert the _$ token, and then joining it back placing the tokens in the places we split the input:
CREATE UNIQUE INDEX "SCOTT"."PK_EMP|" ON "SCOTT"."EMP|" ("EMP_NO")
Full SELECT query would look like that:
SELECT REGEXP_REPLACE(
'CREATE UNIQUE INDEX "SCOTT"."PK_EMP" ON "SCOTT"."EMP" ("EMP_NO")',
'(.*)(" ON ".*)(" \(.*)',
'\1_$\2_$\3'
) RX
FROM dual;

Regex QueryString Parsing for a specific in BigQuery

So last week I was able to begin to stream my Appengine logs into BigQuery and am now attempting to pull some data out of the log entries into a table.
The data in protoPayload.resource is the page requested with the querystring paramters included.
The contents of protoPayload.resource looks like the following examples:
/service.html?device_ID=123456
/service.html?v=2&device_ID=78ec9b4a56
I am getting close, but when there is another entry before device_ID, I am not getting it. As you can see I am not great with Regex, but it is the only way I think I can parse the data in the query. To get just the device ID from the first example, I was able to use the following example. Works great. My next challenge is to the data when the second parameter exists. The device IDs can vary in length from about 10 to 26 characters.
SELECT
RIGHT(Regexp_extract(protoPayload.resource,r'[\?&]([^&]+)'),
length(Regexp_extract(protoPayload.resource,r'[\?&]([^&]+)'))-10) as Device_ID
FROM logs
What I would like is just the values from the querystring device_ID such as:
123456
78ec9b4a56
Assuming you have just 1 query string per record then you can do this:
SELECT REGEXP_EXTRACT(protoPayload.resource, r'device_ID=(.*)$') as device_id FROM mytable
The part within the parentheses will be captured and returned in the result.
If device_ID isn't guaranteed to be the last parameter in the string, then use something like this:
SELECT REGEXP_EXTRACT(protoPayload.resource, r'device_ID=([^\&]*)') as device_id FROM mytable
One approach is to split protoPayload.resource into multiple service entries, and then apply regexp - this way it will support arbitrary number of device_id, i.e.
select regexp_extract(service_entry, r'device_ID=(.*$)') from
(select split(protoPayload.resource, ' ') service_entry from
(select
'/service.html?device_ID=123456 /service.html?v=2&device_ID=78ec9b4a56'
as protoPayload.resource))

Matching number sequences in SQLite with random character separators

I have an sqlite database which has number sequences with random separators. For example
_id data
0 123-45/678>90
1 11*11-22-333
2 4-4-5-67891
I want to be able to query the database "intelligently" with and without the separators. For example, both these queries returning _id=0
SELECT _id FROM myTable WHERE data LIKE '%123-45%'
SELECT _id FROM myTable WHERE data LIKE '%12345%'
The 1st query works as is, but the 2nd query is the problem. Because the separators appear randomly in the database there are too many combinations to loop through in the search term.
I could create two columns, one with separators and one without, running each query against each column, but the database is huge so I want to avoid this if possible.
Is there some way to structure the 2nd query to achieve this as is ? Something like a regex on each row during the query ? Pseudo code
SELECT _id
FROM myTable
WHERE REPLACEALL(data,'(?<=\\d)[-/>*](?=\\d)','') LIKE '%12345%'
Ok this is far from being nice, but you could straightforwardly nest the REPLACE function. Example:
SELECT _id FROM myTable
WHERE REPLACE(..... REPLACE(REPLACE(data,'-',''),'_',''), .... '<all other separators>','') = '12345'
When using this in practice (--not that I would recommend it, but at least its simple), you surely might wrap it inside a function.
EDIT: for a small doc on the REPLACE function, see here, for example.
If I get it right, is this what you want?
SELECT _id
FROM myTable
WHERE Replace(Replace(Replace(data, '?', ''), '/', ''), '-', '') LIKE '%12345%'

How to prepare a C++ string for sql query

I have to prepare strings to be suitable for queries because these strings will be used in the queries as field values. if they contain a ' etc the sql query fails to execute.
I therefore want to replace ' with '' I have seen the code to find and replace a substring with a substring. but I guess the problem is a little tricky because replacing string also contains two single quotes '' replacing one quote ' so when I have to find the next occurance it would encounter a ' which was intentionally replaced.
I am using Sql lite C api and the example query might look like this
select * from persons where name = 'John' D'oe'
Since John Doe contain a ' the query will fail , so I want all occurances of ' in the name to replaced with ''
Any ideas how you guys prepares your field values in query to be used in sql ??? may be it's a basic thing but I am not too smart in C/C++.
your help would be very helpful
Use queries with arguments instead of replacing stuff, which could lead to several problems (like SQL injection vulnerabilities).
MySQL example:
sql::Connection *con = ...;
string query = "SELECT * FROM TABLE WHERE ID = ?";
sql::PreparedStatement *prep_stmt = con->prepareStatement(query);
prep_stmt->setInt(1, 1); // Replace first argument with 1
prep_stmt->execute();
This will execute SELECT * FROM TABLE WHERE ID = 1.
EDIT: more info for SQLite prepared statements here and here.
It depends on the SQL Library you are using. Some of them will have the concept of a PreparedStatement, which you will use question marks in place of the variables, then when you set those variables on the statement, it will internally ensure that you cannot inject sql commands.