Regex to find complete words at Postgresql - regex

I want to only get the records that have some words at one column, I have tried using WHERE ... IN (...) but Postgres is case sensitive in this where clause.
This is why I tried regex and ~* operator.
The following is a SQL snippet that returns all the columns and tables from the DB, I want to restrict the rows to bring only the tables in the regex expresion.
SELECT ordinal_position as COLUMN_ID, TABLE_NAME, COLUMN_NAME
FROM information_schema.columns
WHERE table_schema = 'public' and table_name ~* 'PRODUCTS|BALANCES|BALANCESBARCODEFORMATS|BALANCESEXPORTCATEGORIES|BALANCESEXPORTCATEGORIESSUB'
order by TABLE_NAME, COLUMN_ID
This regex will bring all the columns of BALANCES and the columns of the tables that contain the 'BALANCES' keyword.
I want to restrict the result to complete names only.

Using regexes, the common solution is using word boundaries before and after the current expression.
See effect without: http://regexr.com?35ecl
See effect with word boundaries: http://regexr.com?35eci
In PostgreSQL, the word boundaries are denoted by \y (other popular regex engines, such as PCRE, C# and Java, use \b instead - thus its use in the regex demo above - thanks #IgorRomanchenko).
Thus, for your case, the expression below could be used (the matches are the same as the example regexes in the links above):
'\y(PRODUCTS|BALANCES|BALANCESBARCODEFORMATS|BALANCESEXPORTCATEGORIES|BALANCESEXPORTCATEGORIESSUB)\y'
See demo of this expression in use here:
http://sqlfiddle.com/#!12/9f597/1

If you want to match only whole table_name use something like
'^(PRODUCTS|BALANCES|BALANCESBARCODEFORMATS|BALANCESEXPORTCATEGORIES|BALANCESEXPORTCATEGORIESSUB)$'
^ matches at the beginning of the string.
$ matches at the end of the string.
Details here.
Alternatively you can use something like:
upper(table_name) IN ('PRODUCTS','BALANCES','BALANCESBARCODEFORMATS','BALANCESEXPORTCATEGORIES', ...)
to make IN case insensitive.

Related

How to highlight SQL keywords using a regular expression?

I would like to highlight SQL keywords that occur within a string in a syntax highlighter. Here are the rules I would like to have:
Match the keywords SELECT and FROM (others will be added, but we'll start here). Must be all-caps
Must be contained in a string -- either starting with ' or "
The first word in that string (ignoring whitespace preceding it) should be one of the keywords.
This of course is not comprehensive (can ignore escapes within a string), but I'd like to start here.
Here are a few examples:
SELECT * FROM main -- will not match (not in a string)
"SELECT name FROM main" -- will match
"
SELECT name FROM main" -- will match
"""Here is a SQL statement:
SELECT * FROM main""" -- no, string does not start with a keyword (SELECT...).
The only way I thought to do it in a single regex would be with a negative lookbehind...but then it would not be fixed width, as we don't know when the string starts. Something like:
(?<=["']\s*(SELECT)\s*)(SELECT|FROM)
But this of course won't work:
Would something like this be possible to do in a single regex?
A suitable regular expression is likely to get pretty complex, especially as the rules evolve further. As others have noted, it may be worth considering using a parser instead. That said, here is one possible regex attempting to cover the rules mentioned so far:
(["'])\s*(SELECT)(?:\s+.*)?\s+(FROM)(?:\s+.*)?\1(?:[^\w]|$)
Online Demos
Debuggex Demo
Regex101 Demo
Explanation
As can be seen in the above visualisation, the regex looks for either a double or single quote at the start (saved in capturing group #1) and then matches this reference at the end via \1. The SELECT and FROM keywords are captured in capturing groups #2 and #3. (The (?:x|y) syntax ensures there aren't more groups for other choices as ?: at the start of a choice excludes it as a capturing group.) There are some further optional details such as limiting what is allowed between the SELECT and FROM and not counting the final quotation mark if it is immediately succeeded by a word character.
Results
SELECT * FROM tbl -- no match - not in a string
"SELECT * FROM tbl" -- matches - in a double-quoted string
'SELECT * FROM tbl;' -- matches - in a single-quoted string
'SELECT * FROM it's -- no match - letter after end quote
"SELECT * FROM tbl' -- no match - quotation marks don't match
'SELECT * FROM tbl" -- no match - quotation marks don't match
"select * from tbl" -- no match - keywords not upper case
'Select * From tbl' -- no match - still not all upper case
"SELECT col1 FROM" -- matches - even though no table name
' SELECT col1 FROM ' -- matches - as above with more whitespace
'SELECT col1, col2 FROM' -- matches - with multiple columns
Possible Improvement?
It might also be necessary to exclude quotation marks from the "any character" parts. This can be done at the expense of increased complexity using the technique described here by replacing both instances of .* with (?:(?!\1).)*:
(["'])\s*(SELECT)(?:\s+(?:(?!\1).)*)?\s+(FROM)(?:\s+(?:(?!\1).)*)?\1(?:[^\w]|$)
See this Regex101 Demo.
You could use capturing groups:
(.*["']\s*\K)(?(1)(SELECT|FROM).*(SELECT|FROM)|)
In this case $2 would refer to the first keyword and $3 would refer to the second keyword. This also only works if there are only two keywords and only one string on a line, which seems to be true in all of your examples, but if those restrictions don't work for you, let me know.
Just tested the regexp bellow:
If you need to add other commands the thing may get a little trick, because some keywords doesn't apply. Eg: ALTER TABLE mytable or UPDATE SET col = val;. For these scenarios you will need to create subgroups and the regexp may become slow.
Best regards!
If I understand your requirements well I suggest that:
/^'\s*(SELECT)[^']*(FROM)[^']*'|^"\s*(SELECT)[^"]*(FROM)[^"]*"/m
[Regex Fiddle Demo]
Explanation:
When you need to check start of a string; use ^.
When you need to accept 0-n spaces; use \s*.
When you need to accept new-line or multi-line strings; use m flag over your regex.
When you need to use Case-Sensitive mode; Don't use i flag over your regex.
When you need to block a string between a specific character like "; use [^"]* instead of .* that will protects first end of block.
When you need to have a block with similar start and end characters like ' & "; use ' '|" " instead of ['"] ['"].
Update:
If you need to capture any special keyword after verifying existence of SELECT keyword after start of your string, I can update my solution to this:
/^'\s*(SELECT)([^']*(SELECT|FROM))+|^"\s*(SELECT)([^"]*(SELECT|FROM))+/m
without parsing of quoted strings
could be done using \G and \K construct
(?:"\s*(?=(?:SELECT|FROM))|(?<!^)\G)[^"]*?\K(SELECT|FROM)
demo

RegEx works everywhere except in Pentaho RegEx Evaluation Step

I have a couple of RegEx that work on the online regex websites but not in Pentaho. Could you please help?
Here's the string:
:6585d0f0ba88767ac3b590f719596d864d73e9c1:
harmonicbalance/src/harmonicbalance/HarmonicBalanceFlowModel.cpp
harmonicbalance/src/harmonicbalance/HbFlutterModel.cpp
:8302994b565553c83a048b8905ae597349d99627:
emp/src/emp/PhasePairSingleParticleReynoldsNumber.h
emp/src/emp/TomiyamaDragCoefficientMethod.cpp
:9da194f17ec08bb20ad1be8df68b78ca137ab18a:
combustion/src/combustion/ReactingSpeciesTransportBasedModel.cpp
combustion/src/complexchemistry/TurbulentFlameClosure.cpp
:6a59f0be1e347a65e525e58742bb304639ea9bc4:
meshing/src/meshing/SurfaceMeshManipulation.cpp
physics/src/discretization/FvIndirectRegionInterfaceManager.cpp
physics/src/discretization/FvIndirectRegionInterfaceManager.h
physics/src/discretization/FvRepresentation.cpp
physics/src/discretization/FvRepresentation.h
:64b7f6d36b11b6cd94c20cad53463b7deef8c85a:
resourceclient/src/resourceclient/ResourcePool.cpp
resourceclient/src/resourceclient/ResourcePool.h
resourceclient/src/resourceclient/RestClient.cpp
resourceclient/src/resourceclient/RestClient.h
resourceclient/src/resourceclient/test/ResourcePoolTest.cpp
I would like to capture two groups. First group will extract all commit SHA1 and the other group would extract file names.
Below are the expressions I tried:
(?:^:([A-Za-z0-9]+):|(?!^)\G)\n+([A-Za-z/.-]+)
https://regex101.com/r/3IBkPz/1
^:(\w+):\s+((?:\s*(?!:)[^\s]+)+)
https://regex101.com/r/oIoDvM/1
Thoughts?
AFAIK (as of PDI-8.0), the Regex Evaluation step does NOT support the regex 'g' modifier, your regex pattern must cover all the text to be able to make a match.
For example: the following pattern will not match anything in Regex Evaluation step:
:([0-9a-f]+):\s+([^:]+)
but if I prepend .* to this pattern and pick "Enable dotall mode":
.*:([0-9a-f]+):\s+([^:]+)
it will match the last commit(sha1 + filenames). You can try move .* to the end of
the original pattern which will get you the first commit. So if you want to retrieve
the full list of commits(sha1 + filenames) with the g modifier, this step is
probably not a solution for you.
As the fields are basically split by colons ':' and new lines, you can probably try the following approach:
Use Split field to rows step, Delimiter=':' and include rownum in output, this rownum can be used to filter rows where even number is sha1 and odd number is filenames
Use Analytic Query step to create a new field with LEAD = 1, so now you can get sha1 and filenames in the same row
Use Calculator and Fileter step to calculate the remainer of rownum/2 and keep only rows with the odd number of rownum
Use Split fields to rows again to split filenames to filename using "\n"(Delimiter is a Regular Expression). you might want to filter out the EMPTY filename, since the delimiter only support one char

Postgresql regular expression with semi colon

I want to use regular expression to split String values from a field.
Here is something to follow my question
mydatabase=> SELECT regexp_replace('a1=1,2;B2b=2,3,4;C3c={3,4,5;4,5,6};D4d={4,5,6;7,8,9}',
'([^0-9]|^)([=.*])(?=;|$)', '\1 \2', 'g');
regexp_replace
------------------------------------------------------
a1=1,2;B2b=2,3,4;C3c={3,4,5;4,5,6};D4d={4,5,6;7,8,9}
(1 row)
But I want the result like below
mydatabase=>YOUR_ANSER_QUERY
regexp_replace
------------------
a1=1
B2b=2,3,4
C3c={3,4,5;4,5,6}
D4d={4,5,6;7,8,9}
(4 rows)
You have semi-colon within your brackets. to escape them, i have added (?![0-9]) a negative look-ahead so specified pattern not exist. to separate them into table
This should do it:
SELECT regexp_split_to_table( 'a1=1,2;B2b=2,3,4;C3c={3,4,5;4,5,6};D4d={4,5,6;7,8,9}', ',?[1-2]?(;(?![0-9]))');
I used online regex replace verifier at http://regexr.com
The regex to search for is ([^=]+)=(\{[^\}]+\}|[^;]+)(?:;|$)
and the regex to replace is $1=$2\r.
For your input string they give the required result.
Note that this verifier requires $ sign (+ number) to refer to a capturing group.

REGEXP_LIKE in Oracle

I have a query which I was using in an Access database to match a field. The rows I wish to retrieve have a field which contains a sequence of characters in two possible forms (case-insensitive):
*PO12345, 5 digits preceded by *PO, or
PO12345, 5 digits preceded by PO.
In Access I achieved this with:
WHERE MyField LIKE '*PO#####*'
I have tried to replicate the query for use in an Oracle database:
WHERE REGEXP_LIKE(MyField, '/\*+PO[\d]{5}/i')
However, it doesn't return anything. I have tinkered with the Regex slightly, such as placing brackets around PO, but to no avail. To my knowledge what I have written is correct.
Your regex \*+PO[\d]{5} is wrong. There shouldn't be + after \* as it's optional.
Using ? like this /\*?PO\d{5}/i solves the problem.
Use i (case insensitive) as parameter like this: REGEXP_LIKE (MyField, '^\*?PO\d{5}$', 'i');
Regex101 Demo
Read REGEXP_LIKE documentation.

PHP - Regex for prepending table names within SQL

I am looking for an unobtrusive way to find and replace table names based on their position in an SQL query.
Example:
$query = 'SELECT t1.id, t1.name, t2.country FROM users AS t1, country AS t2 INNER JOIN another_table AS t3 ON t3.user_id = t1.id';
I essentially need to prepend client name abbreviations to table names and then have my CMS handle that change. So, going from 'users' to 'so_users' (If Stack Overflow was a client) but not have to add curly braces around all query table names like Drupal. An example is how WordPress will allow you on setup to prepend table names, but the way WordPress handles this issue is not ideal for my means.
For my example I want the output of some method to be:
$query = 'SELECT t1.id, t1.name, t2.country FROM so_users AS t1, so_country AS t2 INNER JOIN so_another_table AS t3 ON t3.user_id = t1.id';
('so_' in prepended to table names)
Thank you.
Kris
Using a query builder class would be the best solution, as you don't want to make any assumption about the pattern you want to replace with regex. If you don't find any existing library suitable for your particular need, roll out your own. It's not hard to make a simple query builder.
Regex does not have the power to parse SQL. Think of constructions like:
SELECT 'SELECT * FROM users';
SELECT * FROM users; -- users
SELECT '* -- users' FROM users;
SELECT '\' FROM users; -- '; -- differs in My/Pg vs others
SELECT users.name FROM country AS users; -- or without AS
SELECT users(name) FROM country; -- users() is procedure
SELECT "users"."name" FROM users; -- or ` on MySQL, [] in TSQL
and so on. To parse SQL you need a proper SQL parser library; trying to hack it after the fact in regex will only make weird mistakes.
This should work for your given example.
A word of caution though,as others have mentioned allready, Regexes are not the best tool for what you need. Given regex works for your example, nothing more, nothing less. There are lots of SQL constructions imaginable where this regex will not make the replacements you need.
$result = preg_replace('/(FROM|JOIN|,) ([_\w]*) (AS)/m', '$1 so_$2 $3', $subject);
# (FROM|JOIN|,) ([_\w]*) (AS)
#
# Match the regular expression below and capture its match into backreference number 1 «(FROM|JOIN|,)»
# Match either the regular expression below (attempting the next alternative only if this one fails) «FROM»
# Match the characters “FROM” literally «FROM»
# Or match regular expression number 2 below (attempting the next alternative only if this one fails) «JOIN»
# Match the characters “JOIN” literally «JOIN»
# Or match regular expression number 3 below (the entire group fails if this one fails to match) «,»
# Match the character “,” literally «,»
# Match the character “ ” literally « »
# Match the regular expression below and capture its match into backreference number 2 «([_\w]*)»
# Match a single character present in the list below «[_\w]*»
# Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
# The character “_” «_»
# A word character (letters, digits, etc.) «\w»
# Match the character “ ” literally « »
# Match the regular expression below and capture its match into backreference number 3 «(AS)»
# Match the characters “AS” literally «AS»