Postgres regular expression to find procedures with wrong Update syntax - regex

My Postgres database was migrated from MySQL using a tool and the code base has lot of syntax issues.
One of the issues is with the UPDATE statements inside procedures where the column name contains alias name as below.
UPDATE table1 t1 SET t1.col1 = 'some_value';
Having alias name after SET keyword as in t1.col1 is a wrong syntax in Postgres.
As the number of procedures is huge, I'm trying to write a regular expression to find which procedures have this pattern.
select proname, prosrc from pg_proc
where regexp_replace(prosrc, E'[\\n\\r]+', ' ', 'g' ) ~* '[:UPDATE:]([[:space:]]+)[:set:]([[:space:]]+)^[a-z]([a-z]|[0-9])+\.^[a-z]([a-z]|[0-9])+([[:space:]]*)[:=:]';
The regexp_replace part on the left side of the condition is to remove line breaks which works fine. The main part on the right side is not returning the desired result.
I am trying to find the procedures that has UPDATE keyword followed by one or more space, followed by SET keyword, followed by one more space, followed by one more alphanumeric characters (which starts with an alphabet), followed by a dot(.) , followed by one more alphanumeric characters (which starts with an alphabet), followed by zero or more spaces, followed by an equal to sign (=).
But the statement I formed seems to be wrong. Any help on this is much appreciated.

I think this may be more complex than you think... A procedure/function may have more than one update statement, and a simple regex will likely come up with many false positives.
I think you want a function to do a better job of eliminating false positives that result from:
Alias that occurs after the update, in a separate statement (after the semicolon) -- fix by splitting statements by semicolons
Aliases within the update that occur after a FROM or WHERE clause, which are valid and not syntax errors
Less frequent, aliases used in a CTE prior to the update - fix by ignoring everything prior to the update keyword
Here is a boilerplate for what I think will get you close and minimize false positives:
create or replace function find_bad_syntax()
returns setof text
language plpgsql as
$BODY$
DECLARE
r pg_proc%rowtype;
dml varchar[];
eval varchar;
alias varchar;
BEGIN
FOR r IN
SELECT * FROM pg_proc WHERE prosrc ilike '%update%'
LOOP
dml := string_to_array (r.prosrc, ';');
foreach eval in array dml
loop
alias := substring (lower (eval), 'update [\w.]+\s+(\w+)');
continue when alias is null or lower (alias) = 'set';
eval := regexp_replace (eval, 'from\s+.*', '', 'i');
eval := regexp_replace (eval, 'where\s.*', '', 'i');
eval := regexp_replace (eval, '^.*update', '', 'i');
if eval ~* (alias || '\.\w+\s*=') then
-- if eval ~* (alias || '\.\w+\s+=') then
return next format ('PROC: %s ALIAS: %s => ERROR: %s', r.proname, alias, eval);
end if;
end loop;
END LOOP;
END;
$BODY$
So to get the results simply:
select * from find_bad_syntax()
I did a test run, and your function did show up in the results.

The below query gives me the expected results. It checks for code where we have SET followed by one or more space, followed by one or more alphanumeric character and _, followed by a dot(.), followed by one or more alphanumeric character and _, followed by one or more spaces and followed by =.
This fetches all the procedures that have the issue that I posted in question.
select proname, prosrc from pg_proc
where regexp_replace(prosrc, E'[\\n\\r]+', ' ', 'g' )
~* '( SET)[[:space:]]+([a-z]|[0-9]|(_))+\.([a-z]|[0-9]|(_))+[[:space:]]+(=)';

Yes, in PostgreSQL this is not working:
UPDATE table1 t1 SET t1.col1 = 'some_value';
But, this is working correctly:
UPDATE table1 t1 SET col1 = 'some_value';
So we only need to clear the update field alias.
Example for do it:
with t1(txt) as (
select 'UPDATE table1 t1 SET t1.col1 = some_value'
)
select regexp_replace(t1.txt, 'SET (.*)\d\.', 'SET ', 'g') from t1

For finding, selecting:
with t1(txt) as (
select 'UPDATE table1 t1 SET t1.col1 = some_value'
)
select * from t1 where t1.txt ~ 'SET (.*)\d\.'

Some small changes:
with t1(txt) as (
select 'UPDATE table1 t1 SET t1.col1 = some_value'
union all
select 'UPDATE table1 tbp3232 SET tbp3232.col1 = some_value'
union all
select 'select pp3.* from table1 pp3'
union all
select 'UPDATE table1 SET col1 = some_value'
union all
select 'UPDATE table1 t SET t.col1 = some_value'
)
select * from t1 where t1.txt ~ 'SET (.*)\w\.'
--Result:
'UPDATE table1 t1 SET t1.col1 = some_value'
'UPDATE table1 tbp3232 SET tbp3232.col1 = some_value'
'UPDATE table1 t SET t.col1 = some_value'

Related

How to execute a dynamic SQL statement in a single Select statement?

I just wonder how to eval the content of dynamic SQL using one select; this is the example. This is only an example. but I would like dynamically functions, and manage using single selects. ( I know that sqls are only for SELECT instead of modify... but In this deep querentee Im becomeing in a crazy developer)
SELECT 'SELECT SETVAL(' || chr(39) || c.relname || chr(39)|| ' ,
(SELECT MAX(Id)+1 FROM ' || regexp_replace(c.relname, '_[a-zA-Z]+_[a-zA-Z]+(_[a-zA-Z0-9]+)?', '', 'g') ||' ), true );'
FROM pg_class c WHERE c.relkind = 'S';
The original output is:
SELECT SETVAL('viewitem_id_seq' , (SELECT MAX(Id)+1 FROM viewitem ), true );
SELECT SETVAL('userform_id_seq' , (SELECT MAX(Id)+1 FROM userform ), true );
This is the dynamic sentence:
(SELECT MAX(Id)+1 FROM ' || regexp_replace(c.relname, '[a-zA-Z]+[a-zA-Z]+(_[a-zA-Z0-9]+)?', '', 'g')
is an string that generates as output a SQL, how to eval in the same line this statement?
The desired output is:
SELECT SETVAL('viewitem_id_seq' , 25, true );
SELECT SETVAL('userform_id_seq' , 85, true );
thanks!
If those are serial or identity columns it would be better to use pg_get_serial_sequence() to get the link between a table's column and its sequence.
You can actually run dynamic SQL inside a SQL statement by using query_to_xml()
I use the following script if I need to synchronize the sequences for serial (or identity) columns with their actual values:
with sequences as (
-- this query is only to identify all sequences that belong to a column
-- it's essentially similar to your select * from pg_class where reltype = 'S'
-- but returns the sequence name, table and column name to which the
-- sequence belongs
select *
from (
select table_schema,
table_name,
column_name,
pg_get_serial_sequence(format('%I.%I', table_schema, table_name), column_name) as col_sequence
from information_schema.columns
where table_schema not in ('pg_catalog', 'information_schema')
) t
where col_sequence is not null
), maxvals as (
select table_schema, table_name, column_name, col_sequence,
--
-- this is the "magic" that runs the SELECT MAX() query
--
(xpath('/row/max/text()',
query_to_xml(format('select max(%I) from %I.%I', column_name, table_schema, table_name), true, true, ''))
)[1]::text::bigint as max_val
from sequences
)
select table_schema,
table_name,
column_name,
col_sequence,
coalesce(max_val, 0) as max_val,
setval(col_sequence, coalesce(max_val, 1)) --<< this uses the value from the dynamic query
from maxvals;
The dynamic part here is the call to query_to_xml()
First I use format() to properly deal with identifiers. It also makes writing the SQL easier as no concatenation is required. So for every table returned by the first CTE, something like this is executed:
query_to_xml('select max(id) from public.some_table', true, true, '')
This returns something like:
<row xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<max>42</max>
</row>
The value is than extracted from the XML value using xpath() and converted to a number which then is used in the final SELECT to actually call setval()
The nesting with multiple CTEs is only used to make each part more readable.
The same approach can e.g. used to find the row count for all tables
Some background on how query_to_xml() works

PL/SQL regexp_like filters

I want to delete some tables and wrote this procedure:
set serveroutput on
declare
type namearray is table of varchar2(50);
total integer;
name namearray;
begin
--select statement here ..., please see below
total :=name.count;
dbms_output_line(total);
for i in 1 .. total loop
dbms_output.put_line(name(i));
-- execute immediate 'drop table ' || name(i) || ' purge';
End loop;
end;
/
The idea is to drop all tables with table name having pattern like this:
ERROR_REPORT[2 digit][3 Capital characters][10 digits]
example: ERROR_REPORT16MAY2014122748
However, I am not able to come up with the correct regexp. Below are my select statements and results:
select table_name bulk collect into name from user_tables where regexp_like(table_name, '^ERROR_REPORT[0-9{2}A-Z{3}0-9{10}]');
The results included all the table names I needed plus ERROR_REPORT311AUG20111111111. This should not be showing up in the result.
The follow select statement showed the same result, which meant the A-Z{3} had no effect on the regexp.
select table_name bulk collect into name from user_tables where regexp_like(table_name, '^ERROR_REPORT[0-9{2}0-9{10}]');
My question is what would be the correct regexp, and what's wrong with mine?
Thanks,
Alex
Correct regex is
'^ERROR_REPORT[0-9]{2}[A-Z]{3}[0-9]{10}'
I think this regex should work:
^ERROR_REPORT[0-9]{2}[A-Z]{3}[0-9]{10}
However, please check the regex101 link. I've assumed that you need 2 digits after ERROR_REPORT but your example name shows 3.

How to write the pattern in regular expression matching in Pl/SQL?

I have written a substring regular expression in Oracle. I am having a problem with the correct pattern matching. The substring query first fetches the ddl of the trigger into a string and then tries to separate the table's columns from it.
Trigger DDL
CREATE OR REPLACE TRIGGER "SHIVAMG"."DVJ_CI_CURRENCY_CD_L_IU"
BEFORE INSERT OR UPDATE ON CI_CURRENCY_CD_L
FOR EACH ROW
BEGIN
IF INSERTING THEN
IF (UPPER(:NEW.CURRENCY_CD) NOT LIKE 'ZZ%') THEN
INSERT INTO JUNITUSR.CI_CURRENCY_CD_L
(CURRENCY_CD,
LANGUAGE_CD,
DESCR,
VERSION)
SELECT :NEW.CURRENCY_CD,
:NEW.LANGUAGE_CD,
:NEW.DESCR,
:NEW.VERSION
FROM DUAL
WHERE NOT EXISTS
(SELECT 1
FROM JUNITUSR.CI_CURRENCY_CD_L B
WHERE B.CURRENCY_CD =:NEW.CURRENCY_CD AND
B.LANGUAGE_CD = :NEW.LANGUAGE_CD);
END IF;
END IF;
IF UPDATING THEN
IF (UPPER(:NEW.CURRENCY_CD) NOT LIKE 'ZZ%') THEN
UPDATE JUNITUSR.CI_CURRENCY_CD_L A
SET CURRENCY_CD =:NEW.CURRENCY_CD,
LANGUAGE_CD =:NEW.LANGUAGE_CD,
DESCR =:NEW.DESCR ,
VERSION =:NEW.VERSION
WHERE A.CURRENCY_CD = :OLD.CURRENCY_CD AND
A.LANGUAGE_CD =:OLD.LANGUAGE_CD;
END IF;
END IF;
EXCEPTION
WHEN OTHERS THEN
RAISE_APPLICATION_ERROR(-20001,'ERROR: <DVJ_CI_CURRENCY_CD_L_IU> ' || SQLERRM);
END;
ALTER TRIGGER "SHIVAMG"."DVJ_CI_CURRENCY_CD_L_IU" ENABLE"
Substring Query
SELECT REGEXP_SUBSTR((SELECT REGEXP_SUBSTR
(( select dbms_metadata.get_ddl('TRIGGER', 'DVJ_CI_CURRENCY_CD_L_IU' ) from dual), 'INSERT INTO(.*)+\)')FROM dual),'\((.*)\)') FROM DUAL;
I found the correct Substring query to gather the individual column names from the trigger code. It is as follows:
SELECT REGEXP_SUBSTR((SELECT REGEXP_SUBSTR((SELECT REGEXP_SUBSTR (( SELECT dbms_metadata.get_ddl( 'TRIGGER',trig_name,'CISADM') FROM dual),
'INSERT(\s|\n)+INTO[^\)]+\)',1,1,'n') FROM dual),'[\(](\s|\n|.)+[\)]')
FROM DUAL),'(\w)+',1,counter)INTO temp_col_name FROM dual;

Compare column value against list of regex values stored in another table and update accordingly

I am new to Oracle programming.
I want to check the "msg" value of "Table1" against the "regex" values from "Table2".
If the regular expression matches as such, I want to update the respective "regex_id" in "Table1".
Usual query: SELECT 'match found' FROM DUAL WHERE REGEXP_LIKE('s 27', '^(s27|s 27)')
Table1
MSG REG_EXID
Ss27 ?
s27 ?
s28 ?
s29 ?
Table2
REGEX REG_EXID RELEVANCE
^(s27|s 27) 1 10
^(s29|s 29) 2 2
^(m28|m 28) 3 2
^(s27|s 27) 4 100
Taking the newly added "relevance" into account, with Oracle 11g you could try along
UPDATE Table1 T1
SET T1.reg_exID =
(SELECT DISTINCT
MAX(reg_exID) KEEP (DENSE_RANK FIRST ORDER BY relevance DESC) OVER (PARTITION BY regex)
FROM Table2
WHERE REGEXP_LIKE(T1.msg, regex)
)
;
See SQL Fiddle.
You could work along
UPDATE Table1
SET reg_exID = (SELECT reg_exID FROM Table2 WHERE REGEXP_LIKE(Table1.msg, regex));
Please keep in mind:
None of your current sample records will be updated as REGEX are case sensitive.
The above UPDATE will fail, if more than a single REGEX does match.
You could rewrite the current REGEX expressions along "^m ?28".
See it in action: SQL Fiddle (With some data added to actually show the effect.)
Please comment if and as clarification/adjustment is required.

How to find all stored procedures that delete rows from a particular table

If there's a way of doing this without regular expressions, that's great. If there isn't, here's what I've got so far:
I've written a simple CLR user-defined function (which as you can see I've called CLR_RegExMatch) that performs regex matches on a supplied string. I use it to search for patterns inside stored procedures, triggers, functions etc.
Here's an example of its use - searching for inserts into a table called ExampleTable:
SELECT O.name, O.type_desc
FROM
SYS.OBJECTS O
INNER JOIN SYS.SQL_MODULES M ON
M.object_id = O.object_id
AND dbo.CLR_RegExMatch('INSERT\s+(INTO\s+)?ExampleTable\b', M.definition) = 1
The issue I've got is that I can't come up with a regex pattern to find all routines that delete rows from a given table. Obviously I could substitute the following for the last line in the previous example:
AND dbo.CLR_RegExMatch('DELETE\s+(FROM\s+)?ExampleTable\b', M.definition) = 1
and that gets me part of the way there. However it wouldn't pick up the following:
DELETE T1
FROM
ExampleTable T1
INNER JOIN AnotherTable T2 ON T2.ParentId = T1.Id
So what I'm looking for is either a regex pattern that will match deletes as above, or alternatively a different way of going about this.
N.B. The reason that I'm querying the definition column of SYS.SQL_MODULES instead of the ROUTINE_DEFINITION column of INFORMATION_SCHEMA.ROUTINES is that the latter only contains the first 4000 characters a routine definition, whereas the former contains the full text.
Have a look at the FREE Red-Gate tool called SQL Search which does this - it searches your entire database for any kind of string(s).
It's a great must-have tool for any DBA or database developer - did I already mention it's absolutely FREE to use for any kind of use??
It will not tell you which procedures actually delete something from a table - but it will very easily and nicely find all procedures which reference that table in any way. Look at those and find those you need!
If can help, i use this stored procedure that can find a string in all modules in the database/s
usage:
exec find_text 'text to search', 'db_name'
-- if no db_name specified search in all DB
Code below:
CREATE PROCEDURE [dbo].[find_text]
#text varchar(250),
#dbname varchar(64) = null
AS BEGIN
SET NOCOUNT ON;
if #dbname is null
begin
-- enumerate all databases.
DECLARE #db CURSOR FOR Select Name from master..sysdatabases
declare #c_dbname varchar(64)
OPEN #db FETCH #db INTO #c_dbname
while ##FETCH_STATUS <> -1
begin
execute find_text #text, #c_dbname
FETCH #db INTO #c_dbname
end
CLOSE #db DEALLOCATE #db
end
else
begin
declare #sql varchar(250)
--create the find like command
select #sql = 'select ''' + #dbname + ''' as db, o.name,m.definition '
select #sql = #sql + ' from '+#dbname+'.sys.sql_modules m '
select #sql = #sql + ' inner join '+#dbname+'..sysobjects o on m.object_id=o.id'
select #sql = #sql + ' where [definition] like ''%'+#text+'%'''
select #sql = #sql + ' order by o.name'
execute (#sql)
end
END