I need to remove (by means of a function) possible non-latin characters (chinese, japanese, ...) by means of a regex expression from a Postgres database table.
I have tried all solutions I could find online, but nothing seems to work.
CREATE OR REPLACE FUNCTION public.function_104(param text)
RETURNS void
LANGUAGE plpgsql
AS $function$
BEGIN
EXECUTE 'UPDATE public.' || quote_ident(param) || ' SET "name" = REGEXP_REPLACE("name", [^x00-x7F]+, " ")';
END
$function$
I keep running into following error message :
psycopg2.errors.SyntaxError: syntax error at or near "["
LINE 1: ..._roads_free_1 SET "name" = REGEXP_REPLACE("name", [^x00-x7F]...
^
QUERY: UPDATE public.gis_osm_roads_free_1 SET "name" = REGEXP_REPLACE("name", [^x00-x7F]+, " ")
CONTEXT: PL/pgSQL function afri_terra_104(text) line 6 at EXECUTE
```
You must put the regex between single quotes, as well as the replacement text. Since it is a dynamic query, you must escape the single quotes by doubling them:
CREATE OR REPLACE FUNCTION public.function_104(param text)
RETURNS void
LANGUAGE plpgsql
AS $function$
BEGIN
EXECUTE 'UPDATE public.' || quote_ident(param) ||
' SET "name" = REGEXP_REPLACE("name", ''[^x00-x7F]+'', '' '')';
END
$function$;
insert into t104(name) values('abcé');
INSERT 0 1
select function_104('t104');
function_104
--------------
(1 row)
select * from t104;
name
------
abc
(1 row)
Related
My goal is to create a stored procedure that loops through a select statement that will identify tables requiring a vacuum. I will call it from Lambda if I can get it to work. These are my ideas and code so far.
CREATE OR REPLACE PROCEDURE vac_an (rs_out INOUT refcursor)
AS $$
BEGIN
OPEN rs_out FOR SELECT 'VACUUM FULL ' + "schema" + '.' + "table" + ';' AS command
FROM svv_table_info
WHERE (unsorted > 5 OR empty > 5)
AND size < 716800;
END;
$$ LANGUAGE plpgsql;
This is a start. It compiles, but it would not execute the actual command that the cursor builds, which is:
VACUUM FULL SCHEMA.TABLE;
I guess I could call it with this:
CALL sample_cursor_test ();
My second line of thinking was something like this:
CREATE PROCEDURE vac_an()
AS $$
DECLARE
tlist RECORD;
BEGIN
FOR tlist IN EXECUTE 'SELECT 'VACUUM FULL ' + "schema" + '.' + "table" + ';' AS command FROM svv_table_info WHERE (unsorted > 5 OR empty > 5) AND size < 716800;'
LOOP
EXECUTE tlist;
END LOOP;
END;
$$ LANGUAGE plpgsql;
However that gives me :
ERROR: missing "LOOP" at end of SQL expression
Where: compile of PL/pgSQL function "vac_an" near line 4
I feel like the code is almost there I just need to loop through this cursor:
SELECT 'VACUUM FULL ' + "schema" + '.' + "table" + ';' AS command
FROM svv_table_info
WHERE (unsorted > 5 OR empty > 5)
AND size < 716800;
And execute the output line by line.
Can you please help?
You cannot call VACUUM from within a transaction, which means you cannot call VACUUM from within a procedure, since a procedure is inherently a transaction.
I need a help in writing in U-SQL to output records to two different files based on a regular expression output.
Let me explain my scenario in detail.
Let us assume my input file has two columns, "Name" and person identification number ("PIN"):
Name , PIN
John ,12345
Harry ,01234
Tom, 24659
My condition for PIN is it should start with either 1 or 2. In the above case records 1 & 3 are valid and record 2 is invalid.
I need to output record 1 & 3 to my output processed file and 2 to my error file
How can I do this and also can I use Regex.Match to validate the regular expression?
//posting my code
#person =
EXTRACT UserId int,
PNR string,
UID String,
FROM "/Samples/Data/person.csv"
USING Extractors.csv();
#rs1=select UserId,PNR,UID,Regex.match(PNR,'^(19|20)[0-9]{2}((0[1-9])$') as pnrval,Regex.match(UID,'^(19|20)[0-9]{2}$') as uidval
from #person
#rs2 = select UserId,PNR,UID from #rs1 where pnrval=true or uidval=true
#rs3 = select UserId,PNR,UID from #rs1 where uidval=false or uidval= false
OUTPUT #rs2
TO "/output/sl.csv"
USING Outputters.Csv();
OUTPUT #rs3
TO "/output/error.csv"
USING Outputters.Csv();
But I'm receiving this error:
Severity Code Description Project File Line Suppression State Error
E_CSC_USER_INVALIDCOLUMNTYPE: 'System.Text.RegularExpressions.Match'
cannot be used as column type.
#someData =
SELECT * FROM
( VALUES
("John", "12345"),
("Harry", "01234"),
("Tom", "24659")
) AS T(Name, pin);
#result1 =
SELECT Name,
pin
FROM #someData
WHERE pin.StartsWith("1") OR pin.StartsWith("2");
#result2 =
SELECT Name,
pin
FROM #someData
WHERE !pin.StartsWith("1") AND !pin.StartsWith("2");
#person =
EXTRACT UserId int,
PNR string,
UID String,
FROM "/Samples/Data/person.csv"
USING Extractors.csv();
#rs1=select UserId,PNR,UID,Regex.Ismatch(PNR,'^(19|20)[0-9]{2}((0[1-9])$') as pnrval,Regex.Ismatch(UID,'^(19|20)[0-9]{2}$') as uidval
from #person
#rs2 = select UserId,PNR,UID from #rs1 where pnrval=true or uidval=true
#rs3 = select UserId,PNR,UID from #rs1 where pnrval=false or uidval= false
OUTPUT #rs2
TO "/output/sl.csv"
USING Outputters.Csv();
OUTPUT #rs3
TO "/output/error.csv"
USING Outputters.Csv();
This worked for my requirement. Thanks for the support and suggestions
Considering your input, I would use
.*\s*,\s*[12]\d+
.* matches any amount of characters and is needed to match everything before the comma
\s*,\s* matches a comma optionally preceded and or followed by any amount of blanks (\s matches a blank)
[12] matches a single digit, equal to 1 or 2; this satisfies your requirement about PINs
\d+ matches one or more digits
Live demo here.
As far as using Regex.Match, I'll quote this answer on StackOverflow:
System.Text.RegularExpressions.Match is not part of the built-in U-SQL types.
So what I would do here is pre-parsing your CSV in C#; something like:
Regex CurrentRegex = new Regex(#".*\s*,\s*[12]\d+", RegexOptions.IgnoreCase);
foreach (var LineOfText in File.ReadAllLines(InputFilePath))
{
Match CurrentMatch = CurrentRegex.Match(LineOfText);
if (CurrentMatch.Success)
{
// Append line to success file
}
else
{
// Append line to error file
}
CurrentMatch = CurrentMatch.NextMatch();
}
I'm trying to use a regex pattern matching with PostgreSQL 9.4:
Have looked through previous answers but nothing I can find matches this particular problem
select 'apple' ~ '^.*pp.*$' returns 't' as expected
update <table> set column = 'value' where name ~* '^.*pp.*$' also works.
But:
update <table> set column = 'value' from <other_table> where name ~* '^.*pp.*$' produces an error:
The specific example:
update
members set
pattern = a.pattern
from
services a
where
organisation ~* '^.*' || replace(a.pattern, ' ', '.*') || '.*$';
ERROR: argument of WHERE must be type boolean, not type text
LINE 1: ...attern = a.pattern from services a where organisati...
It seems the where clause after the FROM table in the update is not recognising or processing the regex operator correctly.
Or, equally probably, I'm misunderstanding the UPDATE...FROM syntax
Many thanks if you can help
you are missing brackets around string expressions. These operators (~ and ||) has same priority and then are evaluated from left.
postgres=# update foo set b = a where a ~ 'ab';
UPDATE 1
postgres=# update foo set b = a where a ~ 'ab' || 'xxxx';
ERROR: argument of WHERE must be type boolean, not type text
LINE 1: update foo set b = a where a ~ 'ab' || 'xxxx';
^
postgres=# update foo set b = a where a ~ ('ab' || 'xxxx');
UPDATE 0
I am trying to see if there is any way to remove carriage and new lines from all the varchar columns in a table using one statement.
I know that we can do this for a single column using something like below
select regexp_replace(field, E'[\\n\\r]+', ' ', 'g' )
In that case I need have one for every column, which I don't want to do unless there is any easy way.
Appreciate your help!
You can do this either creating a plpgsql function to execute dynamic SQL, or directly run it via DO, as the following example (replace my_table with the name of your table`):
do $$declare _q text; _table text = '<mytable>';
begin
select 'update '||attrelid::regclass::text||E' set\n'||
string_agg(' '||quote_ident(attname)||$q$ = regexp_replace($q$||quote_ident(attname)||$q$, '[\n\r]+', ' ', 'g')$q$, E',\n' order by attnum)
into _q
from pg_attribute
where attnum > 0 and atttypid::regtype::text in ('text', 'varchar')
group by attrelid
having attrelid = _table::regclass;
raise notice E'Executing:\n\n%', _q;
-- uncomment this line when happy with the query:
-- execute _q;
end;$$;
How to get the more than one matched keywords in a given string.
Please find the below query.
SELECT regexp_matches(UPPER('bakerybaking'),'BAKERY|BAKING');
output: "{BAKERY}"
the above scenario given string is matched with two keywords.
when i execute the above query getting only one keyword only.
How to get other matched keywords.
g is a global search flag using in regex.Is used to get all the matching strings
select regexp_matches(UPPER('bakerybaking'),'BAKERY|BAKING','g')
regexp_matches
text[]
--------------
{BAKERY}
{BAKING}
to get the result as a single row :
SELECT ARRAY(select array_to_string(regexp_matches(UPPER('bakerybaking'),'BAKERY|BAKING','g'),''));
array
text[]
---------------
{BAKERY,BAKING}
by using unnest - to convert the array returned to a table
select unnest(regexp_matches(UPPER('bakerybaking'),'BAKERY|BAKING','g'))
unnest
text
------
BAKERY
BAKING
accoring to: http://www.postgresql.org/docs/9.5/static/functions-string.html
SELECT regexp_matches(UPPER('bakerybaking'),'(BAKERY)(BAKING)');
Otput:)
regexp_matches
----------------- {BAKERY,BAKING} (1 row)
Oh the humanity. Please thank me.
--https://stackoverflow.com/questions/52178844/get-second-match-from-regexp-matches-results
--https://stackoverflow.com/questions/24274394/postgresql-8-2-how-to-get-a-string-representation-of-any-array
CREATE OR REPLACE FUNCTION aaa(anyarray,Integer, text)
RETURNS SETOF text
LANGUAGE plpgsql
AS $function$
DECLARE s $1%type;
BEGIN
FOREACH s SLICE 1 IN ARRAY $1[$2:$2] LOOP
RETURN NEXT array_to_string(s,$3);
END LOOP;
RETURN;
END;
$function$;
--SELECT aaa((ARRAY(SELECT unnest(regexp_matches('=If(If(E_Reports_# >=1, DMT(E_Date_R1_#, DateShift),0)', '(\w+_#)|([0-9]+)','g'))::TEXT)),1,',')
--select (array[1,2,3,4,5,6])[2:5];
SELECT aaa(array_remove(Array(SELECT unnest(regexp_matches('=If(If(E_Reports_# >=1, DMT(E_Date_R1_#, DateShift),0)', '(\w+_#)|([0-9]+)','g'))::TEXT), Null),3,',')