As per the swift documentation, The SWIFT X character set consists of the below
X Character Set – SWIFT Character Set
a b c d e f g h i j k l m n o p q r s t u v w x y z
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
0 1 2 3 4 5 6 7 8 9
/ – ? : ( ) . , ‘ + CrLf Space
I have come up with the below to validate the swift character set which seems to be working but want to know if there is a better way of doing it.Also what should I use for CRLF to be OS neutral.Since i use unix I have put chr(10)
^[a-zA-Z0-9 -?:().,''+chr(10)/]*$
Unfortunately, a range like a-z may include accented letters and collation elements, depending on the value of nls_sort at the time of running a query. And, alas, Oracle does not support the character class [[:ascii:]], which would be another way to specify what you need.
You have two choices. Either you specify the nls_sort parameter explicitly every time, before running the query (or rely on it being something like 'English' already), which to me doesn't sound like a good practice; or you specify all letters explicitly.
There are a few more things to fix. The dash - has special meaning in a bracketed expression; if you want it to mean a literal dash, it should appear as either the first or the last character in the list, where it can't have its special meaning. All other regexp special characters are not special in a bracketed expression, so you don't need to worry about dot, question mark, asterisk, parentheses, etc.
However, note that the single-quote character, while it has no special meaning in a regular expression (in a bracketed expression or otherwise), it does have a special meaning in a string in Oracle; to include a single-quote in a hard-coded string, you must escape it by typing two single-quote characters.
Then - if you write chr(10) in a bracketed expression, that is characters c, h, ... - if you mean LF, you need to either actually include a newline character in your string (probably a bad idea), or concatenate it by hand.
And if you want to validate against the official character set of "swift x" (whatever that is), you should include all characters, regardless of your OS. So you should accept CR (chr(13)) too, unless you have a better reason to omit it. If it is present but you don't want it in your db, you should accept it and then remove it after the fact and save the resulting string (after you remove CR), not reject the entire string altogether.
To keep the work organized, I would create a very small table (or view) to store the needed validation string, then use it in all queries that need it.
Something like this:
create table swift_validation (validation_pattern varchar2(100));
insert into swift_validation (validation_pattern)
with helper(ascii_letters) as (
select 'abcdefghijklmnopqrstuvwxyz' from dual
)
select '^[' || ascii_letters -- a-z (ASCII)
|| upper(ascii_letters) -- A-Z (ASCII)
|| '0-9'
|| chr(10) || chr(13) -- LF and CR
|| '/?:().,''+ -'
|| ']*$'
from helper
;
commit;
Check what was saved in the table:
select * from swift_validation;
VALIDATION_PATTERN
------------------------------------------------------------
^[abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0-9
/?:().,'+ -]*$
Note that the result is in three lines. chr(10) is seen as a newline; then chr(13) by itself is converted to another newline.
In any case, if you really want to see the exact characters saved in this string, you can use the dump function. With option 17 for the second argument, the output is readable (you will have to scroll though):
select dump(validation_pattern, 17) from swift_validation;
DUMP(VALIDATION_PATTERN,17)
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
Typ=1 Len=73: ^,[,a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z,A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T,U,V,W,X,Y,Z,0,-,9,^J,^M,/,?,:,(,),.,,,',+, ,-,],*,$
Notice in particular the control characters, ^J and ^M; they mean chr(10) and chr(13) respectively (easy to remember: J and M are the tenth and thirteenth letters of the Latin alphabet).
Then you use this as follows:
with
test_strings (str) as (
select 'abc + (12)' from dual union all
select '$122.38' from dual union all
select null from dual union all
select 'café au lait' from dual union all
select 'A / B - C * D' from dual
)
select t.str,
case when regexp_like(t.str, sv.validation_pattern)
then 'valid' else 'invalid' end as swift_valid
from test_strings t, swift_validation sv
;
STR SWIFT_VALID
------------- -----------
abc + (12) valid
$122.38 invalid
invalid
café au lait invalid
A / B - C * D invalid
Notice one last oddity here. In my test, I included a row where the input string is empty (null). Regular expressions are odd in this respect: null is not (regexp_) like something like 'a*' - even though * is supposed to mean "zero or more ...". Oracle's reasoning, perhaps, is that null may be anything - just one of the hundreds of ways the Oracle identification of null and "empty string" is just plain idiotic. It is what it is though; make sure you don't reject a row with an empty string. I assume "swift x" allows empty strings. You will need to handle that separately, like this:
with
test_strings (str) as (
select 'abc + (12)' from dual union all
select '$122.38' from dual union all
select null from dual union all
select 'café au lait' from dual union all
select 'A / B - C * D' from dual
)
select t.str,
case when t.str is null
or regexp_like(t.str, sv.validation_pattern)
then 'valid' else 'invalid' end as swift_valid
from test_strings t, swift_validation sv
;
STR SWIFT_VALID
------------- -----------
abc + (12) valid
$122.38 invalid
valid
café au lait invalid
A / B - C * D invalid
Left as exercise:
You may need to find the invalid characters in an invalid string. For such generalized applications (more than a straight validation of a whole string), you might be better off saving just the bracketed expression in the swift_validation table (without the leading anchor ^, and the trailing quantifier * and anchor $). Then you need to re-write the validation query slightly, to concatenate these fragments to the validation pattern in the regexp_like condition; but then you can include, for example, an additional column to show the first invalid character in an invalid string.
EDIT
In follow-up discussion (see comments below this answer), the OP clarified that only the combination chr(13) || chr(10) (in that order) is permitted. chr(10) and chr(13) are invalid if they appear by themselves, or in the wrong order.
This makes the problem more interesting (more complicated). To allow only the letters a, b, c or the sequence xy (that is: x alone, or y alone, are not allowed; every x must appear followed immediately by y, and every y must appear immediately preceded by x), the proper matching pattern looks like
'^([abc]|xy)*$'
Here expr1|expr2 is alternation, and it needs to be enclosed in parentheses to apply the * quantifier.
An additional complication is that $ doesn't actually match "the end of the input string"; it anchors either at the end of the input string, or if the input string ends in newline (chr(10)), it anchors before that character. Happily, there is the alternative anchor \z that doesn't suffer from that defect; it anchors truly at the end of the input string. This will be needed if we don't want to validate input strings that end in chr(10) not preceded immediately by chr(13). (If we do want to allow those - even though technically they do violate the "swift x" rules - then replace \z with $ as we had it before).
Here I demonstrate a slightly modified approach - now the small table that stores the validation rule only contains the alternation bit - either one character out of an enumeration, or the two-character sequence chr(13) || chr(10)), letting the "caller" wrap this within whatever is needed for a complete matching pattern.
The small table (note that I changed the column name):
drop table swift_validation purge;
create table swift_validation (valid_patterns varchar2(100));
insert into swift_validation (valid_patterns)
with helper(ascii_letters) as (
select 'abcdefghijklmnopqrstuvwxyz' from dual
)
select '[' -- open bracketed expression
|| ascii_letters -- a-z (ASCII)
|| upper(ascii_letters) -- A-Z (ASCII)
|| '0-9'
|| '/?:().,''+ -' -- '' escape for ', - last
|| ']' -- close bracketed expression
|| '|' -- alternation
|| chr(13) || chr(10) -- CR LF
from helper
;
commit;
Testing (notice the modified match pattern: now the ^ and \z anchors, the parentheses and the * quantifier are hard-coded in the query, not in the saved string):
with
test_strings (id, str) as (
select 1, 'abc + (12)' from dual union all
select 2, '$122.38' from dual union all
select 3, null from dual union all
select 4, 'no_underline' from dual union all
select 5, 'A / B - C * D' from dual union all
select 6, 'abc' || chr(10) || chr(13) from dual union all
select 7, 'abc' || chr(10) from dual union all
select 8, 'abc' || chr(13) || chr(10) from dual union all
select 9, 'café au lait' from dual
)
select t.id, t.str,
case when t.str is null
or regexp_like(t.str, '^(' || sv.valid_patterns || ')*\z')
then 'valid' else 'invalid' end as swift_valid
from test_strings t, swift_validation sv
;
ID STR SWIFT_VALID
-- ------------- -----------
1 abc + (12) valid
2 $122.38 invalid
3 valid
4 no_underline invalid
5 A / B - C * D invalid
6 abc invalid
7 abc invalid
8 abc valid
9 café au lait invalid
The newline characters (CR and LF) aren't clearly visible in the output; I added an id column so you can reference the output according to the input in the with clause.
I'm looking to use REGEXP_REPLACE (or equivalent functions) to format my strings to be fixed length outputs.
For example. My input is always 3 sets of random strings, delimited by commas.
I want the output to have a fixed length of 10 characters (filled by '_') for each capturing group.
input:
abc,def,ghi
085,10,1234567
long words,tom,jerry
Desired output:
_______abc_______def_______ghi
_______085________10___1234567
long words_______tom_____jerry
So the code would be something like:
select regexp_replace( input, '([^,]+),([^,]+),([^,]+)',
LPAD('\1', '_', 10) || LPAD('\2', '_', 10) || LPAD('\3', '_', 10) )
from <table>
That apparently didn't work out as expected.
Any ideas?
With multiple replacements and concatenation you can do this:
select
lpad(regexp_replace('085,10,1234567', ',.*$', ''), 10, '_') || -- remove the first period and everything after it
lpad(regexp_replace('085,10,1234567', '^[^,]+,|,[^,]+$', ''), 10, '_') || -- remove the first period, everything before it, and the last period plus everything after it
lpad(regexp_replace('085,10,1234567', '^.*,', ''), 10, '_') as whatever -- remove the last period and everything before it
from
dual
Just replace '085,10,1234567' with your input.
This example produces:
_______085________10___1234567
In postgresql I would like to substitute just in full words and not substrings. I noticed that replace and translate replace strings even in substrings. Then, I used regexp_replace to add the following:
SELECT REGEXP_REPLACE (UPPER('BIG CATDOG'), '(^|[^a-z0-9])' || UPPER('CAT') || '($|[^a-z0-9])', '\1' || UPPER('GATO') || '\2','g')
In the previous sample, CAT should not been replaced because it is not a whole word, but a substring which is part of a word. How can I achieve to avoid the replacement? The output should be BIG CATDOG because no substitution was possible.
Thanks
The replacement happens because you are only checking for [^a-z0-9] after the search term, and D is not in that character class. You can resolve this by either adding A-Z to your character class:
SELECT REGEXP_REPLACE (UPPER('BIG CATDOG'), '(^|[^a-zA-Z0-9])' || UPPER('CAT') || '($|[^a-zA-Z0-9])', '\1' || UPPER('GATO') || '\2','g')
Or by adding the i flag to the replace call:
SELECT REGEXP_REPLACE (UPPER('BIG CATDOG'), '(^|[^a-z0-9])' || UPPER('CAT') || '($|[^a-z0-9])', '\1' || UPPER('GATO') || '\2','gi')
In either case you will get the desired BIG CATDOG output.
However a better solution is to use the word boundary constraints \m (beginning of word) and \M (end of word):
SELECT REGEXP_REPLACE (UPPER('BIG CATDOG'), '\m' || UPPER('CAT') || '\M', UPPER('GATO'),'g')
Demo on dbfiddle
I have attempted to use this question here Splitting string into multiple rows in Oracle and adjust it to my needs however I'm not very confident with regex and have not been able to solve it via searching.
Currently that questions answers it with a lot of regex_substr and so on, using [^,]+ as the pattern so it splits by a single comma. I need it to split by a multi-character delimiter (e.g. #;) but that regex pattern matches any single character to split it out so where there are #s or ;s elsewhere in the text this causes a split.
I've worked out the pattern (#;+) will match every group of #; but I cannot workout how to invert this as done above to split the row into multiple.
I'm sure I'm just missing something simple so any help would be greatly appreciated!
I think you should use:
[^#;+]+
instead of
(#;+)
As, it will be checking for any one of the characters in the range which can be # ; or + and then you can split accordingly.
You can change it according to your requirement but in the regex I
shared, I am consudering # , ; and + as delimeter
So, in end, the query would look something like this:
with tbl(str) as (
select ' My, Delimiter# Hello My; Delimiter World My Delimiter My Delimiter test My Delimiter ' from dual
)
SELECT LEVEL AS element,
REGEXP_SUBSTR( str ,'([^#;+]+)', 1, LEVEL, NULL, 1 ) AS element_value
FROM tbl
CONNECT BY LEVEL <= regexp_count(str, '[#;+]')+1\\
Output:
ELEMENT ELEMENT_VALUE
1 My, Delimiter
2 Hello My
3 Delimiter World My Delimiter My Delimiter test My Deli
-- EDIT --
In case you want to check unlimited numbers of # or ; to split and don't want to split at one existence, I found the below regex, but again that is not supported by Oracle.
(?:(?:(?![;#]+).#(?![;#]+).|(?![;#]+).;(?![;#]+).|(?![;#]+).)*)+
So, I found no easy apart from below query which will not split on single existence if there is only one such instance between two delimeters:
select ' My, Delimiter;# Hello My Delimiter ;;# World My Delimiter ; My Delimiter test#; My Delimiter ' from dual
)
SELECT LEVEL AS element,
REGEXP_SUBSTR( str ,'([^#;]+#?[^#;]+;?[^#;]+)', 1, LEVEL, NULL, 1 ) AS element_value
FROM tbl
CONNECT BY LEVEL <= regexp_count(str, '[#;]{2,}')+1\\
Output:
ELEMENT ELEMENT_VALUE
1 My, Delimiter
2 Hello My Delimiter
3 World My Delimiter ; My Delimiter test
4 My Delimiter
I have input string something like :
1.2.3.4_abc_4.2.1.44_1.3.4.23
100.11.11.22_xyz-abd_10.2.1.2_12.2.3.4
100.11.11.22_xyz_123_10.2.1.2_1.2.3.4
I have to replace the first string formed between two ipaddress which are separated by _, however in some string the _ is part of the replacement string (xyz_123)
I have to find the abc, xyz-abd and xyz_123 from the above string, so that I can replace with another column in that table.
_.*?_(?=\d+\.)
matches _abc_, _xyz-abd_ and _xyz_123_ in your examples. Is this working for you?
DECLARE
result VARCHAR2(255);
BEGIN
result := REGEXP_REPLACE(subject, $$_.*?_(?=\d+\.)$$, $$_foo_$$);
END;
Probably this is enough:
_[^.]+_
and replace with
_Replacement_
See it here on Regexr.
[^.]+ uses a negated character class to match a sequence of at least one (the + quantifier) non "." characters.
I am also matching a leading and a trailing "_", so you have to put it in again in the replacement string.
If PostgreSQL supports lookbehind and lookahead assertions, then it is possible to avoid the "_" in the replacement string:
(?<=_)[^.]+(?=_)
See it on Regexr
In order to map match first two "" , as #stema and #Tim Pietzcker mentioned the regex works. Then in order to append "" to the column , which is what I was struggling with, can be done with || operator as eg below
update table1 set column1=regexp_replace(column1,'.*?(?=\d+.)','' || column2 || '_')
Then for using the another table for update query , the below eg can be helpfull
update table1 as t set column1=regexp_replace(column1,'.*?(?=\d+.)','' || column2 || '_') from table2 as t2 where t.id=t2.id [other criteria]