Delphi - How can I extract the digits from a character string? - regex

I was developing a program that validate a CPF, a type of document of my country. I already did all the math. But in the input Edit1, the user will insert like:
123.456.789-00
I have to get only the numbers, without the hyphen and the dots, to my calcs worth.
I'm newbie with Delphi, but I think that's simple. How can I do that? Thanks for all

You can use
text := '123.456.789-00'
text := TRegEx.Replace(text, '\D', '')
Here, \D matches any non-digit symbol that is replaced with an empty string.
Result is 12345678900 (see regex demo).

Using David's suggestion, iterate your input string and remove characters that aren't numbers.
{$APPTYPE CONSOLE}
function GetNumbers(const Value: string): string;
var
ch: char;
Index, Count: integer;
begin
SetLength(Result, Length(Value));
Count := 0;
for Index := 1 to length(Value) do
begin
ch := Value[Index];
if (ch >= '0') and (ch <='9') then
begin
inc(Count);
Result[Count] := ch;
end;
end;
SetLength(Result, Count);
end;
begin
Writeln(GetNumbers('123.456.789-00'));
Readln;
end.

Related

PL/SQL. Parse clob UTF8 chars with regexp_like regular expressions

I want to check if any line of my clob have strange characters like (ñ§). These characters are read from a csv-file with an unexpected encoding (UTF-8) which converts some of them.
I tried to filter each line using a regular expression but it's not working as intended. Is there a way to know the encoding of a csv-file when read?
How could I fix the regular expression to allow lines with only these characters? a-zA-Z 0-9 .,;:"'()-_& space tab.
Clob example readed from csv:
l_clob clob :='
"exp","objc","objc","OBR","031110-5","S","EXAMPLE","NAME","08/03/2018",,"122","3","12,45"
"xp","objc","obj","OBR","031300-5","S","EXAMPLE","NAME","08/03/2018",,"0","0","0"
';
Another clob:
DECLARE
l_clob CLOB
:= '"exp","objc","objc","OBR","031110-5","S","EXAMPLE","NAME","08/03/2018",,"122","3","12,45"
"xp","objc","obj","OBR","031300-5","S","EXAMPLE","NAME","08/03/2018",,"0","0","0"';
l_offset PLS_INTEGER := 1;
l_line VARCHAR2 (32767);
csvregexp CONSTANT VARCHAR2 (1000)
:= '^([''"]+[-&\s(a-z0-9)]*[''"]+[,:;\t\s]?)?[''"]+[-&\s(a-z0-9)]*[''"]+' ;
l_total_length PLS_INTEGER := LENGTH (l_clob);
l_line_length PLS_INTEGER;
BEGIN
WHILE l_offset <= l_total_length
LOOP
l_line_length := INSTR (l_clob, CHR (10), l_offset) - l_offset;
IF l_line_length < 0
THEN
l_line_length := l_total_length + 1 - l_offset;
END IF;
l_line := SUBSTR (l_clob, l_offset, l_line_length);
IF REGEXP_LIKE (l_line, csvregexp, 'i')
THEN -- i (case insensitive matches)
DBMS_OUTPUT.put_line ('Ok');
DBMS_OUTPUT.put_line (l_line);
ELSE
DBMS_OUTPUT.put_line ('Error');
DBMS_OUTPUT.put_line (l_line);
END IF;
l_offset := l_offset + l_line_length + 1;
END LOOP;
END;
If you only want to allow special characters you can use this regex:
Your Regex
csvregexp CONSTANT VARCHAR2 (1000) := '^[a-zA-Z 0-9 .,;:"''()-_&]+$' ;
Regex-Details
^ Start of your string - no chars before this - prevents partial match
[] a set of allowed chars
[]+ a set of allowed chars. Has to be one char minimum up to inf. (* instead of + would mean 0-inf.)
[a-zA-Z]+ 1 to inf. letters
[a-zA-Z0-9]+ 1 to inf. letters and numbers
$ end of your string - no chars behind this - prevents partial match
I think you can work it out with this ;-)
If you know there could be an other encoding in your input, you could try to convert and check against the regex again.
Example-convert
select convert('täst','us7ascii', 'utf8') from dual;

Checking for a pure string using regexp_like

I need to check a "substring of the first 6 characters" of an input string for a pure string.
declare
p_str varchar2(30) := 'ABCD1240';
l_result varchar2(20);
begin
if REGEXP_LIKE(substr(p_str,1,6), '[[:alpha:]]') then
dbms_output.put_line('It is a pure string');
else
dbms_output.put_line('It is an alphanumeric');
end if;
end;
/
I can see that the first 6 characters of the string ABCD1290 is alphanumeric as it contains 12.
But, the output that is printed says otherwise.
Am I doing something wrong with the "alpha" in regexp_like ?
I thought alpha was supposed to be pure characters and not numbers.
Here, ABCD1290 should give me: alphanumeric as output.
ABCDXY90 should be : pure string
Try this:
declare
l_res varchar2(100);
begin
for i in (select 'abcdef123' val from dual union
select '123abc123' from dual union
select '123456abc' from dual)
loop
if REGEXP_LIKE(i.val, '^\D{6}')
then
l_res := 'alpha';
else
l_res := 'numeric';
end if;
dbms_output.put_line(i.val || ' is ' || l_res);
end loop;
end;
123456abc is numeric
123abc123 is numeric
abcdef123 is alpha

Largest "separation" of patterns for Delphi regex?

Update
As Graymatter has observed, regex fails to match when there are at least 2 extra line breaks before the second target. That is to say, changing the concatenation loop to "for I := 0 to 1" will make the regex-match fail.
As shown in the code below, without the concatenation, the program can get the two values using regex. However, with the concatenation, the program cannot get the two values.
Could you help to comment on the reason and the workaround ?
program Project1;
{$APPTYPE CONSOLE}
uses
// www.regular-expressions.info/delphi.html
// http://www.regular-expressions.info/download/TPerlRegEx.zip
PerlRegEx,
SysUtils;
procedure Test;
var
Content: UTF8String;
Regex: TPerlRegEx;
GroupIndex: Integer;
I: Integer;
begin
Regex := TPerlRegEx.Create;
Regex.Regex := 'Value1 =\s*(?P<Value1>\d+)\s*.*\s*Value2 =\s*(?P<Value2>\d*\.\d*)';
Content := '';
for I := 0 to 10000000 do
begin
// Uncomment here to see effect
// Content := Content + 'junkjunkjunkjunkjunk' + sLineBreak;
end;
Regex.Subject := 'junkjunkjunkjunkjunk' +
sLineBreak + ' Value1 = 1' +
sLineBreak + 'junkjunkjunkjunkjunk' + Content +
sLineBreak + ' Value2 = 1.23456789' +
sLineBreak + 'junkjunkjunkjunkjunk';
if Regex.Match then
begin
GroupIndex := Regex.NamedGroup('Value1');
Writeln(Regex.Groups[GroupIndex]);
GroupIndex := Regex.NamedGroup('Value2');
Writeln(Regex.Groups[GroupIndex]);
end
else
begin
Writeln('No match');
end;
Regex.Free;
end;
begin
Test;
Readln;
end.
Adding this line works.
Regex.Options := [preSingleLine];
From the documentation:
preSingleLine
Normally, dot (.) matches anything but a newline (\n). With preSingleLine, dot (.) will match anything, including newlines. This allows a multiline string to be regarded as a single entity. Equivalent to Perl's /s modifier. Note that preMultiLine and preSingleLine can be used together.
When there is only one line break before the second target, the regex can match even without preSingleline. The reason is because \s can match line return.

How to skip quoted text in regex (or How to use HyperStr ParseWord with Unicode text ?)

I need regex help to create a delphi function to replace the HyperString ParseWord function in Rad Studio XE2. HyperString was a very useful string library that never made the jump to Unicode. I've got it mostly working but it doesn't honor quote delimiters at all. I need it to be an exact match for the function described below:
function ParseWord(const Source,Table:String;var Index:Integer):String;
Sequential, left to right token parsing using a table of single
character delimiters. Delimiters within quoted strings are ignored.
Quote delimiters are not allowed in Table.
Index is a pointer (initialize to '1' for first word) updated by the
function to point to next word. To retrieve the next word, simply
call the function again using the prior returned Index value.
Note: If Length(Resultant) = 0, no additional words are available.
Delimiters within quoted strings are ignored. (my emphasis)
This is what I have so far:
function ParseWord( const Source, Table: String; var Index: Integer):string;
var
RE : TRegEx;
match : TMatch;
Table2,
chars : string;
begin
if index = length(Source) then
begin
result:= '';
exit;
end;
// escape the special characters and wrap in a Group
Table2 :='['+TRegEx.Escape(Table, false)+']';
RE := TRegEx.create(Table2);
match := RE.Match(Source,Index);
if match.success then
begin
result := copy( Source, Index, match.Index - Index);
Index := match.Index+match.Length;
end
else
begin
result := copy(Source, Index, length(Source)-Index+1);
Index := length(Source);
end;
end;
while ( Length(result)= 0) and (Index<length(Source)) do
begin
Inc(Index);
result := ParseWord(Source,Table, Index);
end;
cheers and thanks.
I would try this regex for Table2:
Table2 := '''[^'']+''|"[^"]+"|[^' + TRegEx.Escape(Table, false) + ']+';
Demo:
This demo is more a POC since I was unable to find an online delphi regex tester.
The delimiters are the space (ASCII code 32) and pipe (ASCII code 124) characters.
The test sentence is:
toto titi "alloa toutou" 'dfg erre' 1245|coucou "nestor|delphi" "" ''
http://regexr.com?32i81
Discussion:
I assume that a quoted string is a string enclosed by either two single quotes (') or two double quotes ("). Correct me if I am wrong.
The regex will match either:
a single quoted string
a double quoted string
a string not composed by any passed delimiters
Known bug:
Since I didn't know how ParseWord handle quote escaping inside string, the regex doesn't support this feature.
For instance :
How to interpret this 'foo''bar' ? => Two tokens : 'foo' and 'bar' OR one single token 'foo''bar'.
What about this case too : "foo""bar" ? => Two tokens : "foo" and "bar" OR one single token "foo""bar".
In my original code I was looking for the delimiter and taking everything up to that as my next match, but that concept didn't carry over when looking for something within quotes. #Stephan's suggestion of negating the search eventually lead me to something that works. An additional complication that I never mentioned earlier is that HyperStr can use anything as a quoting character. The default is double quote but you can change it with a function call.
In my solution I've explicitly hardcoded the QuoteChar as double quote, which suits my own purposes, but it would be trivial to make QuoteChar a global and set it within another function. I've also successfully tested it with single quote (ascii 39), which would be the tricky one in Delphi.
function ParseWord( const Source, Table: String; var Index: Integer):string;
var
RE : TRegEx;
match : TMatch;
Table2: string;
Source2 : string;
QuoteChar : string;
begin
if index = length(Source) then
begin
result:= '';
exit;
end;
// escape the special characters and wrap in a Group
QuoteChar := #39;
Table2 :='[^'+TRegEx.Escape(Table, false)+QuoteChar+']*|'+QuoteChar+'.*?'+QuoteChar ;
Source2 := copy(Source, Index, length(Source)-index+1);
match := TRegEx.Match(Source2,Table2);
if match.success then
begin
result := copy( Source2, match.index, match.length);
Index := Index + match.Index + match.Length-1;
end
else
begin
result := copy(Source, Index, length(Source)-Index+1);
Index := length(Source);
end;
while ( Length(result)= 0) and (Index<length(Source)) do
begin
Inc(Index);
result := ParseWord(Source,Table, Index);
end;
end;
This solution doesn't strip the quote chars from around quoted strings, but I can't tell from my own existing code if it should or not, and I can't test using Hyperstr. Maybe someone else knows?

PL/SQL key-value String using Regex

I have a String stored in a table in the following key-value format: "Key1☺Value1☺Key2☺Value2☺KeyN☺ValueN☺".
Given a Key how can I extract the Value? Is regex the easiest way to handle this? I am new to PL/SQL as well as Regex.
In this case, I would use just a regular split and iterate through the resulting array.
public string GetValue(string keyValuePairedInput, string key, char separator)
{
var split = keyValuePairedInput.Split(separator);
if(split.Lenght % 2 == 1)
throw new KeyWithoutValueException();
for(int i = 0; i < split.Lenght; i += 2)
{
if(split[i] == key)
return split[i + 1];
}
throw new KeyNotFoundException();
}
(this was not compiled and is not pl/sql anyway, treat it as pseudocode ☺)
OK I hear your comment...
Making use of pl/sql functions, you might be able to use something like this:
select 'key' as keyValue,
(instr(keyValueStringField, keyValue) + length(keyValue) + 1) as valueIndex,
substr(keyValueStringField, valueIndex, instr(keyValueStringField, '\1', valueIndex) - valueIndex) as value
from Table
For this kind of string slicing and dicing in PL/SQL you will probably have to use regular expressions. Oracle has a number of regular expression functions you can use. The most commonly used one is REGEXP_LIKE which is very similar to the LIKE operator but does RegEx matching.
However you probably need to use REGEXP_INSTR to find the positions where the separators are then use the SUBSTR function to slice up the string at the matched positions. You could also consider using REGEXP_SUBSTR which does the RegEx matching and slicing in one step.
As an alternative to regular expressions...
Assuming you have an input such as this:
Key1,Value1|Key2,Value2|Key3,Value3
You could use some PL/SQL as shown below:
FUNCTION get_value_by_key
(
p_str VARCHAR2
, p_key VARCHAR2
, p_kvp_separator VARCHAR2
, p_kv_separator VARCHAR2
) RETURN VARCHAR2
AS
v_key VARCHAR2(32767);
v_value VARCHAR2(32767);
v_which NUMBER;
v_cur VARCHAR(1);
BEGIN
v_which := 0;
FOR i IN 1..length(p_str)
LOOP
v_cur := substr(p_str,i,1);
IF v_cur = p_kvp_separator
THEN
IF v_key = p_key
THEN
EXIT;
END IF;
v_key := '';
v_value := '';
v_which := 0;
ELSIF v_cur = p_kv_separator
THEN
v_which := 1;
ELSE
IF v_which = 0
THEN
v_key := v_key || v_cur;
ELSE
v_value := v_value || v_cur;
END IF;
END IF;
END LOOP;
IF v_key = p_key
THEN
RETURN v_value;
END IF;
raise_application_error(-20001, 'key not found!');
END;
To get the value for 'Key2' you could do this (assuming your function was in a package called test_pkg):
SELECT test_pkg.get_value_by_key('Key1,Value1|Key2,Value2|Key3,Value3','Key2','|',',') FROM dual