Regex Replace on IBMi - c++

I am looking for a way to use Regex Replace functions on IBM iseries.
As far as i know, i can use C++ librairies (regex.h) (source)
With this, i can only match regex, but not replace.
(using regcomp() to compile and regexec() to match the regex)
Does anyone know a way to do it ?

It's true that the C/C++ POSIX regular expression library doesn't have a built in regexp replace function, but you can accomplish the same thing using positional information from regexec() and the RPGLE %replace() built in function. (I'm assuming you're going to use RPGLE but you could use another language.)
For example, if you wanted to mask all but the last four digits of a phone number you could do this:
/include qcpysrc,regex_h
d regex_phone_number...
d ds inz likeds(regex_t)
d dsrm ds inz likeds(regmatch_t) dim(20)
d data s 52a inz varying
d pattern s 256a inz varying
d rc s 10i 0 inz(0)
/FREE
*inlr = *on ;
data = 'My phone #''s are: (444) 555 - 6666 and 777.888.9999' ;
dsply data ;
pattern = '\(?([0-9]{3})[ .)]*([0-9]{3})[ .-]*([0-9]{4})' ;
rc = regcomp(regex_phone_number :pattern :REG_EXTENDED) ;
if rc = 0 ;
dow '1' ;
rc = regexec(regex_phone_number :data
:regex_phone_number.re_nsub :%addr(dsrm) :0) ;
if rc <> 0 ;
leave ;
endif ;
data = %replace('***': data :dsrm(2).rm_so+1
:dsrm(2).rm_eo - dsrm(2).rm_so) ;
data = %replace('***': data :dsrm(3).rm_so+1
:dsrm(3).rm_eo - dsrm(3).rm_so) ;
enddo ;
endif ;
dsply data ;
regfree(regex_phone_number) ;
/END-FREE
Here's what the copy book regex_h looks like:
** Header file for calling the "Regular Expression" functions
** provided by the ILE C Runtime Library from an RPG IV
** program. Scott Klement, 2001-05-04
** Converted to qualified DS 2003-11-29
** Modified by Jarrett Gilliam 2014-11-05
**
** This copy book is for using the C regular expression library, regex.h, in RPG.
** You can go to http://www.regular-expressions.info/ to learn more about
** regular expressions. This regex flavor is POSIX ERE. You can go to
** http://www-01.ibm.com/support/knowledgecenter/ssw_ibm_i_71/rtref/regexec.htm
** to learn more about how the C functions work.
d/if defined(REGEX_H)
d/eof
d/endif
d/define REGEX_H
**------------------------------------------------------------
* cflags for regcomp()
**------------------------------------------------------------
d REG_BASIC c CONST(0)
d REG_EXTENDED c CONST(1)
d REG_ICASE c CONST(2)
d REG_NEWLINE c CONST(4)
d REG_NOSUB c CONST(8)
**------------------------------------------------------------
* eflags for regexec()
**------------------------------------------------------------
d REG_NOTBOL c CONST(256)
d REG_NOTEOL c CONST(512)
**------------------------------------------------------------
* errors returned
**------------------------------------------------------------
* RE pattern not found
d REG_NOMATCH c CONST(1)
* Invalid Regular Expression
d REG_BADPAT c CONST(2)
* Invalid collating element
d REG_ECOLLATE c CONST(3)
* Invalid character class
d REG_ECTYPE c CONST(4)
* Last character is \
d REG_EESCAPE c CONST(5)
* Invalid number in \digit
d REG_ESUBREG c CONST(6)
* imbalance
d REG_EBRACK c CONST(7)
* \( \) or () imbalance
d REG_EPAREN c CONST(8)
* \{ \} or { } imbalance
d REG_EBRACE c CONST(9)
* Invalid \{ \} range exp
d REG_BADBR c CONST(10)
* Invalid range exp endpoint
d REG_ERANGE c CONST(11)
* Out of memory
d REG_ESPACE c CONST(12)
* ?*+ not preceded by valid RE
d REG_BADRPT c CONST(13)
* invalid multibyte character
d REG_ECHAR c CONST(14)
* (shift 6 caret or not) anchor and not BOL
d REG_EBOL c CONST(15)
* $ anchor and not EOL
d REG_EEOL c CONST(16)
* Unknown error in regcomp() call
d REG_ECOMP c CONST(17)
* Unknown error in regexec() call
d REG_EEXEC c CONST(18)
**------------------------------------------------------------
* Structure of a compiled regular expression:
**------------------------------------------------------------
d REG_SUBEXP_MAX c 20
d regex_t ds qualified align based(template)
d re_nsub 10i 0
d re_comp *
d re_cflags 10i 0
d re_erroff 10i 0
d re_len 10i 0
d re_ucoll 10i 0 dim(2)
d re_lsub * DIM(REG_SUBEXP_MAX)
d re_esub * DIM(REG_SUBEXP_MAX)
d re_map 256a
d re_shift 5i 0
d re_dbcs 5i 0
**------------------------------------------------------------
* structure used to report matches found by regexec()
**------------------------------------------------------------
d regmatch_t ds qualified align based(template)
d rm_so 10i 0
d rm_ss 5i 0
d rm_eo 10i 0
d rm_es 5i 0
**------------------------------------------------------------
* regcomp() -- Compile a Regular Expression ("RE")
*
* int regcomp(regex_t *preg, const char *pattern,
* int cflags);
*
* where:
* preg (output) = the compiled regular expression.
* pattern (input) = the RE to be compiled.
* cflags (input) = the sum of the cflag constants
* (listed above) for this RE.
*
* Returns 0 = success, otherwise an error number.
**------------------------------------------------------------
d regcomp pr 10i 0 extproc('regcomp')
d preg like(regex_t)
d pattern * value options(*string)
d cflags 10i 0 value
**------------------------------------------------------------
* regexec() -- Execute a compiled Regular Expression ("RE")
*
* int regexec(const regex_t *preg, const char *string,
* size_t nmatch, regmatch_t *pmatch, int eflags);
*
* where:
* preg (input) = the compiled regular expression
* (the output of regcomp())
* string (input) = string to run the RE upon
* nmatch (input) = the number of matches to return.
* pmatch (output) = array of regmatch_t DS's
* showing what matches were found.
* eflags (input) = the sum of the flags (constants
* provided above) modifying the RE
*
* Returns 0 = success, otherwise an error number.
**------------------------------------------------------------
d regexec pr 10i 0 extproc('regexec')
d preg like(regex_t) const
d string * value options(*string)
d nmatch 10u 0 value
d pmatch * value
d eflags 10i 0 value
**------------------------------------------------------------
* regerror() -- return error information from regcomp/regexec
*
* size_t regerror(int errcode, const regex_t *preg,
* char *errbuf, size_t errbuf_size);
*
* where:
* errcode (input) = the error code to return info on
* (obtained as the return value from
* either regcomp() or regexec())
* preg (input) = the (compiled) RE to return the
* error for.
* errbuf (output) = buffer containing human-readable
* error message.
* errbuf_size (input) = size of errbuf (max length of msg
* that will be returned)
*
* returns: length of buffer needed to get entire error msg
**------------------------------------------------------------
d regerror pr 10u 0 extproc('regerror')
d errcode 10i 0 value
d preg like(regex_t) const
d errbuf * value
d errbuf_size 10i 0 value
**------------------------------------------------------------
* regfree() -- free memory locked by Regular Expression
*
* void regfree(regex_t *preg);
*
* where:
* preg (input) = regular expression to free mem for.
*
* NOTE: regcomp() will always allocate extra memory
* to be pointed to by the various pointers in
* the regex_t structure. if you don't call this,
* that memory will never be returned to the system!
**------------------------------------------------------------
d regfree pr extproc('regfree')
d preg like(regex_t)
Here's the output:
DSPLY My phone #'s are: (444) 555 - 6666 and 777.888.9999
DSPLY My phone #'s are: (***) *** - 6666 and ***.***.9999
The code could be improved by extracting the replace logic and putting it in a Procedure of it's own, creating a custom regexp replace function based on the POSIX library but it's not absolutely necessary.

The ILE C/C++ runtime library does not have a regex replace function available.
Java, however, has excellent support for regular expressions and integrates easily with RPGLE.
Introduction to Java and RPG
Using Regular Expressions in Java

I succeed in using Regex with Java.
I was inspired by this code from scott klement and that code from ibm.
The mix works well. I just added the replace function.
H
/include QSYSINC/QRPGLESRC,JNI
D newString pr O CLASS(*JAVA:'java.lang.String')
D EXTPROC(*JAVA:'java.lang.String':
D *CONSTRUCTOR)
D bytearray 32767A VARYING CONST
D getBytes PR 65535A VARYING
D EXTPROC(*JAVA:
D 'java.lang.String':
D 'getBytes')
D PatternCompile pr O CLASS(*JAVA:
D 'java.util.regex.Pattern')
D EXTPROC(*JAVA:
D 'java.util.regex.Pattern':
D 'compile') STATIC
D pattern O CLASS(*JAVA:'java.lang.String')
D PatternMatcher pr O CLASS(*JAVA:
D 'java.util.regex.Matcher')
D EXTPROC(*JAVA:
D 'java.util.regex.Pattern':
D 'matcher')
D comparestr O CLASS(*JAVA
D :'java.lang.CharSequence')
D CheckMatches pr 1N EXTPROC(*JAVA
D :'java.util.regex.Matcher'
D :'matches')
D DoReplace pr O CLASS(*JAVA:'java.lang.String')
D EXTPROC(*JAVA
D :'java.util.regex.Matcher'
D :'replaceAll')
D replacement O CLASS(*JAVA
D :'java.lang.String')
D RegExPattern s O CLASS(*JAVA:
D 'java.util.regex.Pattern')
D RegExMatcher s O CLASS(*JAVA:
D 'java.util.regex.Matcher')
D jstrStmt s like(jstring)
D jPatStr s like(jstring)
D jRepStr s like(jstring)
D jRepStr2 s like(jstring)
D result S 30A
/free
jPatStr = newString('^(\+33|0)([1-9][0-9]{8})$');
jstrStmt = newString('+33123456789');
jRepStr = newString('0$2');
RegExPattern = PatternCompile(jPatStr);
RegExMatcher = PatternMatcher(RegExPattern : jstrStmt);
if (CheckMatches(RegExMatcher) = *ON);
dsply ('it matches');
else;
dsply ('it doesn''t match');
endif;
jRepStr2 = DoReplace(RegExMatcher : jRepStr);
result = getBytes(jRepStr2);
dsply (%subst(result : 1 : 30));
*inlr = *on;
/end-free
It works, but with Java. I still work on the PASE Solution WarrenT suggested, but using PASE in an ILE program is such a pain...

The Young i Professionals Wiki has a page of Open Source Binaries. In the list is the PCRE Library (Perl Compatible Regular Expressions).
Let us know how this works out. I may try it myself ;-)

For excellent SQLRPGLE example and explanation refer to :
https://www.rpgpgm.com/2017/10/replacing-parts-of-strings-using-regexp.html
REGEXP_REPLACE
(
source-string
,
pattern-expression
,
replacement-string
,
start
,
occurence
,
flags
)

Related

REGEX - how to extract a specific number of rows from a text

I need to find out how to extract a specific number of rows from a text( the number of rows that i want to extract would be variable).
In this case, i want to extract anything from 07/06/2021, up to SOLD FINAL ZI 1
TEXT
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
cccccccccccccccccccccccccccccccccccccccccccccccc
07/06/2021 P2P 00.00
T d r 0000 R A cc R A
r : aadr
REF. ------------------
P l p 00.00
P XX/XX/XXXX 0000000000 :00000000000 P R R
A B OO 0000000000 v e: 00.00 n 0000000000
c t 0.00 n
REF. ------------------
P2P 00.00
T d r 0000 R A c R A
rr : Saracie
REF. ------------------
P2P 00.00
T d r 0000 A. B c R A rr : Sanity
REF. ------------------
P l p 00.00
P XX/XX/XXXX 0000000000 00000000000 P R R
D OO 0000000000 V T: 00.00 n 0000000000 c
T 0.00 n
REF. ------------------
XX/XX/XXXX RULAJ ZI 1 3
SOLD FINAL ZI 1
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
cccccccccccccccccccccccccccccccccccccccccccccccc
In regex, i start with \n(\d{2}/\d{2}/\d{4}) in order to get the data 07/06/2021, but i don't know how to extract the rest.
Thank you in advance!
Hello and welcome to stackoverflow,
your question might not solve your actual problem. Do you REALLY want to "extract a specific number of rows"? This might be a XYProblem.
I like the solution from MDR to extract everything up to SOLD FINAL:
^(\d{2}\/\d{2}\/\d{4})[\s\S]+SOLD FINAL.
I like this because I guess you know the word at the end and not the number of lines. But we can't tell.
Anyway to give you the answer to your question (as your actual problem might look different than we expect) you can use this regex:
^(\d{2}\/\d{2}\/\d{4}).*$(\n^.*$){n}
^ --> look at the beginning of a row
(\d{2}\/\d{2}\/\d{4}) --> your regex for the date
.*$ --> also take the rest of the line
(\n^.*$){n} --> take the next n lines
\n --> the line break
^ --> again: beginning of a new line
.* --> as much characters as needed to match the next (non greedy)
$ --> the end of a line
{n}--> the number of lines you want to extract (replace n ;) )

Excel nesting - IF / AND Query part two?

Hi I Had a query earlier and thought I had cracked it with the help of Richard but it doesn't appear
I have attached an image and what I am trying to achieve to make my query clearer.
* If E is correct then cell F will be set to match D manually
* If E is yes and F is set to 111 then G will populate with the contents of C
* If E is no and F is set to anything but 111 then it will return 0
* If E is correct then cell F will be set to match D manually
* If E is yes and F is set to 112 then H will populate with the contents of C
* If E is no and F is set to anything but 112 then it will return 0
* If E is correct then cell F will be set to match D manually
* If E is yes and F is set to 118 then I will populate with the contents of C
* If E is no and F is set to anything but 118 then it will return 0
* If E is correct then cell F will be set to match D manually
* If E is yes and F is set to 119 then J will populate with the contents of C
* If E is no and F is set to anything but 119 then it will return 0
It's not 100% clear, but sounds like this is what you're after:
F2 = =IF(E2="Yes",IF(OR(D2=111,D2=112,D2=118,D2=119)=TRUE,D2,""),"")
G2 = =IF(AND(E2="Yes",F2=111)=TRUE,C2,"")
H2 = =IF(AND(E2="Yes",F2=112)=TRUE,C2,"")
I2 = =IF(AND(E2="Yes",F2=118)=TRUE,C2,"")
J2 = =IF(AND(E2="Yes",F2=119)=TRUE,C2,"")
Then just fill down. I've put "" instead of 0, because it's a lot easier to see what's going on without zero's everywhere. You can change them back once you're happy with the outcome.
Incidentally, sometimes it's easier to parse the code out. Excel works fine if you have code on different lines, like the following for D2:
=
IF(
E2="Yes",
IF(
OR(
D2=111,D2=112,D2=118,D2=119
)=TRUE,
D2,
""
),
""
)

Perl: RegEX: Capture group multiple times

I'm developing a piece of code to filter a text as follows:
<DATA>
.SUBCKT SVI A B C D E F
+ G H I
+ J K L
.....
+ X Y Z
*.PININFO AA BB CC
*.PININFO DD EE FF
<DATA>
I need the output to be
A B C D E F
G H I
J K L
.....
X Y Z
I already made a regular expression to do so:
m/\.SUBCKT\s+SVI\s(.*)|\+(.*)/gm
The problem is that I have many similar sections like this input but I only need to detect + lines which are following .SUBCKT SVI header not any other header.
How I could match group many times like (\+\s+(.*)). I want to match this repeated capture group as it repeated many times.
Any advice to get this expression.
Perhaps this is closer to what you need.
m/\.SUBCKT\s+SVI\s(.*)\n(\+\s+(.*)\n)*/gm
Does this do what you want? Note that it stops at the ..... because it doesn't begin with a + or .SUBCKT
It won't handle the case where a range of + lines is immediately followed by another .SUBCKT line; is that a problem?
use strict;
use warnings;
while ( <DATA> ) {
next unless my $in_range = s/^\.SUBCKT\s+// ... /^[^+]/;
next if $in_range =~ /E/;
s/^\S+\s+//;
print;
}
__DATA__
<DATA>
.SUBCKT SVI A B C D E F
+ G H I
+ J K L
.....
+ X Y Z
*.PININFO AA BB CC
*.PININFO DD EE FF
<DATA>
output
A B C D E F
G H I
J K L
Update
Here's a state machine version that deals with the special case described above
use strict;
use warnings;
my $state;
while ( <DATA> ) {
if ( /^\.SUBCKT\s+\S+\s+(.+)/ ) {
$state = 1;
print $1, "\n";
}
elsif ( /^\+\s+(.+)/ ) {
print $1, "\n" if $state;
}
else {
$state = 0;
}
}
__DATA__
<DATA>
.SUBCKT SVI A B C D E F
+ G H I
+ J K L
.SUBCKT SVI A B C D E F
+ M N O
+ P Q R
*.PININFO AA BB CC
*.PININFO DD EE FF
<DATA>
output
A B C D E F
G H I
J K L
A B C D E F
M N O
P Q R
I made use of #shawnt00 answer and modified the regular expression and it made the job.
\.SUBCKT\s+SVI_TRX201TH\s(.*\n(\+\s+.*\n)*)

G++ Warning: extra tokens at end of #include directive [enabled by default]

I can't find the problem, anyone know solve?
Code
#include <algorithm>‎
int main(int argc, char* argv[]) {
return 0;
}
Warning
extra tokens at end of #include directive [enabled by default]
Looking at the code quoted above using od -c gives this output:
0000000 # i n c l u d e < a l g o r i
0000020 t h m > 342 200 216 \n i n t m a i n
0000040 ( i n t a r g c , c h a r *
0000060 a r g v [ ] ) { \n r
0000100 e t u r n 0 ; \n } \n
Note the bytes between the > and the \n: You probably want to get rid of them.

VBA transpose column vector by delimiter and include nullstrings

Below is an example of my data (column vector):
NAME
A
B
C
A
B
C
[blank cell]
B
C
NAME
A
B
C
Note that the [blank cell] in my document is actually just a blank cell, no formula data etc.
Ultimately, I just need to transpose the data delimited by NAME. NAME cells are the only cells with:
Left(rngCell.Value, 2) = Left(StrConv(rngCell.Value, vbUpperCase), 2)
My code is breaking my transpose and making a new row for all NAME and all [blank cell]s.
I'm trying to get:
NAME A B C A B C B C
NAME A B C A B C A B C
NAME A B C B C B C
But my code is returning:
NAME A B C A B C
B C
NAME A B C A B C A B C
NAME A B C
B C
B C
Here is the code I'm using:
Sub Dataclean()
Dim lngRowLast As Long, _
lngRowPaste As Long, _
lngColOffset As Long
Dim rngCell As Range, _
rngDataSet As Range
Dim strSourceTab As String, _
strOutputTab As String
'Tab name containing source data. Change to suit.
strSourceTab = "sheet2pull"
'Tab name for data output. Change to suit.
strOutputTab = "transposed"
lngRowLast = Sheets(strSourceTab).Cells(Rows.Count, "A").End(xlUp).Row
'Assumes the original dataset is in Column A and starts at Row 1. Change to suit.
Set rngDataSet = Sheets(strSourceTab).Range("A1:A" & lngRowLast)
Application.ScreenUpdating = False
For Each rngCell In rngDataSet
If Left(rngCell.Value, 2) = Left(StrConv(rngCell.Value, vbUpperCase), 2) Then
If lngRowPaste = 0 And lngColOffset = 0 Then
lngRowPaste = 1
lngColOffset = 1
Else
lngRowPaste = lngRowPaste + 1
lngColOffset = 1
End If
ElseIf lngRowPaste = 0 And lngColOffset = 0 Then
lngRowPaste = 1
lngColOffset = 1
End If
Sheets(strOutputTab).Cells(lngRowPaste, lngColOffset).Value = rngCell.Value
lngColOffset = lngColOffset + 1
Next rngCell
Application.ScreenUpdating = True
End Sub
Please let me know if I've been unclear or confusing. I tried to be as explicit as possible, but it's often tough to explain! Thank you so much.
I'm a bit new to VBA, but learning.
You error is when the first if is comparing a blank. This will always be true as a blank and uppercase blank aare equal. Replace your first if with the below.
If Left(rngCell.Value, 2) = Left(StrConv(rngCell.Value, vbUpperCase), 2) And _
Not IsBlank(rngCell.Value) Then