Regular expression to debatch MT940 message - regex

I got a message with below structure, where message starts from tag :20: and ends at :86:. I want to write a regular expression to extract the all messages.
I would write a C# utility to extract each message and put it in ArrayList.
:20:160212-2359
:21:600******444
:28C:00001/00001
.
.
.
:86:DAILY SETTLEMENT /ENTRY-13 MAR
:62F:D160212GBP1229387,45
:64:D160212GBP1229387,45
:65:D120314GBP1229387,45
:65:D120315GBP1229387,45
:65:D120316GBP1229387,45
:65:D120317GBP1229387,45
:65:D120318GBP1229387,45
:86:FORWARD AVAILABLE FUNDS SHOW ITEMS KNOWN BUT NOT YET POSTED
some more comments in 86_2 segment
this is line2
:20:160212-2359
:21:B***22
:25:60*****88
.
.
.
:86:/ENTRY-13 MAR TRF/REF 6*******64 /ORD/ some line here
*********************** /BNF/ JO 88
:62F:C160212EUR13868931,00
:64:C160212EUR13868931,00
:65:C120314EUR13868931,00
:65:C120315EUR13791849,00
:65:C120316EUR13791849,00
:65:C120317EUR13791849,00
:65:C120318EUR13791849,00
:86:FORWARD AVAILABLE FUNDS SHOW ITEMS KNOWN BUT NOT YET POSTED
some more comments in 86_2 segment.
:20:160212-2359
:21:B****X
:25:6*************1
:28C:00001/00001
:86:STORE1 EUROPE B.V. /ENTRY-15 MAR RTS/REF 6*****6 RTS
SWEPT FROM 9999 1**** XX***********BILLING CHARGES -
28FEB12 TRF/REF 6641XXX43799053 /ITEMCNT/004 /BNF/ /ITEMCNT/004
BILLING CHARGES
:61:1203130313DR10000000,00****288//6*****6
:86:STORE1 CNRTY SRL /ENTRY-13 MAR CLG/REF 66**********6
:61:1*****000,00NT*****9846//6******74
:86:NAME /ENTRY-13 MAR CLG/REF 6******4 LA C****R
**** CASH DEPOSIT STORE1
:61:1203150315DR48531,00NCHGBILLING CHARGES//6641XXX43799053
:86:BILLING CHARGES - 28FEB12 /ENTRY-15 MAR TRF/REF
66******53 /ITEMCNT/004
:62F:C160212EUR0,00
:64:C160212EUR0,00
:65:C120314EUR0,00
:65:C120315EUR0,00
:65:C120316EUR0,00
:65:C120317EUR0,00
:65:C120318EUR0,00
:86:FORWARD AVAILABLE FUNDS SHOW ITEMS KNOWN BUT NOT YET POSTED
{newline}
Actual values are replaced with '*' character.
Thanks
Dhiraj Bhavsar

Try this
:20:(.*?):86:
in code
/:20:(.*?):86:/gs
https://regex101.com/r/dW4zS3/1
.*? matches any character between zero and unlimited times, as few times as possible, expanding as needed

Related

Regex to find a specific word between two other specific words

when inspecting content of email body I want to detect when a distribution list name contains "DL" in the "To" field or the "CC" field but not in the subject.
Basically i want my text (DL) detected when found between the closest "To:" and the closest "Subject".
The best I can do is the following but it detects everything from the very first instance of "To:" with a subsequent DL until the very last instance of "Subject"
(?<=To: )(?s:.)*?( DL | DL-)(?s:.)*?(?=Subject:)
expected results: "DL-" from DL-Musketeers but not the "DL" in the subject line if the distribution list wasn't present
From: Mouse, Mickey <JMouse#Disney.​com<mailto:JMouse#​Disney.com>>
Sent: Thursday, May 26, 2022 8:14 AM
To: Mouse, Minnie <DMouse#Disney.c​om<mailto:DMouse#Disney.com>>
Cc: Disney, Joseph R <JDisney#Disney.co​m<mailto:JDisney#Disney.com>> DL-Musketeers#Disney.com
Subject: RE: DL commission
Thanks in advance.
I was able to find a solution with help from #Barmar.
What I'm using is:
(?<=To:)(.)*?( DL | DL-)(?s:.)*?(?=Subject:)|(?<=Cc:)(.)*?( DL | DL-)(?s:.)*?(?=Subject:)

Extracting Multiple Blocks of Similar Text

I am trying to parse a report. The following is a sample of the text that I need to parse:
7605625112 DELIVERED N 1 GORDON CONTRACTORS I SIPLAST INC Freight Priority 2000037933 $216.67 1,131 ROOFING MATERIALS
04/23/2021 02:57 PM K WRISHT N 4 CAPITOL HEIGHTS, MD ARKADELPHIA, AR Prepaid 2000037933 -$124.23 170160-00
04/27/2021 12:41 PM 2 40 20743-3706 71923 $.00 055 $.00
2 WBA HOT $62.00 0
$12.92 $92.44
$167.36
7605625123 DELIVERED N 1 SECHRIST HALL CO SIPLAST INC Freight Priority 2000037919 $476.75 871 PAIL,UN1263,PAINT,3,
04/23/2021 02:57 PM S CHAVEZ N 39 HARLINGEN, TX ARKADELPHIA, AR Prepaid 2000037919 -$378.54
04/27/2021 01:09 PM 2 479 78550 71923 $.00 085 $95.35
2 HRL HOT $62.00 21
$13.55 $98.21
$173.76
This comprised of two or more blocks that start with "[0-9]{10}\sDELIVERED" and the last currency string prior to the next block.
If I test with "(?s)([0-9]{10}\sDELIVERED)(.*)(?<=\$167.36\n)" I successfully get the first Block, but If I use "(?s)([0-9]{10}\sDELIVERED)(.*)(?<=\$\d\d\d.\d\d\n)" it grabs everything.
If someone can show me the changes that I need to make to return two or more blocks I would greatly appreciate it.
* is a greedy operator, so it will try to match as much characters as possible. See also Repetition with Star and Plus.
For fixing it, you can use this regex:
(?s)(\d{10}\sDELIVERED)((.(?!\d{10}\sDELIVERED))*)(?<=\$\d\d\d.\d\d)
in which I basically replaced .* with (.(?!\d{10}\sDELIVERED))* so that for every character it checks if it is followed or not by \d{10}\sDELIVERED.
See a demo here

How to format first 7 rows in this txt file using Regex

I have a text file with data formatted as below. Figured out how to format the second part of the file to format it for upload into a db table. Hitting a wall trying to get the just the first 7 lines to format in the same way.
If it wasn't obvious, I'm trying to get it pipe delimited with the exact same number of columns, so I can easily upload it to the db.
Year: 2019 Period: 03
Office: NY
Dept: Sales
Acct: 111222333
SubAcct: 11122234-8
blahblahblahblahblahblahblah
Status: Pending
1000
AAAAAAAAAA
100,000.00
2000
BBBBBBBBBB
200,000.00
3000
CCCCCCCCCC
300,000.00
4000
DDDDDDDDDD
400,000.00
some kind folks answered my question about the bottom part, using the following code I can format that to look like so -
(.*)\r?\n(.*)\r?\n(.*)(?:\r?\n|$)
substitute with |||||||$1|$2|$3\n
|||||||1000|AAAAAAAAAA|100,000.00
|||||||2000|BBBBBBBBBB|200,000.00
|||||||3000|CCCCCCCCCC|300,000.00
|||||||4000|DDDDDDDDDD|400,000.00
just need help formatting the top part - to look like this, so the entire file matches with the exact same number of columns.
Year: 2019|Period: 03|Office: NY|Dept: Sales|Acct: 111222333|SubAcct: 11122234-8|blahblahblahblahblahblahblah|Status: Pending|||
I'm ok with having multiple passes on the file to get the desired end result.
I've helped you on your previous question, so I will focus now on the first part of your file.
You can use this regex:
\n|\b(?=Period)
Working demo
And use | as the replacement string
If you don't want the previous space before Period, then you can use:
\n|\s(?=Period)

slice the middle of a dynamically generated string in Django

Is there a way (Using Django filters or any other language) to find and slice certain parts of a dynamically generated string? I have tried the slice method and the truncate method
{{variable|slice:"130:-60"}} or
{{variable|truncatechars:255 }}
but neither of those methods work exactly right..... I am working on weather alerts (provided by the National Weather Service) and each alert comes with a unique ID on the front and (sometimes) on the back too.
The unique ID #'s and length vary between 60 and 130 characters and the ID at the end is longitude and latitude but it's only included about 1/2 the time.
So I am looking for / working on code to "sniff out" and remove the unique ID's and to only provide the text for the user to see.
What is the proper method to do this?
Here is an example of an alert:
INC077-437-75584393-/09584738.EGY/W.0027//KT.0215401321/ 1100 AM CDT WED MAY13 2015 THE FLOODING WILL CONTINUE FOR THE MISSISSIPPI RIVER NEAR ORLANDO FLORIDA. FROM THIS EVENING TO THE END OF TIME AT 600 AM WEDNESDAY THE STAGE WAS 30.5 FEET. FLOOD STAGE IS 30.6 FEET IMPACT BY TONIGHT AT 1000 PM SOME WATER BEGINS TO FILL SOME DITCHES. && LAT...LON 4125 5845 5458 6548 8964 5124 1234 8706 $$
and with code (where I call the variable) I want it to be:
1100 AM CDT WED MAY13 2015 THE FLOODING WILL CONTINUE FOR THE MISSISSIPPI RIVER NEAR ORLANDO FLORIDA. FROM THIS EVENING TO THE END OF TIME AT 600 AM WEDNESDAY THE STAGE WAS 30.5 FEET. FLOOD STAGE IS 30.6 FEET IMPACT BY TONIGHT AT 1000 PM SOME WATER BEGINS TO FILL SOME DITCHES.
but I can't cut or truncate because the length of every weather alert is different and each unique ID is a different # and a different length.
Any help is appreciated!
You can add a custom method to your model such as :
class Weather(models.Model):
alert = models.TextField()
#property
def get_id(self):
return self.alert.split('/')[-1]
And in your template :
<p>{{ weather.get_id }}</p>
You can also create a custom template filter :
from django import template
register = template.Library()
#register.filter(name='get_id')
def get_id(value):
return value.split('/')[-1]
And use it in your template this way :
<p>{{ weather|get_id }}</p>

Regex Pattern for String including newline characters

I am looking for a regex pattern that will return a match from %PDF-1.2 to and including %%EOF in the string below.
So far my patterns don't seem to work.
DOCUMENTS ACCEPTED
001//201//0E9136614////ACME 107 PTY LTD//8
**E10 End of validation report**
BDAT 4367 LAST
XSVBOUT
001XSVSEPRXXXOUT_TP.19
ZHDASCRA55 0700 8
ZCO*** TEST DATABASE ***ACME 107 PTY LTD 551824563 APTY LMSH PDF NSW 20111217 PNPC
ZIL 77000030149 Australian Securities and Investments Commission 86768265615 ZUMESOFT SOLUTIONS PTY LTD 61 buxton st north adelaide SA 5006
ZIAProprietary Company 42600 0E9136614 201 TAX INVOICE EXE 0 0E9136614201C PA 20111217 Not Subject to GST - Treasurer's Determination (Exempt Taxes, Fees and Charges)
ZTRENDRA55 5
%PDF-1.2
%????
3495
%%EOF
BDAT 11 LAST
/(?s)(%PDF-1\.2.+%%EOF)/ should solve your problem
If you are using an older flavor of regex the (?s) could be moved to the end of regex modifier like //s so.