Regex Pattern for String including newline characters - regex

I am looking for a regex pattern that will return a match from %PDF-1.2 to and including %%EOF in the string below.
So far my patterns don't seem to work.
DOCUMENTS ACCEPTED
001//201//0E9136614////ACME 107 PTY LTD//8
**E10 End of validation report**
BDAT 4367 LAST
XSVBOUT
001XSVSEPRXXXOUT_TP.19
ZHDASCRA55 0700 8
ZCO*** TEST DATABASE ***ACME 107 PTY LTD 551824563 APTY LMSH PDF NSW 20111217 PNPC
ZIL 77000030149 Australian Securities and Investments Commission 86768265615 ZUMESOFT SOLUTIONS PTY LTD 61 buxton st north adelaide SA 5006
ZIAProprietary Company 42600 0E9136614 201 TAX INVOICE EXE 0 0E9136614201C PA 20111217 Not Subject to GST - Treasurer's Determination (Exempt Taxes, Fees and Charges)
ZTRENDRA55 5
%PDF-1.2
%????
3495
%%EOF
BDAT 11 LAST

/(?s)(%PDF-1\.2.+%%EOF)/ should solve your problem
If you are using an older flavor of regex the (?s) could be moved to the end of regex modifier like //s so.

Related

Using regex in Google Apps Script to extract number before a string

I receive a Gmail that contains text similar to that below. I only want to extract the number immediately before the string 'symbol(s)'. In this case that number would be 233, but it and the other contents of the email vary every day.
ZBH Zimmer Biomet Holdings Inc Medical Devices Healthcare NYSE 25,250 120.68 0.03 0.02 3,124,139
ZBRA Zebra Technologies Corp Cl A Communication Equipment Technology NASDAQ 19,620 368.14 -11.43 -3.01 626,763
ZION Zions Bancorp N.A. Banks - Regional Financial Services NASDAQ 8,580 57.47 0.46 0.81 1,507,253
ZTS Zoetis Inc Drug Manufacturers - Specialty Healthcare NYSE 83,530 175.14 0.47 0.27 2,152,320
233 symbol(s)
I have figured out how to extract everything before,
(?:(?!symbols(s)).)*
but I don't want everything, only the number '233'.
Thank you
Description
Regex is hard for most people to work with, myself included. Make your life easier. I would split the long text string into words and then search for "symbols(s)", then I know the number is just before that.
Script
function test2() {
try {
let text = "ZBH Zimmer Biomet Holdings Inc Medical Devices Healthcare NYSE 25,250 120.68 0.03 0.02 3,124,139 ZBRA Zebra Technologies Corp Cl A Communication Equipment \
Technology NASDAQ 19,620 368.14 -11.43 -3.01 626,763 ZION Zions Bancorp N.A. Banks - Regional Financial Services NASDAQ 8,580 57.47 0.46 0.81 1,507,253 ZTS Zoetis Inc \
Drug Manufacturers - Specialty Healthcare NYSE 83,530 175.14 0.47 0.27 2,152,320 233 symbol(s)";
text = text.split(" ");
let i = text.indexOf("symbol(s)");
console.log(text[i-1]);
}
catch(err) {
console.log(err);
}
}
Console.log
8:31:44 AM Notice Execution started
8:31:46 AM Info 233
8:31:44 AM Notice Execution completed
Reference
String.split()
Array.indexOf()
I like TheWizEd's approach, but in case you still want to try a regex, you can use the following:
\d*(?=\ssymbol)
It's called a lookahead assertion. It will match any number of digits before a space and the word "symbol".
If you really need to include the (s) at the end then you can use this one:
\d*(?=\ssymbol\(s\))
You can test it here with your sample text and it should work: https://regexr.com/

Extracting Multiple Blocks of Similar Text

I am trying to parse a report. The following is a sample of the text that I need to parse:
7605625112 DELIVERED N 1 GORDON CONTRACTORS I SIPLAST INC Freight Priority 2000037933 $216.67 1,131 ROOFING MATERIALS
04/23/2021 02:57 PM K WRISHT N 4 CAPITOL HEIGHTS, MD ARKADELPHIA, AR Prepaid 2000037933 -$124.23 170160-00
04/27/2021 12:41 PM 2 40 20743-3706 71923 $.00 055 $.00
2 WBA HOT $62.00 0
$12.92 $92.44
$167.36
7605625123 DELIVERED N 1 SECHRIST HALL CO SIPLAST INC Freight Priority 2000037919 $476.75 871 PAIL,UN1263,PAINT,3,
04/23/2021 02:57 PM S CHAVEZ N 39 HARLINGEN, TX ARKADELPHIA, AR Prepaid 2000037919 -$378.54
04/27/2021 01:09 PM 2 479 78550 71923 $.00 085 $95.35
2 HRL HOT $62.00 21
$13.55 $98.21
$173.76
This comprised of two or more blocks that start with "[0-9]{10}\sDELIVERED" and the last currency string prior to the next block.
If I test with "(?s)([0-9]{10}\sDELIVERED)(.*)(?<=\$167.36\n)" I successfully get the first Block, but If I use "(?s)([0-9]{10}\sDELIVERED)(.*)(?<=\$\d\d\d.\d\d\n)" it grabs everything.
If someone can show me the changes that I need to make to return two or more blocks I would greatly appreciate it.
* is a greedy operator, so it will try to match as much characters as possible. See also Repetition with Star and Plus.
For fixing it, you can use this regex:
(?s)(\d{10}\sDELIVERED)((.(?!\d{10}\sDELIVERED))*)(?<=\$\d\d\d.\d\d)
in which I basically replaced .* with (.(?!\d{10}\sDELIVERED))* so that for every character it checks if it is followed or not by \d{10}\sDELIVERED.
See a demo here

Regular expression to debatch MT940 message

I got a message with below structure, where message starts from tag :20: and ends at :86:. I want to write a regular expression to extract the all messages.
I would write a C# utility to extract each message and put it in ArrayList.
:20:160212-2359
:21:600******444
:28C:00001/00001
.
.
.
:86:DAILY SETTLEMENT /ENTRY-13 MAR
:62F:D160212GBP1229387,45
:64:D160212GBP1229387,45
:65:D120314GBP1229387,45
:65:D120315GBP1229387,45
:65:D120316GBP1229387,45
:65:D120317GBP1229387,45
:65:D120318GBP1229387,45
:86:FORWARD AVAILABLE FUNDS SHOW ITEMS KNOWN BUT NOT YET POSTED
some more comments in 86_2 segment
this is line2
:20:160212-2359
:21:B***22
:25:60*****88
.
.
.
:86:/ENTRY-13 MAR TRF/REF 6*******64 /ORD/ some line here
*********************** /BNF/ JO 88
:62F:C160212EUR13868931,00
:64:C160212EUR13868931,00
:65:C120314EUR13868931,00
:65:C120315EUR13791849,00
:65:C120316EUR13791849,00
:65:C120317EUR13791849,00
:65:C120318EUR13791849,00
:86:FORWARD AVAILABLE FUNDS SHOW ITEMS KNOWN BUT NOT YET POSTED
some more comments in 86_2 segment.
:20:160212-2359
:21:B****X
:25:6*************1
:28C:00001/00001
:86:STORE1 EUROPE B.V. /ENTRY-15 MAR RTS/REF 6*****6 RTS
SWEPT FROM 9999 1**** XX***********BILLING CHARGES -
28FEB12 TRF/REF 6641XXX43799053 /ITEMCNT/004 /BNF/ /ITEMCNT/004
BILLING CHARGES
:61:1203130313DR10000000,00****288//6*****6
:86:STORE1 CNRTY SRL /ENTRY-13 MAR CLG/REF 66**********6
:61:1*****000,00NT*****9846//6******74
:86:NAME /ENTRY-13 MAR CLG/REF 6******4 LA C****R
**** CASH DEPOSIT STORE1
:61:1203150315DR48531,00NCHGBILLING CHARGES//6641XXX43799053
:86:BILLING CHARGES - 28FEB12 /ENTRY-15 MAR TRF/REF
66******53 /ITEMCNT/004
:62F:C160212EUR0,00
:64:C160212EUR0,00
:65:C120314EUR0,00
:65:C120315EUR0,00
:65:C120316EUR0,00
:65:C120317EUR0,00
:65:C120318EUR0,00
:86:FORWARD AVAILABLE FUNDS SHOW ITEMS KNOWN BUT NOT YET POSTED
{newline}
Actual values are replaced with '*' character.
Thanks
Dhiraj Bhavsar
Try this
:20:(.*?):86:
in code
/:20:(.*?):86:/gs
https://regex101.com/r/dW4zS3/1
.*? matches any character between zero and unlimited times, as few times as possible, expanding as needed

Validate Mobile number using regular expression

I need to validate mobile number. My need:
The number may start with +8801 or 8801 or 01
The next number can be 1 or 5 or 6 or 7 or 8 or 9
Then there have exact 8 digit.
How can i write the regular expression using this conditions ?
the mobile numbers I tried
+8801811419556
01811419556
8801711419556
01611419556
8801511419556
Should be pretty simple:
^(?:\+?88)?01[15-9]\d{8}$
^ - From start of the string
(?:\+?88)? - optional 88, which may begin in +
01 - mandatory 01
[15-9] - "1 or 5 or 6 or 7 or 8 or 9"
\d{8} - 8 digits
$ - end of the string
Working example: http://rubular.com/r/BvnSXDOYF8
Update 2020
As BTRC approved 2 new prefixes, 013 for Grameenphone and 014 for Banglalink, updated expression for now:
^(?:\+?88)?01[13-9]\d{8}$
You may use either one of given regular expression to validate Bangladeshi mobile number.
Solution 1:
/(^(\+88|0088)?(01){1}[56789]{1}(\d){8})$/
Robi, Grameen Phone, Banglalink, Airtel and Teletalk operator mobile no are allowed.
Solution 2:
/(^(\+8801|8801|01|008801))[1|5-9]{1}(\d){8}$/
Citycell, Robi, Grameen Phone, Banglalink, Airtel and Teletalk operator mobile no are allowed.
Allowed mobile number pattern
+8801812598624
008801812598624
01812598624
01712598624
01919598624
01672598624
01512598624
................
.................
Use the following regular expression and test it if you want on following site quickly
regex pal
[8]*01[15-9]\d{8}
I know, that question was asked long time ago, but i assume that #G. M. Nazmul Hossain want to validate mobile number againt chosen country. I show you, how to do it with free library libphonenumber from Google. It's available for Java, C++ and Javascript, but there're also fork for PHP and, i believe, other languages.
+880 tells me that it's country code for Bangladesh. Let's try to validate example numbers with following code in Javascript:
String bdNumberStr = "8801711419556"
PhoneNumberUtil phoneUtil = PhoneNumberUtil.getInstance();
try {
//BD is default country code for Bangladesh (used for number without 880 at the begginning)
PhoneNumber bdNumberProto = phoneUtil.parse(bdNumberStr, "BD");
} catch (NumberParseException e) {
System.err.println("NumberParseException was thrown: " + e.toString());
}
boolean isValid = phoneUtil.isValidNumber(bdNumberProto); // returns true
That code will handle also numbers with spaces in it (for example "880 17 11 41 95 56"), or even with 00880 at the beggininng (+ is sometimes replaced with 00).
Try it out yourself on demo page. Validates all of provided examples and even more.
Have a look at libphonenumber at:
https://code.google.com/p/libphonenumber/
Bangladeshi phone number (Citycell, Robi, Grameen Phone, Banglalink, Airtel and Teletalk operators) validation by using regular expression :
$pattern = '/(^(\+8801|8801|01|008801))[1-9]{1}(\d){8}$/';
$BangladeshiPhoneNo = "+8801840001417";
if(preg_match($pattern, $BangladeshiPhoneNo)){
echo "It is a valid Bangladeshi phone number;
}
**Laravel Bangladeshi Phone No validation for (Citycell, Robi, Grameen Phone, Banglalink, Airtel and Teletalk) and start with +88/88 then 01 then 356789 then 8 digit**
public function rules()
{
return [
'mobile' => 'sometimes|regex:/^(?:\+?88)?01[35-9]\d{8}$/',
];
}
public function messages()
{
'mobile.regex' => 'Mobile no should be bd standard',
];
}

Using Perl to extract text from a text file

I have a question related to using regex to pull out data from a text file. I have a text file in the following format:
REPORTING-OWNER:
OWNER DATA:
COMPANY CONFORMED NAME: DOE JOHN
CENTRAL INDEX KEY: 99999999999
FILING VALUES:
FORM TYPE: 4
SEC ACT: 1934 Act
SEC FILE NUMBER: 811-00248
FILM NUMBER: 11530052
MAIL ADDRESS:
STREET 1: 7 ST PAUL STREET
STREET 2: STE 1140
CITY: BALTIMORE
STATE: MD
ZIP: 21202
ISSUER:
COMPANY DATA:
COMPANY CONFORMED NAME: ACME INC
CENTRAL INDEX KEY: 0000002230
IRS NUMBER: 134912740
STATE OF INCORPORATION: MD
FISCAL YEAR END: 1231
BUSINESS ADDRESS:
STREET 1: SEVEN ST PAUL ST STE 1140
CITY: BALTIMORE
STATE: MD
ZIP: 21202
BUSINESS PHONE: 4107525900
MAIL ADDRESS:
STREET 1: 7 ST PAUL STREET SUITE 1140
CITY: BALTIMORE
STATE: MD
ZIP: 21202
I want to save the owner's name (John Doe) and identifier (99999999999) and the company's name (ACME Inc) and identfier (0000002230) as separate variables. However, as you can see, the variable names (CENTRAL INDEX KEY and COMPANY CONFORMED NAME) are exactly the same for both pieces of information.
I've used the following code to extract the owner's information, but I can't figure out how to extract the data for the company. (Note: I read the entire text file into $data).
if($data=~m/^\s*CENTRAL\s*INDEX\s*KEY:\s*(\d*)/m){$cik=$1;}
if($data=~m/^\s*COMPANY\s*CONFORMED\s*NAME:\s*(.*$)/m){$name=$1;}
Any idea as to how I can extract the information for both the owner and the company?
Thanks!
There is a big difference between doing it quick and dirty with regexes (maintenance nightmare), or doing it right.
As it happens, the file you gave looks very much like YAML.
use YAML;
my $data = Load(...);
say $data->{"REPORTING-OWNER"}->{"OWNER DATA"}->{"COMPANY CONFORMED NAME"};
say $data->{"ISSUER"}->{"COMPANY DATA"}->{"COMPANY CONFORMED NAME"};
Prints:
DOE JOHN
ACME INC
Isn't that cool? All in a few lines of safe and maintainable code ☺
my ($ownname, $ownkey, $comname, $comkey) = $data =~ /\bOWNER DATA:\s+COMPANY CONFORMED NAME:\s+([^\n]+)\s*CENTRAL INDEX KEY:\s+(\d+).*\bCOMPANY DATA:\s+COMPANY CONFORMED NAME:\s+([^\n]+)\s*CENTRAL INDEX KEY:\s+(\d+)/ms
If you're reading this file on a UNIX operating system but it was generated on Windows, then line endings will be indicated by the character pair \r\n instead of just \n, and in this case you should do
$data =~ tr/\r//d;
first to get rid of these \r characters and prevent them from finding their way into $ownname and $comname.
Select both bits of information at the same time so that you know that you're getting the CENTRAL INDEX KEY associated with either the owner or the company.
($name, $cik) = $data =~ /COMPANY\s+CONFORMED\s+NAME:\s+(.+)$\s+CENTRAL\s+INDEX\s+KEY:\s+(.*)$/m;
Instead of trying to match elements in the string, split it into lines, and parse properly into data structure that will let such searches be made easily, like:
$data->{"REPORTING-OWNER"}->{"OWNER DATA"}->{"COMPANY CONFORMED NAME"}
That should be relatively easy to do.
Search for OWNER DATA: read one more line, split on : and take the last field. Same for COMPANY DATA: header (sortof), on so on