Extract fixed length string between two numbers - regex

I have this number: 003859389453604802410207622210986832370060. In this instance, I need to extract 07622210986832 which comes before 02 and ends with 37.
In the real world, 07622210986832 is always 14 digits, and will always start with 02 and end with 37 BUT it could appear at any point in a string that is of random length - all we know is that the number will be there somewhere.
I'm currently using the formula:
=IF(LEN(IFERROR(REGEXEXTRACT(A1:A&"", "02(.*)37")))=14,
However, you will notice in the number sample there is another 02 - "024102".
This is causing an issue.
What I really want to happen is:
Lookup 02
Find the string of 14 numbers and if number 15 is 3 and 16 is 7 (37), that is the number we need.
If you find another 02 number with a 14 digit string and the next two numbers are not 37 - ignore.

Use the pattern 02(\d{14})37, it will extract a sequence of 14 digits preceded by 02 and followed by 37.

try like this:
=ARRAYFORMULA(REGEXEXTRACT(TO_TEXT({A2:A,B2:B,C2:C}), "02(\d{14})37"))
if you want to smash it into 1 column then:
=ARRAYFORMULA(TRIM(TRANSPOSE(QUERY(TRANSPOSE(REGEXEXTRACT(TO_TEXT({A2:A,B2:B,C2:C}),
"02(\d{14})37")),,999^99))))

Related

Regex for 10 digit phone number with variable spacing

I need to validate that a string follows these rules:
contains numerals
may optionally contain any number of space characters in any position
may not contain any other kind of character
the first two numerals must be one of the set: 02; 03; 07; 08; 13; 18
and the number of numerals must be exactly 10 unless the first two numerals are 1 and 3, in which case the number of numerals may be 10 or 6.
Essentially these are Australian landline (with area code), free-call and 13 numbers.
Ideally the regex should be as implementation-agnostic as possible.
Examples of valid input:
0299998888
02 99998888
02 9999 8888
02 99 998 888
0299 998 888
0299 998888
131999
131 999
13 19 99
1300123456
1300 123456
1300 123 456
1300 12 34 56
1300 12 34 56
PS. I've checked at least 5 other answers and searched for multiple variations of this question, to no avail.
The nearest I have is:
^(?=\d{10}$)(02|03|04|07|08|13|18)\d+
... however this does not account for spacing and won't accept 6 digit numbers beginning with 13.
Note, in theory, the following is acceptable:
1 3 1999
1 3 1 9 9 9
By this I mean that first pair of numerals may have a space between them (as bad as that looks).
Following are examples of random numbers that should fail:
13145 (not enough numerals)
1300-123-456 (hyphens not permitted)
9999 8888 (not enough numerals)
(02) 9999 8888 (parentheses not permitted)
You can make a separate pattern for 13 in alternation:
^(?:(?=(?:\s*\d\s*){10}$)(?:0\s*[2378]|1\s*[38])|(?=(?:\s*\d\s*){6}$)1\s*3).*
Demo: https://regex101.com/r/Hkjus2/2

Regular expression to validate 2 character hex string

I have a source of data that was converted from an oracle database and loaded into a hadoop storage point. One of the columns was a BLOB and therefore had lots of control characters and unreadable/undetectable ascii characters outside of the available codeset. I am using Impala to write regex replace function to parse some of the unicode characters that the regex library cannot understand. I would like to remove the offending 2 character hex codes BEFORE I use the unhex query function so that I can do the rest of the regex parsing with a "clean" string.
Here's the code I've used so far, which doesn't quite work:
'[2-7]{1}([A-Fa-f]|[0-9]{1})'
I've determined that I only need to capture \u0020-\u007f - or represented in the two bit hex - 20-7f
If my string looks like this:
010A000000153020405C00000000143020405CBC000000F53320405C4C010000E12F204058540100002D01
I would like to be able to capture 2 characters at a time (e.g. 01,0A,00) evaluate whether or not that fits the acceptable range of 2 byte hex I mentioned above and return only what is acceptable.
The correct output should be:
30 20 40 5C 30 20 40 5C 33 20 40 5C 4C 2F 20 40 58 and 54
However, my expression finds the first acceptable number in my first range (5) and starts the capture from there which returns the position or indexing wrong for the rest of the string... and this is the return from my expression -
010A0000001**53**0**20****40****5C**000000001**43**0**20****40****5C**BC000000F**53****32**0**40****5C****4C**010000E1**2F****20****40****58****54**010000**2D**01
I just don't know how to evaluate only two characters at a time in a mixed-length string. And, if they don't fit the expression, iterate to the next two characters. But only in two character increments.
My example: https://regex101.com/r/BZL7t0/1
I have added a Positieve Lookbehind to it. Which starts at the beginning of the string and then matches 2 characters at the time. This ensures that the group you're matching always has groups of 2 characters before it.
Positieve Lookbehind:
(?<=^(..)*)
Updated regex:
(?<=^(..)*)([2-7]{1}[A-Fa-f0-9]{1})
Preview:
Regex101

Printing every possible combination but must begin with 1-2 numbers and end with 10 characters of any kind

Is it possible to print every combination that begins with 1 or 2 numbers, then one - and in the end 10 letters from a-z, A-Z and 0-9.
Ex. 2-ErZI2eQSZ4
Ex. 16-teqOb7MU1g
The length of the combination would be from 12-13.
How long would it take and how big .txt would it be approximately?
If you look at it statistically, there are two sets of combinations: the ones beginning with 1 number, and the ones beginning with 2 numbers. In the former case, there are 10 ways to pick the first number and 62 ways (26 lowercase letters + 26 uppercase letters + 10 digits = 62 characters) to pick each of the 10 characters. So this gives us 10 * 62 ^ 10 possible outcomes for the former case.
The latter case has 10 ways to pick the first number, 10 ways to pick the second number, and 62 ways to pick each of the 10 other characters. So this gives us 10 * 10 * 62 ^ 10. Thus, the total number of combinations in the .txt file would have 10 * 62 ^ 10 + 10 * 10 * 62 ^ 10 lines. How long this would take depends on whether you are doing this by hand or by computer. It also depends on the language you are using if you plan to program this (which I sure would, if I had to generate all these combinations).

How to add zero in front of single digit values using REGEX in pentaho

I have the month values in a flat file like
Month
12
11
1
2
8
10
now i want to add zero in front of single digit values & double digit as same.
output should be like
Month
12
11
01
02
08
10
This am doing in PENTAHO (I will implement in Replace in string step)
I am not aware of PENTAHO. But following regex should work for most of the languages
Match : \b([0-9])\b
Replace : 0$1
regex101 demo

Trying to extract the last number on a line, with sets of numbers delimited by spaces

So I've extracted the digits from a log file and it looks like this:
2011 04 13 23 54 14 601 04 13 23 54 14 10 35 1 14 8080 59 250
What I'm trying to get is the last number (250), and it will loop through each line of the log. Once I get the last number from each line, I will do some calculations...I just can't extract that last number at the end of the line. Thanks!
while (<>) {
my ($last) = /(\d+)$/;
}
If your data is an array, #digits, then the last one is $digits[-1].
If your data is in a string, use the split to get it into an array.