regex extract data - regex

I'm new on regular expression and I'm spending last two days about my problem.
I have a string like this:
38_285_4461_186_S2A_MSIL2A_20180119T101331_N0206_R022_T32TQQ_20180119T135441
and I need four regex expression to extraxt data from this string in four parts:
38
285
4461
186
I have more string to evaluate and this values are variable, each group contains only number, but number of digits are variable
example of string template are:
xx_xxx_xxxx_xx_S2...................... (where x is a digit and is variable)
I tried the following regex
^(?:[^_]*\_){1}([^_]*)
edit:
I need four regex expression, one for "group" and result is full match.
I can't use java. Regular expression will be used in geoserver.
http://docs.geoserver.org/latest/en/user/tutorials/imagemosaic_timeseries/imagemosaic_time-elevationseries.html

You could either use
^(\d+)_(\d+)_(\d+)_(\d+)
or simply split on the _ and use the array parts.
See a demo on regex101.com.

You don't need a regex to solve this problem, you can simply use split like this:
String[] values = "38_285_4461_186_S2A_MSIL2...".split("_")
// values[0] => 38
// values[1] => 285
// values[2] => 4461
// values[3] => 186

Related

Regex match between n and m numbers but as much as possible

I have a set of strings that have some letters, occasional one number, and then somewhere 2 or 3 numbers. I need to match those 2 or 3 numbers.
I have this:
\w*(\d{2,3})\w*
but then for strings like
AAA1AAA12A
AAA2AA123A
it matches '12' and '23' respectively, i.e. it fails to pick the three digits in the second case.
How do I get those 3 digits?
Here is how you would do it in Java.
the regex simply matches on a group of 2 or 3 digits.
the while loop uses find() to continue finding matches and the printing the captured match. The 1 and the 1223 are ignored.
String s= "AAA1AAA12Aksk2ksksk21sksksk123ksk1223sk";
String regex = "\\D(\\d{2,3})\\D";
Matcher m = Pattern.compile(regex).matcher(s);
while (m.find()) {
System.out.println(m.group(1));
}
prints
12
21
123
Looks like the correct answer would be:
\w*?(\d{2,3})\w*
Basically, making preceding expression lazy does the job

RegEx which is working in Java is not working in Oracle script

I have to validate a string against some rule. They are:
Input can have optional hyphens but 3 hyphens at maximum.
Hyphens should not be counted in length.
The length should be exactly 14 digits.
The string has to be numeric.
The string shouldn't contain more than 5 continuous repetitive digits.
My regular expression which is working as expected in Java is
^(?!.*?(\\d)\\1{5})(?=(?:[0-9]-?){14}$)[0-9]+(?:-[0-9]+){0,3}$
I am trying to implement the same logic in the oracle script like below
IF(REGEXP_LIKE(<myInput>,'(?=(?:[0-9]-?){14}$)')
AND NOT REGEXP_LIKE(<myInput>,'([0-9])(\1){5}')
AND REGEXP_LIKE(<myInput>,'^[0-9]+(?:-[0-9]+){0,3}$'))
THEN ....
END IF;
Regular Expression to identify more than 5 continuous repetitive digits is working properly but (?=(?:[0-9]-?){14}$) and ^[0-9]+(?:-[0-9]+){0,3}$ are not working as expected.
Am I missing anything here?
I tried to keep/remove brackets,start-line, and end-line anchors around the expressions but no luck.
Oracle regex does not support lookarounds. We can try enforcing your logic via several different checks.
WHERE myInput NOT LIKE '%-%-%-%-%' AND -- 3 hyphens maximum
LENGTH(REPLACE(myInput, '-', '')) = 14 AND -- length 14
REGEXP_LIKE(myInput, '^[0-9-]+$') AND -- digits + hyphen only
NOT REGEXP_LIKE(myInput, '[0-9]{6,}') -- max 5 consecutive digits
Oracle regular expressions do not support positive- or negative-lookahead or non-capturing groups so you need to perform multiple checks for the different tests rather than trying to do it all in one regular expression.
You can do it without (slow) regular expressions using:
IF TRANSLATE( value, 'X0123456789-', 'X') IS NULL
AND LENGTH(REPLACE(value, '-')) = 14
AND LENGTH(value) <= 17
AND value NOT LIKE '%--%'
AND value NOT LIKE '%000000%'
AND value NOT LIKE '%111111%'
AND value NOT LIKE '%222222%'
AND value NOT LIKE '%333333%'
AND value NOT LIKE '%444444%'
AND value NOT LIKE '%555555%'
AND value NOT LIKE '%666666%'
AND value NOT LIKE '%777777%'
AND value NOT LIKE '%888888%'
AND value NOT LIKE '%999999%'
THEN
...
END IF;
As:
TRANSLATE( value, 'X0123456789-', 'X') IS NULL checks that the string only contains numeric or hyphen characters.
LENGTH(REPLACE(value, '-')) = 14 checks that the digit string is exactly 14 characters in length.
LENGTH(value) <= 17 checks that the total length is 17 or less and so there can be at most 3 hyphens.
value NOT LIKE '%--%' checks that the hyphens are separated.
value NOT LIKE '%000000%' (etc.) checks that there are not more than 5 continuous repetitive digits.
If you did want to use regular expressions then:
IF REGEXP_LIKE( value, '^\d+(-\d+){0,3}$')
AND LENGTH(REPLACE(value, '-')) = 14
AND NOT REGEXP_LIKE(value, '(\d)\1{5}')
THEN
...
END IF;

Regular Expression for Whole Number (Thousands comma, No decimal)

I need to validate an input filter on a form because I cannot format the output due to the template I am using. This is a price field and I need the output to be uniform. I am able to use a regular expression to validate the input.
I would like regex to be a whole number with thousands separated with a comma. No decimal. No $.
Valid:
* 0 (just zero)
* 100
* 1,000
* 10,000
Not Valid:
01 (leading zero)
100.50
1000
$10,000
-10,000 (no negative number)
Does anyone know how to do this? I cannot find this and thought it should be a common regex.
I believe you need something similar to,
Using javascript
^(0|[1-9][0-9]{0,2})(,[0-9]{3})*$
You could use this regex:
^(0|[1-9][0-9]{0,2})(,[0-9]{3})*$
Below is a python implementation:
import re
regex = r"^(0|[1-9][0-9]{0,2})(,[0-9]{3})*$"
num = "10,000"
if re.search(regex, num):
print(True)
else:
print(False)

Regular expression to select set of numbers

I want to get a set of numbers (3 digits) from a string. But some numbers bind with specific text and those numbers don't need to be include in the output.
Input:
C123456 577 abcd 173944 C5678541883
Result should to be:
577 173 944 188
How can I achieve this?
I assume, you only want to match full numbers without any other characters, separated by white space. If you really want to capture and 188 too and 173944 split into two parts, you can provide a comment below with more information about what it actually should match (numbers with 3 or 6 characters, numbers etc.)
So, to extract all numbers from a white-space separated string, you can use the following regular expression. Make sure to use the global flag /g:
\b([0-9]+)\b
Here is a live example:
var text = "C123456 577 abcd 173944 C5678541883";
var regex = /\b([0-9]+)\b/g;
var match = regex.exec(text);
while(match !== null) {
console.log(match[1]);
match = regex.exec(text);
}

Regular expression to capture first n digits from comma separated strings

I quickly found a way to get a working multi-line regular expression for my needs, but having trouble with its conversion into a single line.
So, consider this input with regex /^[2-9]\d{1}(?:\s){0}/gm applied:
4126-54D429-001,
5149-A42102-002,
9251-Z48910-003
...
However, when I turn it to one line, I'm getting only first two digits in ouput:
4126-54D429-001, 5149-A42102-002, 9251-Z48910-003 ...
How can this regexp be written to get this capture:
4126-54D429-001, 5149-A42102-002, 9251-Z48910-003 ... ?
This Should Work.
REGEXP
\b\d{2}(?=\d{2})
INPUT
4126-54D429-001, 5149-A42102-002, 9251-Z48910-003, 7851-Z48910-003
OUTPUT
41
51
92
78
The comma is not essential
If i help u, mark me as correct and vote up
This will capture the first two digits of each in groups:
(\d{2})[^,]*