regex for validating input C 200 50 - regex

How do i write regex for below?
C 200 50
C/c can be upper case or lower case.
200 - 0 to 200 range
50 - o to 50 range
All three words are separated by space and there can be 1 or more space.
This is what i tried so far.
public static void main(String[] args) {
String input = "C 200 50";
String regex = "C{1} ([01]?[0-9]?[0-9]|2[0-9][0]|20[0]) ([01]?[0-5]|[0-5][0])";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(input);
boolean found = false;
while (matcher.find()) {
System.out.println("I found the text "+matcher.group()+" starting at index "+
matcher.start()+" and ending at index "+matcher.end());
found = true;
}
}
Not sure how to have multiple space, upper or lower first 'C'

If you are validating a string, you must be expecting a whole string match. It means you should use .matches() rather than .find() method as .matches() requires a full string match.
To make c match both c and C you may use Pattern.CASE_INSENSITIVE flag with Pattern.compile, or prepend the pattern with (?i) embedded flag option.
To match one or more spaces, one would use + or \\s+.
To match leading zeros, you may prepend the number matching parts with 0*.
Hence, you may use
String regex = "(?i)C\\s+0*(\\d{1,2}|1\\d{2}|200)\\s+0*([1-4]?\\d|50)";
and then
See the regex demo and a Regulex graph:
See the Java demo:
String input = "C 200 50";
String regex = "(?i)C +0*(\\d{1,2}|1\\d{2}|200) +0*([1-4]?\\d|50)";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(input);
boolean found = false;
if (matcher.matches()) {
System.out.println("I found the text "+matcher.group()+" starting at index "+
matcher.start()+" and ending at index "+matcher.end());
found = true;
}
Output:
I found the text C 200 50 starting at index 0 and ending at index 8
If you need a partial match, use the pattern with .find() method in a while block. To match whole words, wrap the pattern with \\b:
String regex = "(?i)\\bC\\s+0*(\\d{1,2}|1\\d{2}|200)\\s+0*([1-4]?\\d|50)\\b";

Related

Regex greedy to pull only required information

I have one scenario
CF-123/NAME-ANUBHAV/RT-INR 450/SI-No smoking/SC-123
Regex should be compatible with java and it needs to be done in one statement.
wherein I have to pick some information from this string.which are prefixed with predefined tags and have to put them in named groups.
(CF-) confirmationNumber = 123
(Name-) name = ANUBHAV
(RT-) rate = INR 450
(SI-) specialInformation = No smoking
(SC-) serviceCode = 123
I have written below regex:
^(CF-(?<confirmationNumber>.*?)(\/|$))?(([^\s]+)(\/|$))?(NAME-(?<name>.*?)(\/|$))?([^\s]+(\/|$))?(RT-(?<rate>.*?)(\/|$))?([^\s]+(\/|$))?(SI-(?<specialInformation>.*?)(\/|$))?([^\s]+(\/|$))?(SC-(?<serviceCode>.*)(\/|$))?
There can be certain scenarios.
**1st:** CF-123/**Ignore**/NAME-ANUBHAV/RT-INR 450/SI-No smoking/SC-123
**2nd:** CF-123//NAME-ANUBHAV/RT-INR 450/SI-No smoking/SC-123
**3rd:** CF-123/NAME-ANUBHAV/RT-INR 450/**Ignore**/SI-No smoking/SC-123
there can be certain tags in between the string separated by / which we don't need to capture in our named group.enter code here
Basically we need to pick CF-,NAME-,RT-,SI-,SC- and have to assign them in confirmationNumber,name,rate,specialInformation,serviceCode. Anything coming in between the string need not to be captured.
To find the five bits of information that you are interested, you can use a pattern with named groups, compiling the pattern with the regex Pattern
Then, you can use the regex Matcher to find groups
String line = "CF-123/**Ignore**/NAME-ANUBHAV/RT-INR 450/SI-No smoking/SC-123";
String pattern = "CF-(?<confirmationNumber>[^/]+).*NAME-(?<name>[^/]+).*RT-(?<rate>[^/]+).*SI-(?<specialInformation>[^/]+).*SC-(?<serviceCode>[^/]+).*";
// Create a Pattern object
Pattern r = Pattern.compile(pattern);
// Now create matcher object.
Matcher m = r.matcher(line);
After that, you can work with the matched groups:
if (m.find( )) {
String confirmationNumber = m.group("confirmationNumber");
String name = m.group("name");
String rate = m.group("rate");
String specialInformation = m.group("specialInformation");
String serviceCode = m.group("serviceCode");
// continue with your processing
} else {
System.out.println("NO MATCH");
}

how to get a number between two characters?

I have this string:
String values="[52,52,73,52],[23,32],[40]";
How to only get the number 40?
I'm trying this pattern "\\[^[0-9]*$\\]", I've had no luck.
Can someone provide me with the appropriate pattern?
There is no need to use ^
The correct regex here is \\[([0-9]+)\\]$
If you are sure of the single number inside the [], this simple regex would do
\\[(\d+)\\]
Your could update your pattern to use a capturing group and a quantifier + after the character class and omit the ^ anchor to assert the start of the string.
Change the anchor to assert the end of string $ to the end of the pattern:
\\[([0-9]+)\\]$
^ ^^
Regex demo | Java demo
For example:
String regex = "\\[([0-9]+)\\]$";
String string = "[52,52,73,52],[23,32],[40]";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(string);
if(matcher.find()) {
System.out.println(matcher.group(1)); // 40
}
Given that you appear to be using Java, I recommend taking advantage of String#split here:
String values = "[52,52,73,52],[23,32],[40]";
String[] parts = values.split("(?<=\\]),(?=\\[)");
String[][] contents = new String[parts.length][];
for (int i=0; i < parts.length; ++i) {
contents[i] = parts[i].replaceAll("[\\[\\]]", "").split(",");
}
// now access any element at any position, e.g.
String forty = contents[2][0];
System.out.println(forty);
What the above snippet generates is a jagged 2D Java String array, where the first index corresponds to the array in the initial CSV, and the second index corresponds to the element inside that array.
Why not just use String.substring if you need the content between the last [ and last ]:
String values = "[52,52,73,52],[23,32],[40]";
String wanted = values.substring(values.lastIndexOf('[')+1, values.lastIndexOf(']'));

Regular expression to match all digits of unknown length except the last 4 digits

There is a number with unknown length and the idea is to build a regular expression which matches all digits except last 4 digits.
I have tried a lot to achieve this but no luck yet.
Currently I have this regex: "^(\d*)\d{0}\d{0}\d{0}\d{0}.*$"
Input: 123456789089775
Expected output: XXXXXXXXXXX9775
which I am using as follows(and this doesn't work):
String accountNumber ="123456789089775";
String pattern = "^(\\d*)\\d{1}\\d{1}\\d{1}\\d{1}.*$";
String result = accountNumber.replaceAll(pattern, "X");
Please suggest how I should approach this problem or give me the solution.
In this case my whole point is to negate the regex : "\d{4}$"
You may use
\G\d(?=\d{4,}$)
See the regex demo.
Details
\G - start of string or end of the previous match
\d - a digit
(?=\d{4,}$) - a positive lookahead that requires 4 or more digits up to the end of the string immediately to the right of the current location.
Java demo:
String accountNumber ="123456789089775";
String pattern = "\\G\\d(?=\\d{4,}$)"; // Or \\G.(?=.{4,}$)
String result = accountNumber.replaceAll(pattern, "X");
System.out.println(result); // => XXXXXXXXXXX9775
still not allowed to comment as I don't have that "50 rep" yet but DDeMartini's answer would swallow prefixed non-number-accounts as "^(.*)" would match stuff like abcdef1234 as well - stick to your \d-syntax
"^(\\d+)(\\d{4}$)"
seems to work fine and demands numbers (minimum length 6 chars). Tested it like
public class AccountNumberPadder {
private static final Pattern LAST_FOUR_DIGITS = Pattern.compile("^(\\d+)(\\d{4})");
public static void main(String[] args) {
String[] accountNumbers = new String[] { "123456789089775", "999775", "1234567890897" };
for (String accountNumber : accountNumbers) {
Matcher m = LAST_FOUR_DIGITS.matcher(accountNumber);
if (m.find()) {
System.out.println(paddIt(accountNumber, m));
} else {
throw new RuntimeException(String.format("Whooaaa - don't work for %s", accountNumber));
}
}
}
public static String paddIt(String input, Matcher m) {
StringBuilder b = new StringBuilder();
for (int i = 0; i < m.group(1).length(); i++) {
b.append("X");
}
return input.replace(m.group(1), b.toString());
}
}
Try:
String pattern = "^(.*)[0-9]{4}$";
Addendum after comment: A refactor to only match full numerics could look like this:
String pattern = "^([0-9]+)[0-9]{4}$";

Regular expression to match n times in which n is not fixed

The pattern I want to match is a sequence of length n where n is right before the sequence.
For example, when the input is "1aaaaa", I want to match the single character "a", as the first number specifies only 1 character is matched.
Similar, when the input is "2aaaaa", I want to match the first two characters "aa", but not the rest, as the number 2 specifies two characters will be matched.
I understand a{1} and a{2} will match "a" one or two times. But how to match a{n} in which n is not fixed?
Is it possible to do this type of match using regular expressions?
This will work for repeating numbers.
import re
a="1aaa2bbbbb1cccccccc4dddddddddddd"
for b in re.findall(r'\d[a-z]+', a):
print b[int(b[0])+1:int(b[0])+1+int(b[0])]
Output:
a
bb
c
dddd
Though I have done in Java, it will help you get going in your program.
Here you can select the first letter as sub-string from the given input string and use it in your regex to match the string accordingly.
public class DynamicRegex {
public static void main(String args[]){
Scanner scan = new Scanner(System.in);
System.out.println("Enter a string: ");
String str = scan.nextLine();
String testStr = str.substring(0, 1); //Get the first character from the string using sub-string.
String pattern = "a{"+ testStr +"}"; //Use the sub-string in your regex as length of the string to match.
Pattern p = Pattern.compile(pattern);
Matcher m = p.matcher(str);
if(m.find()){
System.out.println(m.group());
}
}
}

Regex return only digits up to a new line character

i have this text in which i need to extract only the digits on the first line
+1-415-655-0001 US TOLL
Access code: 197 703 792
the regex i have just extracts all digits
/d+
var noteStr = "1-415-655-0001 US TOLL\n\nAccess code: 197 703 792"
findCodeInWord()
func findCodeInWord() -> String?
{
let regex = try! NSRegularExpression(pattern: "\\d+", options: [])
var items = [String]()
regex.enumerateMatchesInString(noteStr, options: [], range: NSMakeRange(0, noteStr.characters.count)) { result, flag, stop in
guard let match = result else {
// result is nil
return
}
let range = match.rangeAtIndex(0)
var matchStr = (noteStr as NSString).substringWithRange(range)
print(matchStr)
}
return items.joinWithSeparator("")
}
but this returns all the digits. I only want it to return 14156550001
You can extract these numbers with a single regex based on a \G operator and capturing the digits into Group 1:
\G(?:[^\d\n\r]*(\d+))
See the regex demo, it will only capture into Group 1 digit sequences (1 or more, with (\d+)) that are on the first line due to \G operator that matches at the beginning of the string and then at the end of each successful match and the [^\d\n\r]* character class matching 0+ characters other than digit, LF or CR.
Thus, when it starts matching, 1 is found and captured, then - is matched with [^\d\n\r]*, then 415 is matched and captured, etc. When \n is encountered, no more match is found, the \G anchor fails and thus, the whole regex search stops at the first line.
Swift:
let regex = try! NSRegularExpression(pattern: "\\G(?:[^\\d\n\r]*(\\d+))", options: [])
...
let range = match.rangeAtIndex(1)
"I have this text in which i need to extract only the digits on the first line"
While regex is very useful at times, for a simple task as extracting only number characters from a given string, Swifts native pattern matching is a useful tool; appropriate here as the UnicodeScalar representation of numbers 0 through 9 is in sequence:
var noteStr = "1-415-655-0001 US TOLL\n\nAccess code: 197 703 792"
/* since we're using native pattern matching, let's use a native method
also when extracting the first row (even if this is somewhat simpler
if using Foundation bridged NSString methods) */
if let firstRowChars = noteStr.characters.split("\n").first,
case let firstRow = String(firstRowChars) {
// pattern matching for number characters
let pattern = UnicodeScalar("0")..."9"
let numbers = firstRow.unicodeScalars
.filter { pattern ~= $0 }
.reduce("") { String($0) + String($1) }
print(numbers) // 14156550001
/* Alternatively use .reduce with an inline if clause directly:
let numbers = firstRow.unicodeScalars
.reduce("") { pattern ~= $1 ? String($0) + String($1) : String($0)} */
}