I would like to do pattern matching for following text in my word file, I am not sure how I can use pattern matcher
(P // TRIF)
(P)
(U//TRIF)
(U)
import java.util.ArrayList;
import java.util.List;
import java.util.regex;
public class ExtractDemo {
public static void main(String[] args) {
String input = "I have a ( U) but I (P) like my (P//TRIF) better (U//TRIF).";
Pattern p = Pattern.compile("(P|U|P//TRIF|U//TRIF)");
Matcher m = p.matcher(input);
List<String> animals = new ArrayList<String>();
while (m.find()) {
System.out.println("Found a " + m.group() + ".");
animals.add(m.group());
}
}
}
Your regex matches U, P, P, U
If you would like to match (P // TRIF) or (P) or (U//TRIF) or (U) you could change the order in your alteration to
(P//TRIF|U//TRIF|P|U)
Demo output Java
If you want to capture the text including the surrounding parenthesis in a group, you could try:
(\(\s*(?:P|U|P//TRIF|U//TRIF)\))
public static void main(String args[])
{
String input = "I have a ( U) but I (P) like my (P//TRIF) better (U//TRIF).";
Pattern p = Pattern.compile("(\\(\\s*(?:P|U|P//TRIF|U//TRIF)\\))");
Matcher m = p.matcher(input);
List<String> animals = new ArrayList<String>();
while (m.find()) {
System.out.println("Found a " + m.group() + ".");
animals.add(m.group());
}
}
Demo output Java
Another way to match this could be
\(\s*[PU](?://TRIF)?\)
Demo output Java
Related
I have a string which contains something like this number: -7-972/516/57.15
. Expression must return the number of digits and filter by first number. In result i want to see: 79725165715
. I wrote this expression ^(\D*)+7+(\D*(?:\d\D*){10})$, but that expression got problem "Catastrophic Backtracking"(freezes on execution) with long strings like: bt.rfznascvd#rcs.ru,e.zovtrnko#lkn.ru
I write a new one and that works: \D*7(\d\D*){10}
You just have to use \d. Using of Matches will give you all matches in pattern from considered line. Count of it wwill give you count of matches. And for concatanate them to string I created small extension method.
For testing your regexes I can advice regexlib.
namespace CSharpTest
{
using System.Text;
using System.Text.RegularExpressions;
public static class Program
{
static void Main(string[] args)
{
string input = #"number: -7-972/516/57.15";
var regex = new Regex(#"\d");
var matches = regex.Matches(input);
var countOfNumbers = matches.Count;
var number = matches.ToNumber();
}
public static string ToNumber(this MatchCollection matches)
{
var result = new StringBuilder();
foreach (Match match in matches)
result.Append(match.Value);
return result.ToString();
}
}
}
There is a number with unknown length and the idea is to build a regular expression which matches all digits except last 4 digits.
I have tried a lot to achieve this but no luck yet.
Currently I have this regex: "^(\d*)\d{0}\d{0}\d{0}\d{0}.*$"
Input: 123456789089775
Expected output: XXXXXXXXXXX9775
which I am using as follows(and this doesn't work):
String accountNumber ="123456789089775";
String pattern = "^(\\d*)\\d{1}\\d{1}\\d{1}\\d{1}.*$";
String result = accountNumber.replaceAll(pattern, "X");
Please suggest how I should approach this problem or give me the solution.
In this case my whole point is to negate the regex : "\d{4}$"
You may use
\G\d(?=\d{4,}$)
See the regex demo.
Details
\G - start of string or end of the previous match
\d - a digit
(?=\d{4,}$) - a positive lookahead that requires 4 or more digits up to the end of the string immediately to the right of the current location.
Java demo:
String accountNumber ="123456789089775";
String pattern = "\\G\\d(?=\\d{4,}$)"; // Or \\G.(?=.{4,}$)
String result = accountNumber.replaceAll(pattern, "X");
System.out.println(result); // => XXXXXXXXXXX9775
still not allowed to comment as I don't have that "50 rep" yet but DDeMartini's answer would swallow prefixed non-number-accounts as "^(.*)" would match stuff like abcdef1234 as well - stick to your \d-syntax
"^(\\d+)(\\d{4}$)"
seems to work fine and demands numbers (minimum length 6 chars). Tested it like
public class AccountNumberPadder {
private static final Pattern LAST_FOUR_DIGITS = Pattern.compile("^(\\d+)(\\d{4})");
public static void main(String[] args) {
String[] accountNumbers = new String[] { "123456789089775", "999775", "1234567890897" };
for (String accountNumber : accountNumbers) {
Matcher m = LAST_FOUR_DIGITS.matcher(accountNumber);
if (m.find()) {
System.out.println(paddIt(accountNumber, m));
} else {
throw new RuntimeException(String.format("Whooaaa - don't work for %s", accountNumber));
}
}
}
public static String paddIt(String input, Matcher m) {
StringBuilder b = new StringBuilder();
for (int i = 0; i < m.group(1).length(); i++) {
b.append("X");
}
return input.replace(m.group(1), b.toString());
}
}
Try:
String pattern = "^(.*)[0-9]{4}$";
Addendum after comment: A refactor to only match full numerics could look like this:
String pattern = "^([0-9]+)[0-9]{4}$";
I have the following regex
def formula = math:min(math:round($$value1$$ * $$value2$$) )
def m = formula =~ /\$\$\w+\$\$/
println m.group(1)
Above should ideally print $$value1$$.
Now this regex for the following string works fine on regex101.com but same does not work on Groovy. Ideally it should find two groups $$value1$$ and $$value2$$ using Matcher API, but it does not.
Is there anything wrong in this regex?
Assuming formula is:
def formula = 'math:min(math:round($$value1$$ * $$value2$$) )'
I think you just want:
List result = formula.findAll(/\$\$\w+\$\$/)
I tried your regex in java and it works for me if i remove the / at the beginning and the end of the regex.
public class RegexTest {
public static void main(String[] args) {
String regex = "\\$\\$\\w+\\$\\$";
String test = "math:min(math:round($$value1$$ * $$value2$$) ) ";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(test);
while (matcher.find()){
System.out.println(matcher.group());
}
}
}
it returns
$$value1$$
$$value2$$
I am trying to exclude certain documents from being transported to ES using XDCR.
I have the following regex that filters ABCD and IJ
https://regex101.com/r/gI6sN8/11
Now, I want to use this regex in the XDCR filtering
^(?!.(ABCD|IJ)).$
How do I exclude keys using regex?
EDIT:
What if I want to select everything that doesn't contains ABCDE and ABCHIJ.
I tried
https://regex101.com/r/zT7dI4/1
edit:
Sorry, after further looking at it, this method is invalid. For instance, [^B] allows an A to get by, letting AABCD slip through (since it will match AA at first, then match BCD with the [^A]. Please disregard this post.
Demo here shows below method is invalid
(disregard this)
You could use a posix style trick to exclude words.
Below is to exclude ABCD and IJ.
You get a sense of the pattern from this.
Basically, you put all the first letters into a negative class
as the first in the alternation list, then handle each word
in a separate alternation.
^(?:[^AI]+|(?:A(?:[^B]|$)|AB(?:[^C]|$)|ABC(?:[^D]|$))|(?:I(?:[^J]|$)))+$
Demo
Expanded
^
(?:
[^AI]+
|
(?: # Handle 'ABCD`
A
(?: [^B] | $ )
| AB
(?: [^C] | $ )
| ABC
(?: [^D] | $ )
)
|
(?: # Handle 'IJ`
I
(?: [^J] | $ )
)
)+
$
Hopefully one day there will be built-in support for inverting the match expression. In the mean time, here's a Java 8 program that generates regular expressions for inverted prefix matching using basic regex features supported by the Couchbase XDCR filter.
This should work as long as your key prefixes are somehow delimited from the remainder of the key. Make sure to include the delimiter in the input when modifying this code.
Sample output for red:, reef:, green: is:
^([^rg]|r[^e]|g[^r]|re[^de]|gr[^e]|red[^:]|ree[^f]|gre[^e]|reef[^:]|gree[^n]|green[^:])
File: NegativeLookaheadCheater.java
import java.util.*;
import java.util.stream.Collectors;
public class NegativeLookaheadCheater {
public static void main(String[] args) {
List<String> input = Arrays.asList("red:", "reef:", "green:");
System.out.println("^" + invertMatch(input));
}
private static String invertMatch(Collection<String> literals) {
int maxLength = literals.stream().mapToInt(String::length).max().orElse(0);
List<String> terms = new ArrayList<>();
for (int i = 0; i < maxLength; i++) {
terms.addAll(terms(literals, i));
}
return "(" + String.join("|", terms) + ")";
}
private static List<String> terms(Collection<String> words, int index) {
List<String> result = new ArrayList<>();
Map<String, Set<Character>> prefixToNextLetter = new LinkedHashMap<>();
for (String word : words) {
if (word.length() > index) {
String prefix = word.substring(0, index);
prefixToNextLetter.computeIfAbsent(prefix, key -> new LinkedHashSet<>()).add(word.charAt(index));
}
}
prefixToNextLetter.forEach((literalPrefix, charsToNegate) -> {
result.add(literalPrefix + "[^" + join(charsToNegate) + "]");
});
return result;
}
private static String join(Collection<Character> collection) {
return collection.stream().map(c -> Character.toString(c)).collect(Collectors.joining());
}
}
I have String with some search items and I want to split them in an array of String.
Example:
String text = "java example \"this is a test\" hello world";
I want to get the following results
result[0] = "java";
result[1] = "example";
result[2] = "\"this is a test\"";
result[3] = "hello";
result[4] = "world";
In short, I want to combine text.split(" ") and text.split("\"");
Is there an easy way to code it?
Thanks!
You can use this regex in String#split method:
(?=(([^\"]*\"){2})*[^\"]*$)\\s+
Code:
String text = "java example \"this is a test\" hello world";
String[] tok = text.split("(?=(([^\"]*\"){2})*[^\"]*$)\\s+");
// print the array
System.out.println( Arrays.toString( arr ) );
Output:
[java, example, "this is a test", hello, world]
This regex should match (\\".+?\\")|([^\s]+)
It matches anything within \" including the \" OR single words.
Check here for results: http://www.regexr.com/399a4
I think you are a bit confused and there are errors in your code!
Composing your string should be:
String text = "java example \"this is a test\" hello world";
The value of the variable text would then be:
java example "this is a test" hello world
I am rather assuming that you want to extract this into the following array:
result[0] = "java";
result[1] = "example";
result[2] = "\"this is a test\"";
result[3] = "hello";
result[4] = "world";
You can do this by using a regular expression, for example:
import java.util.ArrayList;
import java.util.List;
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class Example {
public static void main(String[] args) {
String data = "java example \"this is a test\" hello world";
Pattern p = Pattern.compile("((?:\"[a-z\\s]+\")|[a-z]+)");
Matcher m = p.matcher(data);
List<String> lst = new ArrayList<String>();
while(m.find()) {
lst.add(m.group(1));
}
String[] result= new String[lst.size()];
result = lst.toArray(results);
for(String s: result) {
System.out.println(s);
}
}
}
The regular expression ((?:\"[a-z\\s]+\")|[a-z]+) will match on either:
1) sequences of characters a to z or whitespace between double quotes
2) sequence of characters a to z.
We then extract these matches using m.find