Java regular expression: escaping multi-lines comment containing the caracter $ - regex

final Pattern PATTERN = Pattern.compile("\"[^\"]*\"");
#Test
public void parseCsvTest() {
StringBuffer result = new StringBuffer();
Matcher m = null;
String csv="\"foo$\n" + "bar\"";
try {
m = PATTERN.matcher(csv);
while (m.find()) {
m.appendReplacement(result, m.group().replaceAll("\\R+", ""));
}
m.appendTail(result);
} catch (Exception e) {
e.printStackTrace();
}
String escaped_csv = result.toString();
log.info(escaped_csv);
}
With String csv="\"foo\n" + "bar\"";
I'm getting the expected result that is: "foobar"
But with String csv="\"foo$\n" + "bar\""; (notice the $ char after foo), the pattern doesn't identify the group. Note: $ is a char, not the "end of line symbol", despite it can be followed by a "end of line symbol".
Tried with PATTERN = Pattern.compile("\"[^\"]*^$?\""); without success. Will return foo and bar in 2 lines
Any ideas ?

Got it work with: Pattern.compile("\"*[^$]|\"[^\"]*\"");
Results
csv = "\"foo\n" + "bar\n" + "doe\"" => foobardoe
csv = "\"foo$\n" + "bar\n" + "doe\"" => foo$bardoe
csv = "\"foo$\n" + "bar$\n" + "doe\"" => foo$bar$doe
csv = "\"foo$\n" + "bar$\n" + "doe$\"" => foo$bar$doe$

Related

Regex function clarification

I have a string and I have to filter the following:
"#Subject = \"#hb\" + #uv_EmployeeID + \" fdsaas\" + #test"
I have to filter only #uv_EmployeeID and #test and not the values inside ""-inner double quotes
This is working : new Regex(#"[^""]#{1}[a-zA-Z_]+");
You just have to remove the first character from the result, like this :
var reg = new Regex(#"[^""]#{1}[a-zA-Z_]+");
var matches = reg.Matches("#Subject = \"#hb\" + #uv_EmployeeID + \" fdsaas\" + #test");
var empId = matches[0].Value.Substring(1); // #uv_EmployeeID
var test = matches[1].Value.Substring(1); // #test

pattern matching for 2 forward slash using regex in java

I would like to do pattern matching for following text in my word file, I am not sure how I can use pattern matcher
(P // TRIF)
(P)
(U//TRIF)
(U)
import java.util.ArrayList;
import java.util.List;
import java.util.regex;
public class ExtractDemo {
public static void main(String[] args) {
String input = "I have a ( U) but I (P) like my (P//TRIF) better (U//TRIF).";
Pattern p = Pattern.compile("(P|U|P//TRIF|U//TRIF)");
Matcher m = p.matcher(input);
List<String> animals = new ArrayList<String>();
while (m.find()) {
System.out.println("Found a " + m.group() + ".");
animals.add(m.group());
}
}
}
Your regex matches U, P, P, U
If you would like to match (P // TRIF) or (P) or (U//TRIF) or (U) you could change the order in your alteration to
(P//TRIF|U//TRIF|P|U)
Demo output Java
If you want to capture the text including the surrounding parenthesis in a group, you could try:
(\(\s*(?:P|U|P//TRIF|U//TRIF)\))
public static void main(String args[])
{
String input = "I have a ( U) but I (P) like my (P//TRIF) better (U//TRIF).";
Pattern p = Pattern.compile("(\\(\\s*(?:P|U|P//TRIF|U//TRIF)\\))");
Matcher m = p.matcher(input);
List<String> animals = new ArrayList<String>();
while (m.find()) {
System.out.println("Found a " + m.group() + ".");
animals.add(m.group());
}
}
Demo output Java
Another way to match this could be
\(\s*[PU](?://TRIF)?\)
Demo output Java

String replacing nested JSON in Scala

I have a Scala method that will be given a String like so:
"blah blah sediejdri \"foos\": {\"fizz\": \"buzz\"}, odedrfj49 blah"
And I need to strip the "foos JSON" out of it using pure Java/Scala (no external libs). That is, find the substring matching the pattern:
\"foos\" : {ANYTHING},
...and strip it out, so that the input string is now:
"blah blah sediejdri odedrfj49 blah"
The token to search for will always be \"foos\", but the content inside the JSON curly braces will always be different. My best attempt is:
// Ex: "blah \"foos\": { flim flam }, blah blah" ==> "blah blah blah", etc.
def stripFoosJson(var : toClean : String) : String = {
val regex = ".*\"foos\" {.*},.*"
toClean.replaceAll(regex, "")
}
However I my regex is clearly not correct. Can anyone spot where I'm going awry?
Here are 2 solutions I came up with, hope it helps. I think you forgot to handle possible spaces with \s* etc.
object JsonStrip extends App {
// SOLUTION 1, hard way, handles nested braces also:
def findClosingParen(text: String, openPos: Int): Int = {
var closePos = openPos
var parensCounter = 1 // if (parensCounter == 0) it's a match!
while (parensCounter > 0 && closePos < text.length - 1) {
closePos += 1
val c = text(closePos)
if (c == '{') {
parensCounter += 1
} else if (c == '}') {
parensCounter -= 1
}
}
if (parensCounter == 0) closePos else openPos
}
val str = "blah blah sediejdri \"foos\": {\"fizz\": \"buzz\"}, odedrfj49 blah"
val indexOfFoos = str.indexOf("\"foos\"")
val indexOfFooOpenBrace = str.indexOf('{', indexOfFoos)
val indexOfFooCloseBrace = findClosingParen(str, indexOfFooOpenBrace)
// here you would handle if the brace IS found etc...
val stripped = str.substring(0, indexOfFoos) + str.substring(indexOfFooCloseBrace + 2)
println("WITH BRACE COUNT: " + stripped)
// SOLUTION 2, with regex:
val reg = "\"foos\"\\s*:\\s*\\{(.*)\\}\\s*,\\s*"
println("WITH REGEX: " + str.replaceAll(reg, ""))
}
This regex \\"foos\\": {(.*?)} should match what you want, in most regex engine, you might need to replace " with \". If your JSON can contains other curly brackets, you can use this \\"foos\\": \{(?>[^()]|(?R))*\}, it uses recursion to match balanced groups of brackets. Note that this one only works in pcre regex engine, others won't support recursion.

Use Meteor Match and Regex to check strings

I'm checking an array of strings for a specific combination of patterns. I'm having trouble using Meteor's Match function and regex literal together. I want to check if the second string in the array is a url.
addCheck = function(line) {
var firstString = _.first(line);
var secondString = _.indexOf(line, 1);
console.log(secondString);
var urlRegEx = /((([A-Za-z]{3,9}:(?:\/\/)?)(?:[\-;:&=\+\$,\w]+#)?[A-Za-z0-9\.\-]+|(?:www\.|[\-;:&=\+\$,\w]+#)[A-Za-z0-9\.\-]+)((?:\/[\+~%\/\.\w\-]*)?\??(?:[\-\+=&;%#\.\w]*)#?(?:[\.\!\/\\\w]*))?)/g;
if ( firstString == "+" && Match.test(secondString, urlRegEx) === true ) {
console.log( "detected: + | line = " + line )
} else {
// do stuff if we don't detect a
console.log( "line = " + line );
}
}
Any help would be appreciated.
Match.test is used to test the structure of a variable. For example: "it's an array of strings, or an object including the field createdAt", etc.
RegExp.test on the other hand, is used to test if a given string matches a regular expression. That looks like what you want.
Try something like this instead:
if ((firstString === '+') && urlRegEx.test(secondString)) {
...
}

Count how many times new line is present?

For example,
string="help/nsomething/ncrayons"
Output:
String word count is: 3
This is what I have but the program is looping though the method several times and it looks like I am only getting the last string created. Here's the code block:
Regex regx = new Regex(#"\w+([-+.]\w+)*#\w+([-.]\w+)*\.\w+([-.]\w+)*", RegexOptions.IgnoreCase);
MatchCollection matches = regx.Matches(output);
//int counte = 0;
foreach (Match match in matches)
{
//counte = counte + 1;
links = links + match.Value + '\n';
if (links != null)
{
string myString = links;
string[] words = Regex.Split(myString, #"\n");
word_count.Text = words.Length.ToString();
}
}
It is \n for newline.
Not sure if regex is a must for your case but you could use split:
string myString = "help/nsomething/ncrayons";
string[] separator = new string[] { "/n" };
string[] result = myString.Split(separator, StringSplitOptions.None);
MessageBox.Show(result.Count().ToString());
Another way using regex:
string myString = "help/nsomething/ncrayons";
string[] words = Regex.Split(myString, #"/n");
word_count.Text = words.Length;