Regex - change commas only in a portion of a string - regex

I make a lot of changes on a original csv string. there is a lot of comma delimiter. I have to replace by a ";" either only the commas inside the expression || ....|| or only the commas outside this expression. i need to do this change in order to have different delimiter in the expression ||....|| compare to the rest of the string.
Example:
(.*)(?:\|\|)(?:.*)(,)(?:.*)\|\|
After I use
var regex = /myregex/g;
var str = str.replace(regex, ',')
thanks

You can use
const string = "aba,bjlj,alj,ljlj||name1,name2,name3||jflkj,glfgjlf,jflg,fjlfd||name1,name2||fd,sdfsfd,dfs||name1,name2,name3,name4,name5||";
console.log( string.replace(/\|{2}[\w\W]*?\|{2}/g, (x) => x.replace(/,/g, ';')) );
The regex is
/\|{2}.*?\|{2}/gs // matches any text between two double pipes
/\|{2}[\w\W]*?\|{2}/g // matches any text between two double pipes
/\|{2}.*?\|{2}/g // matches any text but line breaks between two double pipes
Note the . does not match line breaks without the s modifier flag.
The regex matches double pipe, then any zero or more chars, as few as possible up to the next double pipe.
Then, x, the whole match value, is passed as an argument to the anonymous callback function used as a replacement argument, and all commas are replaced with ; only inside the matches.
The "contrary" solution is to match and capture the strings between double pipes and only match commas in all other contexts so that you could keep the captures and replace those commas:
const string = "aba,bjlj,alj,ljlj||name1,name2,name3||jflkj,glfgjlf,jflg,fjlfd||name1,name2||fd,sdfsfd,dfs||name1,name2,name3,name4,name5||";
console.log( string.replace(/(\|{2}[\w\W]*?\|{2})|,/g, (x,y) => y || ';') );

Big Thanks.
I also find
var newStr = str.replace(/\|{2}.*?\|{2}/g, function(match) {
return match.replace(/,/g,";");
});
Do you think is it possible to do the contrary and change all the comma outside the occurence ||...|| ?

Related

Capturing a delimiter that isn't in between single quotes

Like the question says, is it possible to use a single Regex string to get a delimiter that isn't in between some quotes?
For example, I want to split this string with the delimiter &:
"example=3&testing='f&tmp'"
should produce
["example=3", "testing='f&tmp'"]
Essentially, things inside single quotes (' ') should remain untouched.
I found out how to get things within quotes with expression: (?:'.*?')
The closest I could get to a tangible solution was: (.[^']&[^'])
It is not an easy task for a String#split, but is quite a feasible task for Matcher#find if you use
[^&\s=]+=(?:'[^']*'|[^\s&]*)
(see this regex demo) and this Java code:
String text = "example=3&testing='f&tmp'";
Pattern p = Pattern.compile("[^&\\s=]+=(?:'[^']*'|[^\\s&]*)");
Matcher m = p.matcher(text);
List<String> res = new ArrayList<>();
while(m.find()) {
res.add(m.group());
}
System.out.println(res);
// => [example=3, testing='f&tmp']
Details
[^&\s=]+ - one or more chars other than &, = and whitespace
= - a = char
(?:'[^']*'|[^\s&]*) - a non-capturing group matching either ', zero or more chars other than ' and then a ', or zero or more chars other than whitespace and &.

regex to extract substring for special cases

I have a scenario where i want to extract some substring based on following condition.
search for any pattern myvalue=123& , extract myvalue=123
If the "myvalue" present at end of the line without "&", extract myvalue=123
for ex:
The string is abcdmyvalue=123&xyz => the it should return myvalue=123
The string is abcdmyvalue=123 => the it should return myvalue=123
for first scenario it is working for me with following regex - myvalue=(.?(?=[&,""]))
I am looking for how to modify this regex to include my second scenario as well. I am using https://regex101.com/ to test this.
Thanks in Advace!
Some notes about the pattern that you tried
if you want to only match, you can omit the capture group
e* matches 0+ times an e char
the part .*?(?=[&,""]) matches as least chars until it can assert eiter & , or " to the right, so the positive lookahead expects a single char to the right to be present
You could shorten the pattern to a match only, using a negated character class that matches 0+ times any character except a whitespace char or &
myvalue=[^&\s]*
Regex demo
function regex(data) {
var test = data.match(/=(.*)&/);
if (test === null) {
return data.split('=')[1]
} else {
return test[1]
}
}
console.log(regex('abcdmyvalue=123&3e')); //123
console.log(regex('abcdmyvalue=123')); //123
here is your working code if there is no & at end of string it will have null and will go else block there we can simply split the string and get the value, If & is present at the end of string then regex will simply extract the value between = and &
if you want to use existing regex then you can do it like that
var test = data1.match(/=(.*)&|=(.*)/)
const result = test[1] ? test[1] : test[2];
console.log(result);

Regular expresion with a specific character and without another

I'm trying to implement the escape character functionality in a macro generator I'm writing in Dart. For example, I would like the program to grab all the occurrences of '&param' in my string and replace it with 'John', unless the '&' character is preceded with the escape character '\'. Example: "My name is &param and my parameter is called \&param." -> "My name is John and my parameter is called &param". What would be the regular expression to catch all the substrings that contain the '&', then my parameter's name, and without the preceding '\'?
It's possible to match that, even avoiding escapes of backslashes, as:
var re = RegExp(r"(?<!(?:^|[^\\])(?:\\{2})*\\)&\w+");
This uses negative lookbehind to find a & followed by word-characters, and not preceded by an odd number of backslashes.
More likely, you want to also recognize double-backslashes and convert them to single-backslashes. That's actually easier if you try to find all matches, because then you know all preceding double-backslashes are part of an earlier match:
var re = RegExp(r"\\\\|(?<!\\)&\w+");
This, when used as re.allMatches will find all occurrences of \\ and &word where the latter is not preceded by an odd number of backslashes.
var _re = RegExp(r"\\\\|(?<!\\)&(\w+)");
String template(String input, Map<String, String> values) {
return input.replaceAllMapped(_re, (m) {
var match = m[0]!;
if (match == r"\\") return r"\";
var replacement = values[m[1]!];
if (replacement != null) return replacement;
// do nothing for undefined words.
return match;
});
}
(You might also want to allow something like &{foo} if parameters can occur next to other characters, like &{amount)USD).
To keep the character before &param when it matches a non-backslash character you need to use so called capturing groups. These are are subexpressions of a regular expression inside parentheses. To use capturing groups in Dard you need to use the method replaceAllMapped. We also have the case when the template starts with &param and in this case we match at the beginning of the string instead.
Try this:
void main() {
final template = 'My name is &param and my parameter is called \\&param.';
final populatedTemplate = template.replaceAllMapped(RegExp(r'(^|[^\\])&param\b'), (match) {
return '${match.group(1)}John';
});
final result = populatedTemplate.replaceAll(RegExp(r'\\&param\b'), 'John');
print(result);
}

How do I replace the nth occurrence of a special character, say, a pipe delimiter with another in Scala?

I'm new to Spark using Scala and I need to replace every nth occurrence of the delimiter with the newline character.
So far, I have been successful at entering a new line after the pipe delimiter.
I'm unable to replace the delimiter itself.
My input string is
val txt = "January|February|March|April|May|June|July|August|September|October|November|December"
println(txt.replaceAll(".\\|", "$0\n"))
The above statement generates the following output.
January|
February|
March|
April|
May|
June|
July|
August|
September|
October|
November|
December
I referred to the suggestion at https://salesforce.stackexchange.com/questions/189923/adding-comma-separator-for-every-nth-character but when I enter the number in the curly braces, I only end up adding the newline after 2 characters after the delimiter.
I'm expecting my output to be as given below.
January|February
March|April
May|June
July|August
September|October
November|December
How do I change my regular expression to get the desired output?
Update:
My friend suggested I try the following statement
println(txt.replaceAll("(.*?\\|){2}", "$0\n"))
and this produced the following output
January|February|
March|April|
May|June|
July|August|
September|October|
November|December
Now I just need to get rid of the pipe symbol at the end of each line.
You want to move the 2nd bar | outside of the capture group.
txt.replaceAll("([^|]+\\|[^|]+)\\|", "$1\n")
//val res0: String =
// January|February
// March|April
// May|June
// July|August
// September|October
// November|December
Regex Explained (regex is not Scala)
( - start a capture group
[^|] - any character as long as it's not the bar | character
[^|]+ - 1 or more of those (any) non-bar chars
\\| - followed by a single bar char |
[^|]+ - followed by 1 or more of any non-bar chars
) - close the capture group
\\| - followed by a single bar char (not in capture group)
"$1\n" - replace the entire matching string with just the first $1 capture group ($0 is the entire matching string) followed by the newline char
UPDATE
For the general case of N repetitions, regex becomes a bit more cumbersome, at least if you're trying to do it with a single regex formula.
The simplest thing to do (not the most efficient but simple to code) is to traverse the String twice.
val n = 5
txt.replaceAll(s"(\\w+\\|){$n}", "$0\n")
.replaceAll("\\|\n", "\n")
//val res0: String =
// January|February|March|April|May
// June|July|August|September|October
// November|December
You could first split the string using '|' to get the array of string and then loop through it to perform the logic you want and get the output as required.
val txt = "January|February|March|April|May|June|July|August|September|October|November|December"
val out = txt.split("\\|")
var output: String = ""
for(i<-0 until out.length -1 by 2){
val ref = out(i) + "|" + out(i+1) + "\n"
output = output + ref
}
val finalout = output.replaceAll("\"\"","") //just to remove the starting double quote
println(finalout)

CSV Regex skipping first comma

I am using regex for CSV processing where data can be in Quotes, or no quotes. But if there is just a comma at the starting column, it skips it.
Here is the regex I am using:
(?:,"|^")(""|[\w\W]*?)(?=",|"$)|(?:,(?!")|^(?!"))([^,]*?|)(?=$|,)
Now the example data I am using is:
,"data",moredata,"Data"
Which should have 4 matches ["","data","moredata","Data"], but it always skips the first comma. It is fine if there is quotes on the first column, or it is not blank, but if it is empty with no quotes, it ignores it.
Here is a sample code I am using for testing purposes, it is written in Dart:
void main() {
String delimiter = ",";
String rawRow = ',,"data",moredata,"Data"';
RegExp exp = new RegExp(r'(?:'+ delimiter + r'"|^")(^,|""|[\w\W]*?)(?="'+ delimiter + r'|"$)|(?:'+ delimiter + '(?!")|^(?!"))([^'+ delimiter + r']*?)(?=$|'+ delimiter + r')');
Iterable<Match> matches = exp.allMatches(rawRow.replaceAll("\n","").replaceAll("\r","").trim());
List<String> row = new List();
matches.forEach((Match m) {
//This checks to see which match group it found the item in.
String cellValue;
if (m.group(2) != null) {
//Data found without speech marks
cellValue = m.group(2);
} else if (m.group(1) != null) {
//Data found with speech marks (so it removes escaped quotes)
cellValue = m.group(1).replaceAll('""', '"');
} else {
//Anything left
cellValue = m.group(0).replaceAll('""', '"');
}
row.add(cellValue);
});
print(row.toString());
}
Investigating your expression
(,"|^")
(""|[\w\W]*?)
(?=",|"$)
|
(,(?!")|^(?!"))
([^,]*?|)
(?=$|,)
(,"|^")(""|[\w\W]*?)(?=",|"$) This part is to match quoted strings, that seem to work for you
Going through this part (,(?!")|^(?!"))([^,]*?|)(?=$|,)
(,(?!")|^(?!")) start with comma not followed by " OR start of line not followed by "
([^,]*?|) Start of line or comma zero or more non greedy and |, why |
(?=$|,) end of line or , .
In CSV this ,,,3,4,5 line should give 6 matches but the above only gets 5
You could add (^(?=,)) at the begining of second part, the part that matches non quoted sections.
Second group with match of start and also added non capture to groups
(?:^(?=,))|(?:,(?!")|^(?!"))(?:[^,]*?)(?=$|,)
Complete: (?:,"|^")(?:""|[\w\W]*?)(?=",|"$)|(?:^(?=,))|(?:,(?!")|^(?!"))(?:[^,]*?)(?=$|,)
Here is another that might work
(?:(?:"(?:[^"]|"")*"|(?<=,)[^,]*(?=,))|^[^,]+|^(?=,)|[^,]+$|(?<=,)$)
How that works i described here: Build CSV parser using regex