Groovy: Replace combination of , and ) - regex

How do I replace comma and right parantheses at the same time, ,') with ), in groovy?
I tried replaceAll with double escape
value = "('cat','rat',',')";
//Replace ,') with )
value = value.replaceAll('\\,')',')');
Tried these with no luck
How can I replace a string in parentheses using a regex?
How to escape comma and double quote at same time for CSV file?

Your question is a bit cofusing, but to replace ,') you don't need escapes at all. Simply use
def value = "('cat','rat',',')";
println value.replace(",')", ")"); // ('cat','rat',')
However, I think you rather want this result ('cat','rat'). Right?
If so, you can use the following code, using Pattern:
import java.util.regex.Pattern
def value = "('cat','rat',',')";
def pattern = Pattern.compile(",'\\)");
def matcher = pattern.matcher(value);
while (matcher.find()) {
value = matcher.replaceAll(")");
matcher = pattern.matcher(value);
}
println value; // ('cat','rat')
Explanation:
You are creating the second replaceable text with your regex, it's not there when you try to replace it, but get's created as a result of the first replacement. So we create a new matcher in the loop and let it find the string again...

Related

Regex - get list comma separated allow spaces before / after the comma

I try to extract an images array/list from a commit message:
String commitMsg = "#build #images = image-a, image-b,image_c, imaged , image-e #setup=my-setup fixing issue with px"
I want to get a list that contains:
["image-a", "image-b", "image_c", "imaged", "image-e"]
NOTES:
A) should allow a single space before/after the comma (,)
B) ensure that #images = exists but exclude it from the group
C) I also searching for other parameters like #build and #setup so I need to ignore them when looking for #images
What I have until now is:
/(?i)#images\s?=\s?<HERE IS THE MISSING LOGIC>/
I use find() method:
def matcher = commitMsg =~ /(?i)#images\s?=\s?([^,]+)/
if(matcher.find()){
println(matcher[0][1])
}
You can use
(?i)(?:\G(?!^)\s?,\s?|#images\s?=\s?)(\w+(?:-\w+)*)
See the regex demo. Details:
(?i) - case insensitive mode on
(?:\G(?!^)\s?,\s?|#images\s?=\s?) - either the end of the previous regex match and a comma enclosed with single optional whitespaces on both ends, or #images string and a = char enclosed with single optional whitespaces on both ends
(\w+(?:-\w+)*) - Group 1: one or more word chars followed with zero or more repetitions of - and one or more word chars.
See a Groovy demo:
String commitMsg = "#build #images = image-a, image-b,image_c, imaged , image-e #setup=my-setup fixing issue with px"
def re = /(?i)(?:\G(?!^)\s?,\s?|#images\s?=\s?)(\w+(?:-\w+)*)/
def res = (commitMsg =~ re).collect { it[1] }
print(res)
Output:
[image-a, image-b, image_c, imaged, image-e]
An alternative Groovy code:
String commitMsg = "#build #images = image-a, image-b,image_c, imaged , image-e #setup=my-setup fixing issue with px"
def re = /(?i)(?:\G(?!^)\s?,\s?|#images\s?=\s?)(\w+(?:-\w+)*)/
def matcher = (commitMsg =~ re).collect()
for(m in matcher) {
println(m[1])
}

Find index locations by regex pattern and replace them with a list of indexes in Scala

I have strings in this format:
object[i].base.base_x[i] and I get lists like List(0,1).
I want to use regular expressions in scala to find the match [i] in the given string and replace the first occurance with 0 and the second with 1. Hence getting something like object[0].base.base_x[1].
I have the following code:
val stringWithoutIndex = "object[i].base.base_x[i]" // basically this string is generated dynamically
val indexReplacePattern = raw"\[i\]".r
val indexValues = List(0,1) // list generated dynamically
if(indexValues.nonEmpty){
indexValues.map(row => {
indexReplacePattern.replaceFirstIn(stringWithoutIndex , "[" + row + "]")
})
else stringWithoutIndex
Since String is immutable, I cannot update stringWithoutIndex resulting into an output like List("object[0].base.base_x[i]", "object[1].base.base_x[i]").
I tried looking into StringBuilder but I am not sure how to update it. Also, is there a better way to do this? Suggestions other than regex are also welcome.
You couldloop through the integers in indexValues using foldLeft and pass the string stringWithoutIndex as the start value.
Then use replaceFirst to replace the first match with the current value of indexValues.
If you want to use a regex, you might use a positive lookahead (?=]) and a positive lookbehind (?<=\[) to assert the i is between opening and square brackets.
(?<=\[)i(?=])
For example:
val strRegex = """(?<=\[)i(?=])"""
val res = indexValues.foldLeft(stringWithoutIndex) { (s, row) =>
s.replaceFirst(strRegex, row.toString)
}
See the regex demo | Scala demo
How about this:
scala> val str = "object[i].base.base_x[i]"
str: String = object[i].base.base_x[i]
scala> str.replace('i', '0').replace("base_x[0]", "base_x[1]")
res0: String = object[0].base.base_x[1]
This sounds like a job for foldLeft. No need for the if (indexValues.nonEmpty) check.
indexValues.foldLeft(stringWithoutIndex) { (s, row) =>
indexReplacePattern.replaceFirstIn(s, "[" + row + "]")
}

Cannot retrive a group from Scala Regex match

I am struggling with regexps in Scala (2.11.5), I have a followin string to parse (example):
val string = "http://sth.com/sth/56,57597,14058913,Article_title,,5.html"
I want to extract third numeric value in the string above (it needs to be third after a slash because there can be other groups following), in order to do that I have the following regex pattern:
val pattern = """\/\d+,\d+,(\d+)""".r
I have been trying to retrieve the group for the third sequence of digits, but nothing seems to work for me.
val matchList = pattern.findAllMatchIn(string).foreach(println)
val matchListb = pattern.findAllIn(string).foreach(println)
I also tried using matching pattern.
string match {
case pattern(a) => println(a)
case _ => "What's going on?"
}
and got the same results. Either whole regexp is returned or nothing.
Is there an easy way to retrieve a group form regexp pattern in Scala?
You can use group method of scala.util.matching.Regex.Match to get the result.
val string = "http://sth.com/sth/56,57597,14058913,Article_title,,5.html"
val pattern = """\/\d+,\d+,(\d+)""".r
val result = pattern.findAllMatchIn(string) // returns iterator of Match
.toArray
.headOption // returns None if match fails
.map(_.group(1)) // select first regex group
// or simply
val result = pattern.findFirstMatchIn(string).map(_.group(1))
// result = Some(14058913)
// result will be None if the string does not match the pattern.
// if you have more than one groups, for instance:
// val pattern = """\/(\d+),\d+,(\d+)""".r
// result will be Some(56)
Pattern matching is usually the easiest way to do it, but it requires a match on the full string, so you'll have to prefix and suffix your regex pattern with .*:
val string = "http://sth.com/sth/56,57597,14058913,Article_title,,5.html"
val pattern = """.*\/\d+,\d+,(\d+).*""".r
val pattern(x) = string
// x: String = 14058913

Python Replacement of Shortcodes using Regular Expressions

I have a string that looks like this:
my_str = "This sentence has a [b|bolded] word, and [b|another] one too!"
And I need it to be converted into this:
new_str = "This sentence has a <b>bolded</b> word, and <b>another</b> one too!"
Is it possible to use Python's string.replace or re.sub method to do this intelligently?
Just capture all the characters before | inside [] into a group . And the part after | into another group. Just call the captured groups through back-referencing in the replacement part to get the desired output.
Regex:
\[([^\[\]|]*)\|([^\[\]]*)\]
Replacemnet string:
<\1>\2</\1>
DEMO
>>> import re
>>> s = "This sentence has a [b|bolded] word, and [b|another] one too!"
>>> m = re.sub(r'\[([^\[\]|]*)\|([^\[\]]*)\]', r'<\1>\2</\1>', s)
>>> m
'This sentence has a <b>bolded</b> word, and <b>another</b> one too!'
Explanation...
Try this expression: [[]b[|](\w+)[]] shorter version can also be \[b\|(\w+)\]
Where the expression is searching for anything that starts with [b| captures what is between it and the closing ] using \w+ which means [a-zA-Z0-9_] to include a wider range of characters you can also use .*? instead of \w+ which will turn out in \[b\|(.*?)\]
Online Demo
Sample Demo:
import re
p = re.compile(ur'[[]b[|](\w+)[]]')
test_str = u"This sentence has a [b|bolded] word, and [b|another] one too!"
subst = u"<bold>$1</bold>"
result = re.sub(p, subst, test_str)
Output:
This sentence has a <bold>bolded</bold> word, and <bold>another</bold> one too!
Just for reference, in case you don't want two problems:
Quick answer to your particular problem:
my_str = "This sentence has a [b|bolded] word, and [b|another] one too!"
print my_str.replace("[b|", "<b>").replace("]", "</b>")
# output:
# This sentence has a <b>bolded</b> word, and <b>another</b> one too!
This has the flaw that it will replace all ] to </b> regardless whether it is appropriate or not. So you might want to consider the following:
Generalize and wrap it in a function
def replace_stuff(s, char):
begin = s.find("[{}|".format(char))
while begin != -1:
end = s.find("]", begin)
s = s[:begin] + s[begin:end+1].replace("[{}|".format(char),
"<{}>".format(char)).replace("]", "</{}>".format(char)) + s[end+1:]
begin = s.find("[{}|".format(char))
return s
For example
s = "Don't forget to [b|initialize] [code|void toUpper(char const *s)]."
print replace_stuff(s, "code")
# output:
# "Don't forget to [b|initialize] <code>void toUpper(char const *s)</code>."

Remove a number from a comma separated string while properly removing commas

FOR EXAMPLE: Given a string... "1,2,3,4"
I need to be able to remove a given number and the comma after/before depending on if the match is at the end of the string or not.
remove(2) = "1,3,4"
remove(4) = "1,2,3"
Also, I'm using javascript.
As jtdubs shows, an easy way is is to use a split function to obtain an array of elements without the commas, remove the required element from the array, and then rebuild the string with a join function.
For javascript something like this might work:
function remove(array,to_remove)
{
var elements=array.split(",");
var remove_index=elements.indexOf(to_remove);
elements.splice(remove_index,1);
var result=elements.join(",");
return result;
}
var string="1,2,3,4,5";
var newstring = remove(string,"4"); // newstring will contain "1,2,3,5"
document.write(newstring+"<br>");
newstring = remove(string,"5");
document.write(newstring+"<br>"); // will contain "1,2,3,4"
You also need to consider the behavior you want if you have repeats, say the string is "1,2,2,4" and I say "remove(2)" should it remove both instances or just the first? this function will remove only the first instance.
Just use multiple substitutions.
s/^$removed,//;
s/,$removed$//;
s/,$removed,/,/;
This will be easier than trying to invent a single replacement that handles all those cases.
string input = "1,2,3,4";
List<string> parts = new List<string>(input.Split(new char[] { ',' }));
parts.RemoveAt(2);
string output = String.Join(",", parts);
Instead of using regex, I would do something like:
- split on comma
- delete the right element
- join with comma
Here is a perl script that does the job:
#!/usr/bin/perl
use 5.10.1;
use strict;
use warnings;
my $toremove = 5;
my $string = "1,2,3,4,5";
my #tmp = split/,/, $string;
#tmp = grep{ $_ != $toremove }#tmp;
$string =join',', #tmp;
say $string;
Output:
1,2,3,4
Javascript has improved since this question was posted.
I use the following regex to remove items from a csv string
let searchStr = "359";
let regex = new RegExp("^" + searchStr + ",?|," + searchStr);
csvStr = csvStr.replace(regex, "");
If the child_id is the start, middle or end, or only item it is replaced.
If the searchStr is at the start of the csvStr it and any trailing comma is replaced. Else if the searchStr is anywhere else in the csvStr it must be preceded with a comma so the searchStr and its preceding comma are replaced by an empty string.