Replacement of strings within 2 strings in regex - regex

I have a string:
dkj a * & &*(&(*(
//#HELLO
^%#&UJNWDUK()C*(v 8*J DK*9
//#HE#$^&&(akls#$98akdjl ak##sjdkja
//
%^&*(//#HELLO//#BYE<><>
//#BYE
^%#&UJNWDUK()C*(v 8*J DK*90K )
//#HELLO
&*^J$XUK 8j8 j jk kk8(&*(
//#BYE
and I need to have 2 groups such as each group must start with //HELLO then there should be a next line and any type of text can follow (.*) but it will end with a //BYE preceded by a line:
1)
//#HELLO
^%#&UJNWDUK()C*(v 8*J DK*9
//#HE#$^&&(akls#$98akdjl ak##sjdkja
//
%^&*(//#HELLO//#BYE<><>
//#BYE
2)
//#HELLO
&*^J$XUK 8j8 j jk kk8(&*(
//#BYE
and replaces the original string to this: (basically adding // to each line of each group)
dkj a * & &*(&(*(
////#HELLO
//^%#&UJNWDUK()C*(v 8*J DK*9
////#HE#$^&&(akls#$98akdjl ak##sjdkja
////
//%^&*(//#HELLO//#BYE<><>
////#BYE
^%#&UJNWDUK()C*(v 8*J DK*90K )
////#HELLO
//&*^J$XUK 8j8 j jk kk8(&*(
////#BYE
Here is my current progress:
I have
\/\/#HELLO\n.*?\/\/#BYE[\n$]
However im not sure how to go about the replacement, I'm thinking separating each line per group using \G after the //#HELLO and ending with //#BYE

It's a bit complex, but this will do it:
Search: (?m)(//#HELLO[\r\n]+|\G(?://#BYE|(?=(?:[^#]|#(?!HELLO[\r\n]+))*#BYE)[^\r\n]*[\r\n]*))
Replace: //$1
In Groovy:
String resultString = subjectString.replaceAll(/(?m)(\/\/#HELLO[\r\n]+|\G(?:\/\/#BYE|(?=(?:[^#]|#(?!HELLO[\r\n]+))*#BYE)[^\r\n]*[\r\n]*))/, '//$1');

For grouping into separate lines use the following regex:
//#HELLO\r(.*[\n\r]+)*//#BYE\r?
\r - Newline character
[\n\r] - Enter characters
*? - Non-greedy match
?- Match 1 or 0 times
You can take out the ? at the end if it always ends with a newline.
You can then use the group (The value inside the brackets) to search and replace.

Related

How to clear all commas except for commas in even position in sheet?

I have multiple rows of string where the string is all wrong. Here is one row an an example of the geometry and output expected:
id
geometry
output
1
POLYGON (( 106.812271, -6.361551, 106.812111, -6.361339, 106.81205, -6.361177, 106.81206, -6.360905, 106.812055, -6.360582, 106.812065, -6.360218, 106.812293, -6.359295, 106.812593, -6.358644, 106.812436, -6.358406, 106.8121515, -6.3582051, 106.8123, -6.357823, 106.81244, -6.357407, 106.812612, -6.356842, 106.812719, -6.356544, 106.81274, -6.356384, 106.812864, -6.356148, 106.813019, -6.356021, 106.813287, -6.355797, 106.813781, -6.355286, 106.814076, -6.354751, 106.814277, -6.354393, 106.814403, -6.354027, 106.814553, -6.353814, 106.814736, -6.353526, 106.814993, -6.353302, 106.81516, -6.353024, 106.815358, -6.35279, 106.815509, -6.352588, 106.815675, -6.352331, 106.8153007, -6.3521138, 106.8151398, -6.3520137, 106.8149789, -6.3518005, 106.8147643, -6.3516939, 106.8144639, -6.3516245, 106.8141527, -6.3515392, 106.8135734, -6.351342, 106.813171, -6.3512034, 106.8123284, -6.3509219, 106.8122418, -6.3511298, 106.8118164, -6.3521534, 106.8116597, -6.3525047, 106.8111849, -6.3535692, 106.8102245, -6.3554942, 106.8093545, -6.3568947, 106.8085097, -6.3580518, 106.80795, -6.358832, 106.8077793, -6.3590429, 106.807668, -6.359441, 106.807499, -6.360346, 106.8072531, -6.3616378, 106.8071476, -6.3622599, 106.8070637, -6.3626798, 106.8070823, -6.3629367, 106.8071207, -6.3634531, 106.8078269, -6.363831, 106.809448, -6.364124, 106.810574, -6.364198, 106.81066, -6.362993, 106.811175, -6.36277, 106.812087, -6.361703, 106.812271, -6.361551))
POLYGON (( 106.812271 -6.361551, 106.812111 -6.361339, 106.81205 -6.361177, 106.81206 -6.360905, 106.812055 -6.360582, 106.812065 -6.360218, 106.812293 -6.359295, 106.812593 -6.358644, 106.812436 -6.358406, 106.8121515 -6.3582051, 106.8123 -6.357823, 106.81244 -6.357407, 106.812612 -6.356842, 106.812719 -6.356544, 106.81274 -6.356384, 106.812864 -6.356148, 106.813019 -6.356021, 106.813287 -6.355797, 106.813781 -6.355286, 106.814076 -6.354751, 106.814277 -6.354393, 106.814403 -6.354027, 106.814553 -6.353814, 106.814736 -6.353526, 106.814993 -6.353302, 106.81516 -6.353024, 106.815358 -6.35279, 106.815509 -6.352588, 106.815675, -6.352331, 106.8153007, -6.3521138, 106.8151398 -6.3520137, 106.8149789 -6.3518005, 106.8147643 -6.3516939, 106.8144639 -6.3516245, 106.8141527 -6.3515392, 106.8135734 -6.351342, 106.813171 -6.3512034, 106.8123284 -6.3509219, 106.8122418 -6.3511298, 106.8118164 -6.3521534, 106.8116597 -6.3525047, 106.8111849 -6.3535692, 106.8102245 -6.3554942, 106.8093545 -6.3568947, 106.8085097 -6.3580518, 106.80795 -6.358832, 106.8077793 -6.3590429, 106.807668 -6.359441, 106.807499 -6.360346, 106.8072531 -6.3616378, 106.8071476 -6.3622599, 106.8070637 -6.3626798, 106.8070823 -6.3629367, 106.8071207 -6.3634531, 106.8078269 -6.363831, 106.809448 -6.364124, 106.810574 -6.364198, 106.81066 -6.362993, 106.811175 -6.36277, 106.812087 -6.361703, 106.812271 -6.361551))
One example is as follows above. I need to get rid of all odd position commas and only keep the even position commas. So that the geometry can become output.
I tried doing a split(text.",") and concatenate however when the columns is blank it returns xxx,,,, which is not what I had in mind.
Since some have more than 200 commas means that I need to have more than 200 columns, is there a simpler way like using regex?Someone please help.
If the second number is always negative, this is simple as replacing , - (comma, space, dash) with (space).
=REGEXREPLACE(B2,", -"," ")
If not,
=REGEXREPLACE(B2,"(-??\d+\.?\d*),(\s*-?\d+\.?\d*)","$1$2")
Capture group #1: (-??\d+\.?\d*)
-?? zero or one of literal dash followed by
\d+ one or more digits followed by
.? zero or one of literal .
\d* zero or more of digits
literal ,
Capture group #2 (\s*-?\d+\.?\d*)
\s* zero or more of space characters
-? zero or one of literal dash followed by
\d+ one or more digits followed by
.? zero or one of literal .
\d* zero or more of digits
Replace with capture groups only: $1$2
try:
=INDEX(REGEXREPLACE(QUERY(FLATTEN(SPLIT(A1, ",")&IF(ISODD(
SEQUENCE(1, COLUMNS(SPLIT(A1, ",")))),, ",")),,9^9), ",$", ))
for array:
=INDEX(IFERROR(BYROW(A1:A3, LAMBDA(x, REGEXREPLACE(QUERY(FLATTEN(SPLIT(x, ",")&
IF(ISODD(SEQUENCE(1, COLUMNS(SPLIT(x, ",")))),, ",")),,9^9), ",$", )))))
An idea to match , and capture any non-commas with an optional comma after:
=REGEXREPLACE(A1; ",([^,]*,?)"; "$1")
Replace with $1 what was captured by the first group - See this demo at regex101

How do I replace the nth occurrence of a special character, say, a pipe delimiter with another in Scala?

I'm new to Spark using Scala and I need to replace every nth occurrence of the delimiter with the newline character.
So far, I have been successful at entering a new line after the pipe delimiter.
I'm unable to replace the delimiter itself.
My input string is
val txt = "January|February|March|April|May|June|July|August|September|October|November|December"
println(txt.replaceAll(".\\|", "$0\n"))
The above statement generates the following output.
January|
February|
March|
April|
May|
June|
July|
August|
September|
October|
November|
December
I referred to the suggestion at https://salesforce.stackexchange.com/questions/189923/adding-comma-separator-for-every-nth-character but when I enter the number in the curly braces, I only end up adding the newline after 2 characters after the delimiter.
I'm expecting my output to be as given below.
January|February
March|April
May|June
July|August
September|October
November|December
How do I change my regular expression to get the desired output?
Update:
My friend suggested I try the following statement
println(txt.replaceAll("(.*?\\|){2}", "$0\n"))
and this produced the following output
January|February|
March|April|
May|June|
July|August|
September|October|
November|December
Now I just need to get rid of the pipe symbol at the end of each line.
You want to move the 2nd bar | outside of the capture group.
txt.replaceAll("([^|]+\\|[^|]+)\\|", "$1\n")
//val res0: String =
// January|February
// March|April
// May|June
// July|August
// September|October
// November|December
Regex Explained (regex is not Scala)
( - start a capture group
[^|] - any character as long as it's not the bar | character
[^|]+ - 1 or more of those (any) non-bar chars
\\| - followed by a single bar char |
[^|]+ - followed by 1 or more of any non-bar chars
) - close the capture group
\\| - followed by a single bar char (not in capture group)
"$1\n" - replace the entire matching string with just the first $1 capture group ($0 is the entire matching string) followed by the newline char
UPDATE
For the general case of N repetitions, regex becomes a bit more cumbersome, at least if you're trying to do it with a single regex formula.
The simplest thing to do (not the most efficient but simple to code) is to traverse the String twice.
val n = 5
txt.replaceAll(s"(\\w+\\|){$n}", "$0\n")
.replaceAll("\\|\n", "\n")
//val res0: String =
// January|February|March|April|May
// June|July|August|September|October
// November|December
You could first split the string using '|' to get the array of string and then loop through it to perform the logic you want and get the output as required.
val txt = "January|February|March|April|May|June|July|August|September|October|November|December"
val out = txt.split("\\|")
var output: String = ""
for(i<-0 until out.length -1 by 2){
val ref = out(i) + "|" + out(i+1) + "\n"
output = output + ref
}
val finalout = output.replaceAll("\"\"","") //just to remove the starting double quote
println(finalout)

Remove the text before second comma ('',") String replace pattern

how can we remove the text before the line that start's with second comma(line 5 in the example),how can i do that using regex?
example :
,
abc,xyz,ggg,nrmr
cde,jjj,kkkk,iiii,tem,posting
234,mm/dd/yy
,
454654,output2,sample
45646,output1,non-sample
16546,225.02
ABC,2.98
expected :
454654,output2,sample
45646,output1,non-sample
16546,225.02
ABC,2.98
It seems you may use
val s = """,
abc,xyz,ggg,nrmr
cde,jjj,kkkk,iiii,tem,posting
234,mm/dd/yy
,
454654,output2,sample
45646,output1,non-sample
16546,225.02
ABC,2.98"""
val res = s.replaceFirst("(?sm)\\A(.*?^,$){2}", "").trim()
println(res)
// =>
// 454654,output2,sample
// 45646,output1,non-sample
// 16546,225.02
// ABC,2.98
See the Scala demo.
Pattern details:
(?sm) - s enables . to match any char in the string including newlines, and m makes ^ and $ match start/end of line respectively
\\A - the start of string
(.*?^,$){2} - 2 occurrences of:
.*? - any 0+ chars as few as possible up to the leftmost
^,$ - line that only contains ,.

Remove optional whitespace when splitting with math operators while keeping them in the result

How to remove whitespace characters from the input string? I am using the following code that
Dim input As String = txtInput.Text
Dim symbol As String = "([-+*/])"
Dim substrings() As String = Regex.Split(input, symbol)
Dim cleaned As String = Regex.Replace(input, "\s", " ")
For Each match As String In substrings
lstOutput.Items.Add(match)
Next
Input: z + x
Output: z, + and x.
I want to get rid of the whitespace in the last item.
You may remove the redundant whitespace while splitting with
\s*([-+*/])\s*
See the regex demo. Also, it is a good idea to trim the input before passing to the regex replace method with .Trim().
Pattern details:
\s* - matches 0+ whitespaces (these will be discarded from the result as they are not captured)
([-+*/]) - Group 1 (captured texts will be output to the resulting array) capturing 1 char: -, +, * or /
\s* - matches 0+ whitespaces (these will be discarded from the result as they are not captured)

Regex to extract value at fixed position index

I have the following string of characters:
73746174652C313A312C310D
|
- extract the value at this position
I would like to extract the value 1 (the 1 at the end of the string) using regex.
So basically a regex that acts as a charAt(index).
I need this solution for a 3rd party application that only supports regular expressions. Note that the application cannot access capture groups and does not support negative lookbehinds.
In C#:
(?<=^.{21})(.)
in JS:
/.(?=.{2}$)/
You could try:
(?<=^.{21}).
It won't work in Javascript, but perhaps it will work in your app.
It means: a single character preceded (?<= ... ) by the beginning of the string ^ plus 21 characters .{21} . So, in the end, it returns the 22th character.
The 22nd character is in capture group 1.
/^.{21}(.)/
But what system are you in that requires this instead of normal string processing?
Depends how you want to match it ( x distance from the beginning or x distance from the end )
/(.).{2}$/ Third from the end (capturing group 1)
/^.{21}(.)/ 22nd character (capturing group 1)
//PHP
$str = '73746174652C313A312C310D';
$char = preg_replace('/(.).{2}$/','$1',$str); //3rd from last
preg_match('/(.).{2}$/',$str,$chars); //3rd from last
$char = $chars[1];
preg_match('/^.{21}(.)/',$str,$chars); //22nd character
$char = $chars[1];
//JS
var str = '73746174652C313A312C310D';
var ch = str.replace(/(.).{2}$/,'$1'); //3rd from last
var ch = str.match(/(.).{2}$/)[1]; //3rd from last
var ch = str.match(/^.{21}(.)/)[1]; //22nd character
If you're having to use the result of the First match: bit of your tool, run it twice:
73746174652C313A312C310D - ^.{21}. = 73746174652C313A312C31
73746174652C313A312C31 - .$ = 1