How can I remove all trailing backslashes from a string in Scala? - regex

I want to remove all trailing backslashes ('\') from a string.
For example:
"ab" -> "ab"
"ab\\\\" -> "ab"
"\\\\ab\\" -> "\\\\ab"
"\\" -> ""
I am able to do this using below code but unable to handle the scenario where the String has only slash(es). Please let me know if this can be achieved through a different regex.
val str = """\\\\q\\"""
val regex = """^(.*[^\\])(\\+)$""".r
str match {
case regex(rest, slashes) => str.stripSuffix(slashes)
case _ => str
}

Converting my comment as an answer. This should work for removing all trailing backslashes:
str = str.replaceFirst("\\\\+$", "");
\\\\+ matches 1+ backslashes (single backslash is entered as \\\\ in Java/Scala).

While not a regex, I suggest a simpler solution : str.reverse.dropWhile(_ == '\\').reverse

Not using a regex, but you could use String.lastIndexWhere(p: (Char) ⇒ Boolean) to get the position of the last character which is not a '\' in order to substring until this character:
str.substring(0, str.lastIndexWhere(_ != '\\') + 1)

If, for some reason, you're committed to a regex solution, it can be done.
val regex = """[^\\]?(\\*)$""".r.unanchored
str match {
case regex(slashes) => str.stripSuffix(slashes)
}

You can do the same with slice function
str.slice(0,str.lastIndexWhere(_ != '\\')+1)

Related

How to replace all '\n' character after 'word' with 'comma' character

Trying to replace all the \n character after the word 'key2:' pattern with comma.
Input String:
key1:value1\nkey2:value2\nvalue22\nvalue222
Expected:
key1:value1\nkey2:value2,value22,value222
Tried:
r'key2:(\n*$)' replace with ','
any suggestions on how can i replace it using regex! from https://rustexp.lpil.uk/
I don't think this can easily be done with regex, so I'd propose a simpler way:
let mut s = String::from("key1:value1\nkey2:value2\nvalue22\nvalue222");
let expected = "key1:value1\nkey2:value2,value22,value222";
let key2 = "key2";
let substr_index = s.find(key2).unwrap() + key2.len();
let commas = s[substr_index..].replace("\n", ",");
s.replace_range(substr_index.., &commas);
assert_eq!(s, expected);

Kotlin .split() with multiple regex

Input: """aaaabb\\\\\cc"""
Pattern: ["""aaa""", """\\""", """\"""]
Output: [aaa, abb, \\, \\, \, cc]
How can I split Input to Output using patterns in Pattern in Kotlin?
I found that Regex("(?<=cha)|(?=cha)") helps patterns to remain after spliting, so I tried to use looping, but some of the patterns like '\' and '[' require escape backslash, so I'm not able to use loop for spliting.
EDIT:
val temp = mutableListOf<String>()
for (e in Input.split(Regex("(?<=\\)|(?=\\)"))) temp.add(e)
This is what I've been doing, but this does not work for multiple regex, and this add extra "" at the end of temp if Input ends with "\"
You may use the function I wrote for some previous question that splits by a pattern keeping all matched and non-matched substrings:
private fun splitKeepDelims(s: String, rx: Regex, keep_empty: Boolean = true) : MutableList<String> {
var res = mutableListOf<String>() // Declare the mutable list var
var start = 0 // Define var for substring start pos
rx.findAll(s).forEach { // Looking for matches
val substr_before = s.substring(start, it.range.first()) // // Substring before match start
if (substr_before.length > 0 || keep_empty) {
res.add(substr_before) // Adding substring before match start
}
res.add(it.value) // Adding match
start = it.range.last()+1 // Updating start pos of next substring before match
}
if ( start != s.length ) res.add(s.substring(start)) // Adding text after last match if any
return res
}
You just need a dynamic pattern from yoyur Pattern list items by joining them with a |, an alternation operator while remembering to escape all the items:
val Pattern = listOf("aaa", """\\""", "\\") // Define the list of literal patterns
val rx = Pattern.map{Regex.escape(it)}.joinToString("|").toRegex() // Build a pattern, \Qaaa\E|\Q\\\E|\Q\\E
val text = """aaaabb\\\\\cc"""
println(splitKeepDelims(text, rx, false))
// => [aaa, abb, \\, \\, \, cc]
See the Kotlin demo
Note that between \Q and \E, all chars in the pattern are considered literal chars, not special regex metacharacters.

Remove occurance of backslash in string not followed by another backslash or pipe

Need help to create a regex that can work on backslash on the below conditions.
1> if the string contains backslash ,not followed by another backslash or pipe consecutively than I should return the string without backslash
2>if the string contains a backslash and a pipe consecutively than I should not remove the backslash.
3>if it contains backslash followed by another backslash consecutively (\\) than I should not remove the backslash.
scala> val str = """Sports\s"""
str: String = Sports\s
scala> str.replaceAll("""\\""", "")
res70: String = Sportss
scala> val str = """Sports\\s"""
str: String = Sports\\s
scala> str.replaceAll("""\\""", "")
res71: String = Sportss
scala> val str = """Sports\\|s"""
str: String = Sports\\|s
scala> str.replaceAll("""\\""", "")
res74: String = Sports|s
In the above tests if the string contains a single backslash or combination of backslash and pipe , the backspace is being removed completely, What should I modify to handle the cases?
Based on my earlier question the working answer was provided, But here is another case where the solution is not working.
scala> val str = "Spo\\rts\\s"
str: String = Spo\rts\s
scala> str.replaceFirst("""^([^\\|]*)\\([^\\|]*)$""", "$1$2")
res102: String = Spo\rts\s
expected output should be without \
You may use
s.replaceAll("""(\\+[\\|])|\\""", "$1")
See the regex demo online.
Pattern details
(\\+[\\|]) - Capturing group 1:
\\+ - 1 or more \ chars
[\\|] - a \ or | char
| - or
\\ - a backslash
The $1 replacement pattern will insert the value inside capturing group #1.
See the Scala demo:
val p = """(\\+[\\|])|\\"""
println("""Sports\s""".replaceAll(p, "$1")) // => Sportss
println("""Sports\|s""".replaceAll(p, "$1")) // => Sports\|s
println("""Sports\\s""".replaceAll(p, "$1")) // => Sports\\s
println("""Sports|\\s""".replaceAll(p, "$1"))// => Sports|\\s

Scala regex find matches in middle of string [duplicate]

This question already has an answer here:
Working regex fails when using Scala pattern matching
(1 answer)
Closed 5 years ago.
I have written the following code in scala:
val regex_str = "([a-z]+)(\\d+)".r
"_abc123" match {
case regex_str(a, n) => "found"
case _ => "other"
}
which returns "other", but if I take off the leading underscore:
val regex_str = "([a-z]+)(\\d+)".r
"abc123" match {
case regex_str(a, n) => "found"
case _ => "other"
}
I get "found". How can I find any ([a-z]+)(\\d+) instead of just at the beginning? I am used to other regex languages where you use a ^ to specify beginning of the string, and the absence of that just gets all matches.
Scala regex patterns default as "anchored", i.e. bound to beginning and end of target string.
You'll get the expected match with this.
val regex_str = "([a-z]+)(\\d+)".r.unanchored
Hi May be you need something like this,
val regex_str = "[^>]([a-z]+)(\\d+)".r
"_abc123" match {
case regex_str(a, n) => println(s"found $a $n")
case _ => println("other")
}
This will avoid the first character from your string.
Hope this helps!
The unapplySeq of the Regex tries to capture the whole input by default (treats the pattern as if it was between ^ and $).
There are two ways to capture inside the input:
use .* before and after the captures: val regex_str = ".*([a-z]+)(\\d+).*".r
do the same with .unanchored: val regex_str = "([a-z]+)(\\d+)".r.unanchored
Otherwise scala treats regular expression anchors the same way as in other languages; this one is an exception made for semantic reasons.
The regex extractor in scala pattern-matching attempts to match the entire string. If you want to skip some junk-characters in the beginning and in the end, prepend a . with a reluctant quantifier to the regex:
val regex_str = ".*?([a-z]+)(\\d+).*".r
val result = "_!+<>__abc123_%$" match {
case regex_str(a, n) => s"found a = '$a', n = '$n'"
case _ => "no match"
}
println(result)
This outputs:
found a = 'abc', n = '123'
Otherwise, don't use the pattern match with the extractor, use "...".r.findAllIn to find all matches.

In Scala how can I split a string on whitespaces accounting for an embedded quoted string?

I know Scala can split strings on regex's like this simple split on whitespace:
myString.split("\\s+").foreach(println)
What if I want to split on whitespace, accounting for the possibility that there may be a quoted string in the input (which I wish to be treated as 1 thing)?
"""This is a "very complex" test"""
In this example I want the resulting substrings to be:
This
is
a
very complex
test
While handling quoted expressions with split can be tricky, doing so with Regex matches is quite easy. We just need to match all non-whitespace character sequences with ([^\\s]+) and all quoted character sequences with \"(.*?)\" (toList added in order to avoid reiteration):
import scala.util.matching._
val text = """This is a "very complex" test"""
val regex = new Regex("\"(.*?)\"|([^\\s]+)")
val matches = regex.findAllMatchIn(text).toList
val words = matches.map { _.subgroups.flatMap(Option(_)).fold("")(_ ++ _) }
words.foreach(println)
/*
This
is
a
very complex
test
*/
Note that the solution also counts quote itself as a word boundary. If you want to inline quoted strings into surrounding expressions, you'll need to add [^\\s]* from both sides of the quoted case and adjust group boundaries correspondingly:
...
val text = """This is a ["very complex"] test"""
val regex = new Regex("([^\\s]*\".*?\"[^\\s]*)|([^\\s]+)")
...
/*
This
is
a
["very complex"]
test
*/
You can also omit quote symbols when inlining a string by splitting a regex group:
...
val text = """This is a ["very complex"] test"""
val regex = new Regex("([^\\s]*)\"(.*?)\"([^\\s]*)|([^\\s]+)")
...
/*
This
is
a
[very complex]
test
*/
In more complex scenarios, when you have to deal with CSV strings, you'd better use a CSV parser (e.g. scala-csv).
For a string like the one in question, when you do not have to deal with escaped quotation marks, nor with any "wild" quotes appearing in the middle of the fields, you may adapt a known Java solution (see Regex for splitting a string using space when not surrounded by single or double quotes):
val text = """This is a "very complex" test"""
val p = "\"([^\"]*)\"|[^\"\\s]+".r
val allMatches = p.findAllMatchIn(text).map(
m => if (m.group(1) != null) m.group(1) else m.group(0)
)
println(allMatches.mkString("\n"))
See the online Scala demo, output:
This
is
a
very complex
test
The regex is rather basic as it contains 2 alternatives, a single capturing group and a negated character class. Here are its details:
\"([^\"]*)\" - ", followed with 0+ chars other than " (captured into Group 1) and then a "
| - or
[^\"\\s]+ - 1+ chars other than " and whitespace.
You only grab .group(1) if Group 1 participated in the match, else, grab the whole match value (.group(0)).
This should work:
val xx = """This is a "very complex" test"""
var x = xx.split("\\s+")
for(i <-0 until x.length) {
if(x(i) contains "\"") {
x(i) = x(i) + " " + x(i + 1)
x(i + 1 ) = ""
}
}
val newX= x.filter(_ != "")
for(i<-newX) {
println(i.replace("\"",""))
}
Rather than using split, I used a recursive approach. Treat the input string as a List[Char], then step through, inspecting the head of the list to see if it is a quote or whitespace, and handle accordingly.
def fancySplit(s: String): List[String] = {
def recurse(s: List[Char]): List[String] = s match {
case Nil => Nil
case '"' :: tail =>
val (quoted, theRest) = tail.span(_ != '"')
quoted.mkString :: recurse(theRest drop 1)
case c :: tail if c.isWhitespace => recurse(tail)
case chars =>
val (word, theRest) = chars.span(c => !c.isWhitespace && c != '"')
word.mkString :: recurse(theRest)
}
recurse(s.toList)
}
If the list is empty, you've finished recursion
If the first character is a ", grab everything up to the next quote, and recurse with what's left (after throwing out that second quote).
If the first character is whitespace, throw it out and recurse from the next character
In any other case, grab everything up to the next split character, then recurse with what's left
Results:
scala> fancySplit("""This is a "very complex" test""") foreach println
This
is
a
very complex
test