I want to extract a word from a string and then use that word in my regex.
My string looks like this:
val s = "null_eci_count"
I want to derive the below string from the above string:
sum(cast((eci is null or eci in ('', '0', 'null', 'NULL')) as int))
I used replaceAll and had derived a part of the above expression:
scala> s.replaceAll("null_", "sum(cast((").replaceAll("_count"," is null) as int))")
res69: String = sum(cast((eci is null) as int))
Please suggest a way to derive the whole expression.
Select the middle part of the string as a group (i.e. eci) .*?_(.*?)_.* and then return eci with the group reference \1.
How about:
val eci = s.split("_").drop(1).head
val result = s match {s"sum(cast(($eci is null or $eci in ('', '0', 'null', 'NULL')) as int))"
I used ArrayBuffer to do this:
import scala.collection.mutable.ArrayBuffer
val tgt=spark.sql("select * from ctx_monitor.xpo_click_counts")
val a = tgt.columns.slice(4,tgt.columns.length)
for (e <- a) {
if (e contains "null"){ val c=e.replaceFirst("null_","");
col += "sum(cast((" + c + " is null or " + c + " in('','0','null','NULL')) as int))"}}
val cols=col.mkString(",")
Related
Lets assume I have a string as such:
val a = "aaaabbbcccss"
and I want to group only the a's and b's as such:
"a4b3cccss"
I have tries a.toList.groupBy(identity).mapValues(_.size) but that returns a map with no ordering so I cannot convert it into the form I want. I was wondering if there is a function in scala that can achieve what I want?
You may use
val a = "aaaabbbcccss"
val p = """([ab])\1*""".r
println(p replaceAllIn (a, m => s"${m.group(1)}${m.group(0).size}") )
See Scala demo
The regex matches:
([ab]) - Group 1: a or b
\1* - zero or more occurrences of the char captured into Group 1.
In the replacement part, m.group(1) is the char captured into Group 1 and m.group(0).size is the size of the whole match.
As an alternative, you might create a function which you can give your string and a list of characters and use a recursive approach where you could take consecutive characters from the list using takeWhile.
Then drop from the list using the length of the result from takewhile and add to the accumulator what you want to concatenate to the acc string which will be returned when the list will be empty.
def countSimilar(str: String, ch: List[Char]): String = {
def process(l: List[Char], acc: String = ""): String = {
l match {
case Nil => acc
case h :: _ =>
val tw = l.takeWhile(_ == h)
acc + process(
l.drop(tw.length),
if (ch.contains(h)) h + tw.length.toString else tw.mkString("")
)
}
}
process(str.toList)
}
println(countSimilar("aaaabbbcccss", List('a', 'b')))
println(countSimilar("aaaabbbcccssaaaabb", List('a', 'b', 'c')))
That will give you:
a4b3cccss
a4b3c3ssa4b2
See the Scala demo
I have a string like this:
val str = "3.2.1"
And I want to do some manipulations based on it.
I will share also what I want to do and it will be nice if you can share your suggestions:
im doing automation for some website, and based on this string I need to do some actions.
So:
the first digit - I will need to choose by value: value="str[0]"
the second digit - I will need to choose by value: value="str[0]+"."+str[1]"
the third digit - I will need to choose by value: value="str[0]+"."+str[1]+"."+str[2]"
as you can see the second field i need to choose is the name firstdigit.seconddigit and the third field is firstdigit.seconddigit.thirddigit
You can use pattern matching for this.
First create regex:
# val pattern = """(\d+)\.(\d+)\.(\d+)""".r
pattern: util.matching.Regex = (\d+)\.(\d+)\.(\d+)
then you can use it to pattern match:
# "3.4.342" match { case pattern(a, b, c) => println(a, b, c) }
(3,4,342)
if you don't need all numbers you can for example do this
"1.2.0" match { case pattern(a, _, _) => println(a) }
1
if you want to for example to take just first two numbers you can do
# val twoNumbers = "1.2.0" match { case pattern(a, b, _) => s"$a.$b" }
twoNumbers: String = "1.2"
Can only add to #Lukasz's answer one more variant with the values extration:
# val pattern = """(\d+)\.(\d+)\.(\d+)""".r
pattern: scala.util.matching.Regex = (\d+)\.(\d+)\.(\d+)
# val pattern(firstdigit, seconddigit, thirddigit) = "3.2.1"
firstdigit: String = "3"
seconddigit: String = "2"
thirddigit: String = "1"
This way all the values can be treated as regular vals further in the code.
val str="vaquar.khan"
val strArray=str.split("\\.")
strArray.foreach(println)
Try the following:
scala> "3.2.1".split(".")
res0: Array[java.lang.String] = Array(string1, string2, string3)
This one:
object Splitter {
def splitAndAccumulate(string: String) = {
val s = string.split("\\.")
s.tail.scanLeft(s.head){ case (acc, elem) =>
acc + "." + elem
}
}
}
passes this test:
test("Simple"){
val t = Splitter.splitAndAccumulate("1.2.3")
val answers = Seq("1", "1.2", "1.2.3")
t.zip(answers).foreach{ case (l, r) =>
assert(l == r)
}
}
I want to extract a list of ID of a string pattern in the following:
{(2),(4),(5),(100)}
Note: no leading or trailing spaces.
The List can have up to 1000 IDs.
I want to use rich string pattern matching to do this. But I tried for 20 minutes with frustration.
Could anyone help me to come up with the correct pattern? Much appreciated!
Here's brute force string manipulation.
scala> "{(2),(4),(5),(100)}".replaceAll("\\(", "").replaceAll("\\)", "").replaceAll("\\{","").replaceAll("\\}","").split(",")
res0: Array[java.lang.String] = Array(2, 4, 5, 100)
Here's a regex as #pst noted in the comments. If you don't want the parentheses change the regular expression to """\d+""".r.
val num = """\(\d+\)""".r
"{(2),(4),(5),(100)}" findAllIn res0
res33: scala.util.matching.Regex.MatchIterator = non-empty iterator
scala> res33.toList
res34: List[String] = List((2), (4), (5), (100))
"{(2),(4),(5),(100)}".split ("[^0-9]").filter(_.length > 0).map (_.toInt)
Split, where char is not part of a number, and only convert non-empty results.
Might be modified to include dots or minus signs.
Use Extractor object:
object MyList {
def apply(l: List[String]): String =
if (l != Nil) "{(" + l.mkString("),(") + ")}"
else "{}"
def unapply(str: String): Some[List[String]] =
Some(
if (str.indexOf("(") > 0)
str.substring(str.indexOf("(") + 1, str.lastIndexOf(")")) split
"\\p{Space}*\\)\\p{Space}*,\\p{Space}*\\(\\p{Space}*" toList
else Nil
)
}
// test
"{(1),(2)}" match { case MyList(l) => l }
// res23: List[String] = List(1, 2)
I am struggling to concatenate a message with two texts into a single text using regex in scala
original message = "part1 "+" part2"
original message = "part1 " + " part2"
original message = "part 1 "+ " part2"
concatenated message = "part1 part2"
What I am using is this code below (to replace atleast the + sign with null)
val line:String = """"text1"+"text2"""" //My original String which is "text1"+"text2"
val temp_line:String = line.replaceAll("\\+","")
println(temp_line)
It works fine and results "text1""text2". Is there a way to get the output "text1 text2" using regex?
Please help. Thanks in advance
This is really not an ideal problem for regexes, but okay:
val Part = """"([^"]*)"(.*$)""".r // Quotes, non quotes, quotes, then the rest
val Plus = """\s*\+\s*(.*)""".r // Plus with optional spaces, then the rest
def parts(s: String, found: List[String] = Nil): String = s match {
case Part(p,rest) => rest match {
case "" => (p :: found).map(_.filter(c => !c.isWhitespace)).reverse.mkString(" ")
case Plus(more) => parts(more, p :: found)
case x => throw new IllegalArgumentException(s"$p :$x:")
}
case x => throw new IllegalArgumentException(s"|$x|")
}
This just takes the input string apart piece by piece; you can add printlns if you want to see how it works. (Note that + is a special character in regex, so you need to escape it to match it.)
scala> parts(""""part1 "+" part2"""")
res1: String = part1 part2
scala> parts(""""part1 " + " part2"""")
res2: String = part1 part2
scala> parts(""""part 1 "+ " part2"""")
res3: String = part1 part2
I have string like {param1=foo}{param2=bar}hello world!
I need to extract array of tuples (paramName, value) from this string and get something like [(param1, foo), (param2, bar)]
Is it possible in Scala to extract this tuples by only one regex? Because I managed to do this only in way like
val str = "{param1=foo}{param2=bar}hello world!"
val param = """(?<=\{)(.+?)(?=\})""".r // extract everything between { and }
val keyValue = """(.+)=(.+)""".r // for extracting key and value
val parameters = for (keyValue(key,value) <- param.findAllIn(str).toArray)
yield (key,value)
And it doesn't look sweet.
Also I tried to use
val param = """(?<=\{)(.+?)=(.+?)(?=\})""".r
But it return param=value as one string
Here's an expression that will find things like {A=B} where A and B do not contain {, }, or =.
scala> val Re = """\{([^{}=]+)=([^{}=]+)\}""".r
scala> val Re(a,b) = "{param1=foo}"
a: String = param1
b: String = foo
And if you want to find all matches in a string:
scala> val s = "{param1=foo}{param2=bar}hello world!"
scala> Re.findAllIn(s).matchData.map(_.subgroups).toList
res9: List[List[String]] = List(List(param1, foo), List(param2, bar))
Without regex you can do:
scala> val str = "{param1=foo}{param2=bar}hello world!"
scala> str split '}' filter(x => x.head =='{' && x.contains('=')) map{x => val Array(key, value) = x.tail split '='; key -> value }
res9: Array[(java.lang.String, java.lang.String)] = Array((param1,foo), (param2,bar))
Or in a clearer way:
// We find different blocks
val str1 = str split '}'
// We remove invalid blocks (end of the String in your case)
val str2 = str1 filter(x => x.head == '{' && x.contains('='))
// We transform the String into a tupple, removing the head
val str3 = str2 map{x =>
val Array(key, value) = x.tail split '='
key -> value
}