I would be very thankful if anyone could help me on this.
So, I have the following logs/data:
ORA-Balance-Element () =
ORA-Balance-Element-Id () = 1000
ORA-Balance-Element () =
ORA-Balance-Element-Id () = 1001
ORA-Balance-Element () =
ORA-Balance-Element-Id () = 1002
ORA-Balance-Element () =
ORA-Balance-Element-Id () = 1003
ORA-Balance-Element () =
ORA-Balance-Element-Id () = 1004
ORA-Balance-Element () =
ORA-Balance-Element-Id () = 1005
ORA-Balance-Element () =
ORA-Balance-Element-Id () = 1006
ORA-Balance-Element () =
ORA-Balance-Element-Id () = 1007
I want to find a pattern that will match all the numbers after the equal sign (8 in total)
and get the result in a list (I will handle this list in Logstash conf).
So far I tried:
\s*ORA-Balance-Element %{DATA:tmp} = %{DATA:ora_balance_element}\n\s*ORA-Balance-Element-Id %{DATA:tmp} = %{DATA:ora_balance_element_id}\n
but this will only find me the first match (the first number, 1000).
Also, I tried:
(\s*ORA-Balance-Element %{DATA:tmp} = %{DATA:ora_balance_element}\n\s*ORA-Balance-Element-Id %{DATA:tmp} = %{DATA:ora_balance_element_id}\n)+
but again I could not be able to take all the matches. (This last one pattern returns the last match, number 1007).
The result I want is a list, e.g.
l = [1000, 1001, 1002, 1003, 1004, 1005, 1006, 1007]
Please notice that the above sample of logs is a multiline event and not many events. In logstash, it refers to one message and not multiple ones.
Can anyone help me solve this?
Thanks a lot!
Related
It doesn't matter what arguments I supply for the bufferSize and overflowStrategy parameters of Source.queue, the result is always something like the output at the bottom. I was expecting to see the offer invocations and offer results complete more or less immediately, and to be able to see different processing and offer result messages based on bufferSize and overflowStrategy. What am I doing wrong here?
Code:
def main(args: Array[String]): Unit = {
implicit val system: ActorSystem = ActorSystem("scratch")
implicit val materializer: ActorMaterializer = ActorMaterializer()
implicit val executionContext: ExecutionContextExecutor = system.dispatcher
val start = Instant.now()
def elapsed = time.Duration.between(start, Instant.now()).toMillis
val intSource = Source.queue[Int](2, OverflowStrategy.dropHead)
val intSink = Sink foreach { ii: Int =>
Thread.sleep(1000)
println(s"processing $ii at $elapsed")
}
val intChannel = intSource.to(intSink).run()
(1 to 4) map { ii =>
println(s"offer invocation for $ii at $elapsed")
(ii, intChannel.offer(ii))
} foreach { intFutureOfferResultPair =>
val (ii, futureOfferResult) = intFutureOfferResultPair
futureOfferResult onComplete { offerResult =>
println(s"offer result for $ii: $offerResult at $elapsed")
}
}
intChannel.complete()
intChannel.watchCompletion.onComplete { _ => system.terminate() }
}
Output:
offer invocation for 1 at 72
offer invocation for 2 at 77
offer invocation for 3 at 77
offer invocation for 4 at 77
offer result for 1: Success(Enqueued) at 90
processing 1 at 1084
offer result for 2: Success(Enqueued) at 1084
processing 2 at 2084
offer result for 3: Success(Enqueued) at 2084
processing 3 at 3084
offer result for 4: Success(Enqueued) at 3084
processing 4 at 4084
I can get the expected behavior by replacing:
val intChannel = intSource.to(intSink).run()
with:
val (intChannel, futureDone) = intSource.async.toMat(intSink)(Keep.both).run()
and:
intChannel.watchCompletion.onComplete { _ => system.terminate() }
with:
futureDone.onComplete { _ => system.terminate() }
Fixed Code:
def main(args: Array[String]): Unit = {
implicit val system: ActorSystem = ActorSystem("scratch")
implicit val materializer: ActorMaterializer = ActorMaterializer()
implicit val executionContext: ExecutionContextExecutor = system.dispatcher
val start = Instant.now()
def elapsed = time.Duration.between(start, Instant.now()).toMillis
val intSource = Source.queue[Int](2, OverflowStrategy.dropHead)
val intSink = Sink foreach { ii: Int =>
Thread.sleep(1000)
println(s"processing $ii at $elapsed")
}
val (intChannel, futureDone) = intSource.async.toMat(intSink)(Keep.both).run()
(1 to 4) map { ii =>
println(s"offer invocation for $ii at $elapsed")
(ii, intChannel.offer(ii))
} foreach { intFutureOfferResultPair =>
val (ii, futureOfferResult) = intFutureOfferResultPair
futureOfferResult onComplete { offerResult =>
println(s"offer result for $ii: $offerResult at $elapsed")
}
}
intChannel.complete()
futureDone.onComplete { _ => system.terminate() }
}
Output
offer invocation for 1 at 84
offer invocation for 2 at 89
offer invocation for 3 at 89
offer invocation for 4 at 89
offer result for 3: Success(Enqueued) at 110
offer result for 4: Success(Enqueued) at 110
offer result for 1: Success(Enqueued) at 110
offer result for 2: Success(Enqueued) at 110
processing 3 at 1102
processing 4 at 2102
I am coming from R background. I could able to implement the pattern search on a Dataframe col in R. But now struggling to do it in spark scala. Any help would be appreciated
problem statement is broken down into details just to describe it appropriately
DF :
Case Freq
135322 265
183201,135322 36
135322,135322 18
135322,121200 11
121200,135322 8
112107,112107 7
183201,135322,135322 4
112107,135322,183201,121200,80000 2
I am looking for a pattern search UDF, which gives me back all the matches of the pattern and then corresponding Freq value from the second col.
example : for pattern 135322 , i would like to find out all the matches in first col Case.It should return corresponding Freq number from Freq col.
Like 265,36,18,11,8,4,2
for pattern 112107,112107 it should return just 7 because there is one matching pattern.
This is how the end result should look
Case Freq results
135322 265 256+36+18+11+8+4+2
183201,135322 36 36+4+2
135322,135322 18 18+4
135322,121200 11 11+2
121200,135322 8 8+2
112107,112107 7 7
183201,135322,135322 4 4
112107,135322,183201,121200,80000 2 2
what i tried so far:
val text= DF.select("case").collect().map(_.getString(0)).mkString("|")
//search function for pattern search
val valsum = udf((txt: String, pattern : String)=> {
txt.split("\\|").count(_.contains(pattern))
} )
//apply the UDF on the first col
val dfValSum = DF.withColumn("results", valsum( lit(text),DF("case")))
This one works
import common.Spark.sparkSession
import java.util.regex.Pattern
import util.control.Breaks._
object playground extends App {
import org.apache.spark.sql.functions._
val pattern = "135322,121200" // Pattern you want to search for
// udf declaration
val coder: ((String, String) => Boolean) = (caseCol: String, pattern: String) =>
{
var result = true
val splitPattern = pattern.split(",")
val splitCaseCol = caseCol.split(",")
var foundAtIndex = -1
for (i <- 0 to splitPattern.length - 1) {
breakable {
for (j <- 0 to splitCaseCol.length - 1) {
if (j > foundAtIndex) {
println(splitCaseCol(j))
if (splitCaseCol(j) == splitPattern(i)) {
result = true
foundAtIndex = j
break
} else result = false
} else result = false
}
}
}
println(caseCol, result)
(result)
}
// registering the udf
val udfFilter = udf(coder)
//reading the input file
val df = sparkSession.read.option("delimiter", "\t").option("header", "true").csv("output.txt")
//calling the function and aggregating
df.filter(udfFilter(col("Case"), lit(pattern))).agg(lit(pattern), sum("Freq")).toDF("pattern","sum").show
}
if input is
135322,121200
Output is
+-------------+----+
| pattern| sum|
+-------------+----+
|135322,121200|13.0|
+-------------+----+
if input is
135322,135322
Output is
+-------------+----+
| pattern| sum|
+-------------+----+
|135322,135322|22.0|
+-------------+----+
If the input argument are not as expected I want to exit the program. How should I achieve that? Below is my attempt.
let () =
if ((Array.length Sys.argv) - 1) <> 2 then
exit 0 ; ()
else
()
Thanks.
exit n is a right way to exit program, but your code has a syntax error. if ... then exit 0; () is parsed as (if ... then exit 0); (). Therefore you got a syntax error around else, since it is not correctly paired with then.
You should write:
let () =
if ((Array.length Sys.argv) - 1) <> 2 then begin
exit 0 ; ()
end else
()
or simply,
let () = if Array.length Sys.argv - 1 <> 2 then exit 0
I get this as a response to an API hit.
1735 Queries
Taking 1.001303 to 31.856310 seconds to complete
SET timestamp=XXX;
SELECT * FROM ABC_EM WHERE last_modified >= 'XXX' AND last_modified < 'XXX';
38 Queries
Taking 1.007646 to 5.284330 seconds to complete
SET timestamp=XXX;
show slave status;
6 Queries
Taking 1.021271 to 1.959838 seconds to complete
SET timestamp=XXX;
SHOW SLAVE STATUS;
2 Queries
Taking 4.825584, 18.947725 seconds to complete
use marketing;
SET timestamp=XXX;
SELECT * FROM ABC WHERE last_modified >= 'XXX' AND last_modified < 'XXX';
I have extracted this out of the response html and have it as a string now.I need to retrieve values as concisely as possible such that I get a map of values of this format Map(Query -> T1 to T2 seconds) Basically what this is the status of all the slow queries running on MySQL slave server. I am building an alert system over it . So from this entire paragraph in the form of String I need to separate out the queries and save the corresponding time range with them.
1.001303 to 31.856310 is a time range . And against the time range the corresponding query is :
SET timestamp=XXX; SELECT * FROM ABC_EM WHERE last_modified >= 'XXX' AND last_modified < 'XXX';
This information I was hoping to save in a Map in scala. A Map of the form (query:String->timeRange:String)
Another example:
("use marketing; SET timestamp=XXX; SELECT * FROM ABC WHERE last_modified >= 'XXX' AND last_modified xyz ;"->"4.825584 to 18.947725 seconds")
"""###(.)###(.)\n\n(.*)###""".r.findAllIn(reqSlowQueryData).matchData foreach {m => println("group0"+m.group(1)+"next group"+m.group(2)+m.group(3)}
I am using the above statement to extract the the repeating cells to do my manipulations on it later. But it doesnt seem to be working;
THANKS IN ADvance! I know there are several ways to do this but all the ones striking me are inefficient and tedious. I need Scala to do the same! Maybe I can extract recursively using the subString method ?
If you want use scala try this:
val regex = """(\d+).(\d+).*(\d+).(\d+) seconds""".r // extract range
val txt = """
|1735 Queries
|
|Taking 1.001303 to 31.856310 seconds to complete
|
|SET timestamp=XXX; SELECT * FROM ABC_EM WHERE last_modified >= 'XXX' AND last_modified < 'XXX';
|
|38 Queries
|
|Taking 1.007646 to 5.284330 seconds to complete
|
|SET timestamp=XXX; show slave status;
|
|6 Queries
|
|Taking 1.021271 to 1.959838 seconds to complete
|
|SET timestamp=XXX; SHOW SLAVE STATUS;
|
|2 Queries
|
|Taking 4.825584, 18.947725 seconds to complete
|
|use marketing; SET timestamp=XXX; SELECT * FROM ABC WHERE last_modified >= 'XXX' AND last_modified < 'XXX';
""".stripMargin
def logToMap(txt:String) = {
val (_,map) = txt.lines.foldLeft[(Option[String],Map[String,String])]((None,Map.empty)){
(acc,el) =>
val (taking,map) = acc // taking contains range
taking match {
case Some(range) if el.trim.nonEmpty => //Some contains range
(None,map + ( el -> range)) // add to map
case None =>
regex.findFirstIn(el) match { //extract range
case Some(range) => (Some(range),map)
case _ => (None,map)
}
case _ => (taking,map) // probably empty line
}
}
map
}
Modified ajozwik's answer to work for SQL commands over multiple lines :
val regex = """(\d+).(\d+).*(\d+).(\d+) seconds""".r // extract range
def logToMap(txt:String) = {
val (_,map) = txt.lines.foldLeft[(Option[String],Map[String,String])]((None,Map.empty)){
(accumulator,element) =>
val (taking,map) = accumulator
taking match {
case Some(range) if element.trim.nonEmpty=> {
if (element.contains("Queries"))
(None, map)
else
(Some(range),map+(range->(map.getOrElse(range,"")+element)))
}
case None =>
regex.findFirstIn(element) match {
case Some(range) => (Some(range),map)
case _ => (None,map)
}
case _ => (taking,map)
}
}
println(map)
map
}
I would like to generate an outfile that looks like this:
MULTILOG for Windows 7.00.2327.2
Created on: 22 February 2012, 15:02:00
PROBLEM RANDOM,
INDIVIDUAL,
DATA = 'C:\Multilog\TACH_MATH03B_FT_MF.IDM.txt',
NITEMS = 65,
NGROUPS = 1,
NEXAMINEES = 63382;
TEST ALL,
L3;
END ;
2
01
11111111111111111111111111111111111111111111111111111111111111111
N
(T30,65A1)
To make like the above, I wrote this code but got some errors. what did I do wrong?
data _null_;
file 'C:\Users\ubishky\Documents\TN try\dry run 2010-11\mcfmath3.txt';
put #1 MULTILOG for Windows 7.00.2327.2;
#1 Created on: &sysdate9, &systime;
#1>PROBLEM RANDOM,;
#10 INDIVIDUAL,;
#10 DATA = 'C:\Multilog\TACH_MATH03B_FT_MF.IDM.txt',;
#10 NITEMS = 64,;
#10 NGROUPS = 1, ;
#10 NEXAMINEES = 63382;
#1>TEST ALL,;
#7 L3;;
#1 >END ;;
#1 2;
#1 01;
#one=REPEAT(1,63);
#1 N;
#1(T30,65A1);
run;
Try adding PUT to the beginning of your # statements.