How to parse Apache Catalina.log - regex

I am trying to find a way to parse a Catalina.log and i am really struggling.
This a piece of the code:
May 12, 2017 2:14:38 PM org.apache.coyote.AbstractProtocol init
SEVERE: Failed to initialize end point associated with ProtocolHandler ["http-apr-10.1.31.104-443"]
java.lang.Exception: Connector attribute SSLCertificateFile must be defined when using SSL with APR
at org.apache.tomcat.util.net.AprEndpoint.bind(AprEndpoint.java:490)
at org.apache.tomcat.util.net.AbstractEndpoint.init(AbstractEndpoint.java:649)
at org.apache.coyote.AbstractProtocol.init(AbstractProtocol.java:434)
at org.apache.catalina.connector.Connector.initInternal(Connector.java:978)
at org.apache.catalina.util.LifecycleBase.init(LifecycleBase.java:102)
at org.apache.catalina.core.StandardService.initInternal(StandardService.java:559)
at org.apache.catalina.util.LifecycleBase.init(LifecycleBase.java:102)
at org.apache.catalina.core.StandardServer.initInternal(StandardServer.java:821)
at org.apache.catalina.util.LifecycleBase.init(LifecycleBase.java:102)
at org.apache.catalina.startup.Catalina.load(Catalina.java:638)
at org.apache.catalina.startup.Catalina.load(Catalina.java:663)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.apache.catalina.startup.Bootstrap.load(Bootstrap.java:253)
at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:427)
I wanna get
Date = May 12, 2017 2:14:38 PM
class = org.apache.coyote.AbstractProtocol init
Error level = SEVERE
Error Msg = Failed to initialize end point associated with ProtocolHandler ["http-apr-10.1.321.224-443"]
Error Msg Body = java.lang.Exception: Connector attribute SSLCertificateFile must be defined when using SSL with APR
at org.apache.tomcat.util.net.AprEndpoint.bind(AprEndpoint.java:490)....
i don even know where to start :)
any ideas are very welcomed

I have prepared for you the following regex:
((Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\s+\d{1,2},\s+\d{4}\s+\d{1,2}:\d{1,2}:\d{1,2}\s(AM|PM))\s(.+)(\r)?\n(FATAL|SEVERE|ERROR|WARN(ING)?|INFO|CONFIG|INFO|DEBUG):\s(.+)(\r)?\n(.+)(\r)?\n(?=\s+at.+java:\d+\))
You can use the following back reference to capture your groups
DATE -> $1
CLASS -> $4
ERROR_LEVEL -> $6
ERROR_MSG -> $8
ERROR_BODY -> $10
The regex will only fetch strings that met the following conditions:
starts by a date in the format specified in your post
after the date, the first line is composed of the class name
the 2nd line is composed of the error level and the error msg
the 3rd line is your error msg body
followed by a java strack trace of n lines starting by \s at and ending by java:\d+)
The regex works in the following way:
((Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\s+\d{1,2},\s+\d{4}\s+\d{1,2}:\d{1,2}:\d{1,2}\s(AM|PM))
This part will fetch the date in the format of your post:
3 char month followed by space(s) then 1 or 2 digits, ',' then year in 4 digit
then space(s), then time(column char, followed by space(s) then AM or PM
\s(.+)(\r)?\n
this part of the regex will allow you to get the rest of your first line corresponding to your class
(FATAL|SEVERE|ERROR|WARN(ING)?|INFO|CONFIG|INFO|DEBUG):\s(.+)(\r)?\n(.+)(\r)?\n
This part will allow you to get the error level (in this exhaustive list) followed by column and the following 2 lines corresponding to your error msg/body
(?=\s+at.+java:\d+\))
This last part is a condition the enforce that your error is followed by a java stack trace.
You might need to adapt a bit some parts of the regex (like the number of lines of the error body, error message) or the stack trace conditions but I think this is a great starting point for your case.
CHEERS!!!

Related

MultiDelimiterSerDe setup in Amazon Hive

I'm trying to use a multidelimiter in a table insert for a hive job in emr on amazon aws. As explained in this link. The delimiter for the file is "|".
https://cwiki.apache.org/confluence/display/Hive/MultiDelimitSerDe
However, I ended up having to use...
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe'
Instead of the documented...
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.MultiDelimitSerDe'
in order for it to not give me this error.
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Cannot validate serde: org.apache.hadoop.hive.serde2.MultiDelimitSerDe
OK. So when I don't get that error, by adding the .contrib, I get this error which is caused by Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassNotFoundException: Class org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe not found
Status: Failed
Vertex failed, vertexName=Map 1, vertexId=vertex_1548264520414_0027_1_00, diagnostics=[Task failed, taskId=task_1548264520414_0027_1_00_000021, diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( failure ) : attempt_1548264520414_0027_1_00_000021_0:java.lang.RuntimeException: java.lang.RuntimeException: Map operator initialization failed
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1840)
at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: Map operator initialization failed
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:354)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:184)
... 14 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassNotFoundException: Class org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe not found
at org.apache.hadoop.hive.ql.exec.MapOperator.getConvertedOI(MapOperator.java:328)
at org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:420)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:286)
... 15 more
So I've been reading that you have to add the .jar file.
https://community.hortonworks.com/questions/82189/hive-cannot-see-jar.html
And so I've tried all kinds of things to get this to work. It says that it is adding it it to the class path.
hive> add jar /usr/lib/hive/lib/hive-contrib-2.3.3-amzn-1.jar
> ;
Added [/usr/lib/hive/lib/hive-contrib-2.3.3-amzn-1.jar] to class path
Added resources: [/usr/lib/hive/lib/hive-contrib-2.3.3-amzn-1.jar]
hive> add jar /usr/lib/hive/lib/hive-contrib.jar
> ;
Added [/usr/lib/hive/lib/hive-contrib.jar] to class path
Added resources: [/usr/lib/hive/lib/hive-contrib.jar]
hive> exit;
So I'm not sure what to do. It's acting as if the .jar file for hive-contrib isn't in the class path despite me adding it. I've also tried running...
export HADOOP_USER_CLASSPATH_FIRST=true
which is found here...
How to include jars in Hive (Amazon Hadoop env)
And that doesn't fix it either.
How can I use a multidelimiter SerDe property for a hive job on aws?
Thank you.
I could not get MultiDelimitSerDe to work. Instead, I was lucky in that the delimiter had quotations on either side of the pipe. So it looks like "|". This turns the values between the quotes into strings, so the additional pipes in those column values don't act as delimiters.
"Test | Test2 "|" Test3 | Test 4 | Test 5 "|" Test 6 "
You can see an explanation in the link below. The part that talks about it is in the comments, not the article.
https://www.ericlin.me/2015/07/how-to-create-a-hive-multi-character-delimitered-table/
If I didn't have those quotation marks around the delimiter, I'm not sure how I would have been able to work with a multi delimiter. Especially if I had quotations in any of my fields, but after checking, out of the billions of rows, there is not a single quote.

Regex for extract the Exception Message fields (Used rsyslog as a message source)

I'm creating log parser to parse the log message from different source like rsyslog, logback extension, nxlog etc.
I have to extract exception message fields. But I stuck while generating regex for below test string.
Test String:
2014-10-16 01:32:22,780 ERROR main Sample.main - java.lang.NullPointerException: Sample Log4j Exception
at Sample.errorLevel3(Sample.java:35)
at Sample.errorLevel2(Sample.java:31)
at Sample.errorLevel1(Sample.java:27)
at Sample.main(Sample.java:16)
Note: \n and \t are received as literal #012#011 in the string escaped by rsyslog
Expected Match:
2014-10-16 01:32:22,780
java.lang.NullPointerException
Sample Log4j Exception
----------
Sample.errorLevel3
Sample.java
35
----------
Sample.errorLevel2
Sample.java
31
----------
Sample.errorLevel1
Sample.java
27

How to append the lines using SED or AWK until a particular pattern does not matches?

I want to append the line which does not start with a given pattern. But i am unable to do that using sed. Plz help me to solve the problem using either sed or awk. For an Example:
INPUT:
18:55:42[pool-1-thread-2] INFO jfileupload.download.http.a - Download completed
18:55:42[HTTPDOWNLOAD] ERROR jfileupload.download.ui.DownloadTransferUI -
java.io.IOException: Failed to show URI:file:/home/rahul/Desktop/
at sun.awt.X11.XDesktopPeer.launch(XDesktopPeer.java:114)
at sun.awt.X11.XDesktopPeer.open(XDesktopPeer.java:77)
at java.awt.Desktop.open(Desktop.java:272)
at jfileupload.download.ui.DownloadTransferUI.a(Unknown Source)
at jfileupload.download.http.HTTPDownloadTransfer.a(Unknown Source)
at jfileupload.download.a.a.run(Unknown Source)
at java.lang.Thread.run(Thread.java:745)
18:55:43[MultiThreadedHttpConnectionManager cleanup] DEBUG org.apache.commons.httpclient.MultiThreadedHttpConnectionManager - ReferenceQueueThread interrupted
java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151)
at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ReferenceQueueThread.run(MultiThreadedHttpConnectionManager.java:1122)
Required Output:
18:55:42[pool-1-thread-2] INFO jfileupload.download.http.a - Download completed
18:55:42[HTTPDOWNLOAD] ERROR jfileupload.download.ui.DownloadTransferUI -java.io.IOException: Failed to show URI:file:/home/rahul/Desktop/, at sun.awt.X11.XDesktopPeer.launch(XDesktopPeer.java:114), at sun.awt.X11.XDesktopPeer.open(XDesktopPeer.java:77), at java.awt.Desktop.open,(Desktop.java:272), at jfileupload.download.ui.DownloadTransferUI.a(Unknown Source), at jfileupload.download.http.HTTPDownloadTransfer.a(Unknown Source), at jfileupload.download.a.a.run(Unknown Source), at java.lang.Thread.run(Thread.java:745)
18:55:43[MultiThreadedHttpConnectionManager cleanup] DEBUG org.apache.commons.httpclient.MultiThreadedHttpConnectionManager - ReferenceQueueThread interrupted,java.lang.InterruptedException, at java.lang.Object.wait(Native Method), at java.lang.ref.ReferenceQueue.remove,(ReferenceQueue.java:135), at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151), at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ReferenceQueueThread.run(MultiThreadedHttpConnectionManager.java:1122)
In the above input i want to append the next line into current line with , until the new line start with such pattern 18:55:42.
Thanks.
not tried it but this should work
awk '/[0-9]+:[0-9]+:[0-9]+/{x=$0}{a[x]=a[x]?a[x]", "$0:$0}END{for (i in a)print a[i]}' file
.
/[0-9]+:[0-9]+:[0-9]+/ If this pattern is matched
{x=$0} Set x to the value of the line
{a[x]=a[x]?a[x]", "$0:$0} Create associative array with x($0) using ternary
operator to check there is a value in a[x] to
begin with. Adds the current lines value to the
a[x]
END{for (i in a)print a[i]} When all records are processed loop through array
and output values

pig REPLACE gives error

Let's assume that my file is named 'data' and looks like this:
2343234 {23.8375,-2.339921102} {(343.34333,-2.0000022)} 5-23-2013-11-am
I need to convert the 2nd field to a pair of coordinate numbers. So I wrote the follwoing code and called it basic.pig:
A = LOAD 'data' AS (f1:int, f2:chararray, f3:chararray. f4:chararray);
B = foreach A generate STRSPLIT(f2,',').$0 as f5, STRSPLIT(f2,',').$1 as f6;
C = foreach B generate REPLACE(f5,'{',' ') as f7, REPLACE(f6,'}',' ') as f8;
and then used (float) to convert the string to a float. But, the command 'REPLACE' fails to work and I get the following error:
-bash-3.2$ pig -x local basic.pig
2013-06-24 16:38:45,030 [main] INFO org.apache.pig.Main - Apache Pig version 0.11.1 (r1459641) compiled
Mar 22 2013, 02:13:53 2013-06-24 16:38:45,031 [main] INFO org.apache.pig.Main - Logging error messages to: /home/--/p/--test/pig_1372117125028.log
2013-06-24 16:38:45,321 [main] INFO org.apache.pig.impl.util.Utils - Default bootup file /home/isl/pmahboubi/.pigbootup not found
2013-06-24 16:38:45,425 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: file:///
2013-06-24 16:38:46,069 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. Lexical error at line 7, column 0. Encountered: <EOF> after : ""
Details at logfile: /home/--/p/--test/pig_1372117125028.log
And this is the details of the pig_137..log
Pig Stack Trace
---------------
ERROR 1000: Error during parsing. Lexical error at line 7, column 0. Encountered: <EOF> after : ""
org.apache.pig.tools.pigscript.parser.TokenMgrError: Lexical error at line 7, column 0. Encountered: <EOF> after : ""
at org.apache.pig.tools.pigscript.parser.PigScriptParserTokenManager.getNextToken(PigScriptParserTokenManager.java:3266)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.jj_ntk(PigScriptParser.java:1134)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:104)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:194)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:170)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
at org.apache.pig.Main.run(Main.java:604)
at org.apache.pig.Main.main(Main.java:157)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
================================================================================
I've got data like this:
2724 1919 2012-11-18T23:57:56.000Z {(33.80981975),(-118.105289)}
2703 6401 2012-11-18T23:57:56.000Z {(55.83525609),(-4.07733138)}
1200 4015 2012-11-18T23:57:56.000Z {(41.49609152),(13.8411998)}
7104 9227 2012-11-18T23:57:56.000Z {(-24.95351118),(-53.46538723)}
and I can do this:
A = LOAD 'my_tsv_data' USING PigStorage('\t') AS (id1:int, id2:int, date:chararray, loc:chararray);
B = FOREACH A GENERATE REPLACE(loc,'\\{|\\}|\\(|\\)','');
C = LIMIT B 10;
DUMP C;
This error
ERROR 1000: Error during parsing. Lexical error at line 7, column 0. Encountered: <EOF> after : ""
came to me because I had used different types of quotation marks. I started with ' and ended with ยด or `, and it took quite a while to find what went wrong. So it had nothing to do with line 7 (my script was not so long, and I shortened data to four lines which naturally did not help), nothing to do with column 0, nothing to do with EOF of data, and hardly anything to do with " marks which I didn't use. So quite misleading error message.
I found the cause by using grunt - pig command shell.

problem in decoupling urls.py , while following a tutorial of django

http://docs.djangoproject.com/en/dev/intro/tutorial03/
I was at the step Decoupling the URLconfs where the tutorial illustrates how to decouple urls.py. On doing exactly what it says, i get the following error-
error at /polls/1/
nothing to repeat
Request Method: GET
Request URL: http://localhost:8000/polls/1/
Exception Type: error
Exception Value:
nothing to repeat
Exception Location: C:\jython2.5.1\Lib\re.py in _compile, line 241
Python Executable: C:\jython2.5.1\jython.bat
Python Version: 2.5.1
Python Path: ['E:\\Programming\\Project\\django_app\\mysite', 'C:\\jython2.5.1\\Lib\\site-packages\\setuptools-0.6c11-py2.5.egg', 'C:\\jython2.5.1\\Lib', '__classpath__', '__pyclasspath__/', 'C:\\jython2.5.1\\Lib\\site-packages']
Server time: Mon, 12 Apr 2010 12:02:56 +0530
Check your regex syntax. In particular, see if you are missing an opening parenthesis before a ? towards the beginning of the pattern, as in
r'^?P<poll_id>\d+)/$'
# ^ note the missing parenthesis
The above should read
r'^(?P<poll_id>\d+)/$'
instead.
(An explanation: "nothing to repeat" is a regex error which arises due to an ? regex operator occurring where it is not preceded by something which it can sensibly attach to. The ? in (?P<...>...) is treated specially, but if you forget the opening parenthesis, the regex engine will treat ? in the regular way, which makes no sense right after ^.)