pig REPLACE gives error - replace

Let's assume that my file is named 'data' and looks like this:
2343234 {23.8375,-2.339921102} {(343.34333,-2.0000022)} 5-23-2013-11-am
I need to convert the 2nd field to a pair of coordinate numbers. So I wrote the follwoing code and called it basic.pig:
A = LOAD 'data' AS (f1:int, f2:chararray, f3:chararray. f4:chararray);
B = foreach A generate STRSPLIT(f2,',').$0 as f5, STRSPLIT(f2,',').$1 as f6;
C = foreach B generate REPLACE(f5,'{',' ') as f7, REPLACE(f6,'}',' ') as f8;
and then used (float) to convert the string to a float. But, the command 'REPLACE' fails to work and I get the following error:
-bash-3.2$ pig -x local basic.pig
2013-06-24 16:38:45,030 [main] INFO org.apache.pig.Main - Apache Pig version 0.11.1 (r1459641) compiled
Mar 22 2013, 02:13:53 2013-06-24 16:38:45,031 [main] INFO org.apache.pig.Main - Logging error messages to: /home/--/p/--test/pig_1372117125028.log
2013-06-24 16:38:45,321 [main] INFO org.apache.pig.impl.util.Utils - Default bootup file /home/isl/pmahboubi/.pigbootup not found
2013-06-24 16:38:45,425 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: file:///
2013-06-24 16:38:46,069 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. Lexical error at line 7, column 0. Encountered: <EOF> after : ""
Details at logfile: /home/--/p/--test/pig_1372117125028.log
And this is the details of the pig_137..log
Pig Stack Trace
---------------
ERROR 1000: Error during parsing. Lexical error at line 7, column 0. Encountered: <EOF> after : ""
org.apache.pig.tools.pigscript.parser.TokenMgrError: Lexical error at line 7, column 0. Encountered: <EOF> after : ""
at org.apache.pig.tools.pigscript.parser.PigScriptParserTokenManager.getNextToken(PigScriptParserTokenManager.java:3266)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.jj_ntk(PigScriptParser.java:1134)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:104)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:194)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:170)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
at org.apache.pig.Main.run(Main.java:604)
at org.apache.pig.Main.main(Main.java:157)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
================================================================================

I've got data like this:
2724 1919 2012-11-18T23:57:56.000Z {(33.80981975),(-118.105289)}
2703 6401 2012-11-18T23:57:56.000Z {(55.83525609),(-4.07733138)}
1200 4015 2012-11-18T23:57:56.000Z {(41.49609152),(13.8411998)}
7104 9227 2012-11-18T23:57:56.000Z {(-24.95351118),(-53.46538723)}
and I can do this:
A = LOAD 'my_tsv_data' USING PigStorage('\t') AS (id1:int, id2:int, date:chararray, loc:chararray);
B = FOREACH A GENERATE REPLACE(loc,'\\{|\\}|\\(|\\)','');
C = LIMIT B 10;
DUMP C;

This error
ERROR 1000: Error during parsing. Lexical error at line 7, column 0. Encountered: <EOF> after : ""
came to me because I had used different types of quotation marks. I started with ' and ended with ยด or `, and it took quite a while to find what went wrong. So it had nothing to do with line 7 (my script was not so long, and I shortened data to four lines which naturally did not help), nothing to do with column 0, nothing to do with EOF of data, and hardly anything to do with " marks which I didn't use. So quite misleading error message.
I found the cause by using grunt - pig command shell.

Related

How to parse Apache Catalina.log

I am trying to find a way to parse a Catalina.log and i am really struggling.
This a piece of the code:
May 12, 2017 2:14:38 PM org.apache.coyote.AbstractProtocol init
SEVERE: Failed to initialize end point associated with ProtocolHandler ["http-apr-10.1.31.104-443"]
java.lang.Exception: Connector attribute SSLCertificateFile must be defined when using SSL with APR
at org.apache.tomcat.util.net.AprEndpoint.bind(AprEndpoint.java:490)
at org.apache.tomcat.util.net.AbstractEndpoint.init(AbstractEndpoint.java:649)
at org.apache.coyote.AbstractProtocol.init(AbstractProtocol.java:434)
at org.apache.catalina.connector.Connector.initInternal(Connector.java:978)
at org.apache.catalina.util.LifecycleBase.init(LifecycleBase.java:102)
at org.apache.catalina.core.StandardService.initInternal(StandardService.java:559)
at org.apache.catalina.util.LifecycleBase.init(LifecycleBase.java:102)
at org.apache.catalina.core.StandardServer.initInternal(StandardServer.java:821)
at org.apache.catalina.util.LifecycleBase.init(LifecycleBase.java:102)
at org.apache.catalina.startup.Catalina.load(Catalina.java:638)
at org.apache.catalina.startup.Catalina.load(Catalina.java:663)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.apache.catalina.startup.Bootstrap.load(Bootstrap.java:253)
at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:427)
I wanna get
Date = May 12, 2017 2:14:38 PM
class = org.apache.coyote.AbstractProtocol init
Error level = SEVERE
Error Msg = Failed to initialize end point associated with ProtocolHandler ["http-apr-10.1.321.224-443"]
Error Msg Body = java.lang.Exception: Connector attribute SSLCertificateFile must be defined when using SSL with APR
at org.apache.tomcat.util.net.AprEndpoint.bind(AprEndpoint.java:490)....
i don even know where to start :)
any ideas are very welcomed
I have prepared for you the following regex:
((Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\s+\d{1,2},\s+\d{4}\s+\d{1,2}:\d{1,2}:\d{1,2}\s(AM|PM))\s(.+)(\r)?\n(FATAL|SEVERE|ERROR|WARN(ING)?|INFO|CONFIG|INFO|DEBUG):\s(.+)(\r)?\n(.+)(\r)?\n(?=\s+at.+java:\d+\))
You can use the following back reference to capture your groups
DATE -> $1
CLASS -> $4
ERROR_LEVEL -> $6
ERROR_MSG -> $8
ERROR_BODY -> $10
The regex will only fetch strings that met the following conditions:
starts by a date in the format specified in your post
after the date, the first line is composed of the class name
the 2nd line is composed of the error level and the error msg
the 3rd line is your error msg body
followed by a java strack trace of n lines starting by \s at and ending by java:\d+)
The regex works in the following way:
((Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\s+\d{1,2},\s+\d{4}\s+\d{1,2}:\d{1,2}:\d{1,2}\s(AM|PM))
This part will fetch the date in the format of your post:
3 char month followed by space(s) then 1 or 2 digits, ',' then year in 4 digit
then space(s), then time(column char, followed by space(s) then AM or PM
\s(.+)(\r)?\n
this part of the regex will allow you to get the rest of your first line corresponding to your class
(FATAL|SEVERE|ERROR|WARN(ING)?|INFO|CONFIG|INFO|DEBUG):\s(.+)(\r)?\n(.+)(\r)?\n
This part will allow you to get the error level (in this exhaustive list) followed by column and the following 2 lines corresponding to your error msg/body
(?=\s+at.+java:\d+\))
This last part is a condition the enforce that your error is followed by a java stack trace.
You might need to adapt a bit some parts of the regex (like the number of lines of the error body, error message) or the stack trace conditions but I think this is a great starting point for your case.
CHEERS!!!

awk print lines before while match INFO untill match ERROR

I want to print lines before my /ERROR/ match. The lines to be printed should be all containing INFO untill the previous ERROR is found.
So If I had a file
ERROR this is an error
INFO error found on line 2
INFO error is due to something
ERROR this is another error
I want the /ERROR/ from ERROR this is another error to print
INFO error found on line 2
INFO error is due to something
ERROR this is another error
Anyone know?
Part of my current script:
/CRITICAL/ {
print "\x1b[93;1m"
}
/ERROR/ {
print "\x1b[37m"
}
/ERROR|EMERGENCY|CRITICAL/ {
if (NR == n+1) print "";
n = NR;
print x;print
print "\x1b[0m"
};{x=$0}'
Try this one liner:
awk 'x;/ERROR/{x=1}' file
Out:
INFO error found on line 2
INFO error is due to something
ERROR this is another error
Long version:
x;/ERROR/{
x1=1;
print
}
If "ERROR" is found x=1, if x is true we've already gone through that line, then we print until we pass that line again.
Or maybe this, I don't have very clear what output you need.
awk '/ERROR/{x=1;next}/ERROR/{x=1}x'
Out
INFO error found on line 2
INFO error is due to something

Regex for extract the Exception Message fields (Used rsyslog as a message source)

I'm creating log parser to parse the log message from different source like rsyslog, logback extension, nxlog etc.
I have to extract exception message fields. But I stuck while generating regex for below test string.
Test String:
2014-10-16 01:32:22,780 ERROR main Sample.main - java.lang.NullPointerException: Sample Log4j Exception
at Sample.errorLevel3(Sample.java:35)
at Sample.errorLevel2(Sample.java:31)
at Sample.errorLevel1(Sample.java:27)
at Sample.main(Sample.java:16)
Note: \n and \t are received as literal #012#011 in the string escaped by rsyslog
Expected Match:
2014-10-16 01:32:22,780
java.lang.NullPointerException
Sample Log4j Exception
----------
Sample.errorLevel3
Sample.java
35
----------
Sample.errorLevel2
Sample.java
31
----------
Sample.errorLevel1
Sample.java
27

How to append the lines using SED or AWK until a particular pattern does not matches?

I want to append the line which does not start with a given pattern. But i am unable to do that using sed. Plz help me to solve the problem using either sed or awk. For an Example:
INPUT:
18:55:42[pool-1-thread-2] INFO jfileupload.download.http.a - Download completed
18:55:42[HTTPDOWNLOAD] ERROR jfileupload.download.ui.DownloadTransferUI -
java.io.IOException: Failed to show URI:file:/home/rahul/Desktop/
at sun.awt.X11.XDesktopPeer.launch(XDesktopPeer.java:114)
at sun.awt.X11.XDesktopPeer.open(XDesktopPeer.java:77)
at java.awt.Desktop.open(Desktop.java:272)
at jfileupload.download.ui.DownloadTransferUI.a(Unknown Source)
at jfileupload.download.http.HTTPDownloadTransfer.a(Unknown Source)
at jfileupload.download.a.a.run(Unknown Source)
at java.lang.Thread.run(Thread.java:745)
18:55:43[MultiThreadedHttpConnectionManager cleanup] DEBUG org.apache.commons.httpclient.MultiThreadedHttpConnectionManager - ReferenceQueueThread interrupted
java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151)
at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ReferenceQueueThread.run(MultiThreadedHttpConnectionManager.java:1122)
Required Output:
18:55:42[pool-1-thread-2] INFO jfileupload.download.http.a - Download completed
18:55:42[HTTPDOWNLOAD] ERROR jfileupload.download.ui.DownloadTransferUI -java.io.IOException: Failed to show URI:file:/home/rahul/Desktop/, at sun.awt.X11.XDesktopPeer.launch(XDesktopPeer.java:114), at sun.awt.X11.XDesktopPeer.open(XDesktopPeer.java:77), at java.awt.Desktop.open,(Desktop.java:272), at jfileupload.download.ui.DownloadTransferUI.a(Unknown Source), at jfileupload.download.http.HTTPDownloadTransfer.a(Unknown Source), at jfileupload.download.a.a.run(Unknown Source), at java.lang.Thread.run(Thread.java:745)
18:55:43[MultiThreadedHttpConnectionManager cleanup] DEBUG org.apache.commons.httpclient.MultiThreadedHttpConnectionManager - ReferenceQueueThread interrupted,java.lang.InterruptedException, at java.lang.Object.wait(Native Method), at java.lang.ref.ReferenceQueue.remove,(ReferenceQueue.java:135), at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151), at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ReferenceQueueThread.run(MultiThreadedHttpConnectionManager.java:1122)
In the above input i want to append the next line into current line with , until the new line start with such pattern 18:55:42.
Thanks.
not tried it but this should work
awk '/[0-9]+:[0-9]+:[0-9]+/{x=$0}{a[x]=a[x]?a[x]", "$0:$0}END{for (i in a)print a[i]}' file
.
/[0-9]+:[0-9]+:[0-9]+/ If this pattern is matched
{x=$0} Set x to the value of the line
{a[x]=a[x]?a[x]", "$0:$0} Create associative array with x($0) using ternary
operator to check there is a value in a[x] to
begin with. Adds the current lines value to the
a[x]
END{for (i in a)print a[i]} When all records are processed loop through array
and output values

doxygen error state 21 with fortran code

I was searching the web for help and didn't find any. Thats why I thought it might be a good idea to document my problem here.
I had the following problem while documenting a really old (15-20 years) FORTRAN-Code with doxygen. I have a file with the same filename as the subroutine in it. And some of these files gave me an error:
********************************************************************
Error in file FILENAME line: XX, state: 21
********************************************************************
I didn't figure out, what the error state 21 is. After some digging into the code I did find the problem. I have a WRITE-command like
WRITE(*,'('' THIS IS SOME TEXT ''
+ '' THIS IS SOME MORE TEXT : '',I6,
+ /'' AND EVEN MORE TEXT ! '')')
+ VARIABLE
The problem here is the exclamation mark (!) in the code line. Doxygen seems to interpret the end of the line after the exclamation mark as doxygen syntax and not FORTRAN code. I changed the line into
WRITE(*,'('' THIS IS SOME TEXT ''
+ '' THIS IS SOME MORE TEXT : '',I6,
+ /'' AND EVEN MORE TEXT ! ''
+ )')VARIABLE
and now everything works fine!