Logstash filter: compare time using regex - regex

I use this code in logstash filter to compare time but don't work.
if [timecheck] =~ /.*((\[0\]\[0-6\]):\[0-5\]\[0-9\]:\[0-5\]\[0-9\])|((\[1\]\[2-9\]|2\[0-3\]):\[0-5\]\[0-9\]:\[0-5\]\[0-9\]).*/ {
mutate {
add_tag => "OVERTIME"
}
}
else if [timecheck] =~ /.+/ {
mutate {
add_tag => "WORKING-HOURS"
}
}
else {
mutate { add_tag => "NO-TIMECHECK-MATCH" }
}
logstash work but regex not match. Always enter in WORKING-HOURS because is not empty
(I try regex on regexr.com and work well)

Don't escape the square brackets.
if [timecheck] =~ /(([0][0-6]):[0-5][0-9]:[0-5][0-9])|(([1][8-9]|2[0-3]):[0-5][0-9]:[0-5][0-9])/ {
mutate {
add_tag => "OVERTIME"
add_field => { "time-work" => "OVERTIME" }
}
}
else if [timecheck] =~ /.+/ {
mutate {
add_tag => "WORKING-HOURS"
add_field => { "time-work" => "WORKING-HOURS" }
}
}
else {
mutate { add_tag => "NO-TIMECHECK-MATCH" }
}

Related

Perl merging results of maps

How do I concatenate the output of two maps to form a single flat array?
I was attempting to use this:
my $test = { 'foo' => [
map {
{ 'i' => "$_" }
} 0..1,
map {
{ 'j' => "$_" }
} 0..1
] };
To achieve the result of this:
my $test = {'foo' => [
{
'i' => '0'
},
{
'i' => '1'
},
{
'j' => '0'
},
{
'j' => '1'
},
]}
However, this is what I got in $test instead:
{
'foo' => [
{
'i' => '0'
},
{
'i' => '1'
},
{
'i' => 'HASH(0x7f90ad19cd30)'
},
{
'i' => 'HASH(0x7f90ae200908)'
}
]
};
Looks like the result of the second map gets iterated over by the first map.
The list returned by the second map is a part of the input list for the first one, following 0..1,.
The parenthesis can fix that
use warnings;
use strict;
use Data::Dump;
my $test = {
'foo' => [
( map { { i => $_ } } 0..1 ),
( map { { j => $_ } } 0..1 )
],
};
dd($test);
since they delimit the expression so now the first map takes only 0..1 as its input list and computes and returns a list, which is then merged with the second map's returned list. (Strictly speaking you need parenthesis only around the first map.)
This prints
{ foo => [{ i => 0 }, { i => 1 }, { j => 0 }, { j => 1 }] }
I've removed unneeded quotes, please restore them if/as needed in your application.
Without parenthesis the map after the comma is taken to be a part of the expression generating the input list for the first map, so producing the next element(s) of the input list after 0..1, effectively
map { i => $_ } (0..1, LIST);
Consider
my #arr = (
map { { 'i', $_ } }
0..1,
qw(a b),
map { ( { 'j', $_ } ) } 0..1
);
dd(\#arr);
It prints
[
{ i => 0 },
{ i => 1 },
{ i => "a" },
{ i => "b" },
{ i => { j => 0 } },
{ i => { j => 1 } },
]
This is also seen from your output, where all keys are i (no js).

logstash-filter not honoring the regex

I am reading files as input and thenafter pass it to be filtered, and accordingly based on the [type] the if/else for output (stdout) follows.
here is the conf part :
filter {
if [path] =~ "error" {
mutate {
replace => { "type" => "ERROR_LOGS"}
}
grok {
match => {"error_examiner" => "%{GREEDYDATA:err}"}
}
if [err] =~ "9999" {
if [err] =~ "invalid offset" {
mutate {
replace => {"type" => "DISCARDED_ERROR_LOGS"}
}
grok {
match => {"message" => "\[%{DATA:date}\] \[%{WORD:logtype} \] \[%{IPORHOST:ip}\]->\[http://search:9999/%{WORD:searchORsuggest}/%{DATA:askme_tag}\?%{DATA:paramstr}\] \[%{DATA:reason}\]"}
}
date {
match => [ "date", "YYYY-MM-DD aaa HH:mm:ss" ]
locale => en
}
geoip {
source => "ip"
target => "geo_ip"
}
kv {
source => "paramstr"
trimkey => "&\?\[\],"
value_split => "="
target => "params"
}
}
else {
mutate {
replace => {"type" => "ACCEPTED_ERROR_LOGS"}
}
grok {
match => {
"message" => "\[%{DATA:date}\] \[%{WORD:logtype} \] \[%{WORD:uptime}\/%{NUMBER:downtime}\] \[%{IPORHOST:ip}\]->\[http://search:9999/%{WORD:searchORsuggest}\/%{DATA:askme_tag}\?%{DATA:paramstr}\]"
}
}
date {
match => [ "date" , "YYYY-MM-DD aaa HH:mm:ss" ]
locale => en
}
geoip {
source => "ip"
target => "geo_ip"
}
kv {
source => "paramstr"
trimkey => "&\?\[\],"
value_split => "="
target => "params"
}
}
}
else if [err] =~ "Exception" {
mutate {
replace => {"type" => "EXCEPTIONS_IN_ERROR_LOGS"}
}
grok {
match => { "message" => "%{GREEDYDATA}"}
}
}
}
else if [path] =~ "info" {
mutate {
replace => {"type" => "INFO_LOGS"}
}
grok {
match => {
"info_examiner" => "%{GREEDYDATA:info}"
}
}
if [info] =~ "9999" {
mutate {
replace => {"type" => "ACCEPTED_INFO_LOGS"}
}
grok {
match => {
"message" => "\[%{DATA:date}\] \[%{WORD:logtype} \] \[%{WORD:uptime}\/%{NUMBER:downtime}\]( \[%{WORD:qtype}\])?( \[%{NUMBER:outsearch}/%{NUMBER:insearch}\])? \[%{IPORHOST:ip}\]->\[http://search:9999/%{WORD:searchORsuggest}/%{DATA:askme_tag}\?%{DATA:paramstr}\]"
}
}
date {
match => [ "date" , "YYYY-MM-DD aaa HH:mm:ss" ]
locale => en
}
geoip {
source => "ip"
target => "geo_ip"
}
kv {
source => "paramstr"
trimkey => "&\?\[\],"
value_split => "="
target => "params"
}
}
else {
mutate {replace => {"type" => "DISCARDED_INFO_LOGS"}}
grok {
match => {"message" => "%{GREEDYDATA}"}
}
}
}
}
the grok regexps I have tested to be working http://grokdebug.herokuapp.com/
however, what's not working is this part :
grok {
match => {"error_examiner" => "%{GREEDYDATA:err}"}
}
if [err] =~ "9999" {
I was wondering what's wrong in there ???
Actually, I have fixed it. Here is what I'd like to share with other fellows, of what I learnt while initial experiments with logstash, for the documentation and other resources aren't so very telling ...
"error_examiner" or "info_examiner" wont work, parse the instance/event row in "message"
geoip doesnt work for internal ips.
kv, for this you must specify the field_split and value_split if they aren't like a=1 b=2 , but say a:1&b:2 then field_Split is &, value_split is :
stdout, by default badly prepends if codec chosen is json, please choose rubydebug.
Thanks,

Logstash Multiline filter for websphere/java logs

Hello i have problem with my logstash multiline configuration. I'm parsing websphere/java logs and multiline don't work on some cases of logs.
My multiline configuration looks like this. I tried several types of regex but no one worked.
codec => multiline {
pattern => "^\A%{SYSLOG5424SD}"
negate => true
what => previous
}
This is example of log that is not parsed in right way:
[1.6.2015 15:02:46:635 CEST] 00000109 BusinessExcep E CNTR0020E: EJB threw an unexpected (non-declared) exception during invocation of method "processCommand" on bean "BeanId(Issz_Produkcia_2.1.63#Ssz_Server_EJB.jar#CommandDispatcherImpl, null)". Exception data: javax.ejb.EJBTransactionRolledbackException: Transaction rolled back; nested exception is: javax.ejb.EJBTransactionRolledbackException: Transaction rolled back; nested exception is: javax.transaction.TransactionRolledbackException: Transaction is ended due to timeout
javax.ejb.EJBTransactionRolledbackException: Transaction rolled back; nested exception is: javax.transaction.TransactionRolledbackException: Transaction is ended due to timeout
javax.transaction.TransactionRolledbackException: Transaction is ended due to timeout
at com.ibm.tx.jta.impl.EmbeddableTranManagerImpl.completeTxTimeout(EmbeddableTranManagerImpl.java:62)
at com.ibm.tx.jta.impl.EmbeddableTranManagerSet.completeTxTimeout(EmbeddableTranManagerSet.java:85)
at com.ibm.ejs.csi.TransactionControlImpl.completeTxTimeout(TransactionControlImpl.java:1347)
at com.ibm.ejs.csi.TranStrategy.postInvoke(TranStrategy.java:273)
at com.ibm.ejs.csi.TransactionControlImpl.postInvoke(TransactionControlImpl.java:579)
at com.ibm.ejs.container.EJSContainer.postInvoke(EJSContainer.java:4874)
at sk.sits.upsvar.server.ejb.entitymanagers.EJSLocal0SLDokumentManagerImpl_18dd4eb4.findAllDokumentPripadByCriteriaMap(EJSLocal0SLDokumentManagerImpl_18dd4eb4.java)
at sk.sits.upsvar.server.ejb.DataAccessServiceImpl.executeDokumentCmd(DataAccessServiceImpl.java:621)
at sk.sits.upsvar.server.ejb.DataAccessServiceImpl.executeCmd(DataAccessServiceImpl.java:220)
at sk.sits.upsvar.server.ejb.EJSLocal0SLDataAccessServiceImpl_6e5b0656.executeCmd(EJSLocal0SLDataAccessServiceImpl_6e5b0656.java)
at sk.sits.upsvar.server.ejb.CommandDispatcherImpl.processSoloCommand(CommandDispatcherImpl.java:222)
at sk.sits.upsvar.server.ejb.CommandDispatcherImpl._processCommand(CommandDispatcherImpl.java:151)
at sk.sits.upsvar.server.ejb.CommandDispatcherImpl.processCommand(CommandDispatcherImpl.java:100)
at sk.sits.upsvar.server.ejb.EJSLocal0SLCommandDispatcherImpl_b974dd5c.processCommand(EJSLocal0SLCommandDispatcherImpl_b974dd5c.java)
at sk.sits.upsvar.server.ejb.SszServiceImpl.process(SszServiceImpl.java:146)
at sk.sits.upsvar.server.ejb.EJSRemote0SLSszService_8e2ee81c.process(EJSRemote0SLSszService_8e2ee81c.java)
at sk.sits.upsvar.server.ejb._EJSRemote0SLSszService_8e2ee81c_Tie.process(_EJSRemote0SLSszService_8e2ee81c_Tie.java)
at sk.sits.upsvar.server.ejb._EJSRemote0SLSszService_8e2ee81c_Tie._invoke(_EJSRemote0SLSszService_8e2ee81c_Tie.java)
at com.ibm.CORBA.iiop.ServerDelegate.dispatchInvokeHandler(ServerDelegate.java:678)
at com.ibm.CORBA.iiop.ServerDelegate.dispatch(ServerDelegate.java:525)
at com.ibm.rmi.iiop.ORB.process(ORB.java:576)
at com.ibm.CORBA.iiop.ORB.process(ORB.java:1578)
at com.ibm.rmi.iiop.Connection.doRequestWork(Connection.java:3076)
at com.ibm.rmi.iiop.Connection.doWork(Connection.java:2946)
at com.ibm.rmi.iiop.WorkUnitImpl.doWork(WorkUnitImpl.java:64)
at com.ibm.ejs.oa.pool.PooledThread.run(ThreadPool.java:118)
at com.ibm.ws.util.ThreadPool$Worker.run(ThreadPool.java:1700)
javax.ejb.EJBTransactionRolledbackException: Transaction rolled back; nested exception is: javax.transaction.TransactionRolledbackException: Transaction is ended due to timeout
Caused by: javax.transaction.TransactionRolledbackException: Transaction is ended due to timeout
at com.ibm.tx.jta.impl.EmbeddableTranManagerImpl.completeTxTimeout(EmbeddableTranManagerImpl.java:62)
at com.ibm.tx.jta.impl.EmbeddableTranManagerSet.completeTxTimeout(EmbeddableTranManagerSet.java:85)
at com.ibm.ejs.csi.TransactionControlImpl.completeTxTimeout(TransactionControlImpl.java:1347)
at com.ibm.ejs.csi.TranStrategy.postInvoke(TranStrategy.java:273)
at com.ibm.ejs.csi.TransactionControlImpl.postInvoke(TransactionControlImpl.java:579)
at com.ibm.ejs.container.EJSContainer.postInvoke(EJSContainer.java:4874)
at sk.sits.upsvar.server.ejb.entitymanagers.EJSLocal0SLDokumentManagerImpl_18dd4eb4.findAllDokumentPripadByCriteriaMap(EJSLocal0SLDokumentManagerImpl_18dd4eb4.java)
at sk.sits.upsvar.server.ejb.DataAccessServiceImpl.executeDokumentCmd(DataAccessServiceImpl.java:621)
at sk.sits.upsvar.server.ejb.DataAccessServiceImpl.executeCmd(DataAccessServiceImpl.java:220)
at sk.sits.upsvar.server.ejb.EJSLocal0SLDataAccessServiceImpl_6e5b0656.executeCmd(EJSLocal0SLDataAccessServiceImpl_6e5b0656.java)
at sk.sits.upsvar.server.ejb.CommandDispatcherImpl.processSoloCommand(CommandDispatcherImpl.java:222)
at sk.sits.upsvar.server.ejb.CommandDispatcherImpl._processCommand(CommandDispatcherImpl.java:151)
at sk.sits.upsvar.server.ejb.CommandDispatcherImpl.processCommand(CommandDispatcherImpl.java:100)
at sk.sits.upsvar.server.ejb.EJSLocal0SLCommandDispatcherImpl_b974dd5c.processCommand(EJSLocal0SLCommandDispatcherImpl_b974dd5c.java)
at sk.sits.upsvar.server.ejb.SszServiceImpl.process(SszServiceImpl.java:146)
at sk.sits.upsvar.server.ejb.EJSRemote0SLSszService_8e2ee81c.process(EJSRemote0SLSszService_8e2ee81c.java)
at sk.sits.upsvar.server.ejb._EJSRemote0SLSszService_8e2ee81c_Tie.process(_EJSRemote0SLSszService_8e2ee81c_Tie.java)
at sk.sits.upsvar.server.ejb._EJSRemote0SLSszService_8e2ee81c_Tie._invoke(_EJSRemote0SLSszService_8e2ee81c_Tie.java)
at com.ibm.CORBA.iiop.ServerDelegate.dispatchInvokeHandler(ServerDelegate.java:678)
at com.ibm.CORBA.iiop.ServerDelegate.dispatch(ServerDelegate.java:525)
at com.ibm.rmi.iiop.ORB.process(ORB.java:576)
at com.ibm.CORBA.iiop.ORB.process(ORB.java:1578)
at com.ibm.rmi.iiop.Connection.doRequestWork(Connection.java:3076)
at com.ibm.rmi.iiop.Connection.doWork(Connection.java:2946)
at com.ibm.rmi.iiop.WorkUnitImpl.doWork(WorkUnitImpl.java:64)
at com.ibm.ejs.oa.pool.PooledThread.run(ThreadPool.java:118)
at com.ibm.ws.util.ThreadPool$Worker.run(ThreadPool.java:1700)
It's parsed by lines and i need it parsed together. I don't know if there is some character which is dividing them.
I tried these patterns:
pattern => "%{DATESTAMP} %{WORD:zone}]"
pattern => "^\["
pattern => "\A"
And a lot more i don't remember them all. Can someone who faced this problem help me.
Thanks you a lot.
Here is my full configuration.
input {
file {
path => "D:\Log\Logstash\testlog.log"
type => "LOG"
start_position => "beginning"
codec => plain {
charset => "ISO-8859-1"
}
codec => multiline {
pattern => "^\A%{SYSLOG5424SD}"
negate => true
what => previous
}
}
}
filter {
grok{
match => [ "message",".*exception.*"]
add_tag => "exception"
}
mutate{
remove_tag => "_grokparsefailure"
}
grok {
match => [ "message","%{DATESTAMP} %{WORD:}] %{WORD:} %{WORD:}\s* W"]
add_tag => "Warning"
remove_tag => "_grokparsefailure"
}
grok {
match => [ "message","%{DATESTAMP} %{WORD:}] %{WORD:} %{WORD:}\s* F"]
add_tag => "Fatal"
remove_tag => "_grokparsefailure"
}
grok {
match => [ "message","%{DATESTAMP} %{WORD:}] %{WORD:} %{WORD:}\s* O"]
add_tag => "Message"
remove_tag => "_grokparsefailure"
}
grok {
match => [ "message","%{DATESTAMP} %{WORD:}] %{WORD:} %{WORD:}\s* C"]
add_tag => "Config"
remove_tag => "_grokparsefailure"
}
#if ("Warning" not in [tags]) {
grok {
match => [ "message","%{DATESTAMP} %{WORD:}] %{WORD:} %{WORD:}\s* E"]
add_tag => "Error"
remove_tag => "_grokparsefailure"
}
#}else {
grok {
match => [ "message","%{DATESTAMP} %{WORD:}] %{WORD:} %{WORD: }\s* I"]
add_tag => "Info"
}
#}
grok {
match => [ "message", "%{DATESTAMP} %{WORD:zone}] %{WORD:ID} %{WORD:CLASS}\s* . (.*\s){0,}%{GREEDYDATA:OBSAH}" ]
remove_tag => "_grokparsefailure"
}
grok {
match => [ "message", "%{DATESTAMP} %{WORD:zone}] %{WORD:ID} %{WORD:CLASS}\s* . (.*\s){0,}%{WORD:WAS_CODE}:%{GREEDYDATA:OBSAH}" ]
#"message","%{DATESTAMP} %{WORD:zone}] %{WORD:ID} %{WORD:CLASS}\s* W \s*\[SID:%{WORD:ISSZSID}]%{GREEDYDATA:OBSAH}"]
remove_tag => "_grokparsefailure"
add_tag => "was_error"
}
if ("was_error" not in [tags]) {
grok {
match => [ "message","%{DATESTAMP} %{WORD:zone}] %{WORD:ID} %{WORD:CLASS}\s* . \s*\[SID:%{WORD:ISSZSID}]%{GREEDYDATA:OBSAH}" ]
remove_tag => "_grokparsefailure"
}
if "_grokparsefailure" not in [tags] {
if [ISSZSID] != "null" {
mutate{
add_tag => "ISSZwithID"
remove_tag => "_grokparsefailure"
}
} else {
mutate{
add_tag => "ISSZnull"
remove_tag => "_grokparsefailure"
}
}
}
}
}
output {
if "_grokparsefailure" not in [tags] {
elasticsearch {
hosts => ["127.0.0.1:9200"]
#protocol => "http"
}
}
stdout {}
}
as assumed using multiline as a codec along with another codec is rather not what it was intended for. I'd rather use it as single codec or as filter.
Transform your configuration into this and you will get the results you look for:
input {
file {
path => "D:\Log\Logstash\testlog.log"
type => "LOG"
start_position => "beginning"
codec => plain { charset => "ISO-8859-1" }
}
}
filter {
multiline {
pattern => "^\A%{SYSLOG5424SD}"
negate => true
what => previous
}
# ... all other filters
}
output {
# your output definitions
}
A famous multiline parsing example is the one from Jordan Sissle on MySQL Log parsing: https://gist.github.com/jordansissel/3753353
Cheers

logstash target of repeat operator is not specified:

I want to use logstash to ship my logs,so I download the logstash1.5.0 rc2 and run it in ubuntu use the command :
bin/logstash -f test.conf
then the console show the error:
The error reported is:
**target of repeat operator is not specified:** /;(?<Args:method>*);(?<INT:traceid:int>(?:[+-]?(?:[0-9]+)));(?<INT:sTime:int>(?:[+-]?(?:[0-9]+)));(?<INT:eTime:int>(?:[+-]?(?:[0-9]+)));(?<HOSTNAME:hostname>\b(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:\.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*(\.?|\b));(?<INT:eoi:int>(?:[+-]?(?:[0-9]+)));(?<INT:ess:int>(?:[+-]?(?:[0-9]+)));(?<Args:args>*)/m
I don't know how to solve this error,may be you can help me.
my test.conf is as follow:
input { stdin { } }
filter {
grok {
match => ["message" , "%{INT:type}"]}
if [type]=="10" {
grok {
patterns_dir => "./patterns"
match => ["message" , ";%{Args:method};%{INT:traceid:int};%{INT:sTime:int};%{INT:eTime:int};%{HOSTNAME:hostname};%{INT:eoi:int};%{INT:ess:int};%{Args:args}"]
}
date {
match => [ "sTime" , "UNIX_MS" ]
}
ruby {
code => "event['duration'] = event['eTime'] - event['sTime']"
}
}
if [type] =~ /3[1-6]/ {
grok {
patterns_dir => "./patterns"
match => [ "message" , ";%{Args:method};%{Args:sessionid};%{INT:traceID:int};%{INT:sTime:int};%{INT:eTime:int};%{HOSTNAME:hostname};%{INT:eoi:int};%{INT:ess:int};URL{%{HttpField:url}};RequestHeader{%{HttpField:ReqHeader}};RequestPara{%{HttpField:ReqPara}};RequestAttr{%{HttpField:ReqAttr}};SessionAttr{%{HttpField:SessionAttr}};ResponseHeader{%{HttpField:ResHeader}}"]
}
kv {
source => "ReqHeader"
field_split => ";"
value_split => ":"
target => "ReqHeader"
}
kv {
source => "ResHeader"
field_split => ";"
value_split => ":"
target => "ResHeader"
}
date {
match => [ "sTime" , "UNIX_MS" ]
}
ruby {
code => "event['duration'] = event['eTime'] - event['sTime']"
}
}
if [type] == "30" {
grok {
patterns_dir => "./patterns"
match => [ "message" ,";%{Args:method};%{Args:sessionid};%{INT:traceID:int};%{INT:sTime:int};%{INT:eTime:int};%{HOSTNAME:hostname};%{INT:eoi:int};%{INT:ess:int};URL{%{HttpField:url}};RequestHeader{%{HttpField:ReqHeader}};RequestPara{%{HttpField:ReqPara}};RequestAttr{%{HttpField:ReqAttr}};SessionAttr{%{HttpField:SessionAttr}}"
]
}
kv {
source => "ReqHeader"
field_split => ";"
value_split => ":"
target => "ReqHeader"
}
kv {
source => "ResHeader"
field_split => ";"
value_split => ":"
target => "ResHeader"
}
date {
match => [ "sTime" , "UNIX_MS" ]
}
ruby {
code => "event['duration'] = event['eTime'] - event['sTime']"
}
}
if [type]=="20" {
grok {
patterns_dir => "./patterns"
match => [ "message" , ";%{Args:method};%{INT:traceID:int};%{INT:sTime:int};%{INT:eTime:int};%{HOSTNAME:hostname};%{INT:eoi:int};%{INT:ess:int};%{INT:mtype};%{Args:DBUrl}"]
}
date {
match => [ "sTime" , "UNIX_MS" ]
}
ruby {
code => "event['duration'] = event['eTime'] - event['sTime']"
}
}
if [type]=="21" {
grok {
patterns_dir => "./patterns"
match => [ "message" , ";%{Args:method};%{INT:traceID:int};%{INT:sTime:int};%{INT:eTime:int};%{HOSTNAME:hostname};%{INT:eoi:int};%{INT:ess:int};%{INT:mtype};%{Args:sql};%{Args:bindVariables}"]
}
date {
match => [ "sTime" , "UNIX_MS" ]
}
ruby {
code => "event['duration'] = event['eTime'] - event['sTime']"
}
}
if [type]=="12" {
grok {
patterns_dir => "./patterns"
match => [ "message" , ";%{Args:method};%{INT:traceID:int};%{INT:sTime:int};%{INT:eTime:int};%{HOSTNAME:hostname};%{INT:eoi:int};%{INT:ess:int};%{Args:logStack}"]
}
date {
match => [ "sTime" , "UNIX_MS" ]
}
ruby {
code => "event['duration'] = event['eTime'] - event['sTime']"
}
}
if [type]=="11" {
grok {
patterns_dir => "./patterns"
match => [ "message" , ";%{Args:method};%{INT:traceID};%{INT:sTime:int};%{INT:eTime:int};%{HOSTNAME:hostname};%{INT:eoi:int};%{INT:ess:int};%{Args:errorStack}"]
}
date {
match => [ "sTime" , "UNIX_MS" ]
}
ruby {
code => "event['duration'] = event['eTime'] - event['sTime']"
}
}
if [type]=="50" {
grok {
patterns_dir => "./patterns"
match => [ "message" , ";%{INT:sTime:int};%{HOSTNAME:host};%{Args:JVMName};%{Args:GCName};%{INT:count:int};%{INT:time:int}"]
}
date {
match => [ "sTime" , "UNIX_MS" ]
}
}
if [type]=="51" {
grok {
patterns_dir => "./patterns"
match => [ "message" , ";%{INT:sTime:int};%{HOSTNAME:host};%{Args:JVMName};%{INT:maxheap};%{INT:currentheap};%{INT:commitheap};%{INT:iniheap};%{INT:maxnonheap};%{INT:currentnonheap};%{INT:commitnonheap};%{INT:ininonheap}"]
}
date {
match => [ "sTime" , "UNIX_MS" ]
}
}
if [type]=="52" {
grok {
patterns_dir => "./patterns"
match => [ "message" , ";%{INT:sTime:int};%{HOSTNAME:host};%{Args:JVMName};%{Args:iniloadedclasses};%{Args:currentloadedclasses};%{Args:iniunloadedclasses}"]
}
date {
match => [ "sTime" , "UNIX_MS" ]
}
}
}
output {
elasticsearch { host => "127.2.96.1"
protocol => "http"
port => "8080" }
stdout { codec => rubydebug
}
}

How would you use map reduce on this document structure?

If I wanted to count foobar.relationships.friend.count, how would I use map/reduce against this document structure so the count will equal 22.
[
[0] {
"rank" => nil,
"profile_id" => 3,
"20130913" => {
"foobar" => {
"relationships" => {
"acquaintance" => {
"count" => 0
},
"friend" => {
"males_count" => 0,
"ids" => [],
"females_count" => 0,
"count" => 10
}
}
}
},
"20130912" => {
"foobar" => {
"relationships" => {
"acquaintance" => {
"count" => 0
},
"friend" => {
"males_count" => 0,
"ids" => [
[0] 77,
[1] 78,
[2] 79
],
"females_count" => 0,
"count" => 12
}
}
}
}
}
]
In JavaScript this query get you the result you expect
r.db('test').table('test').get(3).do( function(doc) {
return doc.keys().map(function(key) {
return r.branch(
doc(key).typeOf().eq('OBJECT'),
doc(key)("foobar")("relationships")("friend")("count").default(0),
0
)
}).reduce( function(left, right) {
return left.add(right)
})
})
In Ruby, it should be
r.db('test').table('test').get(3).do{ |doc|
doc.keys().map{ |key|
r.branch(
doc.get_field(key).typeOf().eq('OBJECT'),
doc.get_field(key)["foobar"]["relationships"]["friend"]["count"].default(0),
0
)
}.reduce{ |left, right|
left+right
}
}
I would also tend to think that the schema you use is not really adapted, it would be better to use something like
{
rank: null
profile_id: 3
people: [
{
id: 20130913,
foobar: { ... }
},
{
id: 20130912,
foobar: { ... }
}
]
}
Edit: A simpler way to do it without using r.branch is just to remove the fields that are not objects with the without command.
Ex:
r.db('test').table('test').get(3).without('rank', 'profile_id').do{ |doc|
doc.keys().map{ |key|
doc.get_field(key)["foobar"]["relationships"]["friend"]["count"].default(0)
}.reduce{ |left, right|
left+right
}
}.run
I think you will need your own inputreader. This site gives you a tutorial how it can be done: http://bigdatacircus.com/2012/08/01/wordcount-with-custom-record-reader-of-textinputformat/
Then you run mapreduce with a mapper
Mapper<LongWritable, ClassRepresentingMyRecords, Text, IntWritable>
In your map function you extract the value for count and emit this is the value. Not sure if you need a key?
In the reducer you add together all the elements with the same key (='count' in your case).
This should get you on your way I think.