elasticsearch/logstash regex - get the value after a specific key [duplicate] - regex

I have a logfile which looks like this ( simplified)
Logline sample
MyLine data={"firstname":"bob","lastname":"the builder"}
I'd like to extract the json contained in data and create two fields, one for firstname, one for last. However, the ouput i get is this:
{"message":"Line data={\"firstname\":\"bob\",\"lastname\":\"the builder\"}\r","#version":"1","#timestamp":"2015-11-26T11:38:56.700Z","host":"xxx","path":"C:/logstashold/bin/input.txt","MyWord":"Line","parsedJson":{"firstname":"bob","lastname":"the builder"}}
As you can see
..."parsedJson":{"firstname":"bob","lastname":"the builder"}}
That's not what I need, I need to create fields for firstname and lastname in kibana, but logstash isn't extracting the fields out with the json filter.
LogStash Config
input {
file {
path => "C:/logstashold/bin/input.txt"
}
}
filter {
grok {
match => { "message" => "%{WORD:MyWord} data=%{GREEDYDATA:request}"}
}
json{
source => "request"
target => "parsedJson"
remove_field=>["request"]
}
}
output {
file{
path => "C:/logstashold/bin/output.txt"
}
}
Any help greatly appreciated, I'm sure I'm missing out something simple
Thanks

After your json filter add another one called mutate in order to add the two fields that you would take from the parsedJson field.
filter {
...
json {
...
}
mutate {
add_field => {
"firstname" => "%{[parsedJson][firstname]}"
"lastname" => "%{[parsedJson][lastname]}"
}
}
}
For your sample log line above that would give:
{
"message" => "MyLine data={\"firstname\":\"bob\",\"lastname\":\"the builder\"}",
"#version" => "1",
"#timestamp" => "2015-11-26T11:54:52.556Z",
"host" => "iMac.local",
"MyWord" => "MyLine",
"parsedJson" => {
"firstname" => "bob",
"lastname" => "the builder"
},
"firstname" => "bob",
"lastname" => "the builder"
}

Related

CakePHP 3.x - save not work in associate table

I have below table structure.
Mandator table
class MandatorTable extends Table
{
public function initialize(array $config)
{
$this->table('mandators');
$this->belongsToMany('Seminar', [
'foreignKey' => 'mandator_id',
'targetForeignKey' => 'seminar_id',
'joinTable' => 'mandators_seminars'
]);
}
}
Semiar table
class SeminarTable extends Table
{
public function initialize(array $config)
{
$this->table('seminars');
$this->belongsToMany('Mandator', [
'foreignKey' => 'seminar_id',
'targetForeignKey' => 'mandator_id',
'joinTable' => 'mandators_seminars'
]);
}
}
both table are belong to 'mandators_seminars' table
mandator_id, seminar_id
When I save data it's save in seminar table but not in 'mandators_seminars' table
Query
$seminartable = $this->Seminar->newEntity();
$this->request->data['mandator'][0] = 1;
$seminardata = $this->Seminar->patchEntity($seminartable, $this->request->data);
$this->Seminar->save($seminardata)
Request data
Array
(
[bookable] => test
[released] => aaa
[linkable] => bb
[name] => ccc
[internalnote] => ddd
[abstract] => ttt
[description] => ddd
[Category] => 14
[mandator] => Array
(
[0] => 1
)
[mandator_owner_id] => 1
)
Look you have two table Mandator and Similar as singular , but your connecting table is plural. Firstly check this. If there is still a problem check this CakePHP Through Association
As you can see your association should be like this:
$this->belongsToMany('Mandator', [
'foreignKey' => 'seminar_id',
'targetForeignKey' => 'mandator_id',
'through' => 'PluginName.MandatorsSeminars'
'joinTable' => 'mandators_seminars',
'className' => 'PluginName.Mandator'
]);
And one more tip: table should be called as plural.

How to format Message in logstash before sending HTTP request

I am using logstash to parse log entries from an input log file.
LogLine:
TID: [0] [] [2016-05-30 23:02:02,602] INFO {org.wso2.carbon.registry.core.jdbc.EmbeddedRegistryService} - Configured Registry in 572ms {org.wso2.carbon.registry.core.jdbc.EmbeddedRegistryService}
Grok Pattern:
TID:%{SPACE}\[%{INT:SourceSystemId}\]%{SPACE}\[%{DATA:ProcessName}\]%{SPACE}\[%{TIMESTAMP_ISO8601:TimeStamp}\]%{SPACE}%{LOGLEVEL:MessageType}%{SPACE}{%{JAVACLASS:MessageTitle}}%{SPACE}-%{SPACE}%{GREEDYDATA:Message}
The Grok pattern is working fine. Now I want to send the output of this parsing in an transformed way to my rest service.
Expected Output:
{
"MessageId": "654656",
"TimeStamp": "2001-12-31T12:00:00",
"CorrelationId": "986565",
"Severity": "NORMAL",
"MessageType": "INFO",
"MessageTitle": "TestTittle",
"Message": "Sample Message",
"MessageDetail": {
"SourceSystemId": "65656",
"ServerIP": "192.168.1.1",
"HostName": "wedev.101",
"ProcessId": "986",
"ProcessName": "JAVA",
"ThreadId": "65656",
"MessageComponentName": "TestComponent"
}
}
Problem Statement:
I want that the json message that is sent to my rest based service should be in the above mentioned format.Is it possible in the logstash that I can also add some hard codded values and use that values that I am getting through parsing the logs.
Following is my logstash-conf file:
input {
file {
path => "C:\WSO2Environment\wso2esb-4.8.1\repository\logs\wso2carbon.log"
type => "wso2"
codec => multiline {
charset => "UTF-8"
multiline_tag => "multiline"
negate => true
pattern => "^%{YEAR}\s%{MONTH}\s%{MONTHDAY}\s%{TIME}:\d{3}\s%{LOGLEVEL}"
what => "previous"
}
}
}
filter {
if [type] == "wso2" {
grok {
match => [ "message", "TID:%{SPACE}\[%{INT:SourceSystemId}\]%{SPACE}\[%{DATA:ProcessName}\]%{SPACE}\[%{TIMESTAMP_ISO8601:TimeStamp}\]%{SPACE}%{LOGLEVEL:MessageType}%{SPACE}{%{JAVACLASS:MessageTitle}}%{SPACE}-%{SPACE}%{GREEDYDATA:Message}" ]
add_tag => [ "grokked" ]
}
if !( "_grokparsefailure" in [tags] ) {
date {
match => [ "log_timestamp", "yyyy MMM dd HH:mm:ss:SSS" ]
add_tag => [ "dated" ]
}
}
}
if ( "multiline" in [tags] ) {
grok {
match => [ "message", "Service:(?<log_service>\s[\w]+)[.\W]*Operation:(?<log_operation>\s[\w]+)" ]
add_tag => [ "servicedetails" ]
tag_on_failure => [ "noservicedetails" ]
}
}
}
output {
# stdout { }
http {
url => "http://localhost:8087/messages"
http_method => "post"
format => "json"
}
}
Note:
I still have to configure the multiline format, so please ignore that part in my logstash configuration file.
To add fields to an event, possibly including data parsed from the event, you will probably want to use the add_field functionality that most Logstash filters implement.
The easiest way to do this would be by adding a mutate filter with any add_field functions that you wanted.
mutate {
add_field => {
"foo_%{somefield}" => "Hello world, from %{host}"
}
}
Here's the official reference

logstash-filter not honoring the regex

I am reading files as input and thenafter pass it to be filtered, and accordingly based on the [type] the if/else for output (stdout) follows.
here is the conf part :
filter {
if [path] =~ "error" {
mutate {
replace => { "type" => "ERROR_LOGS"}
}
grok {
match => {"error_examiner" => "%{GREEDYDATA:err}"}
}
if [err] =~ "9999" {
if [err] =~ "invalid offset" {
mutate {
replace => {"type" => "DISCARDED_ERROR_LOGS"}
}
grok {
match => {"message" => "\[%{DATA:date}\] \[%{WORD:logtype} \] \[%{IPORHOST:ip}\]->\[http://search:9999/%{WORD:searchORsuggest}/%{DATA:askme_tag}\?%{DATA:paramstr}\] \[%{DATA:reason}\]"}
}
date {
match => [ "date", "YYYY-MM-DD aaa HH:mm:ss" ]
locale => en
}
geoip {
source => "ip"
target => "geo_ip"
}
kv {
source => "paramstr"
trimkey => "&\?\[\],"
value_split => "="
target => "params"
}
}
else {
mutate {
replace => {"type" => "ACCEPTED_ERROR_LOGS"}
}
grok {
match => {
"message" => "\[%{DATA:date}\] \[%{WORD:logtype} \] \[%{WORD:uptime}\/%{NUMBER:downtime}\] \[%{IPORHOST:ip}\]->\[http://search:9999/%{WORD:searchORsuggest}\/%{DATA:askme_tag}\?%{DATA:paramstr}\]"
}
}
date {
match => [ "date" , "YYYY-MM-DD aaa HH:mm:ss" ]
locale => en
}
geoip {
source => "ip"
target => "geo_ip"
}
kv {
source => "paramstr"
trimkey => "&\?\[\],"
value_split => "="
target => "params"
}
}
}
else if [err] =~ "Exception" {
mutate {
replace => {"type" => "EXCEPTIONS_IN_ERROR_LOGS"}
}
grok {
match => { "message" => "%{GREEDYDATA}"}
}
}
}
else if [path] =~ "info" {
mutate {
replace => {"type" => "INFO_LOGS"}
}
grok {
match => {
"info_examiner" => "%{GREEDYDATA:info}"
}
}
if [info] =~ "9999" {
mutate {
replace => {"type" => "ACCEPTED_INFO_LOGS"}
}
grok {
match => {
"message" => "\[%{DATA:date}\] \[%{WORD:logtype} \] \[%{WORD:uptime}\/%{NUMBER:downtime}\]( \[%{WORD:qtype}\])?( \[%{NUMBER:outsearch}/%{NUMBER:insearch}\])? \[%{IPORHOST:ip}\]->\[http://search:9999/%{WORD:searchORsuggest}/%{DATA:askme_tag}\?%{DATA:paramstr}\]"
}
}
date {
match => [ "date" , "YYYY-MM-DD aaa HH:mm:ss" ]
locale => en
}
geoip {
source => "ip"
target => "geo_ip"
}
kv {
source => "paramstr"
trimkey => "&\?\[\],"
value_split => "="
target => "params"
}
}
else {
mutate {replace => {"type" => "DISCARDED_INFO_LOGS"}}
grok {
match => {"message" => "%{GREEDYDATA}"}
}
}
}
}
the grok regexps I have tested to be working http://grokdebug.herokuapp.com/
however, what's not working is this part :
grok {
match => {"error_examiner" => "%{GREEDYDATA:err}"}
}
if [err] =~ "9999" {
I was wondering what's wrong in there ???
Actually, I have fixed it. Here is what I'd like to share with other fellows, of what I learnt while initial experiments with logstash, for the documentation and other resources aren't so very telling ...
"error_examiner" or "info_examiner" wont work, parse the instance/event row in "message"
geoip doesnt work for internal ips.
kv, for this you must specify the field_split and value_split if they aren't like a=1 b=2 , but say a:1&b:2 then field_Split is &, value_split is :
stdout, by default badly prepends if codec chosen is json, please choose rubydebug.
Thanks,

Logstash Multiline filter for websphere/java logs

Hello i have problem with my logstash multiline configuration. I'm parsing websphere/java logs and multiline don't work on some cases of logs.
My multiline configuration looks like this. I tried several types of regex but no one worked.
codec => multiline {
pattern => "^\A%{SYSLOG5424SD}"
negate => true
what => previous
}
This is example of log that is not parsed in right way:
[1.6.2015 15:02:46:635 CEST] 00000109 BusinessExcep E CNTR0020E: EJB threw an unexpected (non-declared) exception during invocation of method "processCommand" on bean "BeanId(Issz_Produkcia_2.1.63#Ssz_Server_EJB.jar#CommandDispatcherImpl, null)". Exception data: javax.ejb.EJBTransactionRolledbackException: Transaction rolled back; nested exception is: javax.ejb.EJBTransactionRolledbackException: Transaction rolled back; nested exception is: javax.transaction.TransactionRolledbackException: Transaction is ended due to timeout
javax.ejb.EJBTransactionRolledbackException: Transaction rolled back; nested exception is: javax.transaction.TransactionRolledbackException: Transaction is ended due to timeout
javax.transaction.TransactionRolledbackException: Transaction is ended due to timeout
at com.ibm.tx.jta.impl.EmbeddableTranManagerImpl.completeTxTimeout(EmbeddableTranManagerImpl.java:62)
at com.ibm.tx.jta.impl.EmbeddableTranManagerSet.completeTxTimeout(EmbeddableTranManagerSet.java:85)
at com.ibm.ejs.csi.TransactionControlImpl.completeTxTimeout(TransactionControlImpl.java:1347)
at com.ibm.ejs.csi.TranStrategy.postInvoke(TranStrategy.java:273)
at com.ibm.ejs.csi.TransactionControlImpl.postInvoke(TransactionControlImpl.java:579)
at com.ibm.ejs.container.EJSContainer.postInvoke(EJSContainer.java:4874)
at sk.sits.upsvar.server.ejb.entitymanagers.EJSLocal0SLDokumentManagerImpl_18dd4eb4.findAllDokumentPripadByCriteriaMap(EJSLocal0SLDokumentManagerImpl_18dd4eb4.java)
at sk.sits.upsvar.server.ejb.DataAccessServiceImpl.executeDokumentCmd(DataAccessServiceImpl.java:621)
at sk.sits.upsvar.server.ejb.DataAccessServiceImpl.executeCmd(DataAccessServiceImpl.java:220)
at sk.sits.upsvar.server.ejb.EJSLocal0SLDataAccessServiceImpl_6e5b0656.executeCmd(EJSLocal0SLDataAccessServiceImpl_6e5b0656.java)
at sk.sits.upsvar.server.ejb.CommandDispatcherImpl.processSoloCommand(CommandDispatcherImpl.java:222)
at sk.sits.upsvar.server.ejb.CommandDispatcherImpl._processCommand(CommandDispatcherImpl.java:151)
at sk.sits.upsvar.server.ejb.CommandDispatcherImpl.processCommand(CommandDispatcherImpl.java:100)
at sk.sits.upsvar.server.ejb.EJSLocal0SLCommandDispatcherImpl_b974dd5c.processCommand(EJSLocal0SLCommandDispatcherImpl_b974dd5c.java)
at sk.sits.upsvar.server.ejb.SszServiceImpl.process(SszServiceImpl.java:146)
at sk.sits.upsvar.server.ejb.EJSRemote0SLSszService_8e2ee81c.process(EJSRemote0SLSszService_8e2ee81c.java)
at sk.sits.upsvar.server.ejb._EJSRemote0SLSszService_8e2ee81c_Tie.process(_EJSRemote0SLSszService_8e2ee81c_Tie.java)
at sk.sits.upsvar.server.ejb._EJSRemote0SLSszService_8e2ee81c_Tie._invoke(_EJSRemote0SLSszService_8e2ee81c_Tie.java)
at com.ibm.CORBA.iiop.ServerDelegate.dispatchInvokeHandler(ServerDelegate.java:678)
at com.ibm.CORBA.iiop.ServerDelegate.dispatch(ServerDelegate.java:525)
at com.ibm.rmi.iiop.ORB.process(ORB.java:576)
at com.ibm.CORBA.iiop.ORB.process(ORB.java:1578)
at com.ibm.rmi.iiop.Connection.doRequestWork(Connection.java:3076)
at com.ibm.rmi.iiop.Connection.doWork(Connection.java:2946)
at com.ibm.rmi.iiop.WorkUnitImpl.doWork(WorkUnitImpl.java:64)
at com.ibm.ejs.oa.pool.PooledThread.run(ThreadPool.java:118)
at com.ibm.ws.util.ThreadPool$Worker.run(ThreadPool.java:1700)
javax.ejb.EJBTransactionRolledbackException: Transaction rolled back; nested exception is: javax.transaction.TransactionRolledbackException: Transaction is ended due to timeout
Caused by: javax.transaction.TransactionRolledbackException: Transaction is ended due to timeout
at com.ibm.tx.jta.impl.EmbeddableTranManagerImpl.completeTxTimeout(EmbeddableTranManagerImpl.java:62)
at com.ibm.tx.jta.impl.EmbeddableTranManagerSet.completeTxTimeout(EmbeddableTranManagerSet.java:85)
at com.ibm.ejs.csi.TransactionControlImpl.completeTxTimeout(TransactionControlImpl.java:1347)
at com.ibm.ejs.csi.TranStrategy.postInvoke(TranStrategy.java:273)
at com.ibm.ejs.csi.TransactionControlImpl.postInvoke(TransactionControlImpl.java:579)
at com.ibm.ejs.container.EJSContainer.postInvoke(EJSContainer.java:4874)
at sk.sits.upsvar.server.ejb.entitymanagers.EJSLocal0SLDokumentManagerImpl_18dd4eb4.findAllDokumentPripadByCriteriaMap(EJSLocal0SLDokumentManagerImpl_18dd4eb4.java)
at sk.sits.upsvar.server.ejb.DataAccessServiceImpl.executeDokumentCmd(DataAccessServiceImpl.java:621)
at sk.sits.upsvar.server.ejb.DataAccessServiceImpl.executeCmd(DataAccessServiceImpl.java:220)
at sk.sits.upsvar.server.ejb.EJSLocal0SLDataAccessServiceImpl_6e5b0656.executeCmd(EJSLocal0SLDataAccessServiceImpl_6e5b0656.java)
at sk.sits.upsvar.server.ejb.CommandDispatcherImpl.processSoloCommand(CommandDispatcherImpl.java:222)
at sk.sits.upsvar.server.ejb.CommandDispatcherImpl._processCommand(CommandDispatcherImpl.java:151)
at sk.sits.upsvar.server.ejb.CommandDispatcherImpl.processCommand(CommandDispatcherImpl.java:100)
at sk.sits.upsvar.server.ejb.EJSLocal0SLCommandDispatcherImpl_b974dd5c.processCommand(EJSLocal0SLCommandDispatcherImpl_b974dd5c.java)
at sk.sits.upsvar.server.ejb.SszServiceImpl.process(SszServiceImpl.java:146)
at sk.sits.upsvar.server.ejb.EJSRemote0SLSszService_8e2ee81c.process(EJSRemote0SLSszService_8e2ee81c.java)
at sk.sits.upsvar.server.ejb._EJSRemote0SLSszService_8e2ee81c_Tie.process(_EJSRemote0SLSszService_8e2ee81c_Tie.java)
at sk.sits.upsvar.server.ejb._EJSRemote0SLSszService_8e2ee81c_Tie._invoke(_EJSRemote0SLSszService_8e2ee81c_Tie.java)
at com.ibm.CORBA.iiop.ServerDelegate.dispatchInvokeHandler(ServerDelegate.java:678)
at com.ibm.CORBA.iiop.ServerDelegate.dispatch(ServerDelegate.java:525)
at com.ibm.rmi.iiop.ORB.process(ORB.java:576)
at com.ibm.CORBA.iiop.ORB.process(ORB.java:1578)
at com.ibm.rmi.iiop.Connection.doRequestWork(Connection.java:3076)
at com.ibm.rmi.iiop.Connection.doWork(Connection.java:2946)
at com.ibm.rmi.iiop.WorkUnitImpl.doWork(WorkUnitImpl.java:64)
at com.ibm.ejs.oa.pool.PooledThread.run(ThreadPool.java:118)
at com.ibm.ws.util.ThreadPool$Worker.run(ThreadPool.java:1700)
It's parsed by lines and i need it parsed together. I don't know if there is some character which is dividing them.
I tried these patterns:
pattern => "%{DATESTAMP} %{WORD:zone}]"
pattern => "^\["
pattern => "\A"
And a lot more i don't remember them all. Can someone who faced this problem help me.
Thanks you a lot.
Here is my full configuration.
input {
file {
path => "D:\Log\Logstash\testlog.log"
type => "LOG"
start_position => "beginning"
codec => plain {
charset => "ISO-8859-1"
}
codec => multiline {
pattern => "^\A%{SYSLOG5424SD}"
negate => true
what => previous
}
}
}
filter {
grok{
match => [ "message",".*exception.*"]
add_tag => "exception"
}
mutate{
remove_tag => "_grokparsefailure"
}
grok {
match => [ "message","%{DATESTAMP} %{WORD:}] %{WORD:} %{WORD:}\s* W"]
add_tag => "Warning"
remove_tag => "_grokparsefailure"
}
grok {
match => [ "message","%{DATESTAMP} %{WORD:}] %{WORD:} %{WORD:}\s* F"]
add_tag => "Fatal"
remove_tag => "_grokparsefailure"
}
grok {
match => [ "message","%{DATESTAMP} %{WORD:}] %{WORD:} %{WORD:}\s* O"]
add_tag => "Message"
remove_tag => "_grokparsefailure"
}
grok {
match => [ "message","%{DATESTAMP} %{WORD:}] %{WORD:} %{WORD:}\s* C"]
add_tag => "Config"
remove_tag => "_grokparsefailure"
}
#if ("Warning" not in [tags]) {
grok {
match => [ "message","%{DATESTAMP} %{WORD:}] %{WORD:} %{WORD:}\s* E"]
add_tag => "Error"
remove_tag => "_grokparsefailure"
}
#}else {
grok {
match => [ "message","%{DATESTAMP} %{WORD:}] %{WORD:} %{WORD: }\s* I"]
add_tag => "Info"
}
#}
grok {
match => [ "message", "%{DATESTAMP} %{WORD:zone}] %{WORD:ID} %{WORD:CLASS}\s* . (.*\s){0,}%{GREEDYDATA:OBSAH}" ]
remove_tag => "_grokparsefailure"
}
grok {
match => [ "message", "%{DATESTAMP} %{WORD:zone}] %{WORD:ID} %{WORD:CLASS}\s* . (.*\s){0,}%{WORD:WAS_CODE}:%{GREEDYDATA:OBSAH}" ]
#"message","%{DATESTAMP} %{WORD:zone}] %{WORD:ID} %{WORD:CLASS}\s* W \s*\[SID:%{WORD:ISSZSID}]%{GREEDYDATA:OBSAH}"]
remove_tag => "_grokparsefailure"
add_tag => "was_error"
}
if ("was_error" not in [tags]) {
grok {
match => [ "message","%{DATESTAMP} %{WORD:zone}] %{WORD:ID} %{WORD:CLASS}\s* . \s*\[SID:%{WORD:ISSZSID}]%{GREEDYDATA:OBSAH}" ]
remove_tag => "_grokparsefailure"
}
if "_grokparsefailure" not in [tags] {
if [ISSZSID] != "null" {
mutate{
add_tag => "ISSZwithID"
remove_tag => "_grokparsefailure"
}
} else {
mutate{
add_tag => "ISSZnull"
remove_tag => "_grokparsefailure"
}
}
}
}
}
output {
if "_grokparsefailure" not in [tags] {
elasticsearch {
hosts => ["127.0.0.1:9200"]
#protocol => "http"
}
}
stdout {}
}
as assumed using multiline as a codec along with another codec is rather not what it was intended for. I'd rather use it as single codec or as filter.
Transform your configuration into this and you will get the results you look for:
input {
file {
path => "D:\Log\Logstash\testlog.log"
type => "LOG"
start_position => "beginning"
codec => plain { charset => "ISO-8859-1" }
}
}
filter {
multiline {
pattern => "^\A%{SYSLOG5424SD}"
negate => true
what => previous
}
# ... all other filters
}
output {
# your output definitions
}
A famous multiline parsing example is the one from Jordan Sissle on MySQL Log parsing: https://gist.github.com/jordansissel/3753353
Cheers

How would you use map reduce on this document structure?

If I wanted to count foobar.relationships.friend.count, how would I use map/reduce against this document structure so the count will equal 22.
[
[0] {
"rank" => nil,
"profile_id" => 3,
"20130913" => {
"foobar" => {
"relationships" => {
"acquaintance" => {
"count" => 0
},
"friend" => {
"males_count" => 0,
"ids" => [],
"females_count" => 0,
"count" => 10
}
}
}
},
"20130912" => {
"foobar" => {
"relationships" => {
"acquaintance" => {
"count" => 0
},
"friend" => {
"males_count" => 0,
"ids" => [
[0] 77,
[1] 78,
[2] 79
],
"females_count" => 0,
"count" => 12
}
}
}
}
}
]
In JavaScript this query get you the result you expect
r.db('test').table('test').get(3).do( function(doc) {
return doc.keys().map(function(key) {
return r.branch(
doc(key).typeOf().eq('OBJECT'),
doc(key)("foobar")("relationships")("friend")("count").default(0),
0
)
}).reduce( function(left, right) {
return left.add(right)
})
})
In Ruby, it should be
r.db('test').table('test').get(3).do{ |doc|
doc.keys().map{ |key|
r.branch(
doc.get_field(key).typeOf().eq('OBJECT'),
doc.get_field(key)["foobar"]["relationships"]["friend"]["count"].default(0),
0
)
}.reduce{ |left, right|
left+right
}
}
I would also tend to think that the schema you use is not really adapted, it would be better to use something like
{
rank: null
profile_id: 3
people: [
{
id: 20130913,
foobar: { ... }
},
{
id: 20130912,
foobar: { ... }
}
]
}
Edit: A simpler way to do it without using r.branch is just to remove the fields that are not objects with the without command.
Ex:
r.db('test').table('test').get(3).without('rank', 'profile_id').do{ |doc|
doc.keys().map{ |key|
doc.get_field(key)["foobar"]["relationships"]["friend"]["count"].default(0)
}.reduce{ |left, right|
left+right
}
}.run
I think you will need your own inputreader. This site gives you a tutorial how it can be done: http://bigdatacircus.com/2012/08/01/wordcount-with-custom-record-reader-of-textinputformat/
Then you run mapreduce with a mapper
Mapper<LongWritable, ClassRepresentingMyRecords, Text, IntWritable>
In your map function you extract the value for count and emit this is the value. Not sure if you need a key?
In the reducer you add together all the elements with the same key (='count' in your case).
This should get you on your way I think.