Perl merging results of maps - list

How do I concatenate the output of two maps to form a single flat array?
I was attempting to use this:
my $test = { 'foo' => [
map {
{ 'i' => "$_" }
} 0..1,
map {
{ 'j' => "$_" }
} 0..1
] };
To achieve the result of this:
my $test = {'foo' => [
{
'i' => '0'
},
{
'i' => '1'
},
{
'j' => '0'
},
{
'j' => '1'
},
]}
However, this is what I got in $test instead:
{
'foo' => [
{
'i' => '0'
},
{
'i' => '1'
},
{
'i' => 'HASH(0x7f90ad19cd30)'
},
{
'i' => 'HASH(0x7f90ae200908)'
}
]
};
Looks like the result of the second map gets iterated over by the first map.

The list returned by the second map is a part of the input list for the first one, following 0..1,.
The parenthesis can fix that
use warnings;
use strict;
use Data::Dump;
my $test = {
'foo' => [
( map { { i => $_ } } 0..1 ),
( map { { j => $_ } } 0..1 )
],
};
dd($test);
since they delimit the expression so now the first map takes only 0..1 as its input list and computes and returns a list, which is then merged with the second map's returned list. (Strictly speaking you need parenthesis only around the first map.)
This prints
{ foo => [{ i => 0 }, { i => 1 }, { j => 0 }, { j => 1 }] }
I've removed unneeded quotes, please restore them if/as needed in your application.
Without parenthesis the map after the comma is taken to be a part of the expression generating the input list for the first map, so producing the next element(s) of the input list after 0..1, effectively
map { i => $_ } (0..1, LIST);
Consider
my #arr = (
map { { 'i', $_ } }
0..1,
qw(a b),
map { ( { 'j', $_ } ) } 0..1
);
dd(\#arr);
It prints
[
{ i => 0 },
{ i => 1 },
{ i => "a" },
{ i => "b" },
{ i => { j => 0 } },
{ i => { j => 1 } },
]
This is also seen from your output, where all keys are i (no js).

Related

How do I merge two hashes that start with different keys in Perl?

I have to hashes as follows:
my %hash1 = (
'modules' => {
'top0' => {
'instances' => {
'sub_top' => {
'instances' => {
'inst2' => 2,
'inst0' => 0
}
}
}
}
}
);
my %hash2 = (
'modules' => {
'sub_top' => {
'instances' => {
'inst0' => 0,
'inst1' => 1
}
}
}
); }
I need to merge these into one hash. I tried using Hash::Merge but that works only if I start from subtop onwards only.
my $merged_hash = merge(\%{$hash1{modules}{top0}{instances}}, \%{$hash2{modules}});
Dumping the resulting $merged_hash gives:
$VAR1 = {
'sub_top' => {
'instances' => {
'inst2' => 2,
'inst0' => 0,
'inst1' => 1
}
}
};
But I miss the top part of hash1:
'modules' => {
'top0' => {
'instances' => {
Desired hash after merging should be like this:
$VAR1 = {
'modules' => {
'top0' => {
'instances' => {
'sub_top' => {
'instances' => {
'inst2' => 2,
'inst0' => 0,
'inst1' => 1
}
}
}
}
}
};
One of possible ways to merge hashes on predefined key. See if it satisfies your expectation.
use strict;
use warnings;
use feature 'say';
use Data::Dumper;
my $debug = 0;
my %hash1 = (
'modules' => {
'top0' => {
'instances' => {
'sub_top' => {
'instances' => {
'inst2' => 2,
'inst0' => 0
}
}
}
}
}
);
my %hash2 = (
'modules' => {
'sub_top' => {
'instances' => {
'inst0' => 0,
'inst1' => 1
}
}
}
);
my $key = 'sub_top';
mergeOnKey(\%hash1, \%hash2, $key);
say Dumper(\%hash1);
sub mergeOnKey {
my $h1 = shift;
my $h2 = shift;
my $k = shift;
my $href1 = find_ref($h1,$k);
my $href2 = find_ref($h2,$k);
die 'Hash 1 no key was found' unless $href1;
die 'Hash 2 no key was found' unless $href2;
merge($href1,$href2);
}
sub merge {
my $href1 = shift;
my $href2 = shift;
foreach my $key (keys %{$href2}) {
if( ref $href2->{$key} eq ref {} ) {
merge($href1->{$key},$href2->{$key});
} else {
$href1->{$key} = $href2->{$key} unless defined $href1->{$key};
}
}
say Dumper($href1) if $debug;
}
sub find_ref {
my $href = shift;
my $key = shift;
while( my($k,$v) = each %{$href} ){
say Dumper($v) if $debug;
return $v if $k eq $key;
return find_ref($v,$key) if ref $v eq 'HASH';
}
return 0;
}
Output
$VAR1 = {
'modules' => {
'top0' => {
'instances' => {
'sub_top' => {
'instances' => {
'inst1' => 1,
'inst2' => 2,
'inst0' => 0
}
}
}
}
}
};
If you look at the Synopsis for Hash::Merge, the example shows two hashes that look practically identical in structure. The hashes that you want to merge may have many similarities, but they also have significant differences. So, I would only expect that you can only merge them where their structure is sufficiently "equivalent". Here is the most obvious way that I can think of to get them to merge to the resulting hash you indicate:
#!/bin/env perl
use strict;
use warnings;
use Test::More;
use Hash::Merge qw(merge);
my %hash1 = (
'modules' => {
'top0' => {
'instances' => {
'sub_top' => {
'instances' => {
'inst2' => 2,
'inst0' => 0
}
}
}
}
}
);
my %hash2 = (
'modules' => {
'sub_top' => {
'instances' => {
'inst0' => 0,
'inst1' => 1
}
}
}
);
# Merge the desired parts of the structure
$hash1{modules}{top0}{instances} =
merge( \%{ $hash1{modules}{top0}{instances} }, \%{ $hash2{modules} } );
# Demonstrate that merge worked as desired
my %desired = (
'modules' => {
'top0' => {
'instances' => {
'sub_top' => {
'instances' => {
'inst2' => 2,
'inst0' => 0,
'inst1' => 1
}
}
}
}
}
);
is_deeply( \%hash1, \%desired, 'Desired hash is created' );
done_testing();

Conditional Inside A Foreach Loop To Filter Value And Have A Fall Back

I have a strange problem and I can't get my head around it. I have tried loads of different ways to get the desired result but with no success. What I am looking for is a set of rules like fall backs if conditions are met in a loop. So this is what I have done so far but have tried multitude of different ways. I am always getting fallback as the match even though country, postcode and shipping method is defined. It is almost like it's ignoring these values.
If someone can point me to the right direction I would be grateful thanks.
$country = 'GB';
$postCode = "LE5";
$shippingMethod = "subscription_shipping";
$shippingMatrix = array(
array(
"country" => "GB",
"isPostCodeExcluded" => "LE5",
"shippingMethod" => "subscription_shipping",
"carrier" => "Royal Mail",
"carrierService" => "Royal Mail",
),
array(
"country" => "GB",
"isPostCodeExcluded" => false,
"shippingMethod" => "subscription_shipping",
"carrier" => "DHL",
"carrierService" => "DHL",
),
array(
"country" => false,
"isPostCodeExcluded" => false,
"shippingMethod" => "subscription_shipping",
"carrier" => "Fallback",
"carrierService" => "Fallback",
),
array(
"country" => "GB",
"isPostCodeExcluded" => false,
"shippingMethod" => "standard_delivery",
"carrier" => "DPD",
"carrierService" => "DPD",
),
);
$carriers = [];
foreach ($shippingMatrix as $matrix) {
// If only Shipping Method is matched then fall back will be the result
if ($shippingMethod === $matrix['shippingMethod']) {
$carriers = [
$matrix['carrier'],
$matrix['carrierService'],
];
// If only Shipping Method & Country is matched then fall back will be the result DHL
if ($country === $matrix['country']) {
$carriers = [
$matrix['carrier'],
$matrix['carrierService'],
];
// If only Shipping Method & Country & PostCode is matched then fall back will be the result Royal Mail
if ($postCode === $matrix['isPostCodeExcluded']) {
$carriers = [
$matrix['carrier'],
$matrix['carrierService'],
];
}
}
}
}
var_dump($carriers);
$carriers variable not used as an array notation and because of that
values are overridden. All you need to do to transform $carriers to $carriers[] .
Here is example below.
$carriers[] = [
$matrix['carrier'],
$matrix['carrierService'],
];
Thanks for the reply but this wasn't necessary. After a good old think I made the novice mistake of mixing my types up.
$carriers = [];
foreach ($shippingMatrix as $matrix) {
if ($shippingMethod === $matrix['shippingMethod'] && $country === $matrix['country']) {
if ($postCode === $matrix['isPostCodeExcluded']) {
$carriers = [
$matrix['carrier'],
$matrix['carrierService'],
];
}
}
if ($shippingMethod === $matrix['shippingMethod'] && $country !== $matrix['country'] && !$country) {
$carriers = [
$matrix['carrier'],
$matrix['carrierService'],
];
}
}

Logstash filter: compare time using regex

I use this code in logstash filter to compare time but don't work.
if [timecheck] =~ /.*((\[0\]\[0-6\]):\[0-5\]\[0-9\]:\[0-5\]\[0-9\])|((\[1\]\[2-9\]|2\[0-3\]):\[0-5\]\[0-9\]:\[0-5\]\[0-9\]).*/ {
mutate {
add_tag => "OVERTIME"
}
}
else if [timecheck] =~ /.+/ {
mutate {
add_tag => "WORKING-HOURS"
}
}
else {
mutate { add_tag => "NO-TIMECHECK-MATCH" }
}
logstash work but regex not match. Always enter in WORKING-HOURS because is not empty
(I try regex on regexr.com and work well)
Don't escape the square brackets.
if [timecheck] =~ /(([0][0-6]):[0-5][0-9]:[0-5][0-9])|(([1][8-9]|2[0-3]):[0-5][0-9]:[0-5][0-9])/ {
mutate {
add_tag => "OVERTIME"
add_field => { "time-work" => "OVERTIME" }
}
}
else if [timecheck] =~ /.+/ {
mutate {
add_tag => "WORKING-HOURS"
add_field => { "time-work" => "WORKING-HOURS" }
}
}
else {
mutate { add_tag => "NO-TIMECHECK-MATCH" }
}

logstash-filter not honoring the regex

I am reading files as input and thenafter pass it to be filtered, and accordingly based on the [type] the if/else for output (stdout) follows.
here is the conf part :
filter {
if [path] =~ "error" {
mutate {
replace => { "type" => "ERROR_LOGS"}
}
grok {
match => {"error_examiner" => "%{GREEDYDATA:err}"}
}
if [err] =~ "9999" {
if [err] =~ "invalid offset" {
mutate {
replace => {"type" => "DISCARDED_ERROR_LOGS"}
}
grok {
match => {"message" => "\[%{DATA:date}\] \[%{WORD:logtype} \] \[%{IPORHOST:ip}\]->\[http://search:9999/%{WORD:searchORsuggest}/%{DATA:askme_tag}\?%{DATA:paramstr}\] \[%{DATA:reason}\]"}
}
date {
match => [ "date", "YYYY-MM-DD aaa HH:mm:ss" ]
locale => en
}
geoip {
source => "ip"
target => "geo_ip"
}
kv {
source => "paramstr"
trimkey => "&\?\[\],"
value_split => "="
target => "params"
}
}
else {
mutate {
replace => {"type" => "ACCEPTED_ERROR_LOGS"}
}
grok {
match => {
"message" => "\[%{DATA:date}\] \[%{WORD:logtype} \] \[%{WORD:uptime}\/%{NUMBER:downtime}\] \[%{IPORHOST:ip}\]->\[http://search:9999/%{WORD:searchORsuggest}\/%{DATA:askme_tag}\?%{DATA:paramstr}\]"
}
}
date {
match => [ "date" , "YYYY-MM-DD aaa HH:mm:ss" ]
locale => en
}
geoip {
source => "ip"
target => "geo_ip"
}
kv {
source => "paramstr"
trimkey => "&\?\[\],"
value_split => "="
target => "params"
}
}
}
else if [err] =~ "Exception" {
mutate {
replace => {"type" => "EXCEPTIONS_IN_ERROR_LOGS"}
}
grok {
match => { "message" => "%{GREEDYDATA}"}
}
}
}
else if [path] =~ "info" {
mutate {
replace => {"type" => "INFO_LOGS"}
}
grok {
match => {
"info_examiner" => "%{GREEDYDATA:info}"
}
}
if [info] =~ "9999" {
mutate {
replace => {"type" => "ACCEPTED_INFO_LOGS"}
}
grok {
match => {
"message" => "\[%{DATA:date}\] \[%{WORD:logtype} \] \[%{WORD:uptime}\/%{NUMBER:downtime}\]( \[%{WORD:qtype}\])?( \[%{NUMBER:outsearch}/%{NUMBER:insearch}\])? \[%{IPORHOST:ip}\]->\[http://search:9999/%{WORD:searchORsuggest}/%{DATA:askme_tag}\?%{DATA:paramstr}\]"
}
}
date {
match => [ "date" , "YYYY-MM-DD aaa HH:mm:ss" ]
locale => en
}
geoip {
source => "ip"
target => "geo_ip"
}
kv {
source => "paramstr"
trimkey => "&\?\[\],"
value_split => "="
target => "params"
}
}
else {
mutate {replace => {"type" => "DISCARDED_INFO_LOGS"}}
grok {
match => {"message" => "%{GREEDYDATA}"}
}
}
}
}
the grok regexps I have tested to be working http://grokdebug.herokuapp.com/
however, what's not working is this part :
grok {
match => {"error_examiner" => "%{GREEDYDATA:err}"}
}
if [err] =~ "9999" {
I was wondering what's wrong in there ???
Actually, I have fixed it. Here is what I'd like to share with other fellows, of what I learnt while initial experiments with logstash, for the documentation and other resources aren't so very telling ...
"error_examiner" or "info_examiner" wont work, parse the instance/event row in "message"
geoip doesnt work for internal ips.
kv, for this you must specify the field_split and value_split if they aren't like a=1 b=2 , but say a:1&b:2 then field_Split is &, value_split is :
stdout, by default badly prepends if codec chosen is json, please choose rubydebug.
Thanks,

How would you use map reduce on this document structure?

If I wanted to count foobar.relationships.friend.count, how would I use map/reduce against this document structure so the count will equal 22.
[
[0] {
"rank" => nil,
"profile_id" => 3,
"20130913" => {
"foobar" => {
"relationships" => {
"acquaintance" => {
"count" => 0
},
"friend" => {
"males_count" => 0,
"ids" => [],
"females_count" => 0,
"count" => 10
}
}
}
},
"20130912" => {
"foobar" => {
"relationships" => {
"acquaintance" => {
"count" => 0
},
"friend" => {
"males_count" => 0,
"ids" => [
[0] 77,
[1] 78,
[2] 79
],
"females_count" => 0,
"count" => 12
}
}
}
}
}
]
In JavaScript this query get you the result you expect
r.db('test').table('test').get(3).do( function(doc) {
return doc.keys().map(function(key) {
return r.branch(
doc(key).typeOf().eq('OBJECT'),
doc(key)("foobar")("relationships")("friend")("count").default(0),
0
)
}).reduce( function(left, right) {
return left.add(right)
})
})
In Ruby, it should be
r.db('test').table('test').get(3).do{ |doc|
doc.keys().map{ |key|
r.branch(
doc.get_field(key).typeOf().eq('OBJECT'),
doc.get_field(key)["foobar"]["relationships"]["friend"]["count"].default(0),
0
)
}.reduce{ |left, right|
left+right
}
}
I would also tend to think that the schema you use is not really adapted, it would be better to use something like
{
rank: null
profile_id: 3
people: [
{
id: 20130913,
foobar: { ... }
},
{
id: 20130912,
foobar: { ... }
}
]
}
Edit: A simpler way to do it without using r.branch is just to remove the fields that are not objects with the without command.
Ex:
r.db('test').table('test').get(3).without('rank', 'profile_id').do{ |doc|
doc.keys().map{ |key|
doc.get_field(key)["foobar"]["relationships"]["friend"]["count"].default(0)
}.reduce{ |left, right|
left+right
}
}.run
I think you will need your own inputreader. This site gives you a tutorial how it can be done: http://bigdatacircus.com/2012/08/01/wordcount-with-custom-record-reader-of-textinputformat/
Then you run mapreduce with a mapper
Mapper<LongWritable, ClassRepresentingMyRecords, Text, IntWritable>
In your map function you extract the value for count and emit this is the value. Not sure if you need a key?
In the reducer you add together all the elements with the same key (='count' in your case).
This should get you on your way I think.