Map-Reduce Logs on Hive-Tez

Map-Reduce Logs on Hive-Tez - mapreduce

I want to get the interpretation of Map-Reduce logs after running a query on Hive-Tez ? What the lines after INFO: conveys ?
Here I have attached a sample
INFO : Session is already open
INFO : Dag name: SELECT a.Model...)
INFO : Tez session was closed. Reopening...
INFO : Session re-established.
INFO :
INFO : Status: Running (Executing on YARN cluster with App id application_14708112341234_1234)
INFO : Map 1: -/- Map 3: -/- Map 4: -/- Map 7: -/- Reducer 2: 0/15 Reducer 5: 0/26 Reducer 6: 0/13
INFO : Map 1: -/- Map 3: 0/118 Map 4: 0/118 Map 7: 0/1 Reducer 2: 0/15 Reducer 5: 0/26 Reducer 6: 0/13
INFO : Map 1: 0/118 Map 3: 0/118 Map 4: 0/118 Map 7: 0/1 Reducer 2: 0/15 Reducer 5: 0/26 Reducer 6: 0/13
INFO : Map 1: 0/118 Map 3: 0/118 Map 4: 0(+5)/118 Map 7: 0/1 Reducer 2: 0/15 Reducer 5: 0/26 Reducer 6: 0/13
INFO : Map 1: 0/118 Map 3: 0(+5)/118 Map 4: 0(+7)/118 Map 7: 0(+1)/1 Reducer 2: 0/15 Reducer 5: 0/26 Reducer 6: 0/13
INFO : Map 1: 0/118 Map 3: 0(+15)/118 Map 4: 0(+18)/118 Map 7: 0(+1)/1 Reducer 2: 0/15 Reducer 5: 0/26 Reducer 6: 0/13

The log you posted is DAG execution log. The DAG consists of
Map 1,Map 3,Map 4,Map 7 mappers vertices and reducers: Reducer 2,Reducer 5,Reducer 6
Map 1: -/- - this means that the vertex is not initialized, the number of mappers are not calculated yet.
Map 4: 0(+7)/118 - this means that totally there are 118 mappers, 7 of them are running in parallel, 0 completed yet, 118-7=111 are pending.
Reducer 2: 0/15 - this means that totally there are 15 reducers, 0 of them are running, 0 of them are completed (15 reducers pending).
Negative figures (there are no such in your example) = number of failed or killed mappers or reducers
Qubole has explanation about Tez log: https://docs.qubole.com/en/latest/user-guide/hive/using-hive-on-tez/hive-tez-tuning.html#understanding-log-pane

Related

How to inspect the structure of a Google AutoML Tables model?

I have trained a model using Google AutoML Tables. I would like to inspect the general structure of the trained model (which algos were used, what preprocessing if any, etc)
In the "Viewing model architecture with Cloud Logging" section of the docs here, I see:
If more than one model was used to create the final model, the
hyperparameters for each model are returned as an entry in the
modelParameters array, indexed by position (0, 1, 2, and so on)
My modelParameters array is as shown below (with the first and last elements expanded). Im only doing the AutoML Tables quickstart, which uses the Bank marketing open-source dataset, so I'm suprised that it would return such a complex model (25 stacked/ensembled models?). I would think that just model 0 (a single Gradient Boosted Decision Tree with 300 trees and a max depth of 15) would be sufficient.
Also '25' is a suspiciously round number. Are we sure that the docs are correct and this list isn't actually the best 25, stackranked by accuracy score? Is there a better way to understand the end to end model (including preprocessing) that Google AutoML Tables is producing?
modelParameters: [
0: {
hyperparameters: {
Center Bias: "False"
Max tree depth: 15
Model type: "GBDT"
Number of trees: 300
Tree L1 regularization: 0
Tree L2 regularization: 0.10000000149011612
Tree complexity: 3
}
}
1: {…}
2: {…}
3: {…}
4: {…}
5: {…}
6: {…}
7: {…}
8: {…}
9: {…}
10: {…}
11: {…}
12: {…}
13: {…}
14: {…}
15: {…}
16: {…}
17: {…}
18: {…}
19: {…}
20: {…}
21: {…}
22: {…}
23: {…}
24: {
hyperparameters: {
Center Bias: "False"
Max tree depth: 9
Model type: "GBDT"
Number of trees: 500
Tree L1 regularization: 0
Tree L2 regularization: 0
Tree complexity: 0.10000000149011612
}
}
]
}

The docs and your original interpretation of the results are correct. In this case, AutoML Tables created an ensemble of 25 models.
It also provides the full list of individual models tried during the search process if you click on the "Trials" link. That should be much larger list than 25.

C++ TensorRT batch inference gives weird results

Good day everyone!
I have a problem performing batch inference in TensorRT. When the batch size is 1 it works like a charm, but when I change it to any other number it gives out plain garbage.
Step by step, I downloaded TensorRT (5.0) and installed it on my Ubuntu 18.04 laptop with GTX755M. I then built the samples that went with it and tested it on sampleMNIST sample and it worked like a charm. I then proceeded to change every occurrence of mParams.batchSize to 10. Of course I also changed the size of allocated memory and modified result printing along. But after I recompiled the sample I got completely weird results - the output says 80% 7 20% 1 for every given input:
grim#shigoto:~/tensorrt/bin$ ./sample_mnist
Building and running a GPU inference engine for MNIST
Input:
############################
############################
############################
############################
############################
################.*##########
################.=##########
############+###.=##########
###########% ###.=##########
###########% ###.=##########
###########+ *##:-##########
###########= *##= ##########
###########. ###= ##########
##########= =++.-##########
########## =##########
########## :*## =##########
##########:*###% =##########
###############% =##########
################ =##########
################ =##########
###############* *##########
###############= ###########
###############= ###########
###############=.###########
###############++###########
############################
############################
############################
Output:
0:
1: ********
2:
3:
4:
5:
6:
7: **
8:
9:
This output repeats 10 times. I've tried this with different networks but results were similar, most of networks give 1 correct output and plain garbage the other 9 times. The complete sample can be found here. I've tried googling documentation but I can't understand what I'm doing wrong. Could you please tell me what am I doing wrong or how to perform batch inference in TensorRT?

Did you also modified the mnist.prototxt?
Especially this part:
input: "data"
input_shape {
dim: 1
dim: 1
dim: 28
dim: 28
}
I think that should be:
input: "data"
input_shape {
dim: 10
dim: 1
dim: 28
dim: 28
}

Unable to find custom Hive InputFormat when using `where 1=1`

I'm using Hive and I'm encountering an exception when I'm performing a query with a custom InputFormat.
When I use the query select * from micmiu_blog; Hive works without problems, but if I use select * from micmiu_blog where 1=1; it seems that the framework cannot find my custom InputFormat class.
I have put the JAR file into "hive/lib","hadoop/lib" and I have also put "hadoop/lib" into the CLASSPATH. This is the log:
hive> select * from micmiu_blog where 1=1;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1415530028127_0004, Tracking URL = http:/ /hadoop01-master:8088/proxy/application_1415530028127_0004/
Kill Command = /home/hduser/hadoop-2.2.0/bin/hadoop job -kill job_1415530028127_0004
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2014-11-09 19:53:32,996 Stage-1 map = 0%, reduce = 0%
2014-11-09 19:53:52,010 Stage-1 map = 100%, reduce = 0%
Ended Job = job_1415530028127_0004 with errors
Error during job, obtaining debugging information...
Examining task ID: task_1415530028127_0004_m_000000 (and more) from job job_1415530028127_0004
Task with the most failures(4):
-----
Task ID:
task_1415530028127_0004_m_000000
URL:
http://hadoop01-master:8088/taskdetails.jsp?jobid=job_1415530028127_0004&tipid=task_1415530028127_0004_m_000000
-----
Diagnostic Messages for this Task:
Error: java.io.IOException: cannot find class hiveinput.MyDemoInputFormat
at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:564)
at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.<init>(MapTask.java:167)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:408)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Job 0: Map: 1 HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec

I met the problem just now. You should add your JAR file to the CLASSPATH with hive cli.
You can do it:
hive> add jar /usr/lib/xxx.jar;

Group repeated regex?

I would like to match the following text:
PokerStars Hand #95528134282: Tournament #2013004001, $0.10+$0.01 USD Hold'em No Limit - Level VI (100/200) - 2013/03/14 15:35:36 WET [2013/03/14 11:35:36 ET]
Table '2013004001 5898' 9-max Seat #1 is the button
Seat 1: Pucharrin (7250 in chips)
Seat 2: pahol (24180 in chips)
Seat 3: dno16 (2000 in chips)
Seat 4: sogd20i07 (150 in chips) is sitting out
Seat 5: koaollie (13680 in chips)
Seat 6: vovik770 (6307 in chips)
Seat 7: gab341978 (6920 in chips)
Seat 8: 19gow63 (1000 in chips)
Seat 9: pokerplayer (9048 in chips)
pahol: posts small blind 100
dno16: posts big blind 200
*** HOLE CARDS ***
Dealt to pokerplayer [3s 9d]
sogd20i07: folds
koaollie: folds
vovik770: folds
gab341978: folds
19gow63: raises 800 to 1000 and is all-in
pokerplayer: folds
Pucharrin: folds
pahol: raises 1000 to 2000
dno16: calls 1800 and is all-in
*** FLOP *** [4s 7c Ah]
*** TURN *** [4s 7c Ah] [Qs]
*** RIVER *** [4s 7c Ah Qs] [Ks]
*** SHOW DOWN ***
pahol: shows [Qc Qd] (three of a kind, Queens)
dno16: shows [6h 2h] (high card Ace)
pahol collected 2000 from side pot
19gow63: shows [Kd 2s] (a pair of Kings)
pahol collected 3000 from main pot
19gow63 re-buys and receives 1000 chips for $0.10
dno16 re-buys and receives 2000 chips for $0.20
*** SUMMARY ***
Total pot 5000 Main pot 3000. Side pot 2000. | Rake 0
Board [4s 7c Ah Qs Ks]
Seat 1: Pucharrin (button) folded before Flop (didn't bet)
Seat 2: pahol (small blind) showed [Qc Qd] and won (5000) with three of a kind, Queens
Seat 3: dno16 (big blind) showed [6h 2h] and lost with high card Ace
Seat 4: sogd20i07 folded before Flop (didn't bet)
Seat 5: koaollie folded before Flop (didn't bet)
Seat 6: vovik770 folded before Flop (didn't bet)
Seat 7: gab341978 folded before Flop (didn't bet)
Seat 8: 19gow63 showed [Kd 2s] and lost with a pair of Kings
Seat 9: pokerplayer folded before Flop (didn't bet)
And I would like to capture the lines starting with "Seat somenumber: someplayername (somenumber in chips) someoptionaltext" as a group.
I tried the following regex:
PokerStars.*?Level .+? \(\d+\/(\d+)\) (?:.|\s)*?((?:Seat \d+: .*? \(\d+ in chips\)(?:.|\s)*?)+)(?:.|\s)*?(.*?) collected (\d+)
but it only captures the first occurence "Seat 1: Pucharrin (7250 in chips) ".
How can I change it to capture the group?
Thanks.

Here is a regexp that captures the data you need. When you specify the syntax/ordering of those elements, I could revamp it for you
(Seat\s(\d)+:\s+([\w\d\-\_\.]+)\s+\((\d+)\s+in\s+chips\)(.*))
I used your service and input to test it, seems to work fine.
-- EDIT --
Unless you want the entire block starting with PokerStars and stopping at the last )
Then maybe:
if (preg_match('/((?:PokerStars.*\s*Table.*[\r\n]*)(?:Seat.*\s+)+)/i', $subject, $regs)) {
$result = $regs[0];
} else {
$result = "";
}

Trouble with append-spit

I'm attempting to use clojure.contrib.io's (1.2) append-spit to append to a file (go figure).
If I create a text file on my desktop, as a test, and attempt to append to it in a fresh repl, this is what I get:
user> (append-spit "/Users/ihodes/Desktop/test.txt" "frank")
Backtrace:
0: clojure.contrib.io$assert_not_appending.invoke(io.clj:115)
1: clojure.contrib.io$outputstream__GT_writer.invoke(io.clj:266)
2: clojure.contrib.io$eval1604$fn__1616$G__1593__1621.invoke(io.clj:121)
3: clojure.contrib.io$fn__1660.invoke(io.clj:185)
4: clojure.contrib.io$eval1604$fn__1616$G__1593__1621.invoke(io.clj:121)
5: clojure.contrib.io$append_writer.invoke(io.clj:294)
6: clojure.contrib.io$append_spit.invoke(io.clj:342)
7: user$eval1974.invoke(NO_SOURCE_FILE:1)
8: clojure.lang.Compiler.eval(Compiler.java:5424)
9: clojure.lang.Compiler.eval(Compiler.java:5391)
10: clojure.core$eval.invoke(core.clj:2382)
11: swank.commands.basic$eval_region.invoke(basic.clj:47)
12: swank.commands.basic$eval_region.invoke(basic.clj:37)
13: swank.commands.basic$eval807$listener_eval__808.invoke(basic.clj:71)
14: clojure.lang.Var.invoke(Var.java:365)
15: user$eval1972.invoke(NO_SOURCE_FILE)
16: clojure.lang.Compiler.eval(Compiler.java:5424)
17: clojure.lang.Compiler.eval(Compiler.java:5391)
18: clojure.core$eval.invoke(core.clj:2382)
19: swank.core$eval_in_emacs_package.invoke(core.clj:94)
20: swank.core$eval_for_emacs.invoke(core.clj:241)
21: clojure.lang.Var.invoke(Var.java:373)
22: clojure.lang.AFn.applyToHelper(AFn.java:169)
23: clojure.lang.Var.applyTo(Var.java:482)
24: clojure.core$apply.invoke(core.clj:540)
25: swank.core$eval_from_control.invoke(core.clj:101)
26: swank.core$eval_loop.invoke(core.clj:106)
27: swank.core$spawn_repl_thread$fn__489$fn__490.invoke(core.clj:311)
28: clojure.lang.AFn.applyToHelper(AFn.java:159)
29: clojure.lang.AFn.applyTo(AFn.java:151)
30: clojure.core$apply.invoke(core.clj:540)
31: swank.core$spawn_repl_thread$fn__489.doInvoke(core.clj:308)
32: clojure.lang.RestFn.invoke(RestFn.java:398)
33: clojure.lang.AFn.run(AFn.java:24)
34: java.lang.Thread.run(Thread.java:637)
Which clearly isn't what I wanted.
I was wondering if anyone else had these problems, or if I'm doing something incorrectly? The file I'm appending to it not open (at least by me). I'm at a loss.
Thanks so much!

I notice that the relevant functions are marked as deprecated in 1.2, but I'm also under the impression that, as written, they've got some bugs in need of ironing out.
First, a non-deprecated way to do what you were trying to do (which works fine for me):
(require '[clojure.java.io :as io])
(with-open [w (io/writer (io/file "/path/to/file")
:append true)]
(spit w "Foo foo foo.\n"))
(Skipping io/file and simply passing the string to io/writer would work too -- I prefer to use the wrapper partly as a matter of personal taste and partly so that c.j.io doesn't try to treat the string as an URL (only to back out via an exception and go for a file in this case), which is its first choice of interpretation.)
As for why I think clojure.contrib.io might be suffering from a bug:
(require '[clojure.contrib.io :as cio])
(with-bindings {#'cio/assert-not-appending (constantly true)}
(cio/append-spit "/home/windfall/scratch/SO/clj/append-test.txt" "Quux quux quux?\n"))
This does not complain, but neither does it append to the file -- the current contents gets replaced instead. I'm not yet sure what exactly the problem is, but switching to clojure.java.io should avoid it. (Clearly this needs further investigation -- deprecated code still shouldn't be buggy -- I'll try to figure it out.)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Map-Reduce Logs on Hive-Tez - mapreduce

Related

How to inspect the structure of a Google AutoML Tables model?

C++ TensorRT batch inference gives weird results

Unable to find custom Hive InputFormat when using `where 1=1`

Group repeated regex?

Trouble with append-spit

Categories

Resources