What gridmix input format likes? - mapreduce

I use Rumen mine job-history files, contains job-trace.json and job-topology.json.
GirdMix usage likes:
$HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/share/hadoop/tools/lib/hadoop-gridmix-2.7.3.jar -libjars $HADOOP_HOME/share/hadoop/tools/lib/hadoop-rumen-2.7.3.jar -Dgridmix.compression-emulation.enable=false <iopath> <trace>
And, means working directory for Gridmix, so I feed with: file:///home/hadoop/input, means the trace file extracted from log files, feed with file:///home/hadoop/rumen/job-trace-1hr.json.
Finally, meet with following Exceptions:
2019-03-07 16:37:12,495 ERROR [main] gridmix.Gridmix (Gridmix.java:start(534)) - Startup failed. java.io.IOException: Found no satisfactory file in file:/home//hadoop/input
2019-03-07 16:37:13,040 INFO [main] util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting with status 2
2019-03-07 16:37:13,041 INFO [Thread-1] gridmix.Gridmix (Gridmix.java:run(657)) - Exiting...
So what this parameter like , or how to use it?
can anyone have some ideas?
Thanks.

I found it's my own incorrect useage;
I check out gridmix parameters usage, due to too small input data.
gridmix.min.file.size | The minimum size of the input files. The default limit is 128 MiB. Tweak this parameter if you see an error-message like "Found no satisfactory file" while testing GridMix with a relatively-small input data-set.
So, I tuned larger input data.
Using -generate 10G.
Thanks.

Related

Batch jcl error for cics web services using cics web service assistant tool

I am facing issue while submitting below job, can someone please suggest?
Error:
IEF344I KA7LS2W2 INPUT STEP1 SYSUT2 - ALLOCATION FAILED DUE TO DATA FACILITY SYSTEM ERROR
IGD17501I ATTEMPT TO OPEN A UNIX FILE FAILED,
RETURN CODE IS (00000081) REASON CODE IS (0594003D)
FILENAME IS (/ka7a/KA7A.in)
JCL:
//KA7LS2W2 JOB (51,168),'$ACCEPT',CLASS=1,
// MSGCLASS=X,MSGLEVEL=(1,0),NOTIFY=&SYSUID,REGION=0M
// EXPORT SYMLIST=*
// JCLLIB ORDER=SYS2.CI55.SDFHINST
//STEP1 EXEC DFHLS2WS,
// JAVADIR='java/J7.0_64',PATHPREF='',TMPDIR='/ka7a',
// USSDIR='',TMPFILE=&QT.&SYSUID.&QT
//INPUT.SYSUT1 DD *
PDSLIB=//DJPN.KA7A.POC
LANG=COBOL
PGMINT=CHANNEL
PGMNAME=KZHFEN1C
REQMEM=PAYIN
RESPMEM=PAYOUT
MAPPING-LEVEL=2.2
LOGFILE=/home/websrvices/wsbind/payws.log `enter code here`
WSBIND=/home/webservices/wsbind/payws.wsbind
WSDL=/home/webservices/wsdl/payws.wsdl
/*
Based on the Return Code 81 / Reason Code 0594003D the pathname can't be resolved.
the message IGD17501I explains the error. You'll find more information looking up the Reason Code 0594003D.
You can use BPXMTEXT to lookup more detail on the Reason Code.
Executing this command in USS you'll see:
$ bpxmtext 0594003D
BPXFVLKP 05/14/20
JRDirNotFound: A directory in the pathname was not found
Action: One of the directories specified was not found. Verify that the name
specified is spelled correctly.
Per #phunsoft adding that the same command can be executed in TSO and is not case sensitive like USS.
I'd suspect that /ka7a doesn't exist. Is it a case issue? Or perhaps you meant /u/ka7a/ or `/home/ka7a' ?

cloudbuil.yaml does not unmarshall when using base64-encoded value on build trigger

On my cloudbuild.yaml definition, I used to have a secrets section to get environment values from Google KMS. The secretEnv fields had keys mapping to 'encrypted + base64-encoded' values:
...
secrets:
- kmsKeyName: <API_PATH>
secretEnv:
<KEY>: <ENCRYPTED+BASE64>
I've tried to put this value on a substitution instead, which is replaced when a build trigger is used:
...
secrets:
- kmsKeyName: <API_PATH>
secretEnv:
<KEY>: ${_VALUE}
With that I intend to keep the file generic.
However, the build process keeps failing with a message failed unmarshalling build config cloudbuild.yaml: illegal base64 data at input byte 0. I've checked several times and the base64 value was not copied wrong into the substitution on the trigger.
Thank you in advance.
https://cloud.google.com/cloud-build/docs/configuring-builds/substitute-variable-values
After reading Using user-defined substitutions section carefully, I've seen that
The length of a parameter key is limited to 100 bytes and the length
of a parameter value is limited to 4000 bytes.
Mine was a 253-character long string.
I managed to reproduce a similar error to yours (exactly this one: "Failed to trigger build: failed unmarshalling build config cloudbuild.yaml: json: cannot unmarshal string into Go value of type map[string]json.RawMessage, it is because using"). But this was only when my variable was something like "name:content" instead of "name: content". Notice the white space, so important.
Then, going back to your point... user-defined substitutions are limited to 255 characters (yes, docs are currently wrong and this has been reported). But, for example, if you use something like:
substitutions:
variable_name: cool_really_long_content_but_still_no_255_chars
And then you do this:
steps:
- name: "gcr.io/cloud-builders/docker"
args: ["build", "-t", "gcr.io/$PROJECT_ID/$cool_really_long_content_but_still_no_255_chars", "."]
It still will fail if "gcr.io/$PROJECT_ID/$cool_really_long_content_but_still_no_255_chars" is, in fact, more than 255 chars even if your really long content is still not 255 chars. And this error will appear in Build details>Logs instead of being a popup that you see when you click "run trigger" in "build triggers" section on Google Cloud Build which is where the kind of the reported error appears since logs in that case are showing disabled in Build details section.

GATE_Using for Thesis_Run-time Error

When I am trying to run corpus pipeline on language resources. It is throwing the below (even though I follow the order as Document reset, english tokeniser, sentence splitter)
Can someone help me with the process to debug this run-time error
Error:
gate.creole.ExecutionException: No sentences or tokens to process in document Password_Safe-window1.txt_0003E
Please run a sentence splitter and tokeniser first!
at gate.creole.POSTagger.execute(POSTagger.java:257)
at gate.util.Benchmark.executeWithBenchmarking(Benchmark.java:291)
at gate.creole.SerialController.runComponent(SerialController.java:225)
at gate.creole.SerialController.executeImpl(SerialController.java:157)
at gate.creole.SerialAnalyserController.executeImpl(SerialAnalyserController.java:223)
at gate.creole.SerialAnalyserController.execute(SerialAnalyserController.java:126)
at gate.util.Benchmark.executeWithBenchmarking(Benchmark.java:291)
at gate.gui.SerialControllerEditor$RunAction$1.run(SerialControllerEditor.java:1759)
at java.lang.Thread.run(Thread.java:745)
Edit:
The files are not empty. As i tried to implement #dedek's suggestion, it has thrown no errors. But raised one more problem as follows:
Exception in thread "ApplicationViewer1" java.lang.OutOfMemoryError: Java heap space
I think it is because your document is empty.
Can you confirm that?
There is a run-time param failOnMissingInputAnnotations of the POSTagger, set it to false and it should be ok.
See also the docs:
failOnMissingInputAnnotations - if set to false, the PR will not fail with an ExecutionException if no input Annotations are found and instead only log a single warning message per session and a debug message per document that has no input annotations (run-time, default = true).
Concerning the OutOfMemoryError: Java heap space
See following questions:
Getting OOM while using GATE on large data set
GATE PersistenceManager.loadObjectFromFile outofmemory error while loading .gapp files
JAVA PermGem memory

Pig's "dump" is not working on AWS

I am trying Pig commands on EMR of AWS. But even small commands are not working as I expected. What I did is following.
Save the following 6 lines as ~/a.csv.
1,2,3
4,2,1
8,3,4
4,3,3
7,2,5
8,4,3
Start Pig
Load the csv file.
grunt> A = load './a.csv' using PigStorage(',');
16/01/06 13:09:09 INFO Configuration.deprecation: fs.default.name is deprecated. Instead, use fs.defaultFS
Dump the variable A.
grunt> dump A;
But this commands fails. I expected that this command produces 6 tuples which are described in a.csv. The dump commands a lot of INFO lines and ERROR lines. The ERROR lines are following.
91711 [main] ERROR org.apache.pig.tools.pigstats.PigStats - ERROR 0: java.lang.IllegalStateException: Job in state DEFINE instead of RUNNING
16/01/06 13:10:08 ERROR pigstats.PigStats: ERROR 0: java.lang.IllegalStateException: Job in state DEFINE instead of RUNNING
91711 [main] ERROR org.apache.pig.tools.pigstats.mapreduce.MRPigStatsUtil - 1 map reduce job(s) failed!
16/01/06 13:10:08 ERROR mapreduce.MRPigStatsUtil: 1 map reduce job(s) failed!
[...skipped...]
Input(s):
Failed to read data from "hdfs://ip-xxxx.eu-central-1.compute.internal:8020/user/hadoop/a.csv"
Output(s):
Failed to produce result in "hdfs://ip-xxxx.eu-central-1.compute.internal:8020/tmp/temp-718505580/tmp344967938"
[...skipped...]
91718 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias A. Backend error : java.lang.IllegalStateException: Job in state DEFINE instead of RUNNING
16/01/06 13:10:08 ERROR grunt.Grunt: ERROR 1066: Unable to open iterator for alias A. Backend error : java.lang.IllegalStateException: Job in state DEFINE instead of RUNNING
(I have changed IP-like description.) The error message seems to say that the load operator also fails.
I have no idea why even the dump operator fails. Can you give me any advice?
Note
I also use TAB in a.csv instead commas and execute A = load './a-tab.csv';, but it does not help.
$ pig -x local -> A = load 'a.csv' using PigStorage(','); -> dump A;. Then
Input(s):
Failed to read data from "file:///home/hadoop/a.csv"
If I use the full path, namely A = load '/home/hadoop/a.csv' using PigStorage(',');, then I get
Input(s):
Failed to read data from "/home/hadoop/a.csv"
I have encountered the same problem. You may try to su root use the root user, then ./bin/pig at PIG_HOME to start pig in mapreduce mode. On the other hand, you also can use the current user by sudo ./bin/pig at PIG_HOME to start pig, but you must export JAVA_HOME and HADOOP_HOME in the ./bin/pig file.
If you want to use your local file system, you should have to start your pig in step 2 as below
bin/pig -x local
If you start just as bin/pig that will search the file in DFS. That's why you get error Failed to read data from "hdfs://ip-xxxx.eu-central-1.compute.internal:8020/user/hadoop/a.csv"

weka: how to generate libsvm training parameter

I am running libsvm through weka. Its output accuracy looks good to me, so I am planning to write a svm model by myself. However, weka didn't generate any training parameter, such as number of support vector. Therefore i cannot do anything. Searching the web, i found somebody said it would generate some parameters like the following:
optimization finished, #iter = 27
nu = 0.058475864943863545
obj = -1.871013102744184, rho = -0.19357337828800944
nSV = 9, nBSV = 0 `enter code here`
Total nSV = 9
but how come i didn't see any of them? any step that i missed? please help me. Thanks a lot.
Weka writes the output you mentioned to stderr.
So if you have started weka.sh or weka.bat from a terminal (or "command window" if you are on Windows), you should see that output appear in your terminal window after clicking "classify"
If you want to have access to this information via scripts, you can
redirect the output to a file and read in that file.
Here is how to edit the startup file weka.sh / weka.bat.
Edit this line (it is probably the last line) in order to write log info to a file instead of the terminal window:
java -cp $CP -Xmx8092m weka.gui.GUIChooser 2>>/opt/weka-stable/weka.log &
You can also add a properties file to your home directory to add more fine-grained behaviour.
https://weka.wikispaces.com/Properties+file
(You probably can also access information via the Weka Java API somehow, but you did not ask for that)