symmetricds SYSADMIN send-script error - database-replication

the command of sending a sql script to node or node group is working fine but the issue is with parsing the file itself.
Here you are, the log of the target node
2014-08-27 16:51:12,130 ERROR [station-001] [DataLoaderService] [station-001-pull-1] Failed to load batch 000-31 because: In file: inline evaluation of: ``DROP TABLE ofep.PRODUCT_RESTRICTIONS;'' Encountered "ofep" at line 1, column 12.
java.lang.RuntimeException: In file: inline evaluation of: ``DROP TABLE ofep.PRODUCT_RESTRICTIONS;'' Encountered "ofep" at line 1, column 12.
at org.jumpmind.symmetric.io.data.writer.DatabaseWriter.script(DatabaseWriter.java:919)
at org.jumpmind.symmetric.io.data.writer.DatabaseWriter.write(DatabaseWriter.java:196)
at org.jumpmind.symmetric.io.data.writer.DatabaseWriter.write(DatabaseWriter.java:167)
at org.jumpmind.symmetric.io.data.writer.NestedDataWriter.write(NestedDataWriter.java:64)
at org.jumpmind.symmetric.model.ProcessInfoDataWriter.write(ProcessInfoDataWriter.java:65)
at org.jumpmind.symmetric.io.data.writer.NestedDataWriter.write(NestedDataWriter.java:64)
at org.jumpmind.symmetric.io.data.writer.TransformWriter.write(TransformWriter.java:217)
at org.jumpmind.symmetric.io.data.DataProcessor.forEachDataInTable(DataProcessor.java:194)
at org.jumpmind.symmetric.io.data.DataProcessor.forEachTableInBatch(DataProcessor.java:164)
at org.jumpmind.symmetric.io.data.DataProcessor.process(DataProcessor.java:114)
at org.jumpmind.symmetric.service.impl.DataLoaderService$LoadIntoDatabaseOnArrivalListener.end(DataLoaderService.java:779)
at org.jumpmind.symmetric.io.data.writer.StagingDataWriter.notifyEndBatch(StagingDataWriter.java:75)
at org.jumpmind.symmetric.io.data.writer.AbstractProtocolDataWriter.end(AbstractProtocolDataWriter.java:220)
at org.jumpmind.symmetric.io.data.DataProcessor.process(DataProcessor.java:124)
at org.jumpmind.symmetric.service.impl.DataLoaderService.loadDataFromTransport(DataLoaderService.java:407)
at org.jumpmind.symmetric.service.impl.DataLoaderService.loadDataFromPull(DataLoaderService.java:265)
at org.jumpmind.symmetric.service.impl.PullService.execute(PullService.java:129)
at org.jumpmind.symmetric.service.impl.NodeCommunicationService$2.run(NodeCommunicationService.java:307)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: In file: inline evaluation of: ``DROP TABLE ofep.PRODUCT_RESTRICTIONS;'' Encountered "ofep" at line 1, column 12.
at bsh.Parser.generateParseException(Parser.java:6068)
at bsh.Parser.jj_consume_token(Parser.java:5939)
at bsh.Parser.BlockStatement(Parser.java:2780)
at bsh.Parser.Line(Parser.java:147)
at bsh.Interpreter.Line(Interpreter.java:1000)
at bsh.Interpreter.eval(Interpreter.java:635)
at bsh.Interpreter.eval(Interpreter.java:739)
at bsh.Interpreter.eval(Interpreter.java:728)
at org.jumpmind.symmetric.io.data.writer.DatabaseWriter.script(DatabaseWriter.java:916)
... 20 more
2014-08-27 16:51:12,470 ERROR [station-001] [DataLoaderService] [station-001-pull-1] Failed while parsing batch
java.lang.RuntimeException: In file: inline evaluation of: ``DROP TABLE ofep.PRODUCT_RESTRICTIONS;'' Encountered "ofep" at line 1, column 12.
at org.jumpmind.symmetric.io.data.writer.DatabaseWriter.script(DatabaseWriter.java:919)
at org.jumpmind.symmetric.io.data.writer.DatabaseWriter.write(DatabaseWriter.java:196)
at org.jumpmind.symmetric.io.data.writer.DatabaseWriter.write(DatabaseWriter.java:167)
at org.jumpmind.symmetric.io.data.writer.NestedDataWriter.write(NestedDataWriter.java:64)
at org.jumpmind.symmetric.model.ProcessInfoDataWriter.write(ProcessInfoDataWriter.java:65)
at org.jumpmind.symmetric.io.data.writer.NestedDataWriter.write(NestedDataWriter.java:64)
at org.jumpmind.symmetric.io.data.writer.TransformWriter.write(TransformWriter.java:217)
at org.jumpmind.symmetric.io.data.DataProcessor.forEachDataInTable(DataProcessor.java:194)
at org.jumpmind.symmetric.io.data.DataProcessor.forEachTableInBatch(DataProcessor.java:164)
at org.jumpmind.symmetric.io.data.DataProcessor.process(DataProcessor.java:114)
at org.jumpmind.symmetric.service.impl.DataLoaderService$LoadIntoDatabaseOnArrivalListener.end(DataLoaderService.java:779)
at org.jumpmind.symmetric.io.data.writer.StagingDataWriter.notifyEndBatch(StagingDataWriter.java:75)
at org.jumpmind.symmetric.io.data.writer.AbstractProtocolDataWriter.end(AbstractProtocolDataWriter.java:220)
at org.jumpmind.symmetric.io.data.DataProcessor.process(DataProcessor.java:124)
at org.jumpmind.symmetric.service.impl.DataLoaderService.loadDataFromTransport(DataLoaderService.java:407)
at org.jumpmind.symmetric.service.impl.DataLoaderService.loadDataFromPull(DataLoaderService.java:265)
at org.jumpmind.symmetric.service.impl.PullService.execute(PullService.java:129)
at org.jumpmind.symmetric.service.impl.NodeCommunicationService$2.run(NodeCommunicationService.java:307)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: In file: inline evaluation of: ``DROP TABLE ofep.PRODUCT_RESTRICTIONS;'' Encountered "ofep" at line 1, column 12.
at bsh.Parser.generateParseException(Parser.java:6068)
at bsh.Parser.jj_consume_token(Parser.java:5939)
at bsh.Parser.BlockStatement(Parser.java:2780)
at bsh.Parser.Line(Parser.java:147)
at bsh.Interpreter.Line(Interpreter.java:1000)
at bsh.Interpreter.eval(Interpreter.java:635)
at bsh.Interpreter.eval(Interpreter.java:739)
at bsh.Interpreter.eval(Interpreter.java:728)
at org.jumpmind.symmetric.io.data.writer.DatabaseWriter.script(DatabaseWriter.java:916)
... 20 more
The script contains only one statement “DROP TABLE ofep.PRODUCT_RESTRICTIONS;”
Could you please help me?
Thanks,
Ayman

Symadmin has three different send subcommands...
send-sql Send SQL statement to node
send-schema Send schema change to node
send-script Send script to node
You used send-script which is used for sending BSH scripts.
What you want to use is send-sql.

Related

MultiDelimiterSerDe setup in Amazon Hive

I'm trying to use a multidelimiter in a table insert for a hive job in emr on amazon aws. As explained in this link. The delimiter for the file is "|".
https://cwiki.apache.org/confluence/display/Hive/MultiDelimitSerDe
However, I ended up having to use...
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe'
Instead of the documented...
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.MultiDelimitSerDe'
in order for it to not give me this error.
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Cannot validate serde: org.apache.hadoop.hive.serde2.MultiDelimitSerDe
OK. So when I don't get that error, by adding the .contrib, I get this error which is caused by Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassNotFoundException: Class org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe not found
Status: Failed
Vertex failed, vertexName=Map 1, vertexId=vertex_1548264520414_0027_1_00, diagnostics=[Task failed, taskId=task_1548264520414_0027_1_00_000021, diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( failure ) : attempt_1548264520414_0027_1_00_000021_0:java.lang.RuntimeException: java.lang.RuntimeException: Map operator initialization failed
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1840)
at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: Map operator initialization failed
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:354)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:184)
... 14 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassNotFoundException: Class org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe not found
at org.apache.hadoop.hive.ql.exec.MapOperator.getConvertedOI(MapOperator.java:328)
at org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:420)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:286)
... 15 more
So I've been reading that you have to add the .jar file.
https://community.hortonworks.com/questions/82189/hive-cannot-see-jar.html
And so I've tried all kinds of things to get this to work. It says that it is adding it it to the class path.
hive> add jar /usr/lib/hive/lib/hive-contrib-2.3.3-amzn-1.jar
> ;
Added [/usr/lib/hive/lib/hive-contrib-2.3.3-amzn-1.jar] to class path
Added resources: [/usr/lib/hive/lib/hive-contrib-2.3.3-amzn-1.jar]
hive> add jar /usr/lib/hive/lib/hive-contrib.jar
> ;
Added [/usr/lib/hive/lib/hive-contrib.jar] to class path
Added resources: [/usr/lib/hive/lib/hive-contrib.jar]
hive> exit;
So I'm not sure what to do. It's acting as if the .jar file for hive-contrib isn't in the class path despite me adding it. I've also tried running...
export HADOOP_USER_CLASSPATH_FIRST=true
which is found here...
How to include jars in Hive (Amazon Hadoop env)
And that doesn't fix it either.
How can I use a multidelimiter SerDe property for a hive job on aws?
Thank you.
I could not get MultiDelimitSerDe to work. Instead, I was lucky in that the delimiter had quotations on either side of the pipe. So it looks like "|". This turns the values between the quotes into strings, so the additional pipes in those column values don't act as delimiters.
"Test | Test2 "|" Test3 | Test 4 | Test 5 "|" Test 6 "
You can see an explanation in the link below. The part that talks about it is in the comments, not the article.
https://www.ericlin.me/2015/07/how-to-create-a-hive-multi-character-delimitered-table/
If I didn't have those quotation marks around the delimiter, I'm not sure how I would have been able to work with a multi delimiter. Especially if I had quotations in any of my fields, but after checking, out of the billions of rows, there is not a single quote.

How can a table from the Glue Catalog raise NullPointerException?

I'm currently trying to join two tables together in AWS Glue from a RDS instance and after successfully crawling the database structure I'm setting up my job with:
table1 = (
glueContext.create_dynamic_frame
.from_catalog(database="transact", table_name="transact_table1")
)
table1.printSchema()
print "Count: ", table1.count()
table2 = (
glueContext.create_dynamic_frame
.from_catalog(database="transact", table_name="transact_table2")
)
table2.printSchema()
print "Count: ", table2.count()
Strangely, the job fails with:
File "/mnt/yarn/usercache/root/appcache/application_1520463557704_0008/container_1520463557704_0008_01_000001/PyGlue.zip/awsglue/dynamicframe.py", line 275, in count
File "/mnt/yarn/usercache/root/appcache/application_1520463557704_0008/container_1520463557704_0008_01_000001/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__
File "/mnt/yarn/usercache/root/appcache/application_1520463557704_0008/container_1520463557704_0008_01_000001/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
File "/mnt/yarn/usercache/root/appcache/application_1520463557704_0008/container_1520463557704_0008_01_000001/py4j-0.10.4-src.zip/py4j/protocol.py", line 319, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o64.count.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 4.0 failed 4 times, most recent failure: Lost task 0.3 in stage 4.0 (TID 26, ip-10-1-2-4.ec2.internal, executor 1): java.lang.NullPointerException
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$13.apply(JdbcUtils.scala:427)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$13.apply(JdbcUtils.scala:425)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:286)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:268)
at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:377)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:126)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
The curious part is that it only happens in the second count(), because I can see in the logs that it correctly prints out the table1 schema and count. And it also prints the table2 schema.
I did this because I was trying to Join.apply those dynamic frames together and it was failing with NullPointerException too. What gives? Am I missing a bit of configuration perhaps?
UPDATE 1:
It appears that it's a problem with that second table in particular. Picking some other table from the catalog to serve as table2 makes the Job succeed.
So I'll morph the question into: how must a table from the catalog be different from the others that makes it raise this error?
print "Count: ", table2count()
is it a typo error while posting the question to SO, if not please check the code ? Shouldn't it be table2.count() ?

How to parse Apache Catalina.log

I am trying to find a way to parse a Catalina.log and i am really struggling.
This a piece of the code:
May 12, 2017 2:14:38 PM org.apache.coyote.AbstractProtocol init
SEVERE: Failed to initialize end point associated with ProtocolHandler ["http-apr-10.1.31.104-443"]
java.lang.Exception: Connector attribute SSLCertificateFile must be defined when using SSL with APR
at org.apache.tomcat.util.net.AprEndpoint.bind(AprEndpoint.java:490)
at org.apache.tomcat.util.net.AbstractEndpoint.init(AbstractEndpoint.java:649)
at org.apache.coyote.AbstractProtocol.init(AbstractProtocol.java:434)
at org.apache.catalina.connector.Connector.initInternal(Connector.java:978)
at org.apache.catalina.util.LifecycleBase.init(LifecycleBase.java:102)
at org.apache.catalina.core.StandardService.initInternal(StandardService.java:559)
at org.apache.catalina.util.LifecycleBase.init(LifecycleBase.java:102)
at org.apache.catalina.core.StandardServer.initInternal(StandardServer.java:821)
at org.apache.catalina.util.LifecycleBase.init(LifecycleBase.java:102)
at org.apache.catalina.startup.Catalina.load(Catalina.java:638)
at org.apache.catalina.startup.Catalina.load(Catalina.java:663)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.apache.catalina.startup.Bootstrap.load(Bootstrap.java:253)
at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:427)
I wanna get
Date = May 12, 2017 2:14:38 PM
class = org.apache.coyote.AbstractProtocol init
Error level = SEVERE
Error Msg = Failed to initialize end point associated with ProtocolHandler ["http-apr-10.1.321.224-443"]
Error Msg Body = java.lang.Exception: Connector attribute SSLCertificateFile must be defined when using SSL with APR
at org.apache.tomcat.util.net.AprEndpoint.bind(AprEndpoint.java:490)....
i don even know where to start :)
any ideas are very welcomed
I have prepared for you the following regex:
((Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\s+\d{1,2},\s+\d{4}\s+\d{1,2}:\d{1,2}:\d{1,2}\s(AM|PM))\s(.+)(\r)?\n(FATAL|SEVERE|ERROR|WARN(ING)?|INFO|CONFIG|INFO|DEBUG):\s(.+)(\r)?\n(.+)(\r)?\n(?=\s+at.+java:\d+\))
You can use the following back reference to capture your groups
DATE -> $1
CLASS -> $4
ERROR_LEVEL -> $6
ERROR_MSG -> $8
ERROR_BODY -> $10
The regex will only fetch strings that met the following conditions:
starts by a date in the format specified in your post
after the date, the first line is composed of the class name
the 2nd line is composed of the error level and the error msg
the 3rd line is your error msg body
followed by a java strack trace of n lines starting by \s at and ending by java:\d+)
The regex works in the following way:
((Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\s+\d{1,2},\s+\d{4}\s+\d{1,2}:\d{1,2}:\d{1,2}\s(AM|PM))
This part will fetch the date in the format of your post:
3 char month followed by space(s) then 1 or 2 digits, ',' then year in 4 digit
then space(s), then time(column char, followed by space(s) then AM or PM
\s(.+)(\r)?\n
this part of the regex will allow you to get the rest of your first line corresponding to your class
(FATAL|SEVERE|ERROR|WARN(ING)?|INFO|CONFIG|INFO|DEBUG):\s(.+)(\r)?\n(.+)(\r)?\n
This part will allow you to get the error level (in this exhaustive list) followed by column and the following 2 lines corresponding to your error msg/body
(?=\s+at.+java:\d+\))
This last part is a condition the enforce that your error is followed by a java stack trace.
You might need to adapt a bit some parts of the regex (like the number of lines of the error body, error message) or the stack trace conditions but I think this is a great starting point for your case.
CHEERS!!!

Start token not found error while using JsonSerDe

I am trying to import a JSON data from S3, and after making some queries, export the output as JSON format to S3 again. However, I get the "org.apache.hadoop.hive.serde2.SerDeException: java.io.IOException: Start token not found where expected" error at hive step on EMR cluster. In order to understand what the problem is, I simplify the Hive script and JSON data, but it keeps giving the same error. How can I solve this problem?
Cluster configuration:
Release: emr-5.3.1
Hive version: 2.1.1
Hadoop distribution: Amazon 2.7.3
Service Role: EMR_DefaultRole
MasterInstanceType: m4.large
The content of the simplifed JSON data:
[{"MyID":"FOO123","MyField":"FOO"},{"MyID":"BAR123","MyField":"BAR"}]
Hive script:
DROP TABLE IF EXISTS SOURCE;
DROP TABLE IF EXISTS DESTINATION;
CREATE EXTERNAL TABLE SOURCE(MyID STRING, MyField STRING)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
LOCATION 's3://myPath/subPath/';
CREATE EXTERNAL TABLE DESTINATION(MyID STRING, MyField STRING)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
LOCATION 's3://anotherPath/subPath/';
INSERT OVERWRITE TABLE DESTINATION SELECT MyID, MyField FROM SOURCE;
And here is the stack trace:
Vertex failed, vertexName=Map 4, vertexId=vertex_1278452616863_0001_1_00, diagnostics=[Task failed, taskId=task_1278452616863, diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( failure ) : attempt_1278452616863:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable [{"MyID":"FOO123","MyField":"FOO"},{"MyID":"BAR123","MyField":"BAR"}]
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable [{"MyID":"FOO123","MyField":"FOO"},{"MyID":"BAR123","MyField":"BAR"}]
at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:95)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:70)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:383)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:185)
... 14 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable [{"MyID":"FOO123","MyField":"FOO"},{"MyID":"BAR123","MyField":"BAR"}]
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:497)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:86)
... 17 more
Caused by: org.apache.hadoop.hive.serde2.SerDeException: java.io.IOException: Start token not found where expected
at org.apache.hive.hcatalog.data.JsonSerDe.deserialize(JsonSerDe.java:183)
at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.readRow(MapOperator.java:128)
at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.access$200(MapOperator.java:92)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:488)
... 18 more
Caused by: java.io.IOException: Start token not found where expected
at org.apache.hive.hcatalog.data.JsonSerDe.deserialize(JsonSerDe.java:169)
... 21 more
Thanks.
JSON should start with { and not with array ([)
I tried with this approach updated my JSON file with structure as
{"MyID":"FOO123","MyField":"FOO"},
{"MyID":"BAR123","MyField":"BAR"}
but after done, I noticed only the first object is being inserted into the table.

Tasks fail after BAM admin password change

After changing the default password for admin user in WSO2 BAM 4.1.0, tasks fail with the following error:
TID: [0] [BAM] [2013-06-20 16:56:15,464] ERROR {org.wso2.carbon.analytics.hive.impl.HiveExecutorServiceImpl} - Error while executing Hive script.
Query returned non-zero code: 9, cause: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask {org.wso2.carbon.analytics.hive.impl.HiveExecutorServiceImpl}
java.sql.SQLException: Query returned non-zero code: 9, cause: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
at org.apache.hadoop.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:189)
at org.wso2.carbon.analytics.hive.impl.HiveExecutorServiceImpl$ScriptCallable.call(HiveExecutorServiceImpl.java:355)
at org.wso2.carbon.analytics.hive.impl.HiveExecutorServiceImpl$ScriptCallable.call(HiveExecutorServiceImpl.java:250)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)
TID: [0] [BAM] [2013-06-20 16:56:15,467] ERROR {org.wso2.carbon.analytics.hive.task.HiveScriptExecutorTask} - Error while executing script : am_stats_analyzer_460 {org.wso2.carbon.analytics.hive.ta
sk.HiveScriptExecutorTask}
org.wso2.carbon.analytics.hive.exception.HiveExecutionException: Error while executing Hive script.Query returned non-zero code: 9, cause: FAILED: Execution Error, return code 2 from org.apache.hado
op.hive.ql.exec.MapRedTask
at org.wso2.carbon.analytics.hive.impl.HiveExecutorServiceImpl.execute(HiveExecutorServiceImpl.java:117)
at org.wso2.carbon.analytics.hive.task.HiveScriptExecutorTask.execute(HiveScriptExecutorTask.java:60)
at org.wso2.carbon.ntask.core.impl.TaskQuartzJobAdapter.execute(TaskQuartzJobAdapter.java:56)
at org.quartz.core.JobRunShell.run(JobRunShell.java:213)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)
Reverting back the password to its original value solves the issue.
How do I change the password for the admin user and keep the tasks working?
Have you changed the username, password in the hive script am_stats_analyzer? The defaults are admin/admin, check the hive script and update the password accordingly. The property is as follows;
"cassandra.ks.username" = "admin",
"cassandra.ks.password" = "xxxxx",
Check if that fixes your issue.
In order to solve the issue I had to perform the following steps:
edit the file [BAM_HOME]/repository/conf/etc/cassandra-auth.xml and change the password value to the new password.
edit the file [BAM_HOME]/repository/conf/datasources/master-datasources.xml and change the password value of the WSO2BAM_CASSANDRA_DATASOURCE datasource to the new password.
restart the BAM: the Hive tasks now run without errors.
where the new password is the password I assigned to the admin user.
Moreover, the Main \ Manage \ Cassandra Keyspaces \ List page in the BAM UI, which was raising the following error, is now fixed:
org.wso2.carbon.cassandra.mgt.ui.CassandraAdminClientException: Error retrieving keyspace names !
(...)
Caused by: org.apache.axis2.AxisFault: InvalidRequestException(why:You have not logged in)
(...)
Sorry I couldn't follow up with the question earlier, anyway glad your problem is sorted now .! Keep on trying BAM and don't hesitate to holler if you run into any issues.
Thanks,
Shariq.