Flink defaults to Kryo serialization even for POJOs and Avro SpecificRecords - state

I am trying to do a POC of Flink State Schema Evolution. I am using Flink 1.15.0 and Java 11.
I tried to create 3 data classes - one for each serialization type:
io.peleg.kryo.User - Uses java.time.Instant class which I know is not supported for POJO serialization in Flink.
io.peleg.pojo.User - Uses only classic wrapped primitives - Integer, Long, String. The getters, setters and constructors are generated using Lombok.
io.peleg.avro.User - Generated from Avro schema using Avro Maven Plugin.
For each class I wrote a stream job that uses a time window to buffer elements and turn them into a list.
For each class I tried to do the following:
Run a job
Stop with savepoint
Add a field to the data class
Submit using savepoint
For all data classes the submit with savepoint failed with this exception:
java.lang.Exception: Exception while creating StreamOperatorStateContext.
at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:255)
at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:268)
at org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.initializeStateAndOpenOperators(RegularOperatorChain.java:106)
at org.apache.flink.streaming.runtime.tasks.StreamTask.restoreGates(StreamTask.java:700)
at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.call(StreamTaskActionExecutor.java:55)
at org.apache.flink.streaming.runtime.tasks.StreamTask.restoreInternal(StreamTask.java:676)
at org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:643)
at org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:948)
at org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:917)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:741)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:563)
at java.base/java.lang.Thread.run(Unknown Source)
Caused by: org.apache.flink.util.FlinkException: Could not restore keyed state backend for WindowOperator_3983d6bb2f0a45b638461bc99138f22f_(2/2) from any of the 1 provided restore options.
at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:160)
at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:346)
at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:164)
... 11 more
Caused by: org.apache.flink.runtime.state.BackendBuildingException: Failed when trying to restore heap backend
at org.apache.flink.runtime.state.heap.HeapKeyedStateBackendBuilder.restoreState(HeapKeyedStateBackendBuilder.java:172)
at org.apache.flink.runtime.state.heap.HeapKeyedStateBackendBuilder.build(HeapKeyedStateBackendBuilder.java:106)
at org.apache.flink.runtime.state.hashmap.HashMapStateBackend.createKeyedStateBackend(HashMapStateBackend.java:143)
at org.apache.flink.runtime.state.hashmap.HashMapStateBackend.createKeyedStateBackend(HashMapStateBackend.java:74)
at org.apache.flink.runtime.state.StateBackend.createKeyedStateBackend(StateBackend.java:140)
at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$1(StreamTaskStateInitializerImpl.java:329)
at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:168)
at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135)
... 13 more
Caused by: com.esotericsoftware.kryo.KryoException: java.lang.IndexOutOfBoundsException: Index 83 out of bounds for length 3
Serialization trace:
favoriteColor (io.peleg.avro.User)
at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:528)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:761)
at com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:116)
at com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:22)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:761)
at org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializer.deserialize(KryoSerializer.java:402)
at org.apache.flink.runtime.state.heap.HeapSavepointRestoreOperation.readKVStateData(HeapSavepointRestoreOperation.java:219)
at org.apache.flink.runtime.state.heap.HeapSavepointRestoreOperation.readKeyGroupStateData(HeapSavepointRestoreOperation.java:149)
at org.apache.flink.runtime.state.heap.HeapSavepointRestoreOperation.restore(HeapSavepointRestoreOperation.java:125)
at org.apache.flink.runtime.state.heap.HeapSavepointRestoreOperation.restore(HeapSavepointRestoreOperation.java:57)
at org.apache.flink.runtime.state.heap.HeapKeyedStateBackendBuilder.restoreState(HeapKeyedStateBackendBuilder.java:169)
... 20 more
Caused by: java.lang.IndexOutOfBoundsException: Index 83 out of bounds for length 3
at java.base/jdk.internal.util.Preconditions.outOfBounds(Unknown Source)
at java.base/jdk.internal.util.Preconditions.outOfBoundsCheckIndex(Unknown Source)
at java.base/jdk.internal.util.Preconditions.checkIndex(Unknown Source)
at java.base/java.util.Objects.checkIndex(Unknown Source)
at java.base/java.util.ArrayList.get(Unknown Source)
at com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:42)
at com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:805)
at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:728)
at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:113)
... 31 more
I expected this exception would be thrown for io.peleg.kryo.User since Flink does not support state schema evolution for Kryo serialized classes.
But it seems to me like for all classes it ended up using the Kryo serializer instead of the POJO or Avro serializers.
I ran the job on a Flink cluster I spun up using docker compose:
version: "2.2"
services:
jobmanager:
image: flink:latest
ports:
- "8081:8081"
command: jobmanager
environment:
- |
FLINK_PROPERTIES=
jobmanager.rpc.address: jobmanager
taskmanager:
image: flink:latest
depends_on:
- jobmanager
command: taskmanager
scale: 1
environment:
- |
FLINK_PROPERTIES=
jobmanager.rpc.address: jobmanager
taskmanager.numberOfTaskSlots: 2
My entire code is public on GitHub here.
What I would like to achieve is succusfuly running a job from a savepoint of an older version of a POJO with less/more fields.

I did an experiment with this POJO class:
public class Event {
public long id;
public String data;
public Instant timestamp;
public Event() {}
public Event(final long id, final String data, final Instant timestamp ) {
this.id = id;
this.data = data;
this.timestamp = timestamp;
}
...
}
The image below is from the IntelliJ debugger. As you can see, Flink supplied an InstantSerializer, and is using its POJOSerializer for this Event class.
I'm not sure where you went wrong, but you can use
env.getConfig().disableGenericTypes();
to disable the Kyro fallback, which should help with these experiments.

Related

Error instantiating snitch class 'org.apache.cassandra.locator.Ec2Snitch'

I'm having hard time setting up 2 node Cassandra cluster on Ec2 instances. This is 2.2.19 version. I cannot upgrade due to some other dependencies involved.
The Ec2 instances are in private subnet. Assigned static private ips
Here is my cassandra.yaml
cluster_name: 'Test-cluster'
data_file_directories:
- /var/lib/cassandra/data
commitlog_directory: /var/lib/cassandra/commitlog
saved_caches_directory: /var/lib/cassandra/saved_caches
commitlog_sync: periodic
commitlog_sync_period_in_ms: 10000
commitlog_segment_size_in_mb: 32
seed_provider:
# Addresses of hosts that are deemed contact points.
# Cassandra nodes use this list of hosts to find each other and learn
# the topology of the ring. You must change this if you are running
# multiple nodes!
- class_name: org.apache.cassandra.locator.SimpleSeedProvider
parameters:
# seeds is actually a comma-delimited list of addresses.
# Ex: "<ip1>,<ip2>,<ip3>"
- seeds: "${private_ip}"
listen_address: ${private_ip}
start_native_transport: true
native_transport_port: 9042
storage_port: 7000
num_tokens: 32
ssl_storage_port: 9042
start_rpc: true
rpc_address: ${private_ip}
rpc_port: 9160
broadcast_rpc_address: ${private_ip}
endpoint_snitch: Ec2Snitch
partitioner: org.apache.cassandra.dht.RandomPartitioner
Here is my system.log
INFO [main] 2021-06-07 18:42:41,900 DatabaseDescriptor.java:327 - DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap
INFO [main] 2021-06-07 18:42:42,022 DatabaseDescriptor.java:437 - Global memtable on-heap threshold is enabled at 251MB
INFO [main] 2021-06-07 18:42:42,023 DatabaseDescriptor.java:441 - Global memtable off-heap threshold is enabled at 251MB
ERROR [main] 2021-06-07 18:42:42,049 CassandraDaemon.java:787 - Exception encountered during startup
org.apache.cassandra.exceptions.ConfigurationException: Error instantiating snitch class 'org.apache.cassandra.locator.Ec2Snitch'.
at org.apache.cassandra.utils.FBUtilities.construct(FBUtilities.java:551) ~[apache-cassandra-2.2.19.jar:2.2.19]
at org.apache.cassandra.utils.FBUtilities.construct(FBUtilities.java:529) ~[apache-cassandra-2.2.19.jar:2.2.19]
at org.apache.cassandra.config.DatabaseDescriptor.createEndpointSnitch(DatabaseDescriptor.java:741) ~[apache-cassandra-2.2.19.jar:2.2.19]
at org.apache.cassandra.config.DatabaseDescriptor.applyConfig(DatabaseDescriptor.java:465) ~[apache-cassandra-2.2.19.jar:2.2.19]
at org.apache.cassandra.config.DatabaseDescriptor.<clinit>(DatabaseDescriptor.java:133) ~[apache-cassandra-2.2.19.jar:2.2.19]
at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:599) [apache-cassandra-2.2.19.jar:2.2.19]
at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:774) [apache-cassandra-2.2.19.jar:2.2.19]
Caused by: org.apache.cassandra.exceptions.ConfigurationException: Ec2Snitch was unable to execute the API call. Not an ec2 node?
at org.apache.cassandra.locator.Ec2Snitch.awsApiCall(Ec2Snitch.java:79) ~[apache-cassandra-2.2.19.jar:2.2.19]
at org.apache.cassandra.locator.Ec2Snitch.<init>(Ec2Snitch.java:55) ~[apache-cassandra-2.2.19.jar:2.2.19]
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[na:1.8.0_282]
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[na:1.8.0_282]
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[na:1.8.0_282]
at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ~[na:1.8.0_282]
at java.lang.Class.newInstance(Class.java:442) ~[na:1.8.0_282]
at org.apache.cassandra.utils.FBUtilities.construct(FBUtilities.java:536) ~[apache-cassandra-2.2.19.jar:2.2.19]
Note: When I change snitch to SimpleSnitch it actually works.
Please help!!
Answering my own question
Ec2snitch uses IMDVs1 to get metadata http://169.254.169.254/latest/meta-data/placement/availability-zone to determine certain properties.
I created Ec2 instances through terraform where my code has
metadata_options {
http_endpoint = "enabled"
http_tokens = "enabled"
}
The above code forces to use imdsv2 only which is causing the issue. Ec2snitch couldn't get metadata by simple curl command.
Solution:
metadata_options {
http_endpoint = "enabled"
http_tokens = "optional"
}
If you are doing through console, when launching instance, make sure meta data version is set to V1 and V2

Error while processing a persisted job checkDuplicateCSetKey

TID: [-1] [] [2019-11-22 13:18:34,362] WARN {org.apache.ode.scheduler.simple.SimpleScheduler} - Error while processing a persisted job: Job hqejbhcnphreqf4l2mpcoj time: 2019-11-22 13:18:31 WEST transacted: true persisted: true details: JobDetails( instanceId: null mexId: hqejbhcnphreqf4l2mpcoi processId: {http://wso2.org/bps/sample}my-process-7 type: INVOKE_INTERNAL channel: null correlatorId: null correlationKeySet: null retryCount: 4 inMem: false detailsExt: {enqueue=false}) {org.apache.ode.scheduler.simple.SimpleScheduler}
java.lang.NullPointerException
at org.apache.ode.bpel.engine.BpelRuntimeContextImpl.checkDuplicateCSetKey(BpelRuntimeContextImpl.java:621)
at org.apache.ode.bpel.engine.BpelRuntimeContextImpl.checkDuplicateCSets(BpelRuntimeContextImpl.java:578)
at org.apache.ode.bpel.runtime.PICK$WAITING$2.onRequestRcvd(PICK.java:300)
at sun.reflect.GeneratedMethodAccessor1427.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.ode.jacob.vpu.JacobVPU$JacobThreadImpl.run(JacobVPU.java:451)
at org.apache.ode.jacob.vpu.JacobVPU.execute(JacobVPU.java:139)
at org.apache.ode.bpel.engine.BpelRuntimeContextImpl.execute(BpelRuntimeContextImpl.java:1002)
at org.apache.ode.bpel.engine.PartnerLinkMyRoleImpl.invokeNewInstance(PartnerLinkMyRoleImpl.java:208)
at org.apache.ode.bpel.engine.BpelProcess$1.invoke(BpelProcess.java:283)
at org.apache.ode.bpel.engine.BpelProcess.invokeProcess(BpelProcess.java:224)
at org.apache.ode.bpel.engine.BpelProcess.invokeProcess(BpelProcess.java:279)
at org.apache.ode.bpel.engine.BpelProcess.handleJobDetails(BpelProcess.java:434)
at org.apache.ode.bpel.engine.BpelEngineImpl.sendMyRoleFault(BpelEngineImpl.java:835)
at org.apache.ode.bpel.engine.BpelEngineImpl.onScheduledJob(BpelEngineImpl.java:581)
at org.apache.ode.bpel.engine.BpelServerImpl.onScheduledJob(BpelServerImpl.java:467)
at org.apache.ode.scheduler.simple.SimpleScheduler$RunJob$1.call(SimpleScheduler.java:633)
at org.apache.ode.scheduler.simple.SimpleScheduler$RunJob$1.call(SimpleScheduler.java:627)
at org.apache.ode.scheduler.simple.SimpleScheduler.execTransaction(SimpleScheduler.java:298)
at org.apache.ode.scheduler.simple.SimpleScheduler.execTransaction(SimpleScheduler.java:253)
at org.apache.ode.scheduler.simple.SimpleScheduler$RunJob.call(SimpleScheduler.java:627)
at org.apache.ode.scheduler.simple.SimpleScheduler$RunJob.call(SimpleScheduler.java:611)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(T
The problem is due to the data:
A BPEL process had 3 versions (v1, v2, v3)
Version v1 has been removed (from the registry and bpel / -123) but these instances still remained in the database
An old instance of version v1 remained in ACTIVE status with an id correlation = 400200 (for example).
When starting a new instance of version v3 with a correlation id = 400200 the exception is raised.
Indeed apache ODE to each new instance looks for if there is an instance in active status and carrying the same correlation id (checkDuplicatCS ..). In our context, Apache ODE finds an instance of version v1 and goes back NullpointerException because it does not find the process v1 in its registry.
Solution: Clean the old instances in Active status of version v1.

MultiDelimiterSerDe setup in Amazon Hive

I'm trying to use a multidelimiter in a table insert for a hive job in emr on amazon aws. As explained in this link. The delimiter for the file is "|".
https://cwiki.apache.org/confluence/display/Hive/MultiDelimitSerDe
However, I ended up having to use...
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe'
Instead of the documented...
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.MultiDelimitSerDe'
in order for it to not give me this error.
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Cannot validate serde: org.apache.hadoop.hive.serde2.MultiDelimitSerDe
OK. So when I don't get that error, by adding the .contrib, I get this error which is caused by Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassNotFoundException: Class org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe not found
Status: Failed
Vertex failed, vertexName=Map 1, vertexId=vertex_1548264520414_0027_1_00, diagnostics=[Task failed, taskId=task_1548264520414_0027_1_00_000021, diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( failure ) : attempt_1548264520414_0027_1_00_000021_0:java.lang.RuntimeException: java.lang.RuntimeException: Map operator initialization failed
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1840)
at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: Map operator initialization failed
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:354)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:184)
... 14 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassNotFoundException: Class org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe not found
at org.apache.hadoop.hive.ql.exec.MapOperator.getConvertedOI(MapOperator.java:328)
at org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:420)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:286)
... 15 more
So I've been reading that you have to add the .jar file.
https://community.hortonworks.com/questions/82189/hive-cannot-see-jar.html
And so I've tried all kinds of things to get this to work. It says that it is adding it it to the class path.
hive> add jar /usr/lib/hive/lib/hive-contrib-2.3.3-amzn-1.jar
> ;
Added [/usr/lib/hive/lib/hive-contrib-2.3.3-amzn-1.jar] to class path
Added resources: [/usr/lib/hive/lib/hive-contrib-2.3.3-amzn-1.jar]
hive> add jar /usr/lib/hive/lib/hive-contrib.jar
> ;
Added [/usr/lib/hive/lib/hive-contrib.jar] to class path
Added resources: [/usr/lib/hive/lib/hive-contrib.jar]
hive> exit;
So I'm not sure what to do. It's acting as if the .jar file for hive-contrib isn't in the class path despite me adding it. I've also tried running...
export HADOOP_USER_CLASSPATH_FIRST=true
which is found here...
How to include jars in Hive (Amazon Hadoop env)
And that doesn't fix it either.
How can I use a multidelimiter SerDe property for a hive job on aws?
Thank you.
I could not get MultiDelimitSerDe to work. Instead, I was lucky in that the delimiter had quotations on either side of the pipe. So it looks like "|". This turns the values between the quotes into strings, so the additional pipes in those column values don't act as delimiters.
"Test | Test2 "|" Test3 | Test 4 | Test 5 "|" Test 6 "
You can see an explanation in the link below. The part that talks about it is in the comments, not the article.
https://www.ericlin.me/2015/07/how-to-create-a-hive-multi-character-delimitered-table/
If I didn't have those quotation marks around the delimiter, I'm not sure how I would have been able to work with a multi delimiter. Especially if I had quotations in any of my fields, but after checking, out of the billions of rows, there is not a single quote.

connection was closed in ignite map reduce tasks

Environment:
ignite server:
centos6.5 with kernel 2.6.32-431.el6.x86_64
ignite version 1.9
hadoop version 2.6.2
3 server nodes with each having '-Xms16g -Xmx16g -server -XX:+AggressiveOpts -XX:MaxMetaspaceSize=256m' set when started
I run a map reduce test job with ignite map reduce. The job is simply getting the average number for each people. The data is like:
Jack 0.35
Tom 0.78
Lily 0.92
Jack 0.28
Tom 0.18
...
At first, I generated a data set of 100M lines. It's about 2.53GB. The job finished correctly in about 30s. Then I generated a data set of 1 Billion lines, about 25.3GB. The job always failed with exceptions. I tried several times but the same result.
The ignite server node threw exception below:
[15:06:56,804][ERROR][sys-#2740%null%][GridTcpRestProtocol] Failed to process client request [ses=GridSelectorNioSessionImpl [worker=ByteBufferNioClientWorker [readBuf=java.nio.HeapByteBuffer[pos=549 lim=549 cap=8192], super=AbstractNioClientWorker [selector=sun.nio.ch.EPollSelectorImpl#1cba0431, idx=3, bytesRcvd=0, bytesSent=0, bytesRcvd0=0, bytesSent0=0, select=true, super=GridWorker [name=grid-nio-worker-tcp-rest-3, gridName=null, finished=false, isCancelled=false, hashCode=906881587, interrupted=false, runner=grid-nio-worker-tcp-rest-3-#50%null%]]], writeBuf=null, readBuf=null, inRecovery=null, outRecovery=null, super=GridNioSessionImpl [locAddr=/172.31.68.204:11211, rmtAddr=/172.31.68.202:39473, createTime=1493967985751, closeTime=1493968009502, bytesSent=2715, bytesRcvd=2641, bytesSent0=0, bytesRcvd0=0, sndSchedTime=1493968016794, lastSndTime=1493967998303, lastRcvTime=1493968009502, readsPaused=false, filterChain=FilterChain[filters=[GridNioCodecFilter [parser=GridTcpRestParser [jdkMarshaller=JdkMarshaller [], routerClient=false], directMode=false]], accepted=true]], msg=GridClientTaskRequest [taskName=o.a.i.i.processors.hadoop.proto.HadoopProtocolJobStatusTask, arg=HadoopProtocolTaskArguments []]]
class org.apache.ignite.IgniteCheckedException: Failed to send message (connection was closed): GridSelectorNioSessionImpl [worker=ByteBufferNioClientWorker [readBuf=java.nio.HeapByteBuffer[pos=549 lim=549 cap=8192], super=AbstractNioClientWorker [selector=sun.nio.ch.EPollSelectorImpl#1cba0431, idx=3, bytesRcvd=0, bytesSent=0, bytesRcvd0=0, bytesSent0=0, select=true, super=GridWorker [name=grid-nio-worker-tcp-rest-3, gridName=null, finished=false, isCancelled=false, hashCode=906881587, interrupted=false, runner=grid-nio-worker-tcp-rest-3-#50%null%]]], writeBuf=null, readBuf=null, inRecovery=null, outRecovery=null, super=GridNioSessionImpl [locAddr=/172.31.68.204:11211, rmtAddr=/172.31.68.202:39473, createTime=1493967985751, closeTime=1493968009502, bytesSent=2715, bytesRcvd=2641, bytesSent0=0, bytesRcvd0=0, sndSchedTime=1493968016794, lastSndTime=1493967998303, lastRcvTime=1493968009502, readsPaused=false, filterChain=FilterChain[filters=[GridNioCodecFilter [parser=GridTcpRestParser [jdkMarshaller=JdkMarshaller [], routerClient=false], directMode=false]], accepted=true]]
at org.apache.ignite.internal.util.IgniteUtils.cast(IgniteUtils.java:7239)
at org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:170)
at org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:119)
at org.apache.ignite.internal.processors.rest.protocols.tcp.GridTcpRestNioListener$1$1.apply(GridTcpRestNioListener.java:264)
at org.apache.ignite.internal.processors.rest.protocols.tcp.GridTcpRestNioListener$1$1.apply(GridTcpRestNioListener.java:261)
at org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:271)
at org.apache.ignite.internal.util.future.GridFutureAdapter.listen(GridFutureAdapter.java:228)
at org.apache.ignite.internal.processors.rest.protocols.tcp.GridTcpRestNioListener$1.apply(GridTcpRestNioListener.java:261)
at org.apache.ignite.internal.processors.rest.protocols.tcp.GridTcpRestNioListener$1.apply(GridTcpRestNioListener.java:229)
at org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:271)
at org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListeners(GridFutureAdapter.java:259)
at org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:389)
at org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:355)
at org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:332)
at org.apache.ignite.internal.processors.rest.GridRestProcessor$2$1.apply(GridRestProcessor.java:158)
at org.apache.ignite.internal.processors.rest.GridRestProcessor$2$1.apply(GridRestProcessor.java:155)
at org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:271)
at org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListeners(GridFutureAdapter.java:259)
at org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:389)
at org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:355)
at org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:332)
at org.apache.ignite.internal.util.future.GridFutureChainListener.applyCallback(GridFutureChainListener.java:78)
at org.apache.ignite.internal.util.future.GridFutureChainListener.apply(GridFutureChainListener.java:70)
at org.apache.ignite.internal.util.future.GridFutureChainListener.apply(GridFutureChainListener.java:30)
at org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:271)
at org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListeners(GridFutureAdapter.java:259)
at org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:389)
at org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:355)
at org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:332)
at org.apache.ignite.internal.processors.rest.handlers.task.GridTaskCommandHandler$2.apply(GridTaskCommandHandler.java:294)
at org.apache.ignite.internal.processors.rest.handlers.task.GridTaskCommandHandler$2.apply(GridTaskCommandHandler.java:257)
at org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:271)
at org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListeners(GridFutureAdapter.java:259)
at org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:389)
at org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:355)
at org.apache.ignite.internal.processors.task.GridTaskWorker.finishTask(GridTaskWorker.java:1579)
at org.apache.ignite.internal.processors.task.GridTaskWorker.finishTask(GridTaskWorker.java:1547)
at org.apache.ignite.internal.processors.task.GridTaskWorker.reduce(GridTaskWorker.java:1157)
at org.apache.ignite.internal.processors.task.GridTaskWorker.onResponse(GridTaskWorker.java:942)
at org.apache.ignite.internal.processors.task.GridTaskProcessor.processJobExecuteResponse(GridTaskProcessor.java:996)
at org.apache.ignite.internal.processors.task.GridTaskProcessor$JobMessageListener.onMessage(GridTaskProcessor.java:1221)
at org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1222)
at org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:850)
at org.apache.ignite.internal.managers.communication.GridIoManager.access$2100(GridIoManager.java:108)
at org.apache.ignite.internal.managers.communication.GridIoManager$7.run(GridIoManager.java:790)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Failed to send message (connection was closed): GridSelectorNioSessionImpl [worker=ByteBufferNioClientWorker [readBuf=java.nio.HeapByteBuffer[pos=549 lim=549 cap=8192], super=AbstractNioClientWorker [selector=sun.nio.ch.EPollSelectorImpl#1cba0431, idx=3, bytesRcvd=0, bytesSent=0, bytesRcvd0=0, bytesSent0=0, select=true, super=GridWorker [name=grid-nio-worker-tcp-rest-3, gridName=null, finished=false, isCancelled=false, hashCode=906881587, interrupted=false, runner=grid-nio-worker-tcp-rest-3-#50%null%]]], writeBuf=null, readBuf=null, inRecovery=null, outRecovery=null, super=GridNioSessionImpl [locAddr=/172.31.68.204:11211, rmtAddr=/172.31.68.202:39473, createTime=1493967985751, closeTime=1493968009502, bytesSent=2715, bytesRcvd=2641, bytesSent0=0, bytesRcvd0=0, sndSchedTime=1493968016794, lastSndTime=1493967998303, lastRcvTime=1493968009502, readsPaused=false, filterChain=FilterChain[filters=[GridNioCodecFilter [parser=GridTcpRestParser [jdkMarshaller=JdkMarshaller [], routerClient=false], directMode=false]], accepted=true]]
at org.apache.ignite.internal.util.nio.GridNioServer.send0(GridNioServer.java:554)
at org.apache.ignite.internal.util.nio.GridNioServer.send(GridNioServer.java:494)
at org.apache.ignite.internal.util.nio.GridNioServer$HeadFilter.onSessionWrite(GridNioServer.java:3036)
at org.apache.ignite.internal.util.nio.GridNioFilterAdapter.proceedSessionWrite(GridNioFilterAdapter.java:118)
at org.apache.ignite.internal.util.nio.GridNioCodecFilter.onSessionWrite(GridNioCodecFilter.java:94)
at org.apache.ignite.internal.util.nio.GridNioFilterAdapter.proceedSessionWrite(GridNioFilterAdapter.java:118)
at org.apache.ignite.internal.util.nio.GridNioFilterChain$TailFilter.onSessionWrite(GridNioFilterChain.java:264)
at org.apache.ignite.internal.util.nio.GridNioFilterChain.onSessionWrite(GridNioFilterChain.java:189)
at org.apache.ignite.internal.util.nio.GridNioSessionImpl.send(GridNioSessionImpl.java:108)
at org.apache.ignite.internal.processors.rest.protocols.tcp.GridTcpRestNioListener$1.apply(GridTcpRestNioListener.java:258)
... 40 more
The job client threw exception below:
java.io.IOException: Failed to get job status: job_1fbf9083-9a44-4be9-9199-695a97652dc2_0002
at org.apache.ignite.internal.processors.hadoop.impl.proto.HadoopClientProtocol.getJobStatus(HadoopClientProtocol.java:197)
at org.apache.hadoop.mapreduce.Job$1.run(Job.java:326)
at org.apache.hadoop.mapreduce.Job$1.run(Job.java:323)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
at org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:323)
at org.apache.hadoop.mapreduce.Job.isComplete(Job.java:611)
at org.apache.hadoop.mapreduce.Job.monitorAndPrintJob(Job.java:1357)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1318)
at com.tscloud.sdk.test.ignite.MRTest.run(MRTest.java:81)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at com.tscloud.sdk.test.ignite.MRTest.main(MRTest.java:53)
Caused by: class org.apache.ignite.internal.client.impl.connection.GridClientConnectionResetException: Failed to perform request (connection failed): /172.31.68.204:11211
at org.apache.ignite.internal.client.impl.connection.GridClientConnection.getCloseReasonAsException(GridClientConnection.java:491)
at org.apache.ignite.internal.client.impl.connection.GridClientNioTcpConnection.close(GridClientNioTcpConnection.java:339)
at org.apache.ignite.internal.client.impl.connection.GridClientNioTcpConnection.close(GridClientNioTcpConnection.java:299)
at org.apache.ignite.internal.client.impl.connection.GridClientConnectionManagerAdapter$NioListener.onDisconnected(GridClientConnectionManagerAdapter.java:630)
at org.apache.ignite.internal.util.nio.GridNioFilterChain$TailFilter.onSessionClosed(GridNioFilterChain.java:253)
at org.apache.ignite.internal.util.nio.GridNioFilterAdapter.proceedSessionClosed(GridNioFilterAdapter.java:93)
at org.apache.ignite.internal.util.nio.GridNioCodecFilter.onSessionClosed(GridNioCodecFilter.java:70)
at org.apache.ignite.internal.util.nio.GridNioFilterAdapter.proceedSessionClosed(GridNioFilterAdapter.java:93)
at org.apache.ignite.internal.util.nio.GridNioServer$HeadFilter.onSessionClosed(GridNioServer.java:3005)
at org.apache.ignite.internal.util.nio.GridNioFilterChain.onSessionClosed(GridNioFilterChain.java:147)
at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.close(GridNioServer.java:2306)
at org.apache.ignite.internal.util.nio.GridNioServer$ByteBufferNioClientWorker.processRead(GridNioServer.java:929)
at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.processSelectedKeysOptimized(GridNioServer.java:2026)
at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.bodyInternal(GridNioServer.java:1863)
at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.body(GridNioServer.java:1568)
at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
at java.lang.Thread.run(Thread.java:745)
The job configuration is below:
Configuration configuration = new Configuration();
configuration.set(MRConfig.FRAMEWORK_NAME, IgniteHadoopClientProtocolProvider.FRAMEWORK_NAME);
configuration.set(MRConfig.MASTER_ADDRESS, "172.31.68.202:11211");
configuration.set("fs.igfs.impl", "org.apache.ignite.hadoop.fs.v1.IgniteHadoopFileSystem");
configuration.set("fs.default.name", "igfs://igfs#172.31.68.202/");
I checked nodes status after the job failed using ignitevisorcmd.sh. All server nodes were OK, but there were sometime one of the node server was down. I did not know why it behaved like this.
Any help is appreciated.
Edit(2017-05-16):
I changed the hadoop core-site.xml and add hadoop.tmp.dir property as below
<property>
<name>hadoop.tmp.dir</name>
<value>/data/hadoop/hadoop-2.6.2/tmp</value>
</property>
Then I reformatted hdfs and uploaded the 25.3GB data file. I run the test successfully. It turns out my hdfs has something wrong. Reformatting namenode solves the problem.
Before above steps, I tried checking the jvm heap usage by VisualVM.
One of the server node visualvm monitor snapshot
If your node was down temporarily, then it is reasonable that a job would fail-over to another node or fail altogether if there are no other nodes. I would check that your network was not reset or down. Also, you should check for the presence of any software firewalls between your nodes (it is best to disable operating system firewalls).

Start token not found error while using JsonSerDe

I am trying to import a JSON data from S3, and after making some queries, export the output as JSON format to S3 again. However, I get the "org.apache.hadoop.hive.serde2.SerDeException: java.io.IOException: Start token not found where expected" error at hive step on EMR cluster. In order to understand what the problem is, I simplify the Hive script and JSON data, but it keeps giving the same error. How can I solve this problem?
Cluster configuration:
Release: emr-5.3.1
Hive version: 2.1.1
Hadoop distribution: Amazon 2.7.3
Service Role: EMR_DefaultRole
MasterInstanceType: m4.large
The content of the simplifed JSON data:
[{"MyID":"FOO123","MyField":"FOO"},{"MyID":"BAR123","MyField":"BAR"}]
Hive script:
DROP TABLE IF EXISTS SOURCE;
DROP TABLE IF EXISTS DESTINATION;
CREATE EXTERNAL TABLE SOURCE(MyID STRING, MyField STRING)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
LOCATION 's3://myPath/subPath/';
CREATE EXTERNAL TABLE DESTINATION(MyID STRING, MyField STRING)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
LOCATION 's3://anotherPath/subPath/';
INSERT OVERWRITE TABLE DESTINATION SELECT MyID, MyField FROM SOURCE;
And here is the stack trace:
Vertex failed, vertexName=Map 4, vertexId=vertex_1278452616863_0001_1_00, diagnostics=[Task failed, taskId=task_1278452616863, diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( failure ) : attempt_1278452616863:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable [{"MyID":"FOO123","MyField":"FOO"},{"MyID":"BAR123","MyField":"BAR"}]
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable [{"MyID":"FOO123","MyField":"FOO"},{"MyID":"BAR123","MyField":"BAR"}]
at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:95)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:70)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:383)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:185)
... 14 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable [{"MyID":"FOO123","MyField":"FOO"},{"MyID":"BAR123","MyField":"BAR"}]
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:497)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:86)
... 17 more
Caused by: org.apache.hadoop.hive.serde2.SerDeException: java.io.IOException: Start token not found where expected
at org.apache.hive.hcatalog.data.JsonSerDe.deserialize(JsonSerDe.java:183)
at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.readRow(MapOperator.java:128)
at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.access$200(MapOperator.java:92)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:488)
... 18 more
Caused by: java.io.IOException: Start token not found where expected
at org.apache.hive.hcatalog.data.JsonSerDe.deserialize(JsonSerDe.java:169)
... 21 more
Thanks.
JSON should start with { and not with array ([)
I tried with this approach updated my JSON file with structure as
{"MyID":"FOO123","MyField":"FOO"},
{"MyID":"BAR123","MyField":"BAR"}
but after done, I noticed only the first object is being inserted into the table.