Start token not found error while using JsonSerDe - amazon-web-services

I am trying to import a JSON data from S3, and after making some queries, export the output as JSON format to S3 again. However, I get the "org.apache.hadoop.hive.serde2.SerDeException: java.io.IOException: Start token not found where expected" error at hive step on EMR cluster. In order to understand what the problem is, I simplify the Hive script and JSON data, but it keeps giving the same error. How can I solve this problem?
Cluster configuration:
Release: emr-5.3.1
Hive version: 2.1.1
Hadoop distribution: Amazon 2.7.3
Service Role: EMR_DefaultRole
MasterInstanceType: m4.large
The content of the simplifed JSON data:
[{"MyID":"FOO123","MyField":"FOO"},{"MyID":"BAR123","MyField":"BAR"}]
Hive script:
DROP TABLE IF EXISTS SOURCE;
DROP TABLE IF EXISTS DESTINATION;
CREATE EXTERNAL TABLE SOURCE(MyID STRING, MyField STRING)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
LOCATION 's3://myPath/subPath/';
CREATE EXTERNAL TABLE DESTINATION(MyID STRING, MyField STRING)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
LOCATION 's3://anotherPath/subPath/';
INSERT OVERWRITE TABLE DESTINATION SELECT MyID, MyField FROM SOURCE;
And here is the stack trace:
Vertex failed, vertexName=Map 4, vertexId=vertex_1278452616863_0001_1_00, diagnostics=[Task failed, taskId=task_1278452616863, diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( failure ) : attempt_1278452616863:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable [{"MyID":"FOO123","MyField":"FOO"},{"MyID":"BAR123","MyField":"BAR"}]
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable [{"MyID":"FOO123","MyField":"FOO"},{"MyID":"BAR123","MyField":"BAR"}]
at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:95)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:70)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:383)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:185)
... 14 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable [{"MyID":"FOO123","MyField":"FOO"},{"MyID":"BAR123","MyField":"BAR"}]
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:497)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:86)
... 17 more
Caused by: org.apache.hadoop.hive.serde2.SerDeException: java.io.IOException: Start token not found where expected
at org.apache.hive.hcatalog.data.JsonSerDe.deserialize(JsonSerDe.java:183)
at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.readRow(MapOperator.java:128)
at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.access$200(MapOperator.java:92)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:488)
... 18 more
Caused by: java.io.IOException: Start token not found where expected
at org.apache.hive.hcatalog.data.JsonSerDe.deserialize(JsonSerDe.java:169)
... 21 more
Thanks.

JSON should start with { and not with array ([)

I tried with this approach updated my JSON file with structure as
{"MyID":"FOO123","MyField":"FOO"},
{"MyID":"BAR123","MyField":"BAR"}
but after done, I noticed only the first object is being inserted into the table.

Related

Migrating data from Teradata to BigQuery

I'm running Teradata express on VMWare player. In order to migrate the data to BigQuery I'm trying to follow the official documentation capturing BigQuery transfer job.
https://cloud.google.com/bigquery-transfer/docs/teradata-migration
I have downloaded the required jars and trying to start the migration process alfer successful initialization. Providing the config file contents generated after successful initialization.
TDExpress1620_Sles11:~/Desktop # cat testpath/bq.config
{
"agent-id": "84f7cf04-2133-4c5a-ae43-dd22a30281ba",
"transfer-configuration": {
"project-id": "106730138445",
"location": "us",
"id": "5e1c7eda-0000-249c-aab6-94eb2c06245a"
},
"source-type": "teradata",
"console-log": false,
"silent": false,
"teradata-config": {
"connection": {
"host": "localhost"
},
"local-processing-space": "testpath",
"database-credentials-file-path": "",
"max-local-storage": "200GB",
"gcs-upload-chunk-size": "32MB",
"use-tpt": false,
"max-sessions": 0,
"spool-mode": "NoSpool",
"max-parallel-upload": 1,
"max-parallel-extract-threads": 1,
"session-charset": "UTF8",
"max-unload-file-size": "2GB"
}
After running the command to start the migration process, I come across below error.
TDExpress1620_Sles11:~/Desktop # java -cp terajdbc4.jar:mirroring-agent.jar com.google.cloud.bigquery.dms.Agent --configuration-file=testpath/bq.config
Reading data from gs://data_transfer_agent/latest/version
WARNING: Failed to get the latest released agent version with error: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
Exception in thread "main" com.google.api.gax.rpc.UnavailableException: io.grpc.StatusRuntimeException: UNAVAILABLE: io exception
at com.google.api.gax.rpc.ApiExceptionFactory.createException(ApiExceptionFactory.java:69)
at com.google.api.gax.grpc.GrpcApiExceptionFactory.create(GrpcApiExceptionFactory.java:72)
at com.google.api.gax.grpc.GrpcApiExceptionFactory.create(GrpcApiExceptionFactory.java:60)
at com.google.api.gax.grpc.GrpcExceptionCallable$ExceptionTransformingFuture.onFailure(GrpcExceptionCallable.java:97)
at com.google.api.core.ApiFutures$1.onFailure(ApiFutures.java:68)
at com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1349)
at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:398)
at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1024)
at com.google.common.util.concurrent.AbstractFuture.addListener(AbstractFuture.java:670)
at com.google.common.util.concurrent.ForwardingListenableFuture.addListener(ForwardingListenableFuture.java:45)
at com.google.api.core.ApiFutureToListenableFuture.addListener(ApiFutureToListenableFuture.java:52)
at com.google.common.util.concurrent.Futures.addCallback(Futures.java:1330)
at com.google.api.core.ApiFutures.addCallback(ApiFutures.java:63)
at com.google.api.gax.grpc.GrpcExceptionCallable.futureCall(GrpcExceptionCallable.java:67)
at com.google.api.gax.rpc.AttemptCallable.call(AttemptCallable.java:86)
at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57)
at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Suppressed: com.google.api.gax.rpc.AsyncTaskException: Asynchronous task failed
at com.google.api.gax.rpc.ApiExceptions.callAndTranslateApiException(ApiExceptions.java:57)
at com.google.api.gax.rpc.UnaryCallable.call(UnaryCallable.java:112)
at com.google.cloud.bigquery.datatransfer.v1.DataTransferServiceClient.getTransferConfig(DataTransferServiceClient.java:741)
at com.google.cloud.bigquery.datatransfer.v1.DataTransferServiceClient.getTransferConfig(DataTransferServiceClient.java:718)
at com.google.cloud.bigquery.dms.gcloud.TransferServiceClient.getTransferConfig(TransferServiceClient.java:62)
at com.google.cloud.bigquery.dms.gcloud.TransferServiceClient.getDestinationBucketName(TransferServiceClient.java:94)
at com.google.cloud.bigquery.dms.Agent.main(Agent.java:235)
Caused by: io.grpc.StatusRuntimeException: UNAVAILABLE: io exception
at io.grpc.Status.asRuntimeException(Status.java:533)
at io.grpc.stub.ClientCalls$UnaryStreamToFuture.onClose(ClientCalls.java:490)
at io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
at io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
at io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40)
at io.grpc.internal.CensusStatsModule$StatsClientInterceptor$1$1.onClose(CensusStatsModule.java:700)
at io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
at io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
at io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40)
at io.grpc.internal.CensusTracingModule$TracingClientInterceptor$1$1.onClose(CensusTracingModule.java:399)
at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:500)
at io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:65)
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.close(ClientCallImpl.java:592)
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.access$700(ClientCallImpl.java:508)
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:632)
at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
... 7 more
Caused by: io.grpc.netty.shaded.io.netty.channel.AbstractChannel$AnnotatedNoRouteToHostException: null: bigquerydatatransfer.googleapis.com/2404:6800:4009:810:0:0:0:200a:443
at io.grpc.netty.shaded.io.netty.channel.unix.Errors.throwConnectException(Errors.java:104)
at io.grpc.netty.shaded.io.netty.channel.unix.Socket.connect(Socket.java:257)
at io.grpc.netty.shaded.io.netty.channel.epoll.AbstractEpollChannel.doConnect0(AbstractEpollChannel.java:730)
at io.grpc.netty.shaded.io.netty.channel.epoll.AbstractEpollChannel.doConnect(AbstractEpollChannel.java:715)
at io.grpc.netty.shaded.io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.connect(AbstractEpollChannel.java:557)
at io.grpc.netty.shaded.io.netty.channel.DefaultChannelPipeline$HeadContext.connect(DefaultChannelPipeline.java:1340)
at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:532)
at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:517)
at io.grpc.netty.shaded.io.netty.channel.ChannelDuplexHandler.connect(ChannelDuplexHandler.java:50)
at io.grpc.netty.shaded.io.grpc.netty.WriteBufferingAndExceptionHandler.connect(WriteBufferingAndExceptionHandler.java:136)
at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:532)
at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.access$1000(AbstractChannelHandlerContext.java:38)
at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext$9.run(AbstractChannelHandlerContext.java:522)
at io.grpc.netty.shaded.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
at io.grpc.netty.shaded.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404)
at io.grpc.netty.shaded.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:333)
at io.grpc.netty.shaded.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:905)
at io.grpc.netty.shaded.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
... 1 more
Caused by: java.net.NoRouteToHostException
... 19 more
Please let me know if somebody has successfully implemented the same or able to figure out the exact issue being faced by me.

MultiDelimiterSerDe setup in Amazon Hive

I'm trying to use a multidelimiter in a table insert for a hive job in emr on amazon aws. As explained in this link. The delimiter for the file is "|".
https://cwiki.apache.org/confluence/display/Hive/MultiDelimitSerDe
However, I ended up having to use...
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe'
Instead of the documented...
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.MultiDelimitSerDe'
in order for it to not give me this error.
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Cannot validate serde: org.apache.hadoop.hive.serde2.MultiDelimitSerDe
OK. So when I don't get that error, by adding the .contrib, I get this error which is caused by Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassNotFoundException: Class org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe not found
Status: Failed
Vertex failed, vertexName=Map 1, vertexId=vertex_1548264520414_0027_1_00, diagnostics=[Task failed, taskId=task_1548264520414_0027_1_00_000021, diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( failure ) : attempt_1548264520414_0027_1_00_000021_0:java.lang.RuntimeException: java.lang.RuntimeException: Map operator initialization failed
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1840)
at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: Map operator initialization failed
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:354)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:184)
... 14 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassNotFoundException: Class org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe not found
at org.apache.hadoop.hive.ql.exec.MapOperator.getConvertedOI(MapOperator.java:328)
at org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:420)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:286)
... 15 more
So I've been reading that you have to add the .jar file.
https://community.hortonworks.com/questions/82189/hive-cannot-see-jar.html
And so I've tried all kinds of things to get this to work. It says that it is adding it it to the class path.
hive> add jar /usr/lib/hive/lib/hive-contrib-2.3.3-amzn-1.jar
> ;
Added [/usr/lib/hive/lib/hive-contrib-2.3.3-amzn-1.jar] to class path
Added resources: [/usr/lib/hive/lib/hive-contrib-2.3.3-amzn-1.jar]
hive> add jar /usr/lib/hive/lib/hive-contrib.jar
> ;
Added [/usr/lib/hive/lib/hive-contrib.jar] to class path
Added resources: [/usr/lib/hive/lib/hive-contrib.jar]
hive> exit;
So I'm not sure what to do. It's acting as if the .jar file for hive-contrib isn't in the class path despite me adding it. I've also tried running...
export HADOOP_USER_CLASSPATH_FIRST=true
which is found here...
How to include jars in Hive (Amazon Hadoop env)
And that doesn't fix it either.
How can I use a multidelimiter SerDe property for a hive job on aws?
Thank you.
I could not get MultiDelimitSerDe to work. Instead, I was lucky in that the delimiter had quotations on either side of the pipe. So it looks like "|". This turns the values between the quotes into strings, so the additional pipes in those column values don't act as delimiters.
"Test | Test2 "|" Test3 | Test 4 | Test 5 "|" Test 6 "
You can see an explanation in the link below. The part that talks about it is in the comments, not the article.
https://www.ericlin.me/2015/07/how-to-create-a-hive-multi-character-delimitered-table/
If I didn't have those quotation marks around the delimiter, I'm not sure how I would have been able to work with a multi delimiter. Especially if I had quotations in any of my fields, but after checking, out of the billions of rows, there is not a single quote.

ORA error in wso2 apim-analytics server

1) I'm getting below error in wso2carbon logs when I try to configure wso2 apim-analytics(2.1) server with Oracle DB(12c version). I have tried with ojdbc6.jar and ojdbc7.jar in lib folder but still error is there.
error:
Caused by: java.lang.RuntimeException: ORA-28040: No matching authentication
protocol
2) Is there any REST api available for wso2 apim-analytics similar to DAS server to extract data?
full error:
ERROR
{org.wso2.carbon.analytics.spark.core.AnalyticsTask} - Error while executing
the scheduled task for the script: APIM_LAST_ACCESS_TIME_SCRIPT
{org.wso2.carbon.analytics.spark.core.AnalyticsTask}
org.wso2.carbon.analytics.spark.core.exception.AnalyticsExecutionException:
Exception in executing query create temporary table APILastAccessSummaryData
using CarbonJDBC options (dataSource "WSO2AM_STATS_DB", tableName
"API_LAST_ACCESS_TIME_SUMMARY", schema "tenantDomain STRING ,
apiPublisher STRING , api STRING , version STRING , userId STRING ,
context STRING , max_request_time LONG ", primaryKeys
"tenantDomain,apiPublisher,api" )
at
org.wso2.carbon.analytics.spark.core.internal.SparkAnalyticsExecutor.executeQueryLocal(SparkAnalyticsExecutor.java:764)
at
org.wso2.carbon.analytics.spark.core.internal.SparkAnalyticsExecutor.executeQuery(SparkAnalyticsExecutor.java:721)
at
org.wso2.carbon.analytics.spark.core.CarbonAnalyticsProcessorService.executeQuery(CarbonAnalyticsProcessorService.java:201)
at
org.wso2.carbon.analytics.spark.core.CarbonAnalyticsProcessorService.executeScript(CarbonAnalyticsProcessorService.java:151)
at
org.wso2.carbon.analytics.spark.core.AnalyticsTask.execute(AnalyticsTask.java:60)
at org.wso2.carbon.ntask.core.impl.TaskQuartzJobAdapter.execute(TaskQuartzJobAdapter.java:67)
at org.quartz.core.JobRunShell.run(JobRunShell.java:213)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException:
ORA-28040: No matching authentication protocol
thanks,
Santosh
This was an issue identified in Oracle, and the workaround is : to set SQLNET.ALLOWED_LOGON_VERSION=8 in the $crs_home/network/admin/sqlnet.ora file. [1]
[1] https://community.softwaregrp.com/t5/UCMDB-and-UD-Practitioners-Forum/ORA-28040-No-matching-authentication-protocol/m-p/253403

hive table query Error in configuring object

I'm loading a log file in S3 into Hive running on EMR but I'm getting all NULLs back when viewing the data...
I created the table like this:
create external table coglogs (
HostID string,
ProcessID string,
Time string,
TimeZoneOffset string,
SessionID string,
RequestID string,
SubRequestID string,
StepID string,
Thread string,
Component string,
BuildNumber string,
Level string,
Logger string,
Operation string,
ObjectType string,
ObjectPath string,
Status string,
Message string,
LogData string
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
"input.regex" = "([\d+]\S+[\d+])\t(\d+)\t([\d+]\S+[\d+] [\d+]\S+[\d+])\t(-[\d+])\t([a-zA-Z0-9_\S]*)\t([a-zA-Z0-9_\S]*)\t([a-zA-Z0-9_\S]*)\t([a-zA-Z0-9_\S]*)\t([a-zA-Z0-9_\S]*)\t([a-zA-Z_\S]*)\t([0-9]*)\t([0-9]*)\t([a-zA-Z_\S]*)\t([a-zA-Z_\S]*)\t([a-zA-Z_\S ]*)\t([a-zA-Z_\S ]*)\t([a-zA-Z_\S ]*)\t([a-zA-Z_\S ]*)\t([a-zA-Z_\S ]*)",
"output.format.string" = "%1$s %2$s %3$s %4$s %5$s %6$s %7$s %8$s %9$s %10$s %11$s %12$s %13$s %14$s %15$s %16$s %17$s %18$s %19$s"
)
location 's3n://infinilog/cogserver_hive.log';
and then load the data:
load data inpath 's3n://infinilog/cogserver.log' into table coglogs;
I checked the regex rubular.com and it seems to be correct but why am i not getting any data back in the hive table?
here is an example line in the log file:
101.196.242.160:9300 11329 2016-01-27 18:35:14.132 -5 http-9300-297 caf 2047 2 Audit.dispatcher.caf Request Warning secure error - found userCapabilities
101.196.242.160:9300 11329 2016-01-27 18:35:14.195 -5 F3820773ADD59754DEAFAFDFA0D2C3F2CDDF715483446C99B16BFA8040D0DDA4 9ss8Csyswh28w8sGC4hhvvMy2hw9d48Cd42y92y8 http-9300-299 CM 6102 3 Audit.Other.cms.CM QUERY REPORT /Public Folders/1A Reporting/PT/Reports/Summary Dashboard Success
EDIT:
so there is data in the table (I see a lot of rows with all column values as "None" though) but when I run something like this:
select count(*)
from coglogs
where status = 'Failure'
or any query, I get this back:
Diagnostic Messages for this Task:
Error: java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:112)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:78)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:344)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:172)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:166)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
... 9 more
Caused by: java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:112)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:78)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)
at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:42)
... 14 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
... 17 more
Caused by: java.lang.RuntimeException: Map operator initialization failed
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:153)
... 22 more
Caused by: java.lang.NoSuchFieldError: stringTypeInfo
at org.apache.hadoop.hive.contrib.serde2.RegexSerDe.initialize(RegexSerDe.java:115)
at org.apache.hadoop.hive.serde2.SerDeUtils.initializeSerDe(SerDeUtils.java:521)
at org.apache.hadoop.hive.ql.plan.PartitionDesc.getDeserializer(PartitionDesc.java:138)
at org.apache.hadoop.hive.ql.exec.MapOperator.getConvertedOI(MapOperator.java:297)
at org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:333)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:122)
... 22 more
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec
In hive you have to use "\" instead of "\" in regex

Runtime exception while executing sqoop job

I am trying to execute sqoop job in biginsights. I am importing data from oracle db to hdfs. Below is the sqoop command which starts executing mapper and stops after sometime.
sudo sqoop import --connect "jdbc:oracle:thin:#(DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = ***.**.**.***)(PORT = 1908)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = GPRSDB1) ) )" --username **** --password ***** --query "select distinct pr_con_id from s_org_ext where \$CONDITIONS" --where "PAR_DIVN_ID ='1-1POG9V' AND rownum< 100" --target-dir /user/bigsql/crm/s_contact_mh --fields-terminated-by ',' --m 1
Below is the error :
Error: java.lang.RuntimeException: java.lang.RuntimeException: java.sql.SQLException: The Network Adapter could not establish the connection
at org.apache.sqoop.mapreduce.db.DBInputFormat.setConf(DBInputFormat.java:167)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:746)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.RuntimeException: java.sql.SQLException: The Network Adapter could not establish the connection
at org.apache.sqoop.mapreduce.db.DBInputFormat.getConnection(DBInputFormat.java:220)
at org.apache.sqoop.mapreduce.db.DBInputFormat.setConf(DBInputFormat.java:165)
... 9 more
Caused by: java.sql.SQLException: The Network Adapter could not establish the connection
at oracle.jdbc.driver.SQLStateMapping.newSQLException(SQLStateMapping.java:70)
at oracle.jdbc.driver.DatabaseError.newSQLException(DatabaseError.java:133)
at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:199)
at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:480)
at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:413)
at oracle.jdbc.driver.PhysicalConnection.<init>(PhysicalConnection.java:508)
at oracle.jdbc.driver.T4CConnection.<init>(T4CConnection.java:203)
at oracle.jdbc.driver.T4CDriverExtension.getConnection(T4CDriverExtension.java:33)
at oracle.jdbc.driver.OracleDriver.connect(OracleDriver.java:510)
at java.sql.DriverManager.getConnection(DriverManager.java:571)
at java.sql.DriverManager.getConnection(DriverManager.java:215)
at org.apache.sqoop.mapreduce.db.DBConfiguration.getConnection(DBConfiguration.java:302)
at org.apache.sqoop.mapreduce.db.DBInputFormat.getConnection(DBInputFormat.java:213)
... 10 more
Caused by: oracle.net.ns.NetException: The Network Adapter could not establish the connection
at oracle.net.nt.ConnStrategy.execute(ConnStrategy.java:328)
at oracle.net.resolver.AddrResolution.resolveAndExecute(AddrResolution.java:421)
at oracle.net.ns.NSProtocol.establishConnection(NSProtocol.java:630)
at oracle.net.ns.NSProtocol.connect(NSProtocol.java:206)
at oracle.jdbc.driver.T4CConnection.connect(T4CConnection.java:966)
at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:292)
... 18 more
Caused by: java.net.ConnectException: Connection timed out
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at java.net.Socket.connect(Socket.java:528)
at java.net.Socket.<init>(Socket.java:425)
at java.net.Socket.<init>(Socket.java:208)
at oracle.net.nt.TcpNTAdapter.connect(TcpNTAdapter.java:127)
at oracle.net.nt.ConnOption.connect(ConnOption.java:126)
at oracle.net.nt.ConnStrategy.execute(ConnStrategy.java:306)
... 23 more
Please help me out to solve this issue. Thanks in advance.
SQOOP Eval function just acts on the RDBMS database and returns you the resultset. Here hadoop does not come into picture.
SQOOP Import function tries to import the data from RDBMS and load it into HDFS. For Sqoop Import to work as desired the db server should be reachable from all nodes in the cluster.