SQOOP issue with teradata connector powered by teradata and Avro - hdfs

I'm trying to use sqoop to import a table from Teradata to HDFS as a avrodatafile, but am having issues.
Everything is working ok when importing as textfile. However, when I add --as-avrodatafile to the end of my sqoop command, I get a NPE, i.e:
ERROR sqoop.Sqoop: Got exception running Sqoop java.lang.NullPointerException
java.lang.NullPointerException
at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:763)
at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:786)
at org.apache.sqoop.manager.SqlManager.getColumnInfoForRawQuery(SqlManager.java:289)
at org.apache.sqoop.manager.SqlManager.getColumnInfo(SqlManager.java:275)
at org.apache.sqoop.manager.ConnManager.getColumnInfo(ConnManager.java:393)
at org.apache.sqoop.orm.ClassWriter.getColumnInfo(ClassWriter.java:1854)
at org.apache.sqoop.orm.AvroSchemaGenerator.generate(AvroSchemaGenerator.java:71)
at org.apache.sqoop.orm.AvroSchemaGenerator.generate(AvroSchemaGenerator.java:65)
at com.cloudera.connector.teradata.imports.BaseImportJob.configureInputFormat(BaseImportJob.java:165)
at com.cloudera.connector.teradata.imports.TableImportJob.configureInputFormat(TableImportJob.java:32)
at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:258)
at com.cloudera.connector.teradata.TeradataManager.importTable(TeradataManager.java:273)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:507)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:615)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
at org.apache.sqoop.Sqoop.main(Sqoop.java:236)"
Not sure what is going wrong

Related

Error using dump file on GCP's Database Migration Service for MySQL: No database selected

I'm receiving the following error during the migration process using GCP Database Migration Service for MySQL now using dump file:
Error importing data: generic::unknown: exit status 1 ERROR 1046
(3D000) at line 22: No database selected
Any clue?

Assing spark-deep-learning external jar to spark with python on amazon-EMR

I've been trying to get the spark-deep-learning library working on my EMR cluster to be able to read images in parallel with Python 2.7. I have been searching for this for quite some time now and I have failed to reach a solution. I have tried setting different configuration settings in the conf for the sparksession and I get the following error when trying to create a SparkSession object
ERROR SparkContext:91 - Error initializing SparkContext.
org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:89)
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:63)
at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:164)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:500)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:238)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:748)
The above was the result when using jupyter notebook.
I tried submitted the py file with spark submit and adding the jar I need to use as a value for --jars, --driver-class-path and for the --conf spark.executor.extraClassPath as discussed by this link.Here is the code I submit along with the resulting import error :
bin/spark-submit --jars /home/hadoop/spark-deep-learning-0.2.0-spark2.1-s_2.11.jar /
--driver-class-path /home/hadoop/spark-deep-learning-0.2.0-spark2.1-s_2.11.jar /
--conf spark.executor.extraClassPath=/home/hadoop/spark-deep-learning-0.2.0-spark2.1-s_2.11.jar /
/home/hadoop/RunningCode6.py
Traceback (most recent call last):
File "/home/hadoop/RunningCode6.py", line 74, in <module>
from sparkdl import KerasImageFileTransformer
ImportError: No module named sparkdl
The library works fine in a standalone mode, but I keep getting either one of the above stated errors when I use the cluster mode.
I really hope someone can help me solve this because I've been staring at it for weeks now and I need to get it working
Thanks!

Read HBase data into Spark via Apache Phoenix

Being a noob to working with Spark, Phoenix and HBase, was a trying a few examples, as listed out here and here.
Created the data as per the example for "us_population" here.
However, on trying to query the Table thus created in Phoenix / HBase, via Spark, I get the following error -
scala> val rdd = sc.phoenixTableAsRDD("us_population", Seq("CITY", "STATE", "POPULATION"), zkUrl = Some("random_aws.internal:2181"))
java.lang.NoClassDefFoundError: org/apache/phoenix/jdbc/PhoenixDriver
at org.apache.phoenix.spark.PhoenixRDD.<init>(PhoenixRDD.scala:40)
at
org.apache.phoenix.spark.SparkContextFunctions.phoenixTableAsRDD(SparkContextFunctions.scala:39)
... 52 elided
Caused by: java.lang.ClassNotFoundException: org.apache.phoenix.jdbc.PhoenixDriver
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 54 more
Unsure why this error is popping up. Any help for the same would be greatly appreciated!
P.S. I load Spark with the following command -
spark-shell --jars /usr/lib/phoenix/phoenix-spark-4.9.0-HBase-1.2.jar
Am attempting this on a tiny AWS EMR cluster of 1 Master and 1 Name Node (both are R4.xlarge with 20GB SSD external storage)
The exception you got due to class org.apache.phoenix.jdbc.PhoenixDriver missing in the spark executors classpath.
Try to add phoenix-core-4.9.0-HBase-1.2.jar when you start spark-shell.
spark-shell --jars /usr/lib/phoenix/phoenix-spark-4.9.0-HBase-1.2.jar,/usr/lib/phoenix/phoenix-core-4.9.0-HBase-1.2.jar

Syntax error on incremental append with sqoop in putty

When I am trying an incremental append with sqoop in putty,it throws syntax error.
mysql> sqoop import --connect 'jdbc:mysql://localhost:3306/retail_db'
--username retail_dba --password cloudera
--table sample --target-dir /Aravind1/sqoopdemo01
--check-column id --incremental append
--last-value 2 -m 1;
Error:
ERROR 1064 (42000): You have an error in your SQL syntax; check the
manual that corresponds to your MySQL server version for the right
syntax to use near 'sqoop import --connect
'jdbc:mysql://localhost:3306/retail_db' --usernameretail_' at line 1"
I am trying to append into the file a new record inserted in the table "Sample" Can anyone help me out with the issue?
Thanks,
Aravind

Old-style mapred API in HBase does not work

I have a MapReduce job, which takes HBase table as the output destination
of my reduce job. My reducer class implements the TableMap interface in
package org.apache.hadoop.hbase.mapred, and I used the initTableReduceJob()
function in TableMapReduceUtil class from
package org.apache.hadoop.hbase.mapred to configure my job.
But when I run my job, I got the following error at reduce stage
java.lang.NullPointerException
at org.apache.hadoop.mapred.Task.getFsStatistics(Task.java:1099)
at
org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.<init>(ReduceTask.java:442)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:490)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
My HBase version is 0.94.0 and my Hadoop version is 1.0.1.
I found a post similar to my question at:
https://forums.aws.amazon.com/thread.jspa?messageID=394846
Could anyone give me some hint about why this happened? Should I just stick
with the org.apache.hadoop.hbase.mapreduce package?
This error suggests that you may be running HBase on the local filesystem without HDFS. Try installing or running Hadoop HDFS. The org.apache.hadoop.mapred API appears to require HDFS.
As a possible convenience, you may try the Cloudera development VM, which packages both.