oozie error: Accessing local file system is not allowed - hdfs

Sqoop import action is giving error while running as an oozie job.
I am using a pesudo-distributed hadoop cluster.
I have followed the following steps.
1.Started oozie server
2.edited job.properties and workflow.xml files
3.copied workflow.xml into hdfs
4.ran oozie job
my job.properties file
nameNode=hdfs://localhost:8020
jobTracker=localhost:8021
queueName=default
examplesRoot=examples
oozie.use.system.libpath=true
oozie.wf.application.path=${nameNode}/user/hduser/${examplesRoot}/apps/sqoop
workflow.xml file
<action name="sqoop-node">
<sqoop xmlns="uri:oozie:sqoop-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<prepare>
<delete path="${nameNode}/user/hduser/${examplesRoot}/output-data/sqoop"/>
<!--<mkdir path="${nameNode}/user/hduser/${examplesRoot}/output-data"/>-->
</prepare>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<command>import --connect "jdbc:mysql://localhost/db" --username user --password pass --table "table" --where "Conditions" --driver com.mysql.jdbc.Driver --target-dir ${nameNode}/user/hduser/${examplesRoot}/output-data/sqoop -m 1</command>
<!--<file>db.hsqldb.properties#db.hsqldb.properties</file>
<file>db.hsqldb.script#db.hsqldb.script</file>-->
</sqoop>
<ok to="end"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>Sqoop failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
I was expecting that the job will run without any errors. But the job got killed and it gave the following error.
UnsupportedOperationException: Accessing local file system is not allowed.
I don't understand where I am wrong and why it is not allowing to complete the job?
Can Anyone help me to solve the issue.

Oozie sharelib (with the Sqoop action's dependencies) is stored on HDFS, and the server needs to know how to communicate with the Hadoop cluster. Access to the sharelib stored on a local filesystem is not allowed, see CVE-2017-15712.
Please review conf/hadoop-conf/core-site.xml, and make sure it does not use the local filesystem. For example, if your HDFS namenode listens on port 9000 on localhost, configure fs.defaultFS accordingly.
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
...
</configuration>
Alternatively, you can remove the RawLocalFileSystem class (dummy implementation) and restart the server, but it isn't recommended (i.e. server becomes vulnerable to CVE-2017-15712).
Hope this helps. Also see this answer.

Related

hadoop Web UI localhost:50070 can not open

Ubuntu 16.04.1 LTS
Hadoop 3.3.1
I try to set up hadoop pseudo distributed mode referring to one network tutorial.and follow below steps.
Step 1:setting Hadoop
1.add below code to /etc/profile.
export HADOOP_HOME=/home/hadoop/hadoop
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export HADOOP_INSTALL=$HADOOP_HOME
2.in $HADOOP_HOME/etc/hadoop/hadoop-env.sh,set
export JAVA_HOME=/opt/jdk1.8.0_261
core-site.xml:
<configuration>
<property>
<name>fs.default.name </name>
<value> hdfs://localhost:9000 </value>
</property>
</configuration>
hdfs-site.xml:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>file:///home/hadoop/hadoop/pseudo/hdfs/namenode</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>file:///home/hadoop/hadoop/pseudo/hdfs/datanode</value>
</property>
</configuration>
yarn-site.xml:
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
mapred-site.xml:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
Step 2:Verify Hadoop
1.$ hdfs namenode -format
2.
sudo apt-get install ssh
ssh-keygen -t rsa
ssh-copy-id hadoop#ubuntu
cd ~/hadoop/sbin
start-dfs.sh
3.start-yarn.sh
4.open http://localhost:50070/ in firefox on local machine.
Unable to connect
Firefox can't establish a connection to the server at localhost:50070.
The site could be temporarily unavailable or too busy. Try again in a few moments.
If you are unable to load any pages, check your computer's network connection.
If your computer or network is protected by a firewall or proxy, make sure that Firefox is permitted to access the Web.
5.open http://localhost:8088/ in firefox ,returns the same error of port 50070.
when I run jps command,it returns
hadoop#ubuntu:~/hadoop/sbin$ jps
32501 DataNode
32377 NameNode
32682 SecondaryNameNode
32876 Jps
Since Hadoop 3.0.0 - Alpha 1 there was a Change in the port configuration:
http://localhost:50070
was moved to
http://localhost:9870
see https://issues.apache.org/jira/browse/HDFS-9427

Bad Request - Invalid Hostname in IIS8

I am working on an asp.net core application, but my project stops running from debug mode(using f5). I need to host it on local IIS to debug the code. When running it locally I am getting this error "Bad Request - Invalid Hostname".
You can try below steps to solve this problem:
Exit the IIS Express instant currently running.
Open IIS Express’s applicationhost.config located at the following path C:\Users\\Documents\IISExpress\config\applicationhost.config
Find the entry for a particular site (e.g “Test” running in port 6306) which you are developing.e.g:
<site name="test" id="1">
<application path="/" applicationPool="gratedAppPool">
<virtualDirectory path="/" physicalPath="" />
</application>
<bindings>
<binding protocol="http" bindingInformation="*:6306:localhost" />
</bindings>
</site>
Replace the following bindingInformation=":6306:localhost" with bindingInformation=":6306:*"
Save the file.
Start a command prompt in administrator mode and run the following command.
netsh http add urlacl url=http://*:6306/ user=Everyone
Now debug the site again and you should be able to access the url using host name.

Unable to connect Cassandra Cluster in AWS from EC2 instance

I setup Cassandra Cluster Using DataStax AMI in AWS and run the cassandra service. I am trying to connect this cassandra service from another EC2 instance where titan is installed. Titan server version is 0.4.4. I also tried with 0.5.3 but still the same error.
Cassandra is backend storage for the titan .
Error is
20366 [main] WARN com.tinkerpop.rexster.config.GraphConfigurationContainer - Could not load graph graph. Please check the XML configuration.
20367 [main] WARN com.tinkerpop.rexster.config.GraphConfigurationContainer - GraphConfiguration could not be found or otherwise instantiated: [com.thinkaurelius.titan.tinkerpop.rexster.TitanGraphConfiguration]. Ensure that it is in Rexster's path.
com.tinkerpop.rexster.config.GraphConfigurationException: GraphConfiguration could not be found or otherwise instantiated: [com.thinkaurelius.titan.tinkerpop.rexster.TitanGraphConfiguration]. Ensure that it is in Rexster's path.at com.tinkerpop.rexster.config.GraphConfigurationContainer.getGraphFromConfiguration(GraphConfigurationContainer.java:137)
at com.tinkerpop.rexster.config.GraphConfigurationContainer.<init>(GraphConfigurationContainer.java:54)
at com.tinkerpop.rexster.server.XmlRexsterApplication.reconfigure(XmlRexsterApplication.java:99)
at com.tinkerpop.rexster.server.XmlRexsterApplication.<init>(XmlRexsterApplication.java:47)
at com.tinkerpop.rexster.Application.<init>(Application.java:96)
at com.tinkerpop.rexster.Application.main(Application.java:188)
Caused by: java.lang.IllegalArgumentException: Could not instantiate implementation: com.thinkaurelius.titan.diskstorage.cassandra.astyanax.AstyanaxStoreManager
at com.thinkaurelius.titan.diskstorage.Backend.instantiate(Backend.java:355)
at com.thinkaurelius.titan.diskstorage.Backend.getImplementationClass(Backend.java:367)
at com.thinkaurelius.titan.diskstorage.Backend.getStorageManager(Backend.java:311)
at com.thinkaurelius.titan.diskstorage.Backend.<init>(Backend.java:121)
at com.thinkaurelius.titan.graphdb.configuration.GraphDatabaseConfiguration.getBackend(GraphDatabaseConfiguration.java:1173)
at com.thinkaurelius.titan.graphdb.database.StandardTitanGraph.<init>(StandardTitanGraph.java:75)
at com.thinkaurelius.titan.core.TitanFactory.open(TitanFactory.java:40)
at com.thinkaurelius.titan.tinkerpop.rexster.TitanGraphConfiguration.configureGraphInstance(TitanGraphConfiguration.java:25)
at com.tinkerpop.rexster.config.GraphConfigurationContainer.getGraphFromConfiguration(GraphConfigurationContainer.java:119)
Configuration file -
<rexster>
<http>
<server-port>7182</server-port>
<server-host>0.0.0.0</server-host>
<base-uri>http://localhost</base-uri>
<web-root>public</web-root>
<character-set>UTF-8</character-set>
<enable-jmx>false</enable-jmx>
<enable-doghouse>true</enable-doghouse>
<max-post-size>2097152</max-post-size>
<max-header-size>8192</max-header-size>
<upload-timeout-millis>30000</upload-timeout-millis>
<thread-pool>
<worker>
<core-size>8</core-size>
<max-size>8</max-size>
</worker>
<kernal>
<core-size>4</core-size>
<max-size>4</max-size>
</kernal>
</thread-pool>
<io-strategy>leader-follower</io-strategy>
</http>
<rexpro>
<server-port>7180</server-port>
<server-host>0.0.0.0</server-host>
<session-max-idle>1790000</session-max-idle>
<session-check-interval>3000000</session-check-interval>
<connection-max-idle>180000</connection-max-idle>
<connection-check-interval>3000000</connection-check-interval>
<enable-jmx>false</enable-jmx>
<thread-pool>
<worker>
<core-size>8</core-size>
<max-size>8</max-size>
</worker>
<kernal>
<core-size>4</core-size>
<max-size>4</max-size>
</kernal>
</thread-pool>
<io-strategy>leader-follower</io-strategy>
</rexpro>
<shutdown-port>7183</shutdown-port>
<shutdown-host>127.0.0.1</shutdown-host>
<graphs>
<graph>
<graph-name>graph</graph-name>
<graph-type>com.thinkaurelius.titan.tinkerpop.rexster.TitanGraphConfiguration</graph-type>
<graph-location>/tmp/titan</graph-location>
<graph-read-only>false</graph-read-only>
<properties>
<storage.hostname>ec2-52-22-199-210.amazonaws.com</storage.hostname>
<storage.backend>cassandra</storage.backend>
</properties>
<extensions>
<allows>
<allow>tp:gremlin</allow>
</allows>
</extensions>
</graph>
</graphs>
</rexster>

Scalatra app on Openshift - setting Jetty IP

I'm trying to deploy a minimal Scalatra application on Openshift with DIY cartridge. I've managed to get SBT working, but when it comes to container:start, I get the error:
FAILED SelectChannelConnector#0.0.0.0:8080: java.net.SocketException: Permission denied
Apparently, embedded Jetty tries to open socket at 0.0.0.0, which is prohibited by Openshift (you can only open ports at $OPENSHIFT_INTERNAL_IP). How can I tell Jetty exactly which IP I need it to listen?
Yes you are right about $OPENSHIFT_INTERNAL_IP. So edit ${jetty.home}/etc/jetty.xml and set jetty.host in the connector section as follows:
…..
<Set name="connectors">
<Array type="org.mortbay.jetty.Connector">
<Item>
<New class="org.mortbay.jetty.nio.SelectChannelConnector">
<Set name="host"><SystemProperty name="jetty.host" />$OPENSHIFT_INTERNAL_IP</Set>
<Set name="port"><SystemProperty name="jetty.port" default="8080"/></Set>
...
</New>
</Item>
</Array>
</Set>
hth
I've never used Openshift, so I'm groping a bit here.
Do you have a jetty.host set?
You may need to set up a jetty.xml file and set it in there. See http://docs.codehaus.org/display/JETTY/Newbie+Guide+to+Jetty for how to set the host. You can tell the xsbt web plugin about jetty.xml by setting your project up like this:
https://github.com/JamesEarlDouglas/xsbt-web-plugin/wiki/Settings
Alternately, you may be able to pass the parameter to Jetty during startup. That'd look like this: -Djetty.host="yourhostname"
To get running with jetty 9.2.13.v20150730 on the Openshift with DIY cartridge you have to run with Java8 setting it to run on the $OPENSHIFT_INTERNAL_IP as follows. First ssh onto the host and download a jdk8 with
cd $OPENSHIFT_DATA_DIR
wget --no-check-certificate --no-cookies --header "Cookie: oraclelicense=accept-securebackup-cookie" http://download.oracle.com/otn-pub/java/jdk/8u5-b13/jdk-8u5-linux-x64.tar.gz
tar -zxf jdk-8u5-linux-x64.tar.gz
export PATH=$OPENSHIFT_DATA_DIR/jdk1.8.0_05/bin:$PATH
export JAVA_HOME="$OPENSHIFT_DATA_DIR/jdk/jdk1.8.0_05"
java -version
Then in your .openshift\action_hooks\start ensure you have the same exported variables with something like:
# see http://stackoverflow.com/a/23895161/329496 to install jdk1.8 no DIY cartridge
export JAVA_HOME="$OPENSHIFT_DATA_DIR/jdk/jdk1.8.0_05"
export PATH=$OPENSHIFT_DATA_DIR/jdk1.8.0_05/bin:$PATH
nohup java -cp ${OPENSHIFT_REPO_DIR}target/dependency/jetty-runner.jar org.eclipse.jetty.runner.Runner --host ${OPENSHIFT_DIY_IP} --port ${OPENSHIFT_DIY_PORT} ${OPENSHIFT_REPO_DIR}/target/thinbus-srp-spring-demo.war > ${OPENSHIFT_LOG_DIR}server.log 2>&1 &
(Note that jdk-8u20-linux-x64.tar.gz has also been reported to work so you may want to check for the latest available.)
That setup does not need a jetty.xml as it sets the --host and --port to bind to the correct interface and run the built war file. What it does require is that jetty-runner.jar is copied out of the ivy cache into the target folder. With maven to do that you add something like:
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-dependency-plugin</artifactId>
<version>2.3</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>copy</goal>
</goals>
<configuration>
<artifactItems>
<artifactItem>
<groupId>org.eclipse.jetty</groupId>
<artifactId>jetty-runner</artifactId>
<version>${jetty.version}</version>
<destFileName>jetty-runner.jar</destFileName>
</artifactItem>
</artifactItems>
</configuration>
</execution>
</executions>
</plugin>
Google suggest that the SBT equivalent is simply retrieveManaged := true. You can ssh to the host and run find to figure out where the jetty-runner.jar dependency has been copied to and update the start command appropriately.

AppFabric ErrorCode<ERRCA0017><ES0006>:

I have installed AppFabric on the server. I have created a cluster of a single computer . I have also create a cache named "Gagan".
used the following commands in order
Use-CacheCluster -Provider xml -ConnectionString \NB-GJANJUA\Cache
Start-CacheCluster
Result is that the cache service is up and running ..so far so good.
I then setup my web.config file like below
<?xml version="1.0"?>
<configuration>
<configSections>
<section name="dataCacheClient"
type="Microsoft.ApplicationServer.Caching.DataCacheClientSection,
Microsoft.ApplicationServer.Caching.Core, Version=1.0.0.0,
Culture=neutral, PublicKeyToken=31bf3856ad364e35"
allowLocation="true"
allowDefinition="Everywhere"/>
</configSections>
<!-- cache client -->
<dataCacheClient>
<!-- cache host(s) -->
<hosts>
<host
name="NB-GJANJUA.com"
cachePort="22233"/>
</hosts>
</dataCacheClient>
<system.web>
<compilation debug="true" targetFramework="4.0" >
<assemblies>
<add assembly="System.Core, Version=3.5.0.0, Culture=neutral, PublicKeyToken=B77A5C561934E089"/>
<add assembly="System.Web.Extensions, Version=3.5.0.0, Culture=neutral, PublicKeyToken=31BF3856AD364E35"/>
<add assembly="System.Data.DataSetExtensions, Version=3.5.0.0, Culture=neutral, PublicKeyToken=B77A5C561934E089"/>
<add assembly="System.Xml.Linq, Version=3.5.0.0, Culture=neutral, PublicKeyToken=B77A5C561934E089"/>
<add assembly="Microsoft.ApplicationServer.Caching.Client, Version=1.0.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35"/>
<add assembly="Microsoft.ApplicationServer.Caching.Core, Version=1.0.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35"/>
</assemblies>
</compilation>
<sessionState mode="Custom" customProvider="SessionStore" cookieless="true">
<providers>
<add name="SessionStore" type="Microsoft.ApplicationServer.Caching.DataCacheSessionStoreProvider" cacheName="Gagan" />
</providers>
</sessionState>
</system.web>
<system.webServer>
<modules runAllManagedModulesForAllRequests="true"/>
</system.webServer>
</configuration>
But as soon as I launch my site , it comes up with this error
Parser Error Message: ErrorCode:SubStatus:There is a temporary failure. Please retry later. (One or more specified Cache servers are unavailable, which could be caused by busy network or servers. Ensure that security permission has been granted for this client account on the cluster and that the AppFabric Caching Service is allowed through the firewall on all cache hosts. Retry later.)
Source Error:
Line 44: <sessionState mode="Custom" customProvider="SessionStore" cookieless="true">
Line 45: <providers>
Line 46: <add name="SessionStore" type="Microsoft.ApplicationServer.Caching.DataCacheSessionStoreProvider" cacheName="Gagan" />
Line 47: </providers>
Line 48: </sessionState>
Is there something that I am missing ?
Note : I have already referenced the
Microsoft.ApplicationServer.Caching.Client and the
Microsoft.APplicationServer.Caching.Core assemblies
THanks for your time and patience
With Regards
Gagan Janjua
I was also having this error. Just to test client in development I switched off security by using AppFabric Power Shell command
Stop-CacheCluster
Set-CacheClusterSecurity -SecurityMode None -ProtectionLevel None
Start-CacheCluster
Also set following in client application in web.config
<dataCacheClient>
<securityProperties mode="None" protectionLevel="None"/>
</dataCacheClient>
This is not production scenario but the above error disappear when these settings are applied.
I had a similar issue, running IIS 7.5 on Windows Server 2008 R2. I resolved it by issuing the following commands in PowerShell (started from the Windows AppFabric folder in Start, All Programs):
New-Cache -CacheName NameOfCacheAsSetInWebConfig -TimeToLive 30
Grant-CacheAllowedClientAccount "IIS AppPool\NameOfAppPoolRunningSite"
Once I did that, I was all set.
Have you granted access to the cache for whatever user your website is running as?
Grant-CacheAllowedClientAccount Gagan
I solved this problem as follows:
Launch Windows Task manager and notice under what User Name your w3wp.exe is running?
In my case it was: ASP.NET v4.0
Launched Start -> All Programs -> Windows Server App Fabric -> IIS manager
In IIS , select Machine name and then Application Pools on top left handside.
In Application Pools..Verify that ASP.net v4.0 exists under Application Pools.
Launched Start -> All Programs -> Windows Server App Fabric -> Caching Administration Windows Power Shell
Type the following command on the prompt: Grant-CacheAllowedClientAccount "ASP.NET v4.0"
restarted the web application and following error went away:
ErrorCode<ERRCA0017>:SubStatus<ES0006>:There is a temporary failure. Please retry later. (One or more specified Cache servers are unavailable, which could be caused by busy network or servers. Ensure that security permission has been granted for this client account on the cluster and that the AppFabric Caching Service is allowed through the firewall on all cache hosts. Retry later.)
I had this problem and it was just that the Cache Cluster was down after a reboot. I didn't realize that you have to manually switch the service to start automatically in the services. Detailed information on that is here.
Commenting out the following in the config fixed it for me:
<sessionState customProvider="AppFabricCacheSessionStoreProvider" mode="Custom" timeout="90">
<providers>
<add name="AppFabricCacheSessionStoreProvider" type="Microsoft.ApplicationServer.Caching.DataCacheSessionStoreProvider" cacheName="Session" sharedId="SharedApp" />
</providers>
</sessionState>
By default the worker processes will be setup as an iis user, those users need access. In your Caching Administration Windows Powershell type the following
Grant-CacheAllowedClientAccount IIS_IUSRS