incorrect filesystem path running oozie script from terminal - hdfs

So I have defined an oozie workflow in Cloudera that is supposed to move a file "/user/petter/file.txt" to another location on HDFS.
I have then defined job.properties as:
```
emailTo=petter.hultin#blabla.com
oozie.wf.application.path=hdfs:///user/petter/workflowdef.xml
oozieLauncherJavaOpts=-Xmx1500m
I run from the terminal:
oozie job -oozie http://oozienode:11000/oozie -config job.properties -run
but the job fails with
*
Cannot access:
/user/hue/oozie/workspaces/hue-oozie-1452553957.19/hdfs://${nameNode}/user/file.txt
*
How do I specify the absolute HDFS path for an oozie script? I.e so that it doesnt look into /user/hue...?
The workflowdef.xml is:
```
<workflow-app name="blabla" xmlns="uri:oozie:workflow:0.5">
<start to="fork-7f89"/>
<kill name="Kill">
<message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<action name="fs-8f6b">
<fs>
<move source='hdfs://${nameNode}/user/petter/file.txt' target='/user/petter/anotherlocation/file.txt'/>
</fs>
```

Related

Wazuh syscheck agent SQL error on centos7: FIM is not working

I havd wazuh v3.13.3 installed on centos 7.
syscheck module configuration:
<syscheck>
<disabled>no</disabled>
<!-- Frequency that syscheck is executed default every 12 hours -->
<frequency>43200</frequency>
<scan_on_start>yes</scan_on_start>
<alert_new_files>yes</alert_new_files>
<!-- Directories to check (perform all possible verifications) -->
<directories check_all="yes">/etc,/usr/bin,/usr/sbin</directories>
<directories check_all="yes">/bin,/sbin,/boot</directories>
<directories check_all="yes" realtime="yes">/root</directories>
<!-- Files/directories to ignore -->
<ignore>/etc/mtab</ignore>
<ignore>/etc/hosts.deny</ignore>
<ignore>/etc/mail/statistics</ignore>
<ignore>/etc/random-seed</ignore>
<ignore>/etc/random.seed</ignore>
<ignore>/etc/adjtime</ignore>
<ignore>/etc/httpd/logs</ignore>
<ignore>/etc/utmpx</ignore>
<ignore>/etc/wtmpx</ignore>
<ignore>/etc/cups/certs</ignore>
<ignore>/etc/dumpdates</ignore>
<ignore>/etc/svc/volatile</ignore>
<ignore>/sys/kernel/security</ignore>
<ignore>/sys/kernel/debug</ignore>
<ignore>/dev/core</ignore>
<!-- File types to ignore -->
<ignore type="sregex">^/proc</ignore>
<ignore type="sregex">.log$|.swp$</ignore>
<!-- Check the file, but never compute the diff -->
<nodiff>/etc/ssl/private.key</nodiff>
<skip_nfs>yes</skip_nfs>
</syscheck>
Adding new file to the /root directory:
[root#host ossec]# date; echo "date" > ~/newfile.txt
Sat May 7 17:01:48 UTC 2022
agent log messages:
2022/05/07 17:01:48 ossec-syscheckd[26052] fim_db.c:558 at fim_db_exec_simple_wquery(): ERROR: SQL ERROR: cannot commit - no transaction is active
2022/05/07 17:01:48 ossec-syscheckd[26052] fim_db.c:558 at fim_db_exec_simple_wquery(): ERROR: SQL ERROR: cannot commit - no transaction is active
2022/05/07 17:01:48 ossec-syscheckd[26052] fim_db.c:558 at fim_db_exec_simple_wquery(): ERROR: SQL ERROR: cannot commit - no transaction is active
2022/05/07 17:01:48 ossec-syscheckd: ERROR: SQL ERROR: (8)attempt to write a readonly database
2022/05/07 17:01:48 ossec-syscheckd: ERROR: SQL ERROR: (8)attempt to write a readonly database
and I see no messages about new file in the logs.
It is too big infrastructure to upgrade to wazuh 4.x
How to solve this issue?
Thank you.
The message ERROR: SQL ERROR: (8)attempt to write a readonly database indicates some kind of problem with database permissions or that the FIM database fim.db does not exist, please check that the following files in the agent exist and have the following permissions, user, and group:
[drwxr-x--- ossec ossec ] /var/ossec/queue/fim
[drwxr-x--- ossec ossec ] /var/ossec/queue/fim/db
[-rw-rw---- root ossec ] /var/ossec/queue/fim/db/fim.db
[-rw-rw---- root ossec ] /var/ossec/queue/fim/db/fim.db-journal
In case the fim.db file does not exist, the agent recreates said file when restarting the agent.
In case the fim/ or fim/db/ directories do not exist, it is necessary to create them using the mkdir command and assign them the properties specified above [drwxr-x--- ossec ossec], then restart the agent.

bash: spark-submit: command not found while executing dag in AWS- Managed Apache Airflow

I have to run a spark job, (I am new to spark) and getting following error-
[2022-02-16 14:47:45,415] {{bash.py:135}} INFO - Tmp dir root location: /tmp
[2022-02-16 14:47:45,416] {{bash.py:158}} INFO - Running command: spark-submit --class org.xyz.practice.driver.PractitionerDriver s3://pfdt-poc-temp/xyz_test/org.xyz.spark-xy_mvp-1.0.0-SNAPSHOT.jar
[2022-02-16 14:47:45,422] {{bash.py:169}} INFO - Output:
[2022-02-16 14:47:45,423] {{bash.py:173}} INFO - bash: spark-submit: command not found
[2022-02-16 14:47:45,423] {{bash.py:177}} INFO - Command exited with return code 127
[2022-02-16 14:47:45,437] {{taskinstance.py:1482}} ERROR - Task failed with exception
What has to be done,
def run_spark(**kwargs):
import pyspark
sc = pyspark.SparkContext()
df = sc.textFile('s3://demoairflowpawan/people.txt')
logging.info('Number of lines in people.txt = {0}'.format(df.count()))
sc.stop()
spark_task = BashOperator(
task_id='spark_java',
bash_command='spark-submit --class {{ params.class }} {{ params.jar }}',
params={'class': 'org.xyz.practice.driver.PractitionerDriver', 'jar': 's3://pfdt-poc-temp/xyz_test/org.xyz.spark-xy_mvp-1.0.0-SNAPSHOT.jar'},
dag=dag
)
The question is - why do you expect the spark-submit to be there?
If you created the airflow default pods, then they come with airflow code only.
You can check here an example for spark and airflow - https://medium.com/codex/executing-spark-jobs-with-apache-airflow-3596717bbbe3 - and they state specifically "Spark binaries must be added and mapped".
So you need to figure out how to download the spark binaries to the existing airflow pod.
Alternatively - you can create another k8s job which will do the spark-submit, and have your DAG activate this job.
sorry for the high level answer...

Executing HiveQL in EMR cluster

I have created an EMR cluster thru AWS CLI
aws emr create-cluster --applications Name=Hive Name=HBase Name=Hue Name=Hadoop Name=ZooKeeper
--tags Name="EMR-Atlas" --release-label emr-5.16.0 --ec2-attributes SubnetId=subnet-xxxxx,
KeyName=atlas-emr-dif --use-default-roles --ebs-root-volume-size 100 --instance-groups
InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m4.xlarge InstanceGroupType=CORE,InstanceCount=1,
InstanceType=m4.xlarge --log-uri s3://xxx/logs/new-log --steps Name="Run Remote Script",
Jar=command-runner.jar,Args=
[bash,-c,
"curl https://s3.amazonaws.com/aws-bigdata-blog/artifacts/aws-blog-emr-atlas/apache-atlas-emr.sh
-o /tmp/script.sh; chmod +x /tmp/script.sh; /tmp/script.sh"]
Then I have established a SSH connection for HUE:
--ssh -L 8888:localhost:8888 -i key.pem hadoop#<EMR Master IP Address>
I have created a Hive table thru HUE :
CREATE external TABLE us_disease
(
YearStart int,
StratificationCategory2 string,
GeoLocation string,
ResponseID string,
LocationID int,
TopicID string
)
row format delimited
fields terminated by ','
LOCATION 's3://XXXX/data/USHealthcare/'
TBLPROPERTIES ("skip.header.line.count"="1");
I am able to fetch records with SELECT statement thru HUE.
But, if I try to execute the select statement thru HQL it fails.
I tried in the following way:
My HQL is plain SELECT statment
select * from us_disease limit 10;
and I have stored the same in S3 as hive.hql.
I executed the hql thru step in emr cluster:
Log :
INFO redirectError to /mnt/var/log/hadoop/steps/s-xxxxxxxx/stderr
INFO Working dir /mnt/var/lib/hadoop/steps/s-xxxxxxxx
INFO ProcessRunner started child process 30597 :
hadoop 30597 5505 0 11:40 ? 00:00:00 bash /usr/lib/hadoop/bin/hadoop jar /var/lib/aws/emr/step-runner/hadoop-jars/command-runner.jar hive-script --run-hive-script --args -f s3://dif-test/data-governance/hql/hive.hql
2021-03-30T11:40:36.318Z INFO HadoopJarStepRunner.Runner: startRun() called for s-xxxxxxxx Child Pid: 30597
INFO Synchronously wait child process to complete : hadoop jar /var/lib/aws/emr/step-runner/hadoop-...
INFO waitProcessCompletion ended with exit code 127 : hadoop jar /var/lib/aws/emr/step-runner/hadoop-...
INFO total process run time: 2 seconds
2021-03-30T11:40:36.437Z INFO Step created jobs:
2021-03-30T11:40:36.438Z WARN Step failed with exitCode 127 and took 2 seconds
stderr:
/usr/lib/hadoop/bin/hadoop: line 169: /etc/alternatives/jre/bin/java: No such file or directory
Any help appreciated. Thank you.
The issue got fixed after I updated the emr version. Previously I was using emr-5.16.0 . I changed to emr-5.32.0.
Modified code :
aws emr create-cluster --applications Name=Hive Name=HBase Name=Hue Name=Hadoop Name=ZooKeeper --tags Name="EMR-Atlas" --release-label emr-5.32.0 --ec2-attributes SubnetId=subnet-xxxx,KeyName=atlas-emr-dif --use-default-roles --ebs-root-volume-size 100 --instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m5.xlarge InstanceGroupType=CORE,InstanceCount=2,InstanceType=m5.xlarge --log-uri s3://xxx/xxx/new-log --steps Name="Run Remote Script",Jar=command-runner.jar,Args=[bash,-c,"curl https://s3.amazonaws.com/aws-bigdata-blog/artifacts/aws-blog-emr-atlas/apache-atlas-emr.sh -o /tmp/script.sh; chmod +x /tmp/script.sh; /tmp/script.sh"]

Need to remove white spaces and PS dir when running from ant using filterchain and replaceregex pattern

Here is a sample log file that I am trying to parse through
Added Change Sets
Component PS
9476: Build changes to make for Ant task [Nov 12, 2015 12:02 PM]
Work Item 9476: Build changes to make for Ant task
/PS/build/AntTaskHelper.xml
9582: Testing for EBF and migration script changes [Nov 12, 2015 12:02 PM]
Work Item 9582: Testing for EBF and migration script changes
/PS/database/ebf-migration/EBF-RTC-9582.sql
/PS/database/sif-internal-migration-scripts/RTC-9582.sql
9583: PKB PKG and Image File testing [Nov 12, 2015 12:02 PM]
Work Item 9583: PKB PKG and Image File testing
/PS/database/src/program-units/RTC-9583-PKG_CDT.pkb
/templates/Images/RTC-9583-ABAKER.TIF
/templates/Templates/RTC-9583-A100_1_20090101.xdp
Ultimately I need the results to show the following:
/database/ebf-migration/EBF-RTC-9582.sql
/database/sif-internal-migration-scripts/RTC-9582.sql
/database/src/program-units/RTC-9583-PKG_CDT.pkb
/templates/Images/RTC-9583-ABAKER.TIF
/templates/Templates/RTC-9583-A100_1_20090101.xdp
My regular expression works perfectly well when testing with a sample reg exp tester but not quite what I need when running in the build.
Here's my target
<target name="Parse">
<loadfile property="textFile" srcfile="${deployDir}\buildChanges1.txt">
<filterchain>
<linecontainsregexp>
<regexp pattern="((/database/(ebf-migration|sif-internal-migration-scripts/|src/program-units/))|(/templates/)).*" />
</linecontainsregexp>
<replaceregex pattern="((/database/(ebf-migration|sif-internal-migration-scripts/|src/program-units/))|(/templates/)).*" replace="\0"/>
</filterchain>
</loadfile>
<echo message= "value based on regex =${textFile}"/>
</target>
Here's the output from the build.
Parse:
[echo] value based on regex = /PS/database/ebf-migration/EBF-RTC-9582.sql
[echo] /PS/database/sif-internal-migration-scripts/RTC-9582.sql
[echo] /PS/database/src/program-units/RTC-9583-PKG_CDT.pkb
[echo] /templates/Images/RTC-9583-ABAKER.TIF
[echo] /templates/Templates/RTC-9583-A100_1_20090101.xdp
Any help on getting this to run would be greatly appreciated.

knife bootstrap returning error

Running:
knife bootstrap ec2-54-221-16-158.compute-1.amazonaws.com --sudo -x chef -P chef -N server --run-list 'role[inicial]'
My recipes/default.rb:
script "teste de script" do
interpreter "bash"
cwd "/home/ubuntu"
code <<-EOH
as-create-launch-config LcTiagoN --image-id ami-0521316c --instance-type t1.micro --key tiagov
EOH
end
My roles/inicial.rb:
name "inicial"
run_list "recipe[my_cookbook]"
The following error occurs below:
ShellOut::ShellCommandFailed←[
0m
------------------------------------←[
0m
Expected process to exit with [0], but
received '127'
---- Begin output of "bash" "/tmp/che
f-script20140501-8463-12uvvvl" ----
STDOUT:
STDERR: /tmp/chef-script20140501-8463-
12uvvvl: line 1: as-create-launch-config: command not found
---- End output of "bash" "/tmp/chef-
script20140501-8463-12uvvvl" ----
However when I run the same command (as-create-launch-config LcTiagoN --image-id ami-0521316c --instance-type t1.micro --key tiagov) directly logged in the Amazon instance, the command is executed successfully.
Any suggestions?
Sounds like a problem with the PATH environment. Did you login as "chef" when running the as-create-launch-config command manually?
Best advice I can offer is to include the full path to the command in the bash script. For example:
script "teste de script" do
..
code <<-EOH
/path/to/this/cmd/as-create-launch-config ...
EOH
end