Ansible playbook with nested python scripts - python-2.7

I am trying to execute a ansible playbook which uses the script module to run a custom python script.
This custom python script is importing another python script.
On execution of the playbook the ansible command fails while trying to import the util script. I am new to ansible, please help!!
helloWorld.yaml:
- hosts: all
tasks:
- name: Create a directory
script: /ansible/ems/ansible-mw-tube/modules/createdirectory.py "{{arg1}}"
createdirectory.py -- Script configured in YAML playbook
#!/bin/python
import sys
import os
from hello import HelloWorld
class CreateDir:
def create(self, dirName,HelloWorldContext):
output=HelloWorld.createFolder(HelloWorldContext,dirName)
print output
return output
def main(dirName, HelloWorldContext):
c = CreateDir()
c.create(dirName, HelloWorldContext)
if __name__ == "__main__":
HelloWorldContext = HelloWorld()
main(sys.argv[1],HelloWorldContext)
HelloWorldContext = HelloWorld()
hello.py -- util script which is imported in the main script written above
#!/bin/python
import os
import sys
class HelloWorld:
def createFolder(self, dirName):
print dirName
if not os.path.exists(dirName):
os.makedirs(dirName)
print dirName
if os.path.exists(dirName):
return "sucess"
else:
return "failure"
Ansible executable command
ansible-playbook -v -i /ansible/ems/ansible-mw-tube/inventory/helloworld_host /ansible/ems/ansible-mw-tube/playbooks/helloWorld.yml -e "arg1=/opt/logs/helloworld"
Ansible version
ansible --version
[WARNING]: log file at /opt/ansible/ansible.log is not writeable and we cannot create it, aborting
ansible 2.2.0.0
config file = /etc/ansible/ansible.cfg
configured module search path = Default w/o overrides

The script module copies the script to the remote server and executes it there using the shell command. It can't find the util script, since it doesn't transfer that file - it doesn't know that it needs to do it.
You have several options, such as use copy to move both files to the server and use shell to execute them. But since what you seem to be doing is creating a directory, the file module can do that for you with no scripts necessary.

Related

How to use DBT with AWS Managed Airflow?

hope you are doing well.
I wanted to check if anyone has get up and running with dbt in aws mwaa airflow.
I have tried without success this one and this python packages but fails for some reason or another (can't find the dbt path, etc).
Did anyone has managed to use MWAA (Airflow 2) and DBT without having to build a docker image and placing it somewhere?
Thank you!
I've managed to solve this by doing the following steps:
Add dbt-core==0.19.1 to your requirements.txt
Add DBT cli executable into plugins.zip
#!/usr/bin/env python3
# EASY-INSTALL-ENTRY-SCRIPT: 'dbt-core==0.19.1','console_scripts','dbt'
__requires__ = 'dbt-core==0.19.1'
import re
import sys
from pkg_resources import load_entry_point
if __name__ == '__main__':
sys.argv[0] = re.sub(r'(-script\.pyw?|\.exe)?$', '', sys.argv[0])
sys.exit(
load_entry_point('dbt-core==0.19.1', 'console_scripts', 'dbt')()
)
And from here you have two options:
Setting dbt_bin operator argument to /usr/local/airflow/plugins/dbt
Add /usr/local/airflow/plugins/ to the $PATH by following the docs
Environment variable setter example:
from airflow.plugins_manager import AirflowPlugin
import os
os.environ["PATH"] = os.getenv(
"PATH") + ":/usr/local/airflow/.local/lib/python3.7/site-packages:/usr/local/airflow/plugins/"
class EnvVarPlugin(AirflowPlugin):
name = 'env_var_plugin'
The plugins zip content:
plugins.zip
├── dbt (DBT cli executable)
└── env_var_plugin.py (environment variable setter)
Using the pypi airflow-dbt-python package has simplified the setup of dbt_ to MWAA for us, as it avoids needing to amend PATH environment variables in the plugins file. However, I've yet to have a successful dbt_ run via either airflow-dbt or airflow-dbt-python packages, as MWAA worker seems to be a read only filesystem, so as soon as dbt_ tries to compile to the target directory, the following error occurs:
File "/usr/lib64/python3.7/os.py", line 223, in makedirs
mkdir(name, mode)
OSError: [Errno 30] Read-only file system: '/usr/local/airflow/dags/dbt/target'
This is how I managed to do it:
#dag(**default_args)
def dbt_dag():
#task()
def run_dbt():
from dbt.main import handle_and_check
os.environ["DBT_TARGET_DIR"] = "/usr/local/airflow/tmp/target"
os.environ["DBT_LOG_DIR"] = "/usr/local/airflow/tmp/logs"
os.environ["DBT_PACKAGE_DIR"] = "/usr/local/airflow/tmp/packages"
succeeded = True
try:
args = ['run', '--whatever', 'bla']
results, succeeded = handle_and_check(args)
print(results, succeeded)
except SystemExit as e:
if e.code != 0:
raise e
if not succeeded:
raise Exception("DBT failed")
note that my dbt_project.yml has the following paths, this is to avoid os exception when trying to write to read only paths:
target-path: "{{ env_var('DBT_TARGET_DIR', 'target') }}" # directory which will store compiled SQL files
log-path: "{{ env_var('DBT_LOG_DIR', 'logs') }}" # directory which will store dbt logs
packages-install-path: "{{ env_var('DBT_PACKAGE_DIR', 'packages') }}" # directory which will store dbt packages
Combining the answer from #Yonatan Kiron & #Ofer Helman works for me.
I just need to fix these 3 files:
requiremnt.txt
plugins.zip
dbt_project.yml
My requirements.txt I specify the version I want, and looks like this:
airflow-dbt==0.4.0
dbt-core==1.0.1
dbt-redshift==1.0.0
Note that, as of v1.0.0, pip install dbt is no longer supported and will raise an explicit error. Since v0.13, the PyPi package named dbt was a simple "pass-through" of dbt-core. (refer https://docs.getdbt.com/dbt-cli/install/pip#install-dbt-core-only)
For my plugins.zip I add a file env_var_plugin.py that looks like this
from airflow.plugins_manager import AirflowPlugin
import os
os.environ["DBT_LOG_DIR"] = "/usr/local/airflow/tmp/logs"
os.environ["DBT_PACKAGE_DIR"] = "/usr/local/airflow/tmp/dbt_packages"
os.environ["DBT_TARGET_DIR"] = "/usr/local/airflow/tmp/target"
class EnvVarPlugin(AirflowPlugin):
name = 'env_var_plugin'
And finally I add this in my dbt_project.yml
log-path: "{{ env_var('DBT_LOG_DIR', 'logs') }}" # directory which will store dbt logs
packages-install-path: "{{ env_var('DBT_PACKAGE_DIR', 'dbt_packages') }}" # directory which will store dbt packages
target-path: "{{ env_var('DBT_TARGET_DIR', 'target') }}" # directory which will store compiled SQL files
And as stated in the airflow-dbt github, (https://github.com/gocardless/airflow-dbt#amazon-managed-workflows-for-apache-airflow-mwaa) configure the dbt task like below:
dbt_bin='/usr/local/airflow/.local/bin/dbt',
profiles_dir='/usr/local/airflow/dags/{DBT_FOLDER}/',
dir='/usr/local/airflow/dags/{DBT_FOLDER}/'

Error while triggering a python script using os.system. Script takes sys.argv arguments

Simple script1.py that takes arguments and calls script2.py by passing them to os.system() :
#! /usr/bin/env python
import os
import sys
os.system("script2.py sys.argv[1] sys.argv[2]")
Running this :
./script1.py "arg1" "arg2"
Getting this single error :
sh: 1: script2.py: not found
Both scripts are present in the same directory.
Applied chmod 777 on both script1.py and script2.py and are executable.
Both scripts call the same interpreter installed at /usr/bin/env python.
When I try these :
os.system("./script2.py sys.argv[1] sys.argv[2]")
os.system("python script2.py sys.argv[1] sys.argv[2]")
The sys.argv[1] and sys.argv[2] are being considered as strings instead of dynamic variables
Have you tried with:
./script2.py "arg1" "arg2"
Inside the os.system?
UPDATE 2
Try with:
import urllib
call_with_args = "./script2.py '%s' '%s'" % (str(arg1), str(arg2))
os.system(call_with_args)

PyCharm can't find 'SPARK_HOME' when imported from a different file

I've two files.
test.py
from pyspark import SparkContext
from pyspark import SparkConf
from pyspark import SQLContext
class Connection():
conf = SparkConf()
conf.setMaster("local")
conf.setAppName("Remote_Spark_Program - Leschi Plans")
conf.set('spark.executor.instances', 1)
sc = SparkContext(conf=conf)
sqlContext = SQLContext(sc)
print ('all done.')
con = Connection()
test_test.py
from test import Connection
sparkConnect = Connection()
when I run test.py the connection is made successfully but with test_test.py it gives
raise KeyError(key)
KeyError: 'SPARK_HOME'
KEY_ERROR arises if the SPARK_HOME is not found or invalid. So it's better to add it to your bashrc and check and reload in your code. So add this at the top of your test.py
import os
import sys
import pyspark
from pyspark import SparkContext, SparkConf, SQLContext
# Create a variable for our root path
SPARK_HOME = os.environ.get('SPARK_HOME',None)
# Add the PySpark/py4j to the Python Path
sys.path.insert(0, os.path.join(SPARK_HOME, "python", "lib"))
sys.path.insert(0, os.path.join(SPARK_HOME, "python"))
pyspark_submit_args = os.environ.get("PYSPARK_SUBMIT_ARGS", "")
if not "pyspark-shell" in pyspark_submit_args: pyspark_submit_args += " pyspark-shell"
os.environ["PYSPARK_SUBMIT_ARGS"] = pyspark_submit_args
Also add this at the end of your ~/.bashrc file
COMMAND: vim ~/.bashrc if you are using any Linux based OS
# needed for Apache Spark
export SPARK_HOME="/opt/spark"
export IPYTHON="1"
export PYSPARK_PYTHON="/usr/bin/python3"
export PYSPARK_DRIVER_PYTHON="ipython3"
export PYSPARK_DRIVER_PYTHON_OPTS="notebook"
export PYTHONPATH="$SPARK_HOME/python/:$PYTHONPATH"
export PYTHONPATH="$SPARK_HOME/python/lib/py4j-0.9-src.zip:$PYTHONPATH"
export PYSPARK_SUBMIT_ARGS="--master local[2] pyspark-shell"
export CLASSPATH="$CLASSPATH:/opt/spark/lib/spark-assembly-1.6.1-hadoop2.6.0.jar
Note:
In the above bashrc code, I have given my SPARK_HOME value as /opt/spark you can give the location where you keep your spark folder(the downloaded one from the website).
Also I'm using python3 you can change it to python in the bashrc if you are using python 2.+ versions
I was using Ipython, for easy testing during runtime, like load the data once and test your code many times. If you are using plain old text editor, let me know I will update the bashrc accordingly.

ConfigParser does not load sections when run via crontab

When I run python script via command line, everything works just perfect, but wehn the script is being running from cron ConfigParser creates an empty list of sections
me = singleton.SingleInstance()
######### Accessing the configuration file #######################################
config = ConfigParser.RawConfigParser()
config.read('./matrix.cfg')
sections = config.sections()
######### Build the current map ##################################################
print sections
Here is the cron job
* * * * * /usr/bin/python /etc/portmatrix/matrix.py | logger
and here is the output
Feb 12 12:59:01 dns01 CRON[30879]: (root) CMD (/usr/bin/python /etc/portmatrix/matrix.py | logger)
Feb 12 12:59:01 dns01 logger: []
ConfigParser tries to read the file ./matrix.cfg.
Now the path of this file is ./ which means in the current directory.
So what assumption do you make about the current directory when being run from cron? (I guess you have a file /etc/portmatrix/matrix.cfg and you assume that ./ really means "in the same directory as the running script" - this however is not true)
The simple fix is to provide the full path to the configuration file. E.g.:
config = ConfigParser.RawConfigParser()
config.read('/etc/portmatrix/matrix.cfg')
I ran into a similar situation and I was able to fix it thanks to umläute's answer.
I'm using os however, so in your case, this would be something like:
import os
...
base_path = os.path.dirname(os.path.realpath(__file__))
config = ConfigParser.RawConfigParser()
config.read(os.path.join(base_path, 'matrix.cfg')

Django timer thread

I would like to compute some information in my Django application on regular basis.
I need to select and insert data each second and want to use Django ORM.
How can I do this?
In a shell script, set the DJANGO_SETTINGS_MODULE variable and call a python script
export DJANGO_SETTINGS_MODULE=yourapp.settings
python compute_some_info.py
In compute_some_info.py, set up django and import your modules (look at how the manage.py script sets up to run Django)
#!/usr/bin/env python
import sys
try:
import settings # Assumed to be in the same directory.
except ImportError:
sys.stderr.write("Error: Can't find the file 'settings.py'")
sys.exit(1)
sys.path = sys.path + ['/yourapphome']
from yourapp.models import YourModel
YourModel.compute_some_info()
Then call your shell script in a cron job.
Alternatively -- you can just keep running and sleeping (better if it's every second) -- you would still want to be outside of the webserver and in your own process that is set up this way.
One way to do it would be to create a custom command, and invoke python manage.py your_custom_command from cron or windows scheduler.
http://docs.djangoproject.com/en/dev/howto/custom-management-commands/
For example, create myapp/management/commands/myapp_task.py which reads:
from django.core.management.base import NoArgsCommand
class Command(NoArgsCommand):
def handle_noargs(self, **options):
print 'Doing task...'
# invoke the functions you need to run on your project here
print 'Done'
Then you can run it from cron like this:
export DJANGO_SETTINGS_MODULE=myproject.settings; export PYTHONPATH=/path/to/project_parent; python manage.py myapp_task