How to use DBT with AWS Managed Airflow? - amazon-web-services

hope you are doing well.
I wanted to check if anyone has get up and running with dbt in aws mwaa airflow.
I have tried without success this one and this python packages but fails for some reason or another (can't find the dbt path, etc).
Did anyone has managed to use MWAA (Airflow 2) and DBT without having to build a docker image and placing it somewhere?
Thank you!

I've managed to solve this by doing the following steps:
Add dbt-core==0.19.1 to your requirements.txt
Add DBT cli executable into plugins.zip
#!/usr/bin/env python3
# EASY-INSTALL-ENTRY-SCRIPT: 'dbt-core==0.19.1','console_scripts','dbt'
__requires__ = 'dbt-core==0.19.1'
import re
import sys
from pkg_resources import load_entry_point
if __name__ == '__main__':
sys.argv[0] = re.sub(r'(-script\.pyw?|\.exe)?$', '', sys.argv[0])
sys.exit(
load_entry_point('dbt-core==0.19.1', 'console_scripts', 'dbt')()
)
And from here you have two options:
Setting dbt_bin operator argument to /usr/local/airflow/plugins/dbt
Add /usr/local/airflow/plugins/ to the $PATH by following the docs
Environment variable setter example:
from airflow.plugins_manager import AirflowPlugin
import os
os.environ["PATH"] = os.getenv(
"PATH") + ":/usr/local/airflow/.local/lib/python3.7/site-packages:/usr/local/airflow/plugins/"
class EnvVarPlugin(AirflowPlugin):
name = 'env_var_plugin'
The plugins zip content:
plugins.zip
├── dbt (DBT cli executable)
└── env_var_plugin.py (environment variable setter)

Using the pypi airflow-dbt-python package has simplified the setup of dbt_ to MWAA for us, as it avoids needing to amend PATH environment variables in the plugins file. However, I've yet to have a successful dbt_ run via either airflow-dbt or airflow-dbt-python packages, as MWAA worker seems to be a read only filesystem, so as soon as dbt_ tries to compile to the target directory, the following error occurs:
File "/usr/lib64/python3.7/os.py", line 223, in makedirs
mkdir(name, mode)
OSError: [Errno 30] Read-only file system: '/usr/local/airflow/dags/dbt/target'

This is how I managed to do it:
#dag(**default_args)
def dbt_dag():
#task()
def run_dbt():
from dbt.main import handle_and_check
os.environ["DBT_TARGET_DIR"] = "/usr/local/airflow/tmp/target"
os.environ["DBT_LOG_DIR"] = "/usr/local/airflow/tmp/logs"
os.environ["DBT_PACKAGE_DIR"] = "/usr/local/airflow/tmp/packages"
succeeded = True
try:
args = ['run', '--whatever', 'bla']
results, succeeded = handle_and_check(args)
print(results, succeeded)
except SystemExit as e:
if e.code != 0:
raise e
if not succeeded:
raise Exception("DBT failed")
note that my dbt_project.yml has the following paths, this is to avoid os exception when trying to write to read only paths:
target-path: "{{ env_var('DBT_TARGET_DIR', 'target') }}" # directory which will store compiled SQL files
log-path: "{{ env_var('DBT_LOG_DIR', 'logs') }}" # directory which will store dbt logs
packages-install-path: "{{ env_var('DBT_PACKAGE_DIR', 'packages') }}" # directory which will store dbt packages

Combining the answer from #Yonatan Kiron & #Ofer Helman works for me.
I just need to fix these 3 files:
requiremnt.txt
plugins.zip
dbt_project.yml
My requirements.txt I specify the version I want, and looks like this:
airflow-dbt==0.4.0
dbt-core==1.0.1
dbt-redshift==1.0.0
Note that, as of v1.0.0, pip install dbt is no longer supported and will raise an explicit error. Since v0.13, the PyPi package named dbt was a simple "pass-through" of dbt-core. (refer https://docs.getdbt.com/dbt-cli/install/pip#install-dbt-core-only)
For my plugins.zip I add a file env_var_plugin.py that looks like this
from airflow.plugins_manager import AirflowPlugin
import os
os.environ["DBT_LOG_DIR"] = "/usr/local/airflow/tmp/logs"
os.environ["DBT_PACKAGE_DIR"] = "/usr/local/airflow/tmp/dbt_packages"
os.environ["DBT_TARGET_DIR"] = "/usr/local/airflow/tmp/target"
class EnvVarPlugin(AirflowPlugin):
name = 'env_var_plugin'
And finally I add this in my dbt_project.yml
log-path: "{{ env_var('DBT_LOG_DIR', 'logs') }}" # directory which will store dbt logs
packages-install-path: "{{ env_var('DBT_PACKAGE_DIR', 'dbt_packages') }}" # directory which will store dbt packages
target-path: "{{ env_var('DBT_TARGET_DIR', 'target') }}" # directory which will store compiled SQL files
And as stated in the airflow-dbt github, (https://github.com/gocardless/airflow-dbt#amazon-managed-workflows-for-apache-airflow-mwaa) configure the dbt task like below:
dbt_bin='/usr/local/airflow/.local/bin/dbt',
profiles_dir='/usr/local/airflow/dags/{DBT_FOLDER}/',
dir='/usr/local/airflow/dags/{DBT_FOLDER}/'

Related

How do I use "discover" to run tests in my "tests" directory?

Using DJango/Python 3.7. I read here -- How do I run all Python unit tests in a directory? that I could use a "discover" command to find tests in a specified directory. I want to have a "tests" folder, so I cretaed one and then ran
(venv) localhost:myproject davea$ python -m unittest discover tests
Traceback (most recent call last):
File "/usr/local/Cellar/python/3.7.2_2/Frameworks/Python.framework/Versions/3.7/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/local/Cellar/python/3.7.2_2/Frameworks/Python.framework/Versions/3.7/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/usr/local/Cellar/python/3.7.2_2/Frameworks/Python.framework/Versions/3.7/lib/python3.7/unittest/__main__.py", line 18, in <module>
main(module=None)
File "/usr/local/Cellar/python/3.7.2_2/Frameworks/Python.framework/Versions/3.7/lib/python3.7/unittest/main.py", line 100, in __init__
self.parseArgs(argv)
File "/usr/local/Cellar/python/3.7.2_2/Frameworks/Python.framework/Versions/3.7/lib/python3.7/unittest/main.py", line 124, in parseArgs
self._do_discovery(argv[2:])
File "/usr/local/Cellar/python/3.7.2_2/Frameworks/Python.framework/Versions/3.7/lib/python3.7/unittest/main.py", line 244, in _do_discovery
self.createTests(from_discovery=True, Loader=Loader)
File "/usr/local/Cellar/python/3.7.2_2/Frameworks/Python.framework/Versions/3.7/lib/python3.7/unittest/main.py", line 154, in createTests
self.test = loader.discover(self.start, self.pattern, self.top)
File "/usr/local/Cellar/python/3.7.2_2/Frameworks/Python.framework/Versions/3.7/lib/python3.7/unittest/loader.py", line 344, in discover
raise ImportError('Start directory is not importable: %r' % start_dir)
ImportError: Start directory is not importable: 'tests'
This is odd to me because I have an (empty) init file ...
(venv) localhost:myproject davea$ ls web/tests/
__init__.py model_tests.py
What else do I need to do to get my test directory recognized?
Edit: Below are the contents of model_tests.py ...
from django.conf import settings
from django.test import TestCase
from django.core import management
def setup():
print("setup")
management.call_command('loaddata', 'test_data.yaml', verbosity=0)
def teardown():
management.call_command('flush', verbosity=0, interactive=False)
class ModelTest(TestCase):
# Verify we can correctly calculate the amount of taxes when we are working
# with a state whose tax rates are defined in our test data
def test_calculate_tax_rate_for_defined_state(self):
state = "MN"
income = 30000
taxes = IndividualTaxBracket.objects.get_taxes_owed(state, income)
print(taxes)
self.assertTrue(taxes > 0, "Failed to calucate taxes owed properly.")
I think you are having some confusion about discover command. According to docs.
Unittest supports simple test discovery. In order to be compatible
with test discovery, all of the test files must be modules or packages
(including namespace packages) importable from the top-level directory
of the project (this means that their filenames must be valid
identifiers).
It means all the test files must be importable from the directory from which you are running the command (directory that holds your web directory). It make this sure, all test files must be in valid python packages (directories containing __init__.py).
Secondly you are running the command python -m unittest discover tests which is wrong. You don't have to add tests at the end. unittests with discover command support 4 options. You can read more about it here.
I have following directory structure.
web
├── __init__.py
└── tests
├── __init__.py
└── test_models.py
And I am running following command.
python3 -m unittest discover
With following results.
...
----------------------------------------------------------------------
Ran 3 tests in 0.000s
OK
First things first: Having an __init__.py is not unusual, because the __init__.py tells python that the directory is a module; Its usual to have an empty __init__.py file. I had the same error, and fixed it by renaming my directory ..
Should a file named tests.py exist as a sibling of tests module, that would probably cause the mentioned ImportError, and removing test.py should fix it.
If still unit tests are not discovered, a couple of question are in order:
1) does the test module contain at least a class derived from django.test.TestCase ?
2) and in that case, does that class contain at least one method whose name starts with "test_"
Please note that the name of any file containing a unit test should start with "test".
So model_test.py will not work; is is generally used to setup some fake Models, but unit tests should reside elsewhere.
You can discover and run tests with this management command:
python manage.py test
or
python manage.py test appname
Is there any particular reason for using python -m unittest discover instead ? I think that could work either, but then you'll have to manually bootstrap the django environment
For completion ...
You already know that form here:
The names of your tests and files have to match a specific pattern in order to be discoverable by discover().
But then you got this error:
"django.core.exceptions.ImproperlyConfigured: Requested settings, but settings are not configured"
That means Django wasn't able to find its settings while running your tests. You can tell where to find settings using an environment variable:
DJANGO_SETTINGS_MODULE='myproyect.settings' python3 -m unittest discover
Reference: https://docs.djangoproject.com/en/2.2/topics/settings/#designating-the-settings
On the other hand ...
You should be running your Django tests with
./manage.py tests
this will search tests automatically using the same mechanism than discover(), and since you would be running a Django command, you will have some benefits against running the Django tests directly.
#Nafees Anwar asked: How does setting environment variable configure settings?
At the very beginning of the model_tests.py file there is the line from django.conf import settings, while creating the settings LazyObject instance, Django will search for that environment variable. Read the code for more detail.
I'll post here a snippet from that code for illustration.
# django.conf module.
ENVIRONMENT_VARIABLE = "DJANGO_SETTINGS_MODULE"
class LazySettings(LazyObject):
"""
A lazy proxy for either global Django settings or a custom settings object.
The user can manually configure settings prior to using them. Otherwise,
Django uses the settings module pointed to by DJANGO_SETTINGS_MODULE.
"""
def _setup(self, name=None):
"""
Load the settings module pointed to by the environment variable. This
is used the first time we need any settings at all, if the user has not
previously configured the settings manually.
"""
settings_module = os.environ.get(ENVIRONMENT_VARIABLE)
if not settings_module:
desc = ("setting %s" % name) if name else "settings"
raise ImproperlyConfigured(
"Requested %s, but settings are not configured. "
"You must either define the environment variable %s "
"or call settings.configure() before accessing settings."
% (desc, ENVIRONMENT_VARIABLE))
self._wrapped = Settings(settings_module)
So if you do:
from django.conf import settings
having that environment variable settled, the statement
settings.configure()
will fail with RuntimeError('Settings already configured.')

Python is not fetching username from my .env file in windows

1) !pip install python-dotenv
2) from dotenv import load_dotenv, find_dotenv
3) # find .env automatically by walking up directories until it's found
dotenv_path = find_dotenv()
# load up the entries as environment variables
load_dotenv(dotenv_path)
4) import os
KAGGLE_USERNAME = os.environ.get("KAGGLE_USERNAME")
print(KAGGLE_USERNAME)
Output: None
But
Expected output is:
what is the issue here?
I recently faced this issue.
Problem was i was running this inside a virtual environment and the dotenv package fails to locate the .env file using find_dotenv() command. To overcome this use
dotenv_path = find_dotenv(usecwd=True)
Hopefully this will work.

Ansible playbook with nested python scripts

I am trying to execute a ansible playbook which uses the script module to run a custom python script.
This custom python script is importing another python script.
On execution of the playbook the ansible command fails while trying to import the util script. I am new to ansible, please help!!
helloWorld.yaml:
- hosts: all
tasks:
- name: Create a directory
script: /ansible/ems/ansible-mw-tube/modules/createdirectory.py "{{arg1}}"
createdirectory.py -- Script configured in YAML playbook
#!/bin/python
import sys
import os
from hello import HelloWorld
class CreateDir:
def create(self, dirName,HelloWorldContext):
output=HelloWorld.createFolder(HelloWorldContext,dirName)
print output
return output
def main(dirName, HelloWorldContext):
c = CreateDir()
c.create(dirName, HelloWorldContext)
if __name__ == "__main__":
HelloWorldContext = HelloWorld()
main(sys.argv[1],HelloWorldContext)
HelloWorldContext = HelloWorld()
hello.py -- util script which is imported in the main script written above
#!/bin/python
import os
import sys
class HelloWorld:
def createFolder(self, dirName):
print dirName
if not os.path.exists(dirName):
os.makedirs(dirName)
print dirName
if os.path.exists(dirName):
return "sucess"
else:
return "failure"
Ansible executable command
ansible-playbook -v -i /ansible/ems/ansible-mw-tube/inventory/helloworld_host /ansible/ems/ansible-mw-tube/playbooks/helloWorld.yml -e "arg1=/opt/logs/helloworld"
Ansible version
ansible --version
[WARNING]: log file at /opt/ansible/ansible.log is not writeable and we cannot create it, aborting
ansible 2.2.0.0
config file = /etc/ansible/ansible.cfg
configured module search path = Default w/o overrides
The script module copies the script to the remote server and executes it there using the shell command. It can't find the util script, since it doesn't transfer that file - it doesn't know that it needs to do it.
You have several options, such as use copy to move both files to the server and use shell to execute them. But since what you seem to be doing is creating a directory, the file module can do that for you with no scripts necessary.

How to get ansible.module_utils to resolve my custom directory

Certain variables in ansible.cfg seem to not be taking affect.
Ansible 2.2.1.0
Python 2.7.10
Mac OS Version 10.12.5
We have created a custom class that will be used in our custom Ansible module. The class lives here:
/sites/utils/local/ansible/agt_module_utils/ldapData.py
/sites/utils/local/ansible/agt_module_utils/init.py
The class inside ldapData.py is named:
class ldapDataClass(object):
In all cases, init.py is a zero byte file.
Our Ansible module is located here:
/sites/utils/local/ansible/agt_modules/init.py
/sites/utils/local/ansible/agt_modules/agtWeblogic.py
The import statement in atgWeblogic looks as follows:
from ansible.module_utils.ldapData import ldapDataClass
I have also tried:
from ldapData import ldapDataClass
The config file has the following lines:
library = /sites/utils/local/ansible/att_modules
module_utils = /sites/utils/local/ansible/att_module_utils
When running our module, the modules directory IS resolved but the module_utils directory is not resolved. When the include is "from ansible.module_utils.ldapData import ldapDataClass" the failure is before ansible even connects to the remote machine. When the include is "from ldapData import ldapDataClass" the failure is on the remote machine. Below I am showing the failure "on the remote machine" (scroll right to see full error):
nverkland#local>ansible ecomtest37 -m agtWeblogic -a "action=stop instances=tst37-shop-main" -vvv
ecomtest37 | FAILED! => {
"changed": false,
"failed": true,
"invocation": {
"module_name": "agtWeblogic"
},
"module_stderr": "",
"module_stdout": "Traceback (most recent call last):\r\n File \"/tmp/ansible_FwHCmh/ansible_module_attWeblogic.py\", line 63, in <module>\r\n from ldapData import ldapDataClass\r\nImportError: No module named ldapData\r\n",
"msg": "MODULE FAILURE"
}
If I move the ldapData.py file into the "ansible installed" module_utils directory (/Library/Python/2.7/site-packages/ansible/module_utils/ on my Mac) the module runs fine. What have I done incorrectly in my config file that has prevented the use of my "custom" module_utils directory?
Thanks.
module_utils path is configurable since Ansible 2.3: changelog, PR.
I guess you have to upgrade or refactor the module.
Update: working example
Project tree:
.
├── ansible
│   └── ansible.cfg
├── module_utils
│   └── mycommon.py
└── modules
└── test_module.py
ansible.cfg:
[defaults]
library = ../modules
module_utils = ../module_utils
mycommon.py:
class MyCommonClass(object):
#staticmethod
def hello():
return 'world'
test_module.py:
#!/usr/bin/python
from ansible.module_utils.basic import *
from ansible.module_utils.mycommon import MyCommonClass
module = AnsibleModule(
argument_spec = dict()
)
module.exit_json(changed=True, hello=MyCommonClass.hello())
Execution:
$ ansible localhost -m test_module
[WARNING]: Host file not found: /etc/ansible/hosts
[WARNING]: provided hosts list is empty, only localhost is available
localhost | SUCCESS => {
"changed": true,
"hello": "world"
}
Ansible version:
$ ansible --version
ansible 2.3.1.0
config file = /<masked>/ansible/ansible.cfg
configured module search path = [u'../modules']
python version = 2.7.10 (default, Feb 7 2017, 00:08:15) [GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.34)]
Konstantin's notes helped. I managed to solve the riddle.
In my config file I had:
module_utils = /sites/utils/local/ansible/agt_module_utils
This causes Ansible to think that my class is found at:
import ansible.agt_module_utils.ldapData (working)
until now I was attempting to find the class at:
import ansible.module_utils.ldapData (broken)

python ConfigParser.NoSectionError: - not working on server

Python 2.7
Django 1.10
settings.ini file(located at "/opts/myproject/settings.ini"):
[settings]
DEBUG: True
SECRET_KEY: '5a88V*GuaQgAZa8W2XgvD%dDogQU9Gcc5juq%ax64kyqmzv2rG'
On my django settings file I have:
import os
from ConfigParser import RawConfigParser
config = RawConfigParser()
config.read('/opts/myproject/settings.ini')
SECRET_KEY = config.get('settings', 'SECRET_KEY')
DEBUG = config.get('settings', 'DEBUG')
The setup works fine locally, but when I deploy to my server I get the following error if I try run any django management commands:
ConfigParser.NoSectionError: No section: 'settings'
If I go into Python shell locally I type in the above imports and read the file I get back:
['/opts/myproject/settings.ini']
On server I get back:
[]
I have tried changing "confif.read()" to "config.readfp()" as suggested on here but it didn't work.
Any help or advice is appreciated.