Using GCSToBigQueryOperator this error occur
Broken DAG: [/opt/airflow/dags/injest_data.py] Traceback (most recent call last):
File "/opt/airflow/dags/injest_data.py", line 79, in <module>
> "sourceUris": [f"gs://{BUCKET_NAME}/*"],
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/models/baseoperator.py", line 397, in apply_defaults
raise AirflowException(f"missing keyword arguments {display}")
airflow.exceptions.AirflowException: missing keyword arguments 'bucket', 'destination_project_dataset_table','source_objects'****
And when i tried to change to BigQueryCreateExternalTableOperator This other error occur
Broken DAG: [/opt/airflow/dags/injest_data.py] Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/models/baseoperator.py", line 411, in apply_defaults
result = func(self, **kwargs, default_args=default_args)
File "/home/airflow/.local/lib/python3.7/site-packages/airflow/models/baseoperator.py", line 760, in __init__
f"Invalid arguments were passed to {self.__class__.__name__} (task_id: {task_id}). "
airflow.exceptions.AirflowException: Invalid arguments were passed to BigQueryCreateExternalTableOperator (task_id: bq_external_table_task). Invalid arguments were:
**kwargs: {'tables_resouces': {'tableReferences': {'projectId': 'de-projects-373304', 'datasetId': 'stockmarket_dataset', 'tableId': 'stockmarket_ex
Thanks in advance for your help...
I have tried to change the google query operators and even try to used different method to upload the data to bigquery but says schema dont exist, please i need help to understand what am doing wrong. Thanks in advance for your help, below is the code causing the error
bq_external_table_task = BigQueryCreateExternalTableOperator(
task_id = "bq_external_table_task",
tables_resouces = {
"tableReferences": {
"projectId": PROJECT_ID,
"datasetId": BIGQUERY_DATASET,
"tableId":f"{DATASET}_external_table",
},
"externalDataConfiguration": {
"autodetect": True,
"sourceFormat": f"{INPUT_FILETYPE.upper()}",
"sourceUris": [f"gs://{BUCKET_NAME}/*"],
},
},
)
There is no sourceUris named parameter in GCSToBigQueryOperator. It should have source_objects. Kindly check the operator's parameters from below official document:
GCSToBigQueryOperator
Your BigQueryCreateExternalTableOperator has also wrong parameter names.
tables_resouces should have table_resource. You can also check this operator's parameters from official document:
BigQueryCreateExternalTableOperator
Related
I am trying to execute python code on a dataproc cluster via airflow orchestration.
I am using airflow 1.10.12, and DataprocWorkflowTemplateInstantiateInlineOperator to instanciate a dataproc cluster & pass some parameters (and templated params aswell). The main objective is to run some prediction code.
Note that I upgraded this code so use airflow.providers.google.cloud.operators.dataproc and not airflow.contrib.operators.dataproc_operator to import DataprocInstantiateInlineWorkflowTemplateOperator, because the former introduced the parameters kwarg that, in theory, permits passing arguments to the cluster. Using the later in my other scripts, I have no errors, but I cannot introduce parameters to the cluster.
...
from airflow.providers.google.cloud.operators.dataproc import (
DataprocInstantiateInlineWorkflowTemplateOperator,
)
...
workflow_seg_members = make_workflow_template(
region=REGION,
dataproc_job_bucket=DATAPROC_JOB_BUCKET,
python_main_executable_path="segmentation_members/seg_members_prediction.py",
)
op_seg_members_prediction = DataprocInstantiateInlineWorkflowTemplateOperator(
task_id="seg_members_prediction",
project_id=PROJECT_ID,
region=REGION,
template=workflow_seg_members,
parameters={
"execution_date_str": "{{ds_nodash}}",
"project": "<REDACTED>",
"dataset": "<REDACTED>",
"features_table_prefix": "global_features",
"output_table_prefix": "seg_members_output",
"path_to_model": "segmentation_members/DecisionTreeClassifier.pkl",
"bucket_name": "<REDACTED>",
"model_designation": "Segmentation Members",
},
)
in seg_members_prediction.py, I use argparse to create the needed arguments.
The error I am getting is :
TypeError: Parameter to MergeFrom() must be instance of same class: expected google.cloud.dataproc.v1beta2.OrderedJob got str.
My questions are :
How do I fix this MergeFrom() exception?
Is this the right approach to pass parameters from airflow to my dataproc cluster?
Here is the complete stack :
File "/usr/local/lib/airflow/airflow/providers/google/common/hooks/base_google.py", line 373, in inner_wrapper
return func(self, *args, **kwargs)
File "/usr/local/lib/airflow/airflow/providers/google/cloud/hooks/dataproc.py", line 712, in instantiate_inline_workflow_template
metadata=metadata,
File "/opt/python3.6/lib/python3.6/site-packages/google/cloud/dataproc_v1beta2/gapic/workflow_template_service_client.py", line 488, in instantiate_inline_workflow_template
request_id=request_id,
TypeError: Parameter to MergeFrom() must be instance of same class: expected google.cloud.dataproc.v1beta2.OrderedJob got str.
[2022-06-13 12:36:34,696] {taskinstance.py:1197} INFO - Marking task as UP_FOR_RETRY. dag_id=ds_seg_members_integration, task_id=seg_members_prediction, execution_date=20220610T162234, start_date=20220613T123629, end_date=20220613T123634
Traceback (most recent call last):
File "/usr/local/bin/airflow", line 7, in <module>
exec(compile(f.read(), __file__, 'exec'))
File "/usr/local/lib/airflow/airflow/bin/airflow", line 37, in <module>
args.func(args)
File "/usr/local/lib/airflow/airflow/utils/cli.py", line 76, in wrapper
return f(*args, **kwargs)
File "/usr/local/lib/airflow/airflow/bin/cli.py", line 561, in run
_run(args, dag, ti)
File "/usr/local/lib/airflow/airflow/bin/cli.py", line 480, in _run
pool=args.pool,
File "/usr/local/lib/airflow/airflow/utils/db.py", line 74, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/airflow/airflow/models/taskinstance.py", line 986, in _run_raw_task
result = task_copy.execute(context=context)
File "/usr/local/lib/airflow/airflow/providers/google/cloud/operators/dataproc.py", line 1748, in execute
metadata=self.metadata,
File "/usr/local/lib/airflow/airflow/providers/google/common/hooks/base_google.py", line 373, in inner_wrapper
return func(self, *args, **kwargs)
File "/usr/local/lib/airflow/airflow/providers/google/cloud/hooks/dataproc.py", line 712, in instantiate_inline_workflow_template
metadata=metadata,
File "/opt/python3.6/lib/python3.6/site-packages/google/cloud/dataproc_v1beta2/gapic/workflow_template_service_client.py", line 488, in instantiate_inline_workflow_template
request_id=request_id,
TypeError: Parameter to MergeFrom() must be instance of same class: expected google.cloud.dataproc.v1beta2.OrderedJob got str.
EDIT :
I tried running the following code but still got the same error :
op_seg_members_prediction = DataprocInstantiateInlineWorkflowTemplateOperator(
task_id="seg_members_prediction",
project_id=PROJECT_ID,
region=REGION,
template=workflow_seg_members,
)
op_seg_members_prediction.execute(context="DEBUG")
After browsing the apache-airflow-providers-google doc I discovered it isn't compatible with airflow 1.
You can install this package on top of an existing Airflow 2
installation (see Requirements below for the minimum Airflow version
supported) via pip install apache-airflow-providers-google
The package supports the following python versions: 3.7,3.8,3.9,3.10
So I upgraded to airflow 2 in my local environment, and the MergeFrom() error stopped occurring.
This still begs the question : How do you pass parameters from airflow to dataproc for airflow 1
I am trying to make SignalProcessor as per haystack documentation, here is my code:
from haystack.signals import RealtimeSignalProcessor
from products.models import ProductCreateModel
from django.db import models
from star_ratings.models import Rating
class BatchingSignalProcessor(RealtimeSignalProcessor):
def handle_save(self):
using_backends = self.connection_router.for_write(instance=instance)
for using in using_backends:
try:
index = self.connections[using].get_unified_index().get_index(instance.__class__)
index.update_object(instance, using=using)
except NotHandled:
# TODO: Maybe log it or let the exception bubble?
pass
def setup(self):
models.signals.post_save.connect(self.handle_save, sender=Rating)
Full error:
Unhandled exception in thread started by <function check_errors.<locals>.wrapper at 0x00000240811ED400>
Traceback (most recent call last):
File "C:\Users\lenovo\AppData\Local\conda\conda\envs\myDjangoEnv\lib\site-packages\django\utils\autoreload.py", line 225, in wrapper
fn(*args, **kwargs)
File "C:\Users\lenovo\AppData\Local\conda\conda\envs\myDjangoEnv\lib\site-packages\django\core\management\commands\runserver.py", line 112, in inner_run
autoreload.raise_last_exception()
File "C:\Users\lenovo\AppData\Local\conda\conda\envs\myDjangoEnv\lib\site-packages\django\utils\autoreload.py", line 248, in raise_last_exception
raise _exception[1]
File "C:\Users\lenovo\AppData\Local\conda\conda\envs\myDjangoEnv\lib\site-packages\django\core\management\__init__.py", line 327, in execute
autoreload.check_errors(django.setup)()
File "C:\Users\lenovo\AppData\Local\conda\conda\envs\myDjangoEnv\lib\site-packages\django\utils\autoreload.py", line 225, in wrapper
fn(*args, **kwargs)
File "C:\Users\lenovo\AppData\Local\conda\conda\envs\myDjangoEnv\lib\site-packages\django\__init__.py", line 24, in setup
apps.populate(settings.INSTALLED_APPS)
File "C:\Users\lenovo\AppData\Local\conda\conda\envs\myDjangoEnv\lib\site-packages\django\apps\registry.py", line 120, in populate
app_config.ready()
File "C:\Users\lenovo\AppData\Local\conda\conda\envs\myDjangoEnv\lib\site-packages\haystack\apps.py", line 28, in ready
self.signal_processor = signal_processor_class(connections, connection_router)
File "C:\Users\lenovo\AppData\Local\conda\conda\envs\myDjangoEnv\lib\site-packages\haystack\signals.py", line 20, in __init__
self.setup()
File "c:\Users\lenovo\Desktop\My_Django_Stuff\bekaim\search\signals.py", line 21, in setup
models.signals.post_save.connect(self.handle_save, sender=Rating)
File "C:\Users\lenovo\AppData\Local\conda\conda\envs\myDjangoEnv\lib\site-packages\django\db\models\signals.py", line 28, in connect
weak=weak, dispatch_uid=dispatch_uid,
File "C:\Users\lenovo\AppData\Local\conda\conda\envs\myDjangoEnv\lib\site-packages\django\db\models\signals.py", line 23, in _lazy_method
return partial_method(sender)
File "C:\Users\lenovo\AppData\Local\conda\conda\envs\myDjangoEnv\lib\site-packages\django\dispatch\dispatcher.py", line 90, in connect
raise ValueError("Signal receivers must accept keyword arguments (**kwargs).")
ValueError: Signal receivers must accept keyword arguments (**kwargs)
Here you can see documentation - https://django-haystack.readthedocs.io/en/v2.2.0/signal_processors.html#custom-signalprocessors
Here is what I find in django documentation but unable to figure out solution - https://docs.djangoproject.com/en/2.0/_modules/django/dispatch/dispatcher/
How to resolve this error ?
TL;DR:
you should replace
def handle_save(self):
with
def handle_save(self, **kwargs):
The thing is, I don't think you're looking at the right documentation. I'd like to suggest this instead, where it states
Notice that the function takes a sender argument, along with wildcard keyword arguments (**kwargs); all signal handlers must take these arguments.
We’ll look at senders a bit later, but right now look at the **kwargs argument. All signals send keyword arguments, and may change those keyword arguments at any time. In the case of request_finished, it’s documented as sending no arguments, which means we might be tempted to write our signal handling as my_callback(sender).
This would be wrong – in fact, Django will throw an error if you do so. That’s because at any point arguments could get added to the signal and your receiver must be able to handle those new arguments.
Basically, the error handling can be seen in source code for the connect method as,
# Check for **kwargs
if not func_accepts_kwargs(receiver):
raise ValueError("Signal receivers must accept keyword arguments (**kwargs).")
So, since your specified recevier (handle_save in this case) is not accepting **kwargs it inevitably return an error.
Your confusion comes from the fact that you're overriding the built-in handle_save method of haystack. As you can see in the source code, it was initially including the **kwargs, but by overriding it. you changed the method's definition. Thus, resulting in the error.
I have been writing a command line programs with Argparse for some time now, and I am trying to write it in such a way that when the user supplies the following to the command line:
$python my_script.py -h
A help section (usage) will be printed out that prints out help section of the main parser, as well as brief overviews of the subparsers.
But right now, anytime I type in the previous line into my terminal, I receive no usage and instead get a massive traceback and the following error:
TypeError: expected string or buffer
This error has never occurred to me before with argparse-based command line programs. Furthermore, if I supply the name of one of the subparsers,
$python my_script.py subparserA -h
I get a print-out of the subparser's usage. The same holds true for other subparsers.
So why is it not possible for me to get the usage for the main parser? This worked for me before so I don't know why it's not working now. I really would like for the user to be able to look at an overview of the different subparsers available.
My basic code is currently set up in the following way:
import argparse
import sys
if __name__ == "__main__":
Parser = argparse.ArgumentParser(prog= "My_program")
Parser.description= "This program does A and B things."
subparsers= Parser.add_subparsers(help= "SubparserA does A things and SubparserB does B things", dest='mode')
subparserA= subparsers.add_parser("subparserA", help= "Additional explanation of what A things entail")
subparserA.add_arguments("-foo", required=True, help= "foo is needed for SubparserA to work")
subparserB= subparsers.add_parser("subparserB", help="Additional explanation of what B things entail")
subparserB.add_argument("-bar", required=True, help= "bar is needed for SubparserB to work")
args= Parser.parse_args()
if args.mode == "subparserA":
###do things pertinent to subparserA
elif args.mode== "subparserB":
###do things pertinent to subparserB
else:
argparse.print_help()
argparse.ArgumentError("too few arguments")
UPDATE
Here is the full traceback of the error:
Traceback (most recent call last):
File "my_program.py", line 164, in <module>
args= Parser.parse_args()
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/argparse.py", line 1701, in parse_args
args, argv = self.parse_known_args(args, namespace)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/argparse.py", line 1733, in parse_known_args
namespace, args = self._parse_known_args(args, namespace)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/argparse.py", line 1939, in _parse_known_args
start_index = consume_optional(start_index)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/argparse.py", line 1879, in consume_optional
take_action(action, args, option_string)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/argparse.py", line 1807, in take_action
action(self, namespace, argument_values, option_string)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/argparse.py", line 996, in __call__
parser.print_help()
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/argparse.py", line 2340, in print_help
self._print_message(self.format_help(), file)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/argparse.py", line 2314, in format_help
return formatter.format_help()
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/argparse.py", line 281, in format_help
help = self._root_section.format_help()
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/argparse.py", line 211, in format_help
func(*args)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/argparse.py", line 485, in _format_text
return self._fill_text(text, text_width, indent) + '\n\n'
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/argparse.py", line 621, in _fill_text
text = self._whitespace_matcher.sub(' ', text).strip()
TypeError: expected string or buffer
You should be using
Parser.print_help()
Parser.error('too few arguments')
That is use methods of the existing Parser object.
When I run your script
1019:~/mypy$ python stack46754855.py
Traceback (most recent call last):
File "stack46754855.py", line 10, in <module>
subparserA= subparsers.add_parser("subparserA", help= "Additional explanation of what A things entail", dest= 'mode')
File "/usr/lib/python2.7/argparse.py", line 1066, in add_parser
parser = self._parser_class(**kwargs)
TypeError: __init__() got an unexpected keyword argument 'dest'
dest is a not a valid parameter for the add_parser method. It is a valid, and useful, parameter for add_subparsers.
subparsers= Parser.add_subparsers(dest='mode')
It also objects to the add_arguments method.
After correction those I get:
1022:~/mypy$ python stack46754855.py
usage: My_program [-h] {subparserA,subparserB} ...
My_program: error: too few arguments
In Py2, subparsers is a required argument. It is optional in Py3 (a bug), allowing the script to run to the invalid argparse.print_help call:
1022:~/mypy$ python3 stack46754855.py
Traceback (most recent call last):
File "stack46754855.py", line 27, in <module>
argparse.print_help()
AttributeError: module 'argparse' has no attribute 'print_help'
With the change I suggested above:
1025:~/mypy$ python3 stack46754855.py
usage: My_program [-h] {subparserA,subparserB} ...
This program does A and B things.
positional arguments:
{subparserA,subparserB}
SubparserA does A things and SubparserB does B things
subparserA Additional explanation of what A things entail
subparserB Additional explanation of what B things entail
optional arguments:
-h, --help show this help message and exit
usage: My_program [-h] {subparserA,subparserB} ...
My_program: error: too few arguments
The second usage comes from the Parser.error call.
I can't reproduce your
massive traceback and the following error:
TypeError: expected string or buffer
I need to see that traceback (or part of it) to see what exactly is raising the error. That's not a normal argparse error; certainly it isn't one that argparse traps and reroutes.
More on the required/not required subparser behavior at How to Set a Default Subparser using Argparse Module with Python 2.7
Use + instead of , for multi line help string in parser.add_argument. If you have split you argument help in multiple lines using ',' then, you will see this issue
parser.add_argument("xml",help=("long help here",
" long help second line"))
This will result in above exception
instead
parser.add_argument("xml",help=("long help here" +
" long help second line"))
Why is it this code won't work and give AttributeError?
internship = parser.find_all('a', attrs = {'title': lambda job: job.startswith('Internship')})
while this one works:
internship = parser.find_all('a', attrs = {'title': lambda job: job and job.startswith('Internship')})
This is the error that I got from the first code:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\lib\site-packages\bs4\element.py", line 1299, in find_all
return self._find_all(name, attrs, text, limit, generator, **kwargs)
File "C:\Python27\lib\site-packages\bs4\element.py", line 549, in _find_all
found = strainer.search(i)
File "C:\Python27\lib\site-packages\bs4\element.py", line 1690, in search
found = self.search_tag(markup)
File "C:\Python27\lib\site-packages\bs4\element.py", line 1662, in search_tag
if not self._matches(attr_value, match_against):
File "C:\Python27\lib\site-packages\bs4\element.py", line 1722, in _matches
return match_against(markup)
File "<stdin>", line 1, in <lambda>
AttributeError: 'NoneType' object has no attribute 'startswith'
In the first line of code, you are getting the attribute error because the code assumes that job contains a string, which has the method startswith(), but it doesn't contain a string, it contains None.
In the second line of code, you are not getting the attribute error because the code is testing to see if job contains None, before calling startswith() on it. Another (not quite equivalent but arguably better) way to express
lambda job: job and job.startswith('Internship')
is
lambda job: job.startswith('Internship') if job else False
When I use %matplotlib inline in my program, I get a ValueError. What does this error mean, and how can I resolve it?
Here is the error:
Traceback (most recent call last):
File "main.py", line 40, in <module>
ct.iloc[:-1,:-1].plot(kind='bar',stacked=True,color=['red','blue'],grid='false')
File "/usr/lib/python2.7/dist-packages/pandas/tools/plotting.py", line 1735, in plot_frame
plot_obj.generate()
File "/usr/lib/python2.7/dist-packages/pandas/tools/plotting.py", line 907, in generate
self._adorn_subplots()
File "/usr/lib/python2.7/dist-packages/pandas/tools/plotting.py", line 1012, in _adorn_subplots
ax.grid(self.grid)
File "/usr/lib/pymodules/python2.7/matplotlib/axes.py", line 2176, in grid
b = _string_to_bool(b)
File "/usr/lib/pymodules/python2.7/matplotlib/axes.py", line 54, in _string_to_bool
raise ValueError("string argument must be either 'on' or 'off'")
ValueError: string argument must be either 'on' or 'off'
When asking a question you should follow these guidlines: https://stackoverflow.com/help/mcve
instead of just posting a traceback.
That said tracebacks can be very useful and following yours you'll be able to figure out the problem.
Using the final line of your traceback can be very useful. One of the string arguments you are passing should only be 'on' or 'off'. Based on this we can then look at the grid option as this is a boolean option.
I tested this like so:
%matplotlib inline
import matplotlib.pyplot as plt
plt.plot([23,4],[4,6])
plt.grid('false')
giving the same error you got.
To fix this you should use either grid = 'off' or grid = False as options. In my example above I would change that to plt.grid('off')