Dataflow- dynamic create disposition Apache Beam - google-cloud-platform

I want to dynamically choose from Create Disposition options depending on the arguments.
In the the DataflowPipelineOptions I am accepting load type in a ValueProvider via arguments. However I am not able to get the string from the ValueProvider to decide on what create disposition option to use.
withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED)
I want 'CREATE_IF_NEEDED' to be dynamic. I want to replace this with something like this. Note following is just a pseudocode. I am looking for solution here.
create_disp = options.getLoad()
withCreateDisposition(create_disp

You can pass a program argument representing createDisposition
Program argument (CREATE_NEVER or CREATE_IF_NEEDED) :
--bqCreateDisposition=CREATE_NEVER
In the Option class in Java, you can pass a field as Enum (there is a default value in this case with CREATE_IF_NEEDED) :
import org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO;
import org.apache.beam.sdk.options.Default;
import org.apache.beam.sdk.options.Default.Enum;
import org.apache.beam.sdk.options.Description;
import org.apache.beam.sdk.options.PipelineOptions;
public interface MyOptions extends PipelineOptions {
#Description("BQ create disposition")
#Default
#Enum("CREATE_IF_NEEDED")
BigQueryIO.Write.CreateDisposition getBqCreateDisposition();
void setBqCreateDisposition(BigQueryIO.Write.CreateDisposition value);
}

Related

How can we pass two agruments in Terraform import script?

I need to import an existing order to terraform state.
For example:
Consider I need to pass ID and Environment values to import the script.
if we have to pass only one argument say ID, we can use the below script
terraform import hashicups_order.sample {id}
In my case I need to pass two arguments, we can say it id and environmentValue. So how can we do that?
terraform import hashicups_order.sample {id} {one more argument???}
TF import has the following form
terraform import [options] ADDRESS ID
ADDRESS ID is a single value (not multiple values) uniquely identifying the resource to be imported.
If you wish to pass any other values to import you have to use -var in [options] as explained in the docs.

How to instantiate a class one time and access it in views

I have a class that runs once, which I had in myapp/__init__.py, but each time django starts it would run twice. It also runs when I migrate models, when I don't need it to.
I've read about the ready function https://docs.djangoproject.com/en/dev/ref/applications/#django.apps.AppConfig.ready, but cannot access the instantiated class outside of apps.py
Here is my current workflow:
in init.py:
from .my_module import ResourceHeavyClass
resource_heavy_instance = ResourceHeavyClass()
in my views.py
from . import resource_heavy_instance
This currently works, but I only want to load the module when the server starts, not when I make migrations. Appreciate any tips/advice.
You could make use of a SimpleLazyObject to postpone the creation until you really need it. Like for example:
from .my_module import ResourceHeavyClass
from django.utils.functional import SimpleLazyObject
class SomeClass:
resource_heave_instance = SimpleLazyObject(ResourceHeavyClass)
Now as long as you do not fetch the SomeClass.resource_heave_instance, it will not create the ResourceHeavyClass.
So if you for example have a method, you can use it like:
def some_method():
resource_heave_instance = SomeClass.resource_heave_instance
So here, when you call the some_method, it fetches the attribute, and it will indeed construct the object. But as long as the attribute is not fetched, it will not create a ResourceHeavyClass object. Once constructed, it will not create the object a second time.
So if the attribute is not fetched by just interpreting the file (so only by calling functions, and other continuations), we are safe.

How to pass parameter to dataflow template for pipeline construction

I am trying make a ancestor query like this example and transfer it to template version.
The problem is that the parameter ancestor_id is for the function make_query during pipeline construction.
If I don't pass it when create and stage the template, I will get RuntimeValueProviderError: RuntimeValueProvider(option: ancestor_id, type: int).get() not called from a runtime context. But if I pass it at template creating, it seems like a StaticValueProvider that never change when I execute the template.
What is the correct way to pass parameter to template for pipeline construction?
import apache_beam as beam
from apache_beam.io.gcp.datastore.v1.datastoreio import ReadFromDatastore
from apache_beam.options.pipeline_options import PipelineOptions
from google.cloud.proto.datastore.v1 import entity_pb2
from google.cloud.proto.datastore.v1 import query_pb2
from googledatastore import helper as datastore_helper
from googledatastore import PropertyFilter
class Test(PipelineOptions):
#classmethod
def _add_argparse_args(cls, parser):
parser.add_value_provider_argument('--ancestor_id', type=int)
def make_query(ancestor_id):
ancestor = entity_pb2.Key()
datastore_helper.add_key_path(ancestor, KIND, ancestor_id)
query = query_pb2.Query()
datastore_helper.set_kind(query, KIND)
datastore_helper.set_property_filter(query.filter, '__key__', PropertyFilter.HAS_ANCESTOR, ancestor)
return query
pipeline_options = PipelineOptions()
test_options = pipeline_options.view_as(TestOptions)
with beam.Pipeline(options=pipline_options) as p:
entities = p | ReadFromDatastore(PROJECT_ID, make_query(test_options.ancestor_id.get()))
Two problems.
The ValueProvider.value.get() method can only run in a run-time method like ParDo.process(). See example.
Further, your challenge is that your are using Google Cloud Datastore IO (a query from datastore). As of today (May 2018),
the official documentation indicates that, Datastore IO is NOT accepting runtime template parameters yet.
For python, particularly,
The following connectors accept runtime parameters.
File-based IOs: textio, avroio, tfrecordio
A workaround: you probably can first run a query without any templated parameters to get a PCollection of entities. At this time, since any transformers can accept a templated parameter you might be able to use it as a filter. But this depends on your use case and it may not applicable to you.

Surprising behaviour using TemplateEngine with variable "URL" interpreted as class

Given the following Groovy code:
def engine = new SimpleTemplateEngine()
def propMap = [ URL: "http://stackoverflow.com",URL2: "http://stackoverflow.com"]
def result = engine.createTemplate('''
${URL}
${URL2}
''').make(propMap) as String
println(java.net.URL)
the output is
class java.net.URL
http://stackoverflow.com
Somehow the URL ends up being interpreted as class java.net.URL (which Groovy seems to be auto-importing), but why? And can a variable named URL used in this context?
Groovy is making several default imports, which also includes java.net. Import java.net.URL apparently shadows your local variable.
You could use this to explicitly tell Groovy to use your variable instead of java.net.URL.
${this.URL}
${URL2}
I also tried to use alias for import like this:
import java.net.URL as JavaURL
but it didn't really help, because both implicit (URL) and explicit (JavaURL) imports were used.

django import a view function

I have a django application xxx which does a number of things.
I also have a sepaerate application yyy. Which wants to call one of the functions of xxx.
Is there a way for me to import the functions?
For example, in yyy can i say
from toplevel.xxx import doit
Or what is the best approach, I dont want to duplicate code.
Of course, you can fo it.
With a proper import and parameter, you can do it.
#app: app1
#someview.py
def a_view(request, someparam):
#some code here
#app: app2
#otherview.py
from app1.someview import a_view
def another_view(request):
param = 1
a_view(request, param)
As for an example
UPDATE: Wish to mention that, your function a_view() do not have to get a parameter at all. So you can call functions with no paramaters. I just wish to mention that, if your function have paramaters, you have to pass them as if you do within an application.