How to pass parameter to dataflow template for pipeline construction

How to pass parameter to dataflow template for pipeline construction - python-2.7

I am trying make a ancestor query like this example and transfer it to template version.
The problem is that the parameter ancestor_id is for the function make_query during pipeline construction.
If I don't pass it when create and stage the template, I will get RuntimeValueProviderError: RuntimeValueProvider(option: ancestor_id, type: int).get() not called from a runtime context. But if I pass it at template creating, it seems like a StaticValueProvider that never change when I execute the template.
What is the correct way to pass parameter to template for pipeline construction?
import apache_beam as beam
from apache_beam.io.gcp.datastore.v1.datastoreio import ReadFromDatastore
from apache_beam.options.pipeline_options import PipelineOptions
from google.cloud.proto.datastore.v1 import entity_pb2
from google.cloud.proto.datastore.v1 import query_pb2
from googledatastore import helper as datastore_helper
from googledatastore import PropertyFilter
class Test(PipelineOptions):
#classmethod
def _add_argparse_args(cls, parser):
parser.add_value_provider_argument('--ancestor_id', type=int)
def make_query(ancestor_id):
ancestor = entity_pb2.Key()
datastore_helper.add_key_path(ancestor, KIND, ancestor_id)
query = query_pb2.Query()
datastore_helper.set_kind(query, KIND)
datastore_helper.set_property_filter(query.filter, '__key__', PropertyFilter.HAS_ANCESTOR, ancestor)
return query
pipeline_options = PipelineOptions()
test_options = pipeline_options.view_as(TestOptions)
with beam.Pipeline(options=pipline_options) as p:
entities = p | ReadFromDatastore(PROJECT_ID, make_query(test_options.ancestor_id.get()))

Two problems.
The ValueProvider.value.get() method can only run in a run-time method like ParDo.process(). See example.
Further, your challenge is that your are using Google Cloud Datastore IO (a query from datastore). As of today (May 2018),
the official documentation indicates that, Datastore IO is NOT accepting runtime template parameters yet.
For python, particularly,
The following connectors accept runtime parameters.
File-based IOs: textio, avroio, tfrecordio
A workaround: you probably can first run a query without any templated parameters to get a PCollection of entities. At this time, since any transformers can accept a templated parameter you might be able to use it as a filter. But this depends on your use case and it may not applicable to you.

Related

Dataflow- dynamic create disposition Apache Beam

I want to dynamically choose from Create Disposition options depending on the arguments.
In the the DataflowPipelineOptions I am accepting load type in a ValueProvider via arguments. However I am not able to get the string from the ValueProvider to decide on what create disposition option to use.
withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED)
I want 'CREATE_IF_NEEDED' to be dynamic. I want to replace this with something like this. Note following is just a pseudocode. I am looking for solution here.
create_disp = options.getLoad()
withCreateDisposition(create_disp

You can pass a program argument representing createDisposition
Program argument (CREATE_NEVER or CREATE_IF_NEEDED) :
--bqCreateDisposition=CREATE_NEVER
In the Option class in Java, you can pass a field as Enum (there is a default value in this case with CREATE_IF_NEEDED) :
import org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO;
import org.apache.beam.sdk.options.Default;
import org.apache.beam.sdk.options.Default.Enum;
import org.apache.beam.sdk.options.Description;
import org.apache.beam.sdk.options.PipelineOptions;
public interface MyOptions extends PipelineOptions {
#Description("BQ create disposition")
#Default
#Enum("CREATE_IF_NEEDED")
BigQueryIO.Write.CreateDisposition getBqCreateDisposition();
void setBqCreateDisposition(BigQueryIO.Write.CreateDisposition value);
}

How to instantiate a class one time and access it in views

I have a class that runs once, which I had in myapp/__init__.py, but each time django starts it would run twice. It also runs when I migrate models, when I don't need it to.
I've read about the ready function https://docs.djangoproject.com/en/dev/ref/applications/#django.apps.AppConfig.ready, but cannot access the instantiated class outside of apps.py
Here is my current workflow:
in init.py:
from .my_module import ResourceHeavyClass
resource_heavy_instance = ResourceHeavyClass()
in my views.py
from . import resource_heavy_instance
This currently works, but I only want to load the module when the server starts, not when I make migrations. Appreciate any tips/advice.

You could make use of a SimpleLazyObject to postpone the creation until you really need it. Like for example:
from .my_module import ResourceHeavyClass
from django.utils.functional import SimpleLazyObject
class SomeClass:
resource_heave_instance = SimpleLazyObject(ResourceHeavyClass)
Now as long as you do not fetch the SomeClass.resource_heave_instance, it will not create the ResourceHeavyClass.
So if you for example have a method, you can use it like:
def some_method():
resource_heave_instance = SomeClass.resource_heave_instance
So here, when you call the some_method, it fetches the attribute, and it will indeed construct the object. But as long as the attribute is not fetched, it will not create a ResourceHeavyClass object. Once constructed, it will not create the object a second time.
So if the attribute is not fetched by just interpreting the file (so only by calling functions, and other continuations), we are safe.

how to call event from another module in tkinter

i made a test app for reasons like this. I am trying to have the ButtonRelease-1 event call a function inside another file. I am getting a syntax when trying to run the app.
TypeError: listb() takes exactly 2 arguments (1 given)
this is pretty strait forward syntax but i cannot fix it in this specific situation. I am basically just having the event get the clicked info printed. It's the event that is no working because function inside other file is not reconizing the event?
anyways, curious how to fix this code so it works. The function has to stay in another file. This would be easy if it was in same file but it cannot be.
start.py
from Tkinter import *
import example_funcs as EF
class Page_three(Frame):
def __init__(self):
Frame.__init__(self)
self.pack()
self.listboxs()
def listboxs(self):
self.z = Listbox(self)
self.z.grid()
for item in range(1,10):
self.z.insert(END, item)
self.z.bind("<ButtonRelease-1>", EF.listb(self))
root = Tk()
app = Page_three()
app.mainloop()
example_funcs.py
from Tkinter import *
import Tkinter as tk
def listb(self, event):
selection = self.z.curselection()
print selection
self is used so variables can be called inside the function, if do not call self as instance it will have syntax of not finding my listbox variable.

Passing EF.listb(self) doesn't do what you want it to do. It doesn't partially bind the self parameter to the instance you're calling it from, then let the event paramter get filled in by the callback. Instead, it just calls the function immediately (before the bind call is made) and you get an error about using the wrong number of arguments.
There are a few different ways you could fix this issue.
One option would be to manually bind the self parameter to the listb function using functools.partial:
import example_funcs as EF
import functools
class Page_three(Frame):
...
def listboxs(self):
...
self.z.bind("<ButtonRelease-1>", functools.partial(EF.listb, self)) # bind self
Another approach would be to make listb an actual method in your class, so that you can reference it as a method on self. That could look like this:
import example_funcs as EF
class Page_three(Frame):
...
def listboxs(self):
...
self.z.bind("<ButtonRelease-1>", self.listb) # refer to a method without calling it
listb = EF.listb # add the function from the other module as a method on this class
If listb isn't used anywhere else though, then defining it in another module and copying it over here would be pretty silly. You should just move the definition into this class instead of adding a reference to it after the fact. On the other hand, if listb is being used in several different classes, it suggests that the classes should be using some kind of inheritance to share the method, rather than crudely copying references to the one definition around.

"Lazy load" of data from a context processor

In each view of my application I need to have navigation menu prepared. So right now in every view I execute complicated query and store the menu in a dictionary which is passed to a template. In templates the variable in which I have the data is surrounded with "cache", so even though the queries are quite costly, it doesn't bother me.
But I don't want to repeat myself in every view. I guessed that the best place to prepare the menu is in my own context processor. And so I did write one, but I noticed that even when I don't use the data from the context processor, the queries used to prepare the menu are executed. Is there a way to "lazy load" such data from CP or do I have to use "low level" cache in CP? Or maybe there's a better solution to my problem?

Django has a SimpleLazyObject. In Django 1.3, this is used by the auth context processor (source code). This makes user available in the template context for every query, but the user is only accessed when the template contains {{ user }}.
You should be able to do something similar in your context processor.
from django.utils.functional import SimpleLazyObject
def my_context_processor(request):
def complicated_query():
do_stuff()
return result
return {
'result': SimpleLazyObject(complicated_query)

If you pass a callable object into the template context, Django will evaluate it when it is used in the template. This provides one simple way to do laziness — just pass in callables:
def my_context_processor(request):
def complicated_query():
do_stuff()
return result
return {'my_info': complicated_query}
The problem with this is it does not memoize the call — if you use it multiple times in a template, complicated_query gets called multiple times.
The fix is to use something like SimpleLazyObject as in the other answer, or to use something like functools.lru_cache:
from functools import lru_cache:
def my_context_processor(request):
#lru_cache()
def complicated_query():
result = do_stuff()
return result
return {'my_info': complicated_query}
You can now use my_info in your template, and it will be evaluated lazily, just once.
Or, if the function already exists, you would do it like this:
from somewhere import complicated_query
def my_context_processor(request):
return {'my_info': lru_cache()(complicated_query)}
I would prefer this method over SimpleLazyObject because the latter can produce some strange bugs sometimes.
(I was the one who originally implemented LazyObject and SimpleLazyObject, and discovered for myself that there is curse on any code artefact labelled simple.)

django import a view function

I have a django application xxx which does a number of things.
I also have a sepaerate application yyy. Which wants to call one of the functions of xxx.
Is there a way for me to import the functions?
For example, in yyy can i say
from toplevel.xxx import doit
Or what is the best approach, I dont want to duplicate code.

Of course, you can fo it.
With a proper import and parameter, you can do it.
#app: app1
#someview.py
def a_view(request, someparam):
#some code here
#app: app2
#otherview.py
from app1.someview import a_view
def another_view(request):
param = 1
a_view(request, param)
As for an example
UPDATE: Wish to mention that, your function a_view() do not have to get a parameter at all. So you can call functions with no paramaters. I just wish to mention that, if your function have paramaters, you have to pass them as if you do within an application.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to pass parameter to dataflow template for pipeline construction - python-2.7

Related

Dataflow- dynamic create disposition Apache Beam

How to instantiate a class one time and access it in views

how to call event from another module in tkinter

"Lazy load" of data from a context processor

django import a view function

Categories

Resources