Google Dataflow: create templates with runtime parameters - google-cloud-platform

In Data flow, I need to pass start Date and end date as runtime arguments and query bigquery for that date range and write output to day wise folders.
When we use ValueProvider, getStartDate().get() method is throwing java.lang.RuntimeException: Not called from a runtime context. If I hardcode some value when getStartDate().get().isAccessible() is false, template is being generated but the runtime arguments are not reflecting in job. It is always running with the hardcoded value during template creation.
Any suggestions ?

BigQueryIO takes a ValueProvider of the query. The easiest way to do this is to pass the query text as the runtime value.
NestedValueProvider could help you create the query string from another value provider, alas, NestedValueProvider only support one input ValueProvider at a time. So you could concatenate your start and end dates into a single value and then do the split.

Related

Passing dynamic arguments into another pipeline Googel Data Fusion

I am trying to query big query to get max date of two columns in google data fusion and pass the result into another pipeline as run time arguments.
select max(datecolumn1) as passargmnt1, max(datecolumn2) as passargmnt2 from dummy table;
upon research, it looks like Bigquery Argument setter might help...but the documentation is not much of a help.
can anyone provide some detail on how to achieve this ? Any better alternative solution is also preferred.
DK
I tried big query execute plugin and choose RUN AS ARGUMENTS but the didn't help

Google Cloud Data Fusion - Dynamic arguments based on functions

Good morning all,
I'm looking in Google Data Fusion for a way to make dynamic the name of a source file stored on GCS. The files to be processed are named according to their value date, example: 2020-12-10_data.csv
My need would be to set the filename dynamically so that the pipeline uses the correct file every day (something like this: ${ new Date(). Getfullyear()... }_data.csv
I managed to use the arguments in runtime by specifying the date as a string (2020-12-10) but not with a function.
And more generally is there any documentation on how to enter dynamic parameters with ready-made or custom "functions" (I couldn't find it)
Thanks in advance for your help.
There is a readymade workaround, you can give a try "BigQuery Execute" plugin.
Steps:
Put below query in SQL
select cast(current_date as string) ||'_data.csv' as filename
--for output '2020-12-15_data.csv'
Row As Arguments to 'true'
Now use the above arguments via ${filename} wherever you want to.

Handling invalid dates in Oracle

I am writing simple SELECT queries which involve parsing out date from a string.
The dates are typed in by users manually in a web application and are recorded as string in database.
I am having CASE statement to handle various date formats and use correct format specifier accordingly in TO_DATE function.
However, sometimes, users enter something that's not a valid date(e.g. 13-31-2013) by mistake and then the entire query fails. Is there any way to handle such rougue records and replace them with some default date in query so that the entire query does not fail due to single invalid date record?
I have already tried regular expressions but they are not quite reliable when it comes to handling leap years and 30/31 days in months AFAIK.
I don't have privileges to store procedures or anything like that. Its just plain simple SELECT query executed from my application.
This is a client task..
The DB will give you an error for an invalid date (the DB does not have a "TO_DATE_AND_FIX_IF_NOT_CORRECT" function).
If you've got this error- it means you already tried to cast something to an invalid date.
I recommend doing the migration to date on your application server, and in the case of exception from your code - send a default date to the DB.
Also, that way you send to the DB an object of type DbDate and not a string.
That way you achieve two goals:
1. The dates will always be what you want them to be (from the client).
2. You close the door for SQL Injection attacks.
It sounds like in your case you should write the function I mentioned...
it should look something like that:
Create or replace function TO_DATE_SPECIAL(in_date in varchar2) return DATE is
ret_val date;
begin
ret_val := to_date(in_date,'MM-DD-YYYY');
return ret_val;
exception
when others then
return to_date('01-01-2000','MM-DD-YYYY');
end;
within the query - instead of using "to_date" use the new function.
that way instead of failing - it will give you back a default date.
-> There is not IsDate function .. so you'll have to create an object for it...
I hope you've got the idea and how to use it, if not - let me know.
I ended up using crazy regex that checks leap years, 30/31 days as well.
Here it is:
((^(0?[13578]|1[02])[\/.-]?(0?[1-9]|[12][0-9]|3[01])[\/.-]?(18|19|20){0,1}[0-9]{2}$)|(^(0?[469]|11)[\/.-]?(0?[1-9]|[12][0-9]|30)[\/.-]?(18|19|20){0,1}[0-9]{2}$)|(^([0]?2)[\/.-]?(0?[1-9]|1[0-9]|2[0-8])[\/.-]?(18|19|20){0,1}[0-9]{2}$)|(^([0]?2)[\/.-]?29[\/.-]?(((18|19|20){0,1}(04|08|[2468][048]|[13579][26]))|2000|00)$))
It is modified version of the answer by McKay here.
Not the most efficient but it works. I'll wait to see if I get a better alternative.

Oracle Service Bus DB Adapter

I am trying to use Oracle Service Bus DB Adapter to create a REST based service. There are four paramters that get passed in the query out of which at any time only 2 are passed. For example:
http://www.example.com/findPerson/personId=&birthDt=&ss=&lastname=
birthDt is always passed, but only 1 of the other 3 are passed. The other parameters are empty.
For me to do a database lookup, all I need is birthDt and 1 of the other 3 passed.
Is there a way in OSB to do a conditional select based on what is passed in? Do I do a Select or "Query By Example" or "Invoke a stored procedure" that returns what I need?
In the response to the REST service call, I need to return several elements in an XML format.
You could create a stored procedure in the backend which has all the input parameters as input (and 3 of them have 'default null')
create or replace procedure my_procedure
(p_parm1 in varchar2 default null, etc ..
and in the stored procedure you check what parameters are filled to construct your select statement.
In the xquery on the osb you will need to check which parameters from your rest call are filled in, to map these on the optional parameters of your stored procedure call.
Or you can use the 'select statement' option in the db adapter and use some construction like this :
select *
from my_table
where kolom1 = :p_name or :p_name is null
Now you can expand the whole query based on the values of your input parameters
Also for this case you need an xquery in the osb which will 'map' your rest parameters to the select statement parameters.
Easiest way is i think to just pass on the whole query-parameter string into your xquery and use substring/substring-after etc to get the different parameters out of it together with their values and map these values to the input xml payload of your db adapter call.

Custom Date Aggregate Function

I want to sort my Store models by their opening times. Store models contains is_open function which controls Store's opening time ranges and produces a boolean if it's open or not. The problem is I don't want to sort my queryset manually because of efficiency problem. I thought if I write a custom annotate function then I can filter the query more efficiently.
So I googled and found that I can extend Django's aggregate class. From what I understood, I have to use pre-defined sql functions like MAX, AVG etc. The thing is I want to check that today's date is in a given list of time intervals. So anyone can help me that which sql name should I use ?
Edit
I'd like to put the code here but it's really a spaghetti one. One pages long code only generates time intervals and checks the suitable one.
I want to avoid :
alg= lambda r: (not (s.is_open() and s.reachable))
sorted(stores,key=alg)
and replace with :
Store.objects.annotate(is_open = CheckOpen(datetime.today())).order_by('is_open')
But I'm totally lost at how to write CheckOpen...
have a look at the docs for extra