Update Dataflow Streaming job with Session and Siding window embedded in DF

Update Dataflow Streaming job with Session and Siding window embedded in DF - google-cloud-platform

In my use-case, I'm performing Session as well as Sliding window inside Dataflow job. So basically my Sliding window timing is 10 hour with sliding time 4 min. Since I'm applying grouping and performing max function on top of that, on every 3 min interval, window will fire the pane and it will go into Session window with triggering logic on it. Below is the code for the same.
Window<Map<String, String>> windowMap = Window.<Map<String, String>>into(
SlidingWindows.of(Duration.standardHours(10)).every(Duration.standardMinutes(4)));
Window<Map<String, String>> windowSession = Window
.<Map<String, String>>into(Sessions.withGapDuration(Duration.standardHours(10))).discardingFiredPanes()
.triggering(Repeatedly
.forever(AfterProcessingTime.pastFirstElementInPane().plusDelayOf(Duration.standardSeconds(5))))
.withAllowedLateness(Duration.standardSeconds(10));
I would like to add logger on some steps for Debugging, so I'm trying to update the current streaming job using below code:
options.setRegion("asia-east1");
options.setUpdate(true);
options.setStreaming(true);
So previously I had around 10k data and I updated the existing pipeline using above config and now I'm not able to see that much data in steps of updated DF job. So help me with the understanding whether it preserves the previous job data or not as I'm not seeing previous DF step count in updated Job.

Related

Pytorch Lightning Tensorboard Logger Across Multiple Models

I'm relatively new to Lightning and Loggers vs manually tracking metrics. I am trying to train two distinct models and have their accuracy and loss plotted on the same charts in tensorboard (or any other logger) within Colab.
What I have right now is basically:
trainer1 = pl.Trainer(gpus=n_gpus, max_epochs=n_epochs, progress_bar_refresh_rate=20, num_sanity_val_steps=0)
trainer2 = pl.Trainer(gpus=n_gpus, max_epochs=n_epochs, progress_bar_refresh_rate=20, num_sanity_val_steps=0)
trainer1.fit(Model1, train_loader, val_loader)
trainer2.fit(Model2, train_loader, val_loader)
#Then later:
%load_ext tensorboard
%tensorboard --logdir lightning_logs/
What I'd like to see at this point are those logged metrics charted together on the same chart, any help would be appreciated. I've spent some time trying to toy with this but I'm a bit out of my depth on this, thank you!

The exact chart used for logging a specific metric depends on the key name you provide in the .log() call (its a feature that Lightning inherits from TensorBoard itself)
def validation_step(self, batch, _):
# This string decides which chart to use in the TB web interface
# vvvvvvvvv
self.log('valid_acc', acc)
Just use the same string for both .log() calls and have both runs saved in same directory.
logger = TensorBoardLogger(save_dir='lightning_logs/', name='model1')
logger = TensorBoardLogger(save_dir='lightning_logs/', name='model2')
If you run tesnsorboard --logdir ./lightning_logs pointing at the parent directory, you should be able to see both metric in the same chart with the key named valid_acc.

Power Query | Loop with delay

I'm new to PQ and trying to do following:
Get updates from server
Transform it.
Post data back.
While code works just fine i'd like it to be performed each N minutes until application closure.
Also LastMessageId variable should be revaluated after each call of GetUpdates() and i need to somehow call GetUpdates() again with it.
I've tried Function.InvokeAfter but didn't get how to run it more than once.
Recursion blow stack out ofc.
The only solution i see is to use List.Generate but struggle to understand how it can be used with delay.
let
//Get list of records
GetUpdates = (optional offset as number) as list => 1,
Updates = GetUpdates(),
// Store last update_id
LastMessageId = List.Last(Updates)[update_id],
// Prepare and response
Process = (item as record) as record =>
// Map Process function to each item in the list of records
Map = List.Transform(Updates, each Process(_))
in
Map

PowerBI does not support continuous automatic re-loading of data in the desktop.
Online, you can enforce a refresh as fast as 15 minutes using direct query1
Alternative methods:
You could do this in Excel and use VBA to re-execute the query on a schedule
Streaming data in PowerBI2
Streaming data with Flow and PowerBI3
1: Supported DirectQuery Sources
2: Realtime Streaming in PowerBI
3: Streaming data with Flow
4: Don't forget to enable historic logging!

Azure webjob is always "Running"

I just created an azure web job. I scheduled it to run every 1 minute:
0 */1 * * * *
This is the code
var host = new JobHost();
Console.WriteLine("Starting program...");
var unityContainer = new UnityContainer();
unityContainer.RegisterType<ProgramStarter, ProgramStarter>();
unityContainer.RegisterType<IOutgoingEmailRepository, OutgoingEmailRepository>();
unityContainer.RegisterType<IOutgoingEmailService, OutgoingEmailService>();
unityContainer.RegisterType<IDapperHelper, DapperHelper>();
//var game = unityContainer.Resolve<IOutgoingEmailRepository>();
var program = unityContainer.Resolve<ProgramStarter>();
program.Run().Wait();
Console.WriteLine("All done....");
host.RunAndBlock();
The problem is that the status never change to "success". Am I doing smth wrong? The followings are the app settings I use, should I change? I also noticed that it runs just the first time, I believe it is because it never ends

You could check your webkjob logs on KUDU.
If you use the above job in a RunAndBlock scenario, then your job has to be continuous. That means, the process will run all the time.
Obviously, you're using Trigger webjob here, not Continuous. RunAndBlock method can not be used here.
WEBJOBS_IDLE_TIMEOUT - Time in seconds after which we'll abort a
running triggered job's process if it's in idle, has no cpu time or
output (Only for triggered jobs).
In addition,I notice that you set WEBJOBS_IDLE_TIMEOUT to 100000.It seems that the value is too large so that it makes your webjob never stops for a long time when it's in idle.
You could also change the grace period of a job by specifying it (in seconds) in the settings.job file where the name of the setting is stopping_wait_time like so:
{ "stopping_wait_time": 60 }
More details ,please refer to this doc.
Hope it helps you.

Jupyter Ipywidgets behave Inconsistently when displaying and refreshing

I've made a jupyter notebook that displays some graphs and dataframes. I can then change the widget values to refresh the dataframes and graphs.
I'm running into two problems:
Every time I run the book for the first time after opening Jupyter notebook, I get this:
Yet when I run all cells again, I get the desired output:
Every time I use the widgets and make a change, the notebook eventually refreshes correctly, but for some reason it continually outputs the top few graphs, seems to clear the output, then show them again, then clear output, and after a few iterations of this, is able to display everything. Here is my event handler:
def handle_submit(sender):
clear_output(wait=True)
start = w.slider_start.value.strftime('%Y-%m-%d')
end = w.slider_end.value.strftime('%Y-%m-%d')
zval = z.value
yval = y.value
xval = x.value
df_dlm,df_dgo,df_stats = show_stats(zval,yval,xval,start,end)
df_e = show_stats_exp(zval,yval,xval,start,end)
display_dataframe(df_stats)
plot_lines_e(df_e)
dfx = prepare_data(df_dlm,df_dgo)
plot_lines(dfx)
plot_scatter(dfx)
where the display_dataframe, plot_lines, and plot_scatter functions all have a "display" type line in them to actually show the graph since %matplotlib inline does not play well with ipywidgets...feels like that has something to do with it, but not sure how to get around this.
FYI the graphs are done in Bokeh, certain dataframes are plotted at matplotlib objects.
Thanks for your help!

That warning is just a bug that should be fixed in the next version of ipywidgets I believe. In the mean time the only thing that you can do is wait ~10-20 seconds after loading or reloading the notebook and the "trusted" box appears in the right hand corner as picture below;
Try executing all of your code only after you see that box come up.

how to get the next scheduled trigger fire time in Quartz.net

This is my first Quartz.net project. I have done my basic homework and all my cron triggers fire correctly and life is good. However Iam having a hard time finding a property in the api doc. I know its there , just cannot find it. How do I get the exact time a trigger is scheduled to fire ? If I have a trigger say at 8:00 AM every day where in the trigger class is this 8:00 AM stored in ?
_quartzScheduler.ScheduleJob(job, trigger);
Program.Log.InfoFormat
("Job {0} will trigger next time at: {1}", job.FullName, trigger.WhatShouldIPutHere?);
So far I have tried
GetNextFireTimeUtc(), StartTimeUTC and return value of _quartzScheduler.ScheduleJob() shown above. Nothing else on http://quartznet.sourceforge.net/apidoc/topic645.html
The triggers fire at their scheduled times correctly. Just the cosmetics. thank you

As jhouse said ScheduleJob returns the next schedule.
I am using Quartz.net 1.0.3. and everything works fine.
Remember that Quartz.net uses UTC date/time format.
I've used this cron expression: "0 0 8 1/1 * ? *".
DateTime ft = sched.ScheduleJob(job, trigger);
If I print ft.ToString("dd/MM/yyyy HH:mm") I get this 09/07/2011 07.00
which is not right cause I've scheduled my trigger to fire every day at 8AM (I am in London).
If I print ft.ToLocalTime().ToString("dd/MM/yyyy HH:mm") I get what I expect 09/07/2011 08.00

You should get what you're after from getNextFireTime (the value from that method should be accurate after having called ScheduleJob()). Also the ScheduleJob() method should return the date of the first fire time.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Update Dataflow Streaming job with Session and Siding window embedded in DF - google-cloud-platform

Related

Pytorch Lightning Tensorboard Logger Across Multiple Models

Power Query | Loop with delay

Azure webjob is always "Running"

Jupyter Ipywidgets behave Inconsistently when displaying and refreshing

how to get the next scheduled trigger fire time in Quartz.net

Categories

Resources