Apache Beam GroupByKey doesn't seem to output anything - google-cloud-platform

I have a Beam Pipeline running on Google Dataflow that takes inputs from Pubsub and Groups them by Key before processing the time-series data. Somehow when I run the pipeline, the code seems unable to process the GroupByKey step and the rest of the steps don't get run. (See attached screenshot for console output at the Group By Key step)
Window<KV<String, Data>> window = Window.<KV<String, Data>>into(FixedWindows.of(Duration.standardHours(1)))
.triggering(Repeatedly.forever(pastEndOfWindow()))
.withAllowedLateness(Duration.standardSeconds(10))
.discardingFiredPanes();
PCollection<KV<String, List<Data>>> keyToDataList = data.apply("Add Event Timestamp", WithTimestamps.of(new EventTimestampFunction()))
.apply("Windowing", window)
.apply("Group by Key", GroupByKey.create())
.apply("Sort by date", ParDo.of(new SortDataFn()));
The strange thing is that the code used to work in the past but while testing out some small part modified downstream, the GroupByKey didn't work anymore. The worst thing was that reverting to the previous git version also didn't work. I tried cleaning the build, copying out the previous working files out into a new folder and they all didn't work. I'm tearing out my hair now so if anyone has some insights into this issue I'd really appreciate if you could help! Thanks in advance!!

Related

How to detect windows update status

I want to detect current windows 10 update status programmatically.
I tried wuapi and it works well but there are some problems in wuapi.
First, it takes long time to get update information.
Second, it can not be used at offline.
Is there any other method to detect current windows 10 update status?
Is there any registry or system file to detect it?
I tried procmon to analyse but there are too many files and registries linked with windows udpate.
Thank you...
There is no documented way to access the search results that Automatic Updates is using (the results that the Windows Update page in Settings displays).
However, there are two things that might be of use to you:
You can use IAutomaticUpdatesResults::LastInstallationSuccessDate to immediately see the last time the computer installed updates successfully. If all you want to know is "Is this PC processing updates successfully?", then this may be all you need.
You can use a Windows Update API search to see what updates are needed. Here's a script you can use as a starting point. If you use this script as written, it will go online to find newly-released updates, which isn't what you want in your scenario. But you can set your IUpdateSearcher object's Online property to false before calling Search. Doing that will perform an offline scan, in which WU just re-evaluates the updates it already knows about. This will work offline and will also return faster results.
"COM API
The COM API is a good way to directly access Windows Update without having to parse logs. Applications of this API range from finding available updates on the computer to installing and uninstalling updates.
You could use the Microsoft.Update.Session class to run an update search and then count the number of updates available to see if there are any updates for the computer.
PowerShell Example:
$updateObject = New-Object -ComObject Microsoft.Update.Session
$updateObject.ClientApplicationID = "Serverfault Example Script"
$updateSearcher = $updateObject.CreateUpdateSearcher()
$searchResults = $updateSearcher.Search("IsInstalled=0")
Write-Host $searchResults.Updates.Count
If the returned result is more than 0 then there are updates for the computer that need to be installed and/or downloaded. You can easily update the powershell script to fit your application.
Just a heads up, it appears that the search function is not async so it would freeze your application while searching. In that case you will want to make it async."
Registry method
Source:
https://serverfault.com/questions/891188/is-it-possible-to-detect-the-windows-update-status-via-registry-to-see-if-the-s

How do solve run time exception: Attempted to get side input window for GlobalWindow from non-global WindowFn

I am struggling to figure out what how I can resolve an issue I am seeing with this data flow job. I saw a similar thread on the apache beam archives question thread but I did not quite understand how to use this information.
Essentially data is being streamed into Big Query (which works), I am trying to write these BQ rows into spanner in the same dataflow job which raises the following runtime exception:
java.lang.IllegalArgumentException: Attempted to get side input window for GlobalWindow from non-global WindowFn
org.apache.beam.sdk.transforms.windowing.PartitioningWindowFn$1.getSideInputWindow(PartitioningWindowFn.java:47) ....
The relevant section of the data flow graph can be seen here data flow graph and the code I am using to write to spanner is here:
sensorReports
.apply("WindowSensorReportByMonth",
Window.<TableRow>into(FixedWindows.of(Duration.standardMinutes(5))).withAllowedLateness(Duration.ZERO).discardingFiredPanes()
.triggering(AfterProcessingTime.pastFirstElementInPane()
.plusDelayOf(Duration.standardMinutes(1)))
.discardingFiredPanes())
.apply("CreateSensorReportMutation", ParDo.of(new RowToMutationTransform()))
.apply("Write to Spanner",
SpannerIO.write()
.withDatabaseId(propertiesUtils.getSpannerDBId())
.withInstanceId(propertiesUtils.getSpannerInstanceId())
.withProjectId(propertiesUtils.getSpannerProjectId())
.withBatchSizeBytes(0));
SpannerIO.write() internally reads the DB schema using a global window and uses this as a side input, so your non-global-windowed Mutations are clashing with it.
You could put all your Mutations into a global window before passing to Spanner.IO.write()
.apply("To Global Window", Window.into(new GlobalWindows()))
but in BEAM versions 2.5-2.8, this will result in either an error or nothing ever being written (as SpannerIO never supported streaming pipelines).
Edited answer:
However, BEAM before version 2.9.0 does not support streaming pipelines. V2.4 and earlier did, provided you don't pass a windowed PCollection to it.
You will be pleased to hear that all is fixed in version 2.9 (release in progress) where the SpannerIO both supports streaming writes and handles the windowing correctly.

How to stop GStreamer DEBUG logs

I am enabling GST_DEBUG = 7 and then starting a media application, so I keep on getting the logs which I have set as "GST_DEBUG_FILE":"/var/log/gst-log".
Q. If I close the media application I stop getting the logs(which is fine), but is there any way to stop the GST_DEBUG logs using CLI ?
Please correct me if my thought process is not in right direction, I am new to this.
If I understand correctly, you would like to change the value of GST_DEBUG environment variable when GStreamer process is running. If so, in general you can't do that. There are however some ways to overcome that. Look at this and this question.

Jython 2.5.3 and time.sleep

I'm developing a small in house alternative to Tripwire, so I've coded a small script to hash files in a JBoss EAP server, and store the path and the hash in a MySQL database.
Every day the script compares the hashes in the filesystem with those saved in the DB, so any change is logged and finally reported using JasperServer.
The script runs at night using cron, to avoid a large number of scripts quering the DB at the same time it uses time.sleep(RANDOM_NUMBER_OF_SECONDS) before doing the fun stuff, but sometimes time.sleep seems to sleep forever and the script ends without any error, I check the mail cron sends and no error is logged. Any help would be appreciated. I'm Using jython-standalone-2.5.3, IBM's JDK and RHEL 5.6 running inside VMWare.
I just found http://bugs.jython.org/issue1974 and a code comment seems to point that OS signals can cause this behavior, but not sure if this is my case.
If you want to see the code checkout at http://code.google.com/p/pysnapshot/
Luis GarcĂ­a Bustos.
I don't know why do you think time.sleep() can make less number of scripts querying the DB.
IMO ot is better to use cron to call that program periodically. After it is started it should check if in /tmp/ directory is "semaphore" file, for example /tmp/snapshot_working.txt. If there is no semaphore file, then create it and write to it something like: "snapshot started: 2012-12-05 22:00:00". After your program completes checking it should remove this file. If at start program will find semaphore file then it could just stop or check if date & time saved in this file looks "old". If it is "old", then remove it and start normally writing in log that "old" file was found (administrator can find such long working snaphots and terminate it).
The only reason do make time.sleep() in your case is if you want to use such script at normal working hours without making Denial Of Service attack to your DB. Example: after making 100 DB queries you can make little sleep and give DB time to serve other user queries. But I think the sooner program finishes the better.

Selenium wait for download?

I'm trying to test the happy-path for a piece of code which takes a long time to respond, and then begins writing a file to the response output stream, which prompts a download dialog in browsers.
The problem is that this process has failed in the past, throwing an exception after this long amount of work. Is there a way in selenium to wait-for-download or equivalent?
I could throw in a Thread.sleep, but that would be inaccurate and unnecessarily slow down the test run.
What should I do, here?
I had the same problem. I invented something to solve the problem. A tempt file is created by Python with '.part' extension. So, if still we have the temp, python can wait for 10 second and check again if the file is downloaded or not yet.
while True:
if os.path.isfile('ts.csv.part'):
sleep(10)
elif os.path.isfile('ts.csv'):
break
else:
sleep(10)
driver.close()
So you have two problems here:
You need to cause the browser to download the file
You need to measure when the downloaded file is complete
Neither problemc an be directly solved by Selenium (yet - 2.0 may help), but both are solvable problems. The first problem can be solved by GUI automation toolkits, such as AutoIT. But they can also be solved by simply sending an automated keypress at the OS level that simulates the enter key (works for Firefox, a little harder on some versions of Chrome and Safari). If you're using Java, you can use Robot to do that. Other languages have similar toolkits to do such a thing.
The second issue is probably best solved with some sort of proxy solution. For example, if your browser was configured to go through a proxy and that proxy had an API, you could query the proxy with that API to ask when network activity had ended.
That's what we do at http://browsermob.com, which is a a startup I founded that uses Selenium to do load testing. We've released some of the proxy code as open source, available at http://browsermob.com/tools.
But two problems still persist:
You need to configure the browser to use the proxy. In Selenium 2 this is easier, but it's possible to do it with Selenium 1 as well. The key is just making sure that your browser launcher brings up the browser with the right profile/settings.
There currently is no API for BrowserMob proxy to tell you when network traffic has stopped! This is a big hole in the concept of the project that I want to fix as soon as I get the time. However, if you're keen to help out, join the Google Group and I can definitely point you in the right direction.
Hope that helps you identify your various options. Best of luck!
This is Chrome-testing-only solution for controlling the downloads with javascript..
Using WebDriver (Selenium2) it can be done within Chrome's chrome:// which is HTML/CSS/Javascript:
driver.get( "chrome://downloads/" );
waitElement( By.CssSelector("#downloads-summary-text") );
// next javascript snippet cancels the last/current download
// if your test ends in file attachment downloading
// you'll very likely need this if you more re-instantiated tests left
((JavascriptExecutor)driver).executeScript("downloads.downloads_[0].cancel_();");
There are other Download.prototype.functions in "chrome://downloads/downloads.js"
This suites you if you just need to test some info note eg. caused by file attachment starting activity, and not the file itself.
Naturally you need to control step 1. - mentioned by Patrick above - and by this you control step 2. FOR THE TEST, not for the functionality of actual file download completion / cancel.
See also : Javascript: Cancel/Stop Image Requests which is relating to Browser stopping.
This falls under the "things that can't be automated" category. Selenium is built with JavaScipt and due to JavaScript sandbox restrictions it can't access downloads.
Selenium 2 might be able to do this once Alerts/Prompts have been implemented but that this won't happen for the next little while yet.
If you want to check for the download dialog, try with AutoIt. I use that for uploading and downloading the files. Using AutoIt with Se RC is easier.
def file_downloaded?(file)
while File.file?(file) == false
p "File downloading in progress..."
sleep 1
end
end
*Ruby Syntax