How to compress before writing custom ETW logs? - compression

We are logging custom events from application to ETW. Now need to find the best compression technique as lots of data will be written to the file and the etl file is to be send with error report to the server.
How to specify the details? In System.Diagnostics.Tracing.EventSource itself, when setting it up?

Related

Convert a stream into mini batch for loading into bigquery

I would like to build the following pipeline:
pub/sub --> dataflow --> bigquery
The data is streaming, but I would like to avoid streaming the data directly into BigQuery, therefore I was hoping to batch up small chunks in the dataflow machine and then write them into BQ as a load job when they reach a certain size/time.
I cannot find any examples of how to do this using the python apache beam SDK - only Java.
This is work in progress. The FILE_LOADS method is only available for batch pipelines (with the use_beam_bq_sink experiments flag, it will be the default one in the future.
However, for streaming pipelines, as seen in the code it will raise a NotImplementedError with message:
File Loads to BigQuery are only supported on Batch pipelines.
There is an open JIRA ticket where you can follow the progress.

Update wowza StreamPublisher schedule via REST API (or alternative)

Just getting started with Wowza Streaming Engine.
Objective:
Set up a streaming server which live streams existing video (from S3) at a pre-defined schedule (think of a tv channel that linearly streams - you're unable to seek through).
Create a separate admin app that manages that schedule and updates the streaming app accordingly.
Accomplish this with as a little custom Java as possible.
Questions:
Is it possible to fetch / update streamingschedule.smil with the Wowza Streaming Engine REST API?
There are methods to retrieve and update specific SMIL files via the REST API, but they only seem to be applicable to those created through the manager. After all, streamingschedule.smil needs to be created manually by hand
Alternatively, is it possible to reference a streamingschedule.smil that exists on an S3 bucket? (In a similar way footage can be linked from S3 buckets with the use of the MediaCache module)
A comment here (search for '3a') seems to indicate it's possible, but there's a lot of noise in that thread.
What I've done:
Set up Wowza Streaming Engine 4.4.1 on EC2
Enabled REST API documentation
Created a separate S3 bucket and filled it with pre-recorded footage
Enabled MediaCache on the server which points to the above S3 bucket
Created a customised VOD edge application, with AppType set to Live and StreamType set to live in order to be able to point to the above (as suggested here)
Created a StreamPublisher module with a streamingschedule.smil file
The above all works and I have a working schedule with linearly streaming content pulled from an S3 bucket. Just need to be able to easily manipulate that schedule without having to manually edit the file via SSH.
So close! TIA
To answer your questions:
No. However, you can update it by creating an http provider and having it handle the modifications to that schedule. Should you want more flexibility here you can even extend the scheduler module to not require that file at all.
Yes. You would have to modify the ServerListenerStreamPublisher solution to accomplish it. Currently it solely looks a the local filesystem to read teh streamingschedule.smil file.
Thanks,
Matt

Informatica jobs real time jobs from web-service

We have requirement to load the data from web-service into the target database using Informatica. The web-service will be initiated by the source application whenever there is a change in source side. From Informatica side, we have to trigger the loading job whenever we receive the web-service instead of scheduling/batch jobs.
Please let me know if you have any option to achieve this using power exchange.
You could make use of HTTP transformation to load the data from web-service.
There is a demo on this in Informatica marketplace. Download the file there to get the complete implementation steps - https://marketplace.informatica.com/solutions/mapping_web_service_using_http_transformation
And with respect to triggering the work flows adhoc, may be you can make use of file watchers. Whenever there is a web service request, you can arrange to have a file transferred to your source location that indicates a new request. I am not sure if this is possible in your case. It would be great if you could provide more details. However, there is another demo here explaining implementation of file wather to auto trigger your workflows that could help -
https://marketplace.informatica.com/solutions/mapping_email_on_non_occurrence_of_event

Drupal 7 updating content via a service ( external source )

I need to create a service that will receive an XML feed at any given time that will have data related to a content-type.
Could someone please advise me what modules i should use to develop a solution.
So
Another server will post a xml feed with instruction add/delete/update content in the xml
I will require to update the content type from the XML feed post
I have previously used the migrate module but this is run on my side through cron or manual. The main difference here is that i could receive the post from the other server at any given time or possibly multiple concurrent posts.
This sounds like a job for the Services 3 module. You can make a resources module to that which parses your xml file an does the work for you. The services module is handling rest/rpc connections for you.

How data gets into HDFS files system

I am trying to understand how data from multiple sources and systems gets into HDF? I want to push web server log files form 30+ systems. These logs are sitting on 18 different servers.
Thx
Veer
You can create a map-reduce job. The input for your mapper would be a file sitting on a server, and your reducer would deduct to which path to put the file in hdfs. You can either aggregate all of your files in your reducer, or simply write the file as is at the given path.
You can use Oozie to schedule the job, or you can run it sporadically by submitting the map-reduce job on the server which hosts the job tracker service.
You could also create an java application that uses the hdfs api. The FileSystem object can be used to do standard file system operation, like writing a file to a given path.
Either way, you need to request the creation through hdfs api, because the name node is responsible for splitting the file in blocks and writing it on distributed servers.