I want to create a job running on DataFlow (streaming format).
The function will be to receive files from Google Cloud Storage (Path: gs: // mybucket ....) and transfer this file to a server, running Windows Server.
Can anybody suggest me a solution?
Windows is not supported as a storage destination. Dataflow has a set of connectors that are used for data transfers/storage. Windows server is not one of them.
This link provides the current connectors that are avaiable.
Apache Beam Built-in I/O Transforms
Related
I've configured a local (java) environment for executing Apache Beam Pipeline following the official doc.
The sample project (WordCount) works perfectly, but now I'd like to change it and get input data from PubSub topic and put output to BigQuery.
I've already created my PubSub topic and my BigQuery dataset, but my question is: how can I configure REMOTE input (pubsub topic) and output (BigQuery) for a locally running Pipeline (for debug purpose)?
See classes PubsubIO / PubsubClient
(or com.google.cloud.pubsub.v1).
For remote access you'd most likely need to register an external IP.
I'm new to GCP, I'm trying to build an ETL stream that will upload data from files to BigQuery. It seems to me that the best solution would be to use gsutil. The steps I see today are:
(done) Downloading the .zip file from the SFTP server to the virtual machine
(done) Unpacking the file
Uploading files from VM to Cloud Storage
(done) Automatically upload files from Cloud Storage to BigQuery
Steps 1 and 2 would be performed according to the schedule, but I would like step 3 to be event driven. So when files are copied to a specific folder, gsutil will send them to the specified bucket in Cloud Storage. Any ideas how can this be done?
Assuming you're running on a Linux VM, you might want to check out inotifywait, as mentioned in this question -- you can run this as a background process to try it out, e.g. bash /path/to/my/inotify/script.sh &, and then set it up as a daemon once you've tested it out and got something working to your liking.
In AWS, How to stream a log files in a Windows Box to S3 in real time?
Thought of having a Kinesis Firehose configured to a S3 but later realized there is no Windows Agent for Kenesis.
But I feel we can acheive the same using API Calls. Any pointer or code examples will help us.
Language : .NET /
Platform : Windows 2012 R2 Server
This is a product from SAP, It is using Log4Net. Is there any example on how to configure the xml?
Note : Currently we are using aws sync command to push log files to S3 uisng a scheduled task (which runs every minute). I feel there should be a better way to implement this schenario.
Thanks,
Vijay
We are running a real time streaming application on Hortonworks using Kafka and Spark Streaming in On-Premise cluster setup.
We have a requirement where we need to push some event triggered data from Spark Streaming Or Kafka to save on S3 file system of AWS.
Any pointers around this will be appreciated.
you can save using the s3a:// scheme and the Hadoop fileSystem API, e.g something a bit like
val fs = FileSystem.get("s3a://bucket1/dir", sparkContext.hadoopConfiguration)
val out = fs.create("dest/mydata", true)
out.write(" whatever, I forget the API for OutputStreams, it takes byte arrays really")
out.close()
It can be a bit tricky setting up the classpath, but everything should be set up in HDP for this. More precisely, if it isn't, I get to field the support calls :)
Just getting started with Wowza Streaming Engine.
Objective:
Set up a streaming server which live streams existing video (from S3) at a pre-defined schedule (think of a tv channel that linearly streams - you're unable to seek through).
Create a separate admin app that manages that schedule and updates the streaming app accordingly.
Accomplish this with as a little custom Java as possible.
Questions:
Is it possible to fetch / update streamingschedule.smil with the Wowza Streaming Engine REST API?
There are methods to retrieve and update specific SMIL files via the REST API, but they only seem to be applicable to those created through the manager. After all, streamingschedule.smil needs to be created manually by hand
Alternatively, is it possible to reference a streamingschedule.smil that exists on an S3 bucket? (In a similar way footage can be linked from S3 buckets with the use of the MediaCache module)
A comment here (search for '3a') seems to indicate it's possible, but there's a lot of noise in that thread.
What I've done:
Set up Wowza Streaming Engine 4.4.1 on EC2
Enabled REST API documentation
Created a separate S3 bucket and filled it with pre-recorded footage
Enabled MediaCache on the server which points to the above S3 bucket
Created a customised VOD edge application, with AppType set to Live and StreamType set to live in order to be able to point to the above (as suggested here)
Created a StreamPublisher module with a streamingschedule.smil file
The above all works and I have a working schedule with linearly streaming content pulled from an S3 bucket. Just need to be able to easily manipulate that schedule without having to manually edit the file via SSH.
So close! TIA
To answer your questions:
No. However, you can update it by creating an http provider and having it handle the modifications to that schedule. Should you want more flexibility here you can even extend the scheduler module to not require that file at all.
Yes. You would have to modify the ServerListenerStreamPublisher solution to accomplish it. Currently it solely looks a the local filesystem to read teh streamingschedule.smil file.
Thanks,
Matt