Hi I have created an apache beam pipeline, tested it and ran it from inside eclipse, both locally and using dataflow runner. I can see in eclipse console that the pipeline is running I also see the details, i. e. logs on the console.
Now, how do I deploy this pipeline to GCP, so that it keeps working irrespective of the state of my machine. For e.g., if I run it using mvn compile exec:java the console shows it is running, but i can not find the job using the dataflow UI.
Also, what will happen if I kill the process locally, will the job on the GCP infrastructure also be stopped? How Do I know a job has been triggered independent of my machine`s state on the GCP infrastructure?
The maven compile exec:java with arguments output is as follows,
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
[jar:file:/C:/Users/ThakurG/.m2/repository/org/slf4j/slf4j-
jdk14/1.7.14/slf4j-jdk14-1.7.14.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/C:/Users/ThakurG/.m2/repository/org/slf4j/slf4j-nop/1.7.25/slf4j-nop-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.JDK14LoggerFactory]
Jan 08, 2018 5:33:22 PM com.trial.apps.gcp.df.ReceiveAndPersistToBQ main
INFO: starting the process...
Jan 08, 2018 5:33:25 PM com.trial.apps.gcp.df.ReceiveAndPersistToBQ
createStream
INFO: pipeline created::Pipeline#73387971
Jan 08, 2018 5:33:27 PM com.trial.apps.gcp.df.ReceiveAndPersistToBQ main
INFO: pie crated::Pipeline#73387971
Jan 08, 2018 5:54:57 PM com.trial.apps.gcp.df.ReceiveAndPersistToBQ$1 apply
INFO: Message received::1884408,16/09/2017,A,2007156,CLARK RUBBER FRANCHISING PTY LTD,A ,5075,6,Y,296,40467910,-34.868095,138.683535,66 SILKES RD,,,PARADISE,5075,0,7.4,5.6,18/09/2017 2:09,0.22
Jan 08, 2018 5:54:57 PM com.trial.apps.gcp.df.ReceiveAndPersistToBQ$1 apply
INFO: Payload from msg::1884408,16/09/2017,A,2007156,CLARK RUBBER FRANCHISING PTY LTD,A ,5075,6,Y,296,40467910,-34.868095,138.683535,66 SILKES RD,,,PARADISE,5075,0,7.4,5.6,18/09/2017 2:09,0.22
Jan 08, 2018 5:54:57 PM com.trial.apps.gcp.df.ReceiveAndPersistToBQ$1 apply
This is the maven command I`m using from cmd prompt,
`mvn compile exec:java -Dexec.mainClass=com.trial.apps.gcp.df.ReceiveAndPersistToBQ -Dexec.args="--project=analyticspoc-XXX --stagingLocation=gs://analytics_poc_staging --runner=DataflowRunner --streaming=true"`
This is the piece of code I`m using to create the pipeline and set the options on the same.
PipelineOptions options = PipelineOptionsFactory.create();
DataflowPipelineOptions dfOptions = options.as(DataflowPipelineOptions.class);
dfOptions.setRunner(DataflowRunner.class);
dfOptions.setJobName("gcpgteclipse");
dfOptions.setStreaming(true);
// Then create the pipeline.
Pipeline pipeL = Pipeline.create(dfOptions);
Can you clarify what exactly do you mean by "console shows it is running" and by "can not find the job using Dataflow UI"?
If your program's output prints the message:
To access the Dataflow monitoring console, please navigate to https://console.developers.google.com/project/.../dataflow/job/....
Then your job is running on the Dataflow service. Once it's running, killing the main program will not stop the job - all the main program does is periodically poll the Dataflow service for the status of the job and new log messages. Following the printed link should take you to the Dataflow UI.
If this message is not printed, then perhaps your program is getting stuck somewhere before actually starting the Dataflow job. If you include your program's output, that will help debugging.
To deploy a pipeline to be executed by Dataflow, you specify the runner and project execution parameters through the command line or via the DataflowPipelineOptions class. runner must be set to DataflowRunner (Apache Beam 2.x.x) and project is set to your GCP project ID. See Specifying Execution Parameters. If you do not see the job in the Dataflow Jobs UI list, then it is definitely not running in Dataflow.
If you kill the process that deploys a job to Dataflow, then the job will continue to run in Dataflow. It will not be stopped.
This is trivial, but to be absolutely clear, you must call run() on the Pipeline object in order for it to be executed (and therefore deployed to Dataflow). The return value of run() is a PipelineResult object which contains various methods for determining the status of a job. For example, you can call pipeline.run().waitUntilFinish(); to force your program to block execution until the job is complete. If your program is blocked, then you know the job was triggered. See the PipelineResult section of the Apache Beam Java SDK docs for all of the available methods.
My app runs fine on my local machine which has 16 Gig of Ram using 'heroku local' command to start both the dyno and workers using the Procfile. The background jobs queued in Delayed Job are processed one-by-one and then the table is emptied. When I run on Heroku, it fails to execute the background processing at all. It gets stuck with the following out of memory message in my logfile:
2016-04-03T23:48:06.382070+00:00 app[web.1]: Using rack adapter
2016-04-03T23:48:06.382149+00:00 app[web.1]: Thin web server (v1.6.4 codename Gob Bluth)
2016-04-03T23:48:06.382154+00:00 app[web.1]: Maximum connections set to 1024
2016-04-03T23:48:06.382155+00:00 app[web.1]: Listening on 0.0.0.0:7557, CTRL+C to stop
2016-04-03T23:48:06.711418+00:00 heroku[web.1]: State changed from starting to up
2016-04-03T23:48:37.519962+00:00 heroku[worker.1]: Process running mem=541M(105.8%)
2016-04-03T23:48:37.519962+00:00 heroku[worker.1]: Error R14 (Memory quota exceeded)
2016-04-03T23:48:59.317063+00:00 heroku[worker.1]: Process running mem=708M(138.3%)
2016-04-03T23:48:59.317063+00:00 heroku[worker.1]: Error R14 (Memory quota exceeded)
2016-04-03T23:49:21.449475+00:00 heroku[worker.1]: Error R14 (Memory quota exceeded)
2016-04-03T23:49:21.449325+00:00 heroku[worker.1]: Process running mem=829M(161.9%)
2016-04-03T23:49:24.273557+00:00 app[worker.1]: rake aborted!
2016-04-03T23:49:24.273587+00:00 app[worker.1]: Can't modify frozen hash
2016-04-03T23:49:24.274764+00:00 app[worker.1]: /app/vendor/bundle/ruby/2.2.0/gems/activerecord-4.2.6/lib/active_record/attribute_set/builder.rb:45:in `[]='
2016-04-03T23:49:24.274771+00:00 app[worker.1]: /app/vendor/bundle/ruby/2.2.0/gems/activerecord-4.2.6/lib/active_record/attribute_set.rb:39:in `write_from_user'
I know that R14 is out of memory error. so I have two questions:
Is there anyway that delayed job can be tuned to take less memory. There will be some disk swapping involved, but it least it will run.
Why do I keep getting rake aborted! Can't modify frozen hash error (lines 4 and 5 from bottom of the log shown below). I do not get it in my local environment. What does it mean? Is it memory related?
Thanks in advance for your time. I am running Rails 4.2.6 and delayed_job 4.1.1 as shown below:
→ gem list | grep delayed
delayed_job (4.1.1)
delayed_job_active_record (4.1.0)
delayed_job_web (1.2.10)
Bharat
I found the problem. I am posting my solution here for those who may be running in similar problems.
I increase the heroku worker memory to use 2 standard dynos meaning I gave it 1 Gig memory so as to remove the memory quota problem. That made R14 go away, but still I continued to get
rake aborted!
Can't modify frozen hash
error and the program will crash then. So the problem was clearly here. After much research, I found that the previous programmer had used the 'workless' gem to reduce heroku charges. Workless gem makes heroku workers go to sleep when not being used and therefore no charges are incurred when not running heroku.
What I did not post in my original question is that I have upgraded the app from Rails 3.2.9 to Rails 4.2.6. Also my research showed that the workless gem had not been upgraded in the last three years and there was no mention on rails 4 on their site. So the chances were that it may not work well with Rails 4.2.6 and Heroku.
I saw some lines in my stack trace which were related to the workless gem. This was a clue for me to see what happens if I subtract, i.e., remove this gem from production. So I removed it and redeployed.
The frozen hash error went away and my delayed_job worker ran successfully to completion on Heroku.
The lesson for me was carefully read the log and check out all the dependencies :)
Hope this helps.
I am attempting to deploy some changes to a loopback app running on a remote Ubuntu box on top of strong-pm.
The changes that I make locally are not being reflected in what gets deployed to the server. Here are the commands I execute:
$slc build
$slc deploy http://IPADDRESS deploy
to which I get a successful deploy message which looks like this:
peter#peters-MacBook-Pro ~/Desktop/projects/www/places-api master slc deploy http://PADDRESS deploy
Counting objects: 5740, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (5207/5207), done.
Writing objects: 100% (5740/5740), 7.14 MiB | 2.80 MiB/s, done.
Total 5740 (delta 1555), reused 150 (delta 75)
To http://PADDRESS:8701/api/services/1/deploy/default
* [new branch] deploy -> deploy
Deployed `deploy` as `placesAPI` to `http://IPADDRESS:8701/`
Checking the deployed files on the server here :
/var/lib/strong-pm/svc/1/work
I can see that the changes I made to the local app are not reflected in what has just been deployed to the server.
In order to check that the changes are reflected in the build, I checked out the deploy git repository, like so:
git checkout deploy
Inspecting the files here, I can see that the changes I made are present.
**does anyone know why the changes are not reflected in what is deployed to the server ? **
I know this is a old post but for anyone getting this issue I just encountered the same problem.
Finally I used slc arc and tried to Build from there.
Make sure that the "Fully qualified path to archive" has a correct value
It should be something like
../project-1.0.0.tgz
I have an issue with my cloud foundry installation on vSphere. After an upgrade to 1.6 I started to get "migrator is not current errors" in the cloud controller clock and worker component. They do not come up anymore.
[2015-12-10 11:36:19+0000] ------------ STARTING cloud_controller_clock_ctl at Thu Dec 10 11:36:19 UTC 2015 --------------
[2015-12-10 11:36:23+0000] rake aborted!
[2015-12-10 11:36:23+0000] Sequel::Migrator::NotCurrentError: migrator is not current
[2015-12-10 11:36:23+0000] Tasks: TOP => clock:start
[2015-12-10 11:36:23+0000] (See full trace by running task with --trace)
After googling this I only found this mailing list https://lists.cloudfoundry.org/archives/list/cf-bosh#lists.cloudfoundry.org/message/GIOTVF2A77KREO4ESHSY7ZXZJKM5ZULA/. Can I migrate my Cloud Controller DB manually? Does anyone knows how to fix this? I'd be very grateful!
I am new to Google Cloud Platform, so might be asking simple questions
I was testing StarterPipline and Word Count examples using Cloud Dataflow API and although these work locally, both fail if I try to run these pipelines on the cloud dataflow service.
I've verified that all API required are enabled and I am successfully authenticated.
There are NO messages in LOG files and the only thing I see is that request staged class files on Cloud Storage and gives "Job finished with status FAILED" before starting worker pool (log below).
Any thoughts and suggestions would be greatly appreciated!
Thanks, Vladimir
Sep 15, 2015 4:13:09 PM com.google.cloud.dataflow.sdk.runners.DataflowPipelineRunner fromOptions
INFO: PipelineOptions.filesToStage was not specified. Defaulting to files from the classpath: will stage 45 files. Enable logging at DEBUG level to see which files will be staged.
Sep 15, 2015 4:13:09 PM com.google.cloud.dataflow.sdk.Pipeline applyInternal
WARNING: Transform AnonymousParDo2 does not have a stable unique name. This will prevent updating of pipelines.
Sep 15, 2015 4:13:09 PM com.google.cloud.dataflow.sdk.runners.DataflowPipelineRunner run
INFO: Executing pipeline on the Dataflow Service, which will have billing implications related to Google Compute Engine usage and other Google Cloud Services.
Sep 15, 2015 4:13:09 PM com.google.cloud.dataflow.sdk.util.PackageUtil stageClasspathElements
INFO: Uploading 45 files from PipelineOptions.filesToStage to staging location to prepare for execution.
Sep 15, 2015 4:13:19 PM com.google.cloud.dataflow.sdk.util.PackageUtil stageClasspathElements
INFO: Uploading PipelineOptions.filesToStage complete: 0 files newly uploaded, 45 files cached
Dataflow SDK version: 1.0.0
Sep 15, 2015 4:13:20 PM com.google.cloud.dataflow.sdk.runners.DataflowPipelineRunner run
INFO: To access the Dataflow monitoring console, please navigate to https://console.developers.google.com/project/XXXXXXXXXXXXXXXXXXX/dataflow/job/2015-09-15_07_13_20-12403932015881940310
Submitted job: 2015-09-15_07_13_20-12403932015881940310
Sep 15, 2015 4:13:20 PM com.google.cloud.dataflow.sdk.runners.DataflowPipelineRunner run
INFO: To cancel the job using the 'gcloud' tool, run:
> gcloud alpha dataflow jobs --project=XXXXXXXXXXXXXXXXXXX cancel 2015-09-15_07_13_20-12403932015881940310
Sep 15, 2015 4:13:27 PM com.google.cloud.dataflow.sdk.runners.BlockingDataflowPipelineRunner run
INFO: Job finished with status FAILED
Exception in thread "main" com.google.cloud.dataflow.sdk.runners.DataflowJobExecutionException: Job 2015-09-15_07_13_20-12403932015881940310 failed with status FAILED
at com.google.cloud.dataflow.sdk.runners.BlockingDataflowPipelineRunner.run(BlockingDataflowPipelineRunner.java:155)
at com.google.cloud.dataflow.sdk.runners.BlockingDataflowPipelineRunner.run(BlockingDataflowPipelineRunner.java:56)
at com.google.cloud.dataflow.sdk.Pipeline.run(Pipeline.java:176)
at com.google.cloud.dataflow.starter.StarterPipeline.main(StarterPipeline.java:68)