Working link for Google Cloud pipeline components docs? - google-cloud-platform

Does anyone have a working link for the docs for Google Cloud pipeline components. The link in the github page under "ReadTheDocs page" is broken. Tried some other tutorial notebooks, such as this one, the link under "The components are documented here." seems to be broken too.
Edit:
The link is up now.

Pipelines support KFP (Kubeflow pipeline) and TFX (Tensorflow Extended) definitions. You have documentation here
You can find useful resources here especially this notebook

Related

Leveraging AWS Neptune Gremlin Client Library

We're looking to leverage the Neptune Gremlin client library to get load balancing and refreshes automatic.
There is a blog article here: https://aws.amazon.com/blogs/database/load-balance-graph-queries-using-the-amazon-neptune-gremlin-client/
This is also a repo containing the code here:
https://github.com/awslabs/amazon-neptune-tools/tree/master/neptune-gremlin-client
However, the artifacts aren't published anywhere. Is it still possible to do this? Ideally, we avoid vendoring the code into our codebase since we would then forefeit updates.
The artifacts for several of the tools in that repo can be found here.
https://github.com/awslabs/amazon-neptune-tools/releases/tag/amazon-neptune-tools-1.2

Mapping dependencies/requirements for GCP APIs/services

Does anyone knows a way to map the dependencies or requirements of any GCP API?
E.g. enabling container.googleapis.com would automatically enable compute.googleapis.com and others into a same chart/table/text/anything.
The GCP docs don't specify any such dependency for any API (from what I have seen so far). So I'm either looking for a Doc which specifies this, a gcloud command or a completely different tool that can help mapping it.
We don't have any public external documentation around service dependencies for now. therefore please open a FR in refer to this link
did you open a Feature Request as suggested ? If so, can you share the link ?
As a faint consolation, you can have a look at this article from which we can tell that the API interdependency information was once available through the serviceusage API.
There you'll find a diagram as of october 2020 (see screenshot bellow)
One workaround could be to use the Service Usage API. The disable method has a disableDependentServices field which disables all services that depend on the services being disabled.
You could enable a bunch of services in GCP, disable a service, and observe which dependent services are also disabled.
I did end up opening a feature request for this and the fact that I had to do so still boggles the mind.

Invalid arguments when creating new datalab instance

I am following the quickstart tutorial for datalab here, within the GCP console. When I try to run
datalab beta create-gpu datalab-instance-name
In step 3 I receive the following error
write() argument must be str, not bytes
Can anyone help explain why this is the case and how to fix it?
Thanks
Referring to the official documentation, before running Datalab instance, the corresponding APIs should be enabled: Google Compute Engine and Cloud Source Repositories APIs. To do so, visit Products -> APIs and Services -> Library and search for the APIs. Additionally, make sure that billing is enabled for your Google Cloud project.
You can also enabling the APIs by typing the following command, which will give you a prompt to enable the API:
datalab list
I made some research and found that the same issue has been reported on the Github page. If enabling API's wouldn't work, the best option would be to contribute (add a comment) in the mentioned Github topic to make it more visible to the Datalab Engineering team.

Mocks for AWS SimpleWorkflowService and ElasticMapReduce

Are there any mocks for AWS SWF or EMR available anywhere? I tried looking at some other AWS API mocks such as https://github.com/atlassian/localstack/ or https://github.com/treelogic-swe/aws-mock but they don't have SWF or EMR which are the things that would be really painful to reproduce. Just not sure if anyone has heard of a way to locally test things that use dependencies on those services.
The "moto" project (https://github.com/spulec/moto) groups mocks for the "boto" library (the official python sdk for AWS), and it has mocks for basic things in SWF (disclaimer: I'm the author who contributed them) and EMR.
If you happen to work in Python they're ready to use via a #mock_swf decorator (use 0.4.x for boto 2.x or 1.x for boto 3.x). If you work with another language, moto supports a server mode that mimics an AWS endpoint. The SWF service is not provided out of the box yet, but with a minor change in "moto/backends.py" you should be able to try using it. I think the EMR service works out of the box.
Should you have any issue with the SWF mocks in this project, you can file an issue on the Github project, don't hesitate to cc me directly (#jbbarth), I can probably help improving this.

Is there documentation available for Google Cloud Dataflow?

Google Cloud Dataflow has been released in June 2014 (more information in this blog post), but I can't find any technical documentation on the developers section of the cloud.google.com website: https://cloud.google.com/developers/
Does someone knows where I can find more information, technical documentation about this product?
I'm really interested about how works topology, is it static or dynamic?.. etc..
Google Cloud Dataflow is now in Alpha stage. The documentation is now publicly available here: https://cloud.google.com/dataflow/. Follow the documentation link.
Please note that in Alpha - access to the managed service is limited to invite only. You can request access via the link above. Use the "Apply for Alpha" button.
The Cloud Dataflow SDK for Java has also been made public & open sourced on GitHub here: https://github.com/GoogleCloudPlatform/DataflowJavaSDK. Please note that you can download the SDK and run your Dataflow programs locally without having to execute them on the managed service. Local pipeline execution is a great way to get a feel for the programming model, but understand that the local execution is not parallelized.
We are also moving support over to StackOverflow. Please use the tag: google-cloud-dataflow.
Cheers - Eric
Google Cloud Dataflow is currently in private beta. You can apply here. Documentation is provided upon approval.