Can Tika call out to Tesseract in Foundry? - foundry-code-repositories

Apache tika can call out to Tesseract OCR but it relies on finding it on the machine using an environment variable. Is it possible for Foundry pipelines to bundle Tesseract with the transform's dependencies in such a way that Tika can find it? Alternatively, is it possible to specify the Docker image used for compute?

Related

How to Run WSO2 Streaming Integrator Editor

I installed WSO2 Streaming Integrator using Docker Container. There is no word about how to install the editor in this tutorial but I need to have WSO2 SI editor.
please give some points about how to install editor for developing Siddhi application.
Currently, a docker image is not available for the SI tooling distribution. You can use the Zip Archive of Streaming Integrator tooling from here to create a docker image.

Using newer version of nodejs in a ruby project with cloud foundary

My project is using the latest ruby-buildpack which currently loads nodejs 6.14.4. I'd like to use a more current version of nodejs. What's the best way to get it exposed to the application? Does multi-buildpacks solve this problem, and if so, do I list the nodejs buildpack before or after the ruby buildpack in the manifest file? Or, would it be better to package a custom buildpack?
What's the best way to get it exposed to the application? Does multi-buildpacks solve this problem,
I think multi-buildpacks should work for you. You can put Nodejs as a supply buildpack which would tell it to install Node.js, whatever version you want. Then the Ruby buildpack would run and Node.js should be available on the path while it runs so you can use it to do whatever you want.
and if so, do I list the nodejs buildpack before or after the ruby buildpack in the manifest file
The last buildpack should be the buildpack which supplies the command to start your app. Only the final buildpack is allowed to pick the command which starts your app. Other buildpacks, called supply buildpacks, only contribute/install dependencies.
It sounds like that should be the Ruby buildpack in your case.
Or, would it be better to package a custom buildpack?
I'd strongly advise against this. Forking and maintaining a buildpack is a lot of work. Let other people do this work for you and you'll be a lot happier :)

How to run C++ files on the Google Cloud Functions?

From what I am aware, Google Cloud Functions only allows you to deploy NodeJs or Python scripts.
Question: How would I be able to deploy a simple Hello_World.cpp on Google Cloud Functions? For example, writing a hello world HTTP function.
What are alternate methods to do this? I want to use serverless approach, since it's cheapest method. Therefore, that is why I'm going with Google Cloud Functions. Would I have to use docker in order to run C++ files? I've been stuck on this for a while and any guidance or help would be appreciated.
You can compile your C++ function into a WebAssembly module using emscripten. Then you can call it from a small nodejs glue code.
I built an example for you here:
https://github.com/ArthurSonzogni/gcloud-cpp-starter
You can run C++ Code by node.js on google cloud functions (tested with node.js 10)
how to using C++ and N-API (node-addon-api) https://medium.com/#atulanand94/beginners-guide-to-writing-nodejs-addons-using-c-and-n-api-node-addon-api-9b3b718a9a7f
use https://console.cloud.google.com/functions and click CREATE FUNCTION to upload .zip or gcloud functions deploy --runtime nodejs10 --trigger-http
The trick is when you zip file you need to remove /build and /node_modules folder then use command line by cd to folder of index.js and 'zip your_name.zip -r *'
ps. when I use firebase deploy --only functions it will error because it doesn't know file addon.node format (in fact it shouldn't read this file because it need to be recompiled) but I think if we using gcloud functions command line with .gcloudignore for /build and /node_modules it will success https://cloud.google.com/functions/docs/deploying/filesystem
HOW DOES IT WORK
I think when you deploy node.js source code to cloud functions it will run npm install and your C++ code will be compiled too (like npm run build will be auto run after npm install)
You can't use C++ on Cloud Functions, period. You can only use Node.js 6.14, Node.js 8.11.1 (beta) and Python 3.7 (also beta).
If you wish to use C++ in GCP with a serverless approach, my best suggestion would be running your own Custom Runtime in App Engine. You would still need to configure some instances options, but you don't have to manage servers and so on.
You can only use App Engine Flexible Environment (or, of course, standard VM architecture, Compute Engine). Extract from the docs (https://cloud.google.com/appengine/docs/flexible/):
Runtimes - The flexible environment includes native support for Java 8
(with no web-serving framework), Eclipse Jetty 9, Python 2.7 and Python 3.6,
Node.js, Ruby, PHP, .NET core, and Go. Developers can customize these
runtimes or provide their own runtime by supplying a custom Docker image
or Dockerfile from the open source community.
As an interesting side note, Google Serverless Containers will give you the chance to deploy your dockerized application but in a serverless flavour (in fact it's built on top of Google Cloud Functions technology). It's currently in Alpha stage.

Import setup module error while deploying to app engine via google cloud sdk

I am writing after a lot of searching and trial and error with no luck.
I am trying to deploy a service in app engine.
You might be aware that deploying on app engine is usually practiced a two step process
1. Deploy on local dev app server
2. If step 1 succeeds deploy on cloud
My problems are with step 1 when I include third party python libraries such as numpy, sklearn, gcloud etc.
I am trying to deploy a service in local devapp server. When I import numpy or any other third party libraries in my main.py script it throws an error saying unable to find the module.
I am using cloud sdk and have two python distributions, the default python 2.7 and anaconda with python 2.7. When I change the path to look for the modules in anaconda distribution, it fails to find module ‘setup’ required by the cloud sdk.
Is there a way to install the cloud sdk for anaconda distribution ?
Any help/pointers will be much appreciated!
When using app engine python standard environment, you can install pure python 3rd party libs using pip by vendoring them as explained here.
There are also a number of libraries included in the python27 runtime which can be requested using the libraries directive in your app.yaml as explained here.
If there's a lib which is not pure python (i.e it uses C extensions) that you want to use in your project, and it's not part of this list, then your only option is to use a flexible VM. If you want to use anaconda, you should consider customizing the runtime for your flexible VM.

GhostScript in Azure

I'm in the process of moving some on-premise app to Azure and struggling with once aspect - GhostScript. We use GhostScript to convert PDF's to multi page TIFF's. At present this is deployed in an Azure VM, but it seems like a WebApp and WebJob would be a better fit - from a management point of view. In all of my testing I've been unable to get a job to run the GhostScript exe.
Has anyone been able to run GhostScript or any third party exe in a WebJob?
I have tried packaging the GhostScript exe, lib and dll into a ZIP file and then unzip to Path.GetTempPath() and then using a new System.Diagnostics.Process to run the exe with the required parameters - this didn't work - the process refused to start with an exit code of -1073741819.
Any help or suggestions would be appreciated.
We got it to work here:
Converting PDFs to Multipage Tiff files Using Azure WebJobs. The key was putting the Ghostscript assemblies in the root of the project and setting "Copy always". This is what allows them to be pushed to the Azure server, and to end up in the correct place, when you publish the project.
Also, we needed to download the file to be processed by Ghostscript to the local Azure WebJob temp directory. This is discovered by using the following code:
Environment.GetEnvironmentVariable("WEBJOBS_PATH");