Generate informatica mappings to save development effort - informatica

I have 100 files on amazon s3 in various folders under 5 different buckets which need to load in snowflake 100 different target tables. I can import physical source and target objects, build pipelines manually which will take long time for me.
Is there is any alternate way to generate 100 mappings using some script or command to save development effort.
I am using Informatica developer 10.2 version.

Related

Where to store spacy (or other libs) model in GCP

My teams heavily uses spacy, bert, and other NLP tools based on models. Where should I store these models (en_core_web_lg and such), so that:
It is only stored once (pricing reasons)
Multiple Notebook projects can access it
I have tried uploading it in a Cloud Storage bucket because pandas could open files directly from the bucket, but this is not the case for spacy.
I would like to avoid solutions like having the notebooks downloading locally models from the bucket every time it is run.

Simultaneous Deploys from Github with Cloud Build for Multi-tenant architecture

My company is developing a web application and have decided that a multi-tenant architecture would be most appropriate for isolating individual client installs. An install would represent an organization (a nonprofit, for example) and not an individual users account. Each install would be several Cloud Run applications bucketed to an individual GCP project.
We want to be able to take advantage of Cloud Build's GitHub support to deploy from our main branch in GitHub to each individual client install. So far, I've been able to get this setup working across two individual GCP projects, where Cloud Build runs in each project individually and deploys to the individual GCP project Cloud Runs at roughly the same time and with the same duration. (Cloud Build does some processing unique to each client install so the build processes in each install are not performing redundant work)
My specific question is can we scale this deployment technique up? Is there any constraint preventing using Cloud Build in multiple GCP projects to deploy to our client installs, or will we hit issues when we continue to add more GCP projects? I know that so far this technique works for 2 installs, but will it work for 20, 200 installs?
You are limited to 10 concurrent builds per project. But if you run one cloud build per project, there is no limitation or known issues.

serverless/cloud solution ( ideally AWS) for moving zip files from vendor FTP to AWS s3 on a monthly basis

We have a vendor that provides us data files ( 4/5 files ~10 GB each) on a monthly basis. They provide these files on their FTP site that we connect to using the username and password provided by them.
We download the zip files, unzip them, extract some relevant files, Gzip them and upload them to our s3 bucket and from there we push the data to Redshift.
Currently I have a python script that runs on an EC2 instance that does all this, but I am sure there's a better "serverless" solution out there ( Ideally in AWS environment) that can do this for me since this doesnt seem to be a very unique use case.
I am looking for recommendations / alternate solutions for processing these files.
Thank you.

Data science workflow with large geospatial datasets

I am relatively new to the docker approach so please bear with me.
The goal is to ingest large geospatial datasets to Google Earth Engine using an open source replicable approach. I got everything working on my local machine and a Google Compute Engine but would like to make the approach accessible to others as well.
The large static geospatial files (NETCDF4) are currently stored on Amazon S3 and Google Cloud Storage (GEOTIFF). I need a couple of python based modules to convert and ingest the data into Earth Engine using a command line interface. This has to happen only once. The data conversion is not very heavy and can be done by one fat instance (32GB RAM, 16 cores takes 2 hours), there is no need for a cluster.
My question is how I should deal with large static datasets in Docker. I thought of the following option but would like to know best practices.
1) Use docker and mount the amazon s3 and Google Cloud Storage buckets to the docker container.
2) Copy the large datasets to a docker image and use Amazon ECS
3) just use the AWS CLI
4) use Boto3 in Python
5) A fifth option that I am not yet aware of
The python modules that I use are a.o.: python-GDAL, pandas, earth-engine, subprocess

Deploy multiple Content Delivery Servers with same confguration

I am building out a Sitecore farm with multiple Content Delivery servers. In the current process, I stand up the CD server and go through the manual steps of commenting out connection strings and enabling or disabling config files as detailed here per each virtual machine/CD server:
https://doc.sitecore.net/Sitecore%20Experience%20Platform/xDB%20configuration/Configure%20a%20content%20delivery%20server
But since I have multiple servers, is there any sort of global configuration file where I could dictate the settings I want (essentially a settings template for CD servers), or a tool where I could load my desired settings/template for which config files are enabled/disabled etc.? I have used the SIM tool for instance installation, but unsure if it offers the loading of a pre-determined "template" for a CD server.
It just seems in-efficient to have to stand up a server then config each one manually versus a more automated process (ex. akin to Sitecore Azure, but in this case I need to install the VMs on-prem).
There's nothing directly in Sitecore to achieve what you want. Depending on what tools you are using then there are some options to reach that goal though.
Visual Studio / Build Server
You can make use of SlowCheetah config transforms to configure non-web.config files such as ConnetionStrings and AppSettings. You will need a different build profiles for each environment you wish to create a build for and add the appropriate config transforms and overrides. SlowCheetah is available as a nuget package to add to your projects and also a Visual Studio plugin which provides additional tooling to help add the transforms.
Continuous Deployment
If you are using a continuous deployment tool like Octopus Deploy then you can substitute variables in files on a per environment and machine role basis (e.g. CM vs CD). You also have the ability to write custom PowerShell steps to modify/transform/delete files as required. Since this can also run on a machine role basis you can write a step to remove unnecessary connection strings (master, reporting, tracking.history) on CD environments as well as delete the other files specified in the Sitecore Configuration Guide.
Sitecore Config Overrides
Anything within the <sitecore> node in web.config can be modified and patch using Include File Patching Facilities built into Sitecore. If you have certain settings which need to be modified or deleted for a CD environment then you can create a CD-specific override, which I place in /website/App_Config/Include/z.ProjectName/WebCD and use a post-deployment PowrrShell script in Octopus deploy to delete this folder on CM environment. There are example of patches within the Include folder, such as SwitchToMaster.config. In theory you could write a patch file to remove all the config sections mentioned in the depoyment guide, but it would be easier to write a PowerShell step to delete these instead.
I tend to use all the above to aid in deploying to various environments for different server roles (CM vs CD).
Strongly recommend you take a look at Desired State Configuration which will do exactly what you're talking about. You need to set up the actual configuration at least once of course, but then it can be deployed to as many machines as you'd like. Changes to the config are automatically flowed to all machines built from the config, and any changes made directly to the machines (referred to as configuration drift) are automatically corrected. This can be combined with Azure, which now has capability to act as a "pull-server" through the Automation features.
There's a lot of reading to do to get up to speed with this feature-set but it will solve your problem.
This is not a Sitecore tool per se.