When I migrate workflows from Development repository to Production repository, there is a chekc box "Retain persistent mapping variable values.." in migration wizard. What happens if I check this box? Does it keep e.g. values in the production environment or does it replace these values from the Development repository?
If you check the option Retain persistent values for mapping variables, the existing values from the target folder will be kept.
Otherwise, they will be replaced with values from the source folder.
Quote from the Repository Guide:
Retain persistent values for mapping variables. You can choose to retain existing values or replace them with values from the source folder.
Retain persistent values for workflow variables. You can choose to retain existing values or replace them with values from the source folder.
Related
I am trying to replicate an Informatica Powercenter mapping in Informatica Cloud. When looking at the Target table properties, found the attribute "Is Staged" in the target connection object definition.
The property Truncate Target Table can be inferred easily, it means Truncate table before it is being loaded with data. What does the property "Is Staged" mean?
is Staged the name says infa will stage the data into a staging area flat file. And then read from the file and load into target table. If its unchecked, data will be loaded using a direct targte writing pipeline.
This is done to make sure data is extracted from source asap and if there is a failure in load, you can restart and re-load.
But this is set for certain data sources. Also you need to setup stage directory.
as part of code deployment to production, we need to copy all tables from a big query dataset to production environment. However, the UI option or the bq command line option is moving the data too . How do I just move all the BIG QUERY tables at once from non prod to prod environment without data??
Kindly suggest?
posting my comment as an answer:
I don't know about any way how to achieve what you want directly, but there is a possible workaround:
You first need to create the dataset in the destination project and then run CREATE TABLE new_project.dataset.xx AS SELECT * FROM old_project.dataset.xx WHERE 1=0.
You also need to make sure to specify the partition field. This works well for datasets where there are just a few tables, for larger datasets you can script this operation in Python or whatever else you use.
we are considering using AWS Neptune as graphdb solution.
I am coming from Django world so I used to use db migrations a lot.
I could not find any info about how AWS Neptune does change management on DB?
ie. what happens if I want to reload a backup from a month ago and there has been schema changes since then? How do we track these changes?
Should we write custom scripts?
Unlike something like an RDBMS and some other data stores, Amazon Neptune, and many other graph dbs for that matter, are called "schemaless" meaning there is no need to explicitly define or maintain a schema. The schema is implicitly defined by the data stored in the database. In the case you mentioned, restoring a backup, there is no need for a migration/change script to be run. When you restore the backup the schema will be defined by the restored data.
This "schemaless" nature of the database allows applications to begin adding new entity types and data properties without any sort of ETL process. However, this also means that the application does need to manage some sort of schema internally to maintain sanity over the data being stored (e.g. first_name and firstName could be used and would be separate properties.).
I want to create an audit, which can be re-used across multiple mappings to capture Source record count and target record count in when source database is oracle and target database is sql server
We are using it from source to staging mappings
It's all there, in Metadata tables. There's no need to add anything that will make your loads longer and more complex.
You can review this Framework for some ideas.
Rather new to AWS Data Pipeline so any help will be appreciated. I have used the pipeline template RDStoS3CopyActivity to extract all contents of a table in RDS MySQL. Seems to be working good. But there are 90 other tables to be extracted and dumped to S3. I cannot imagine craeting 90 pipelines or one for each table.
What is the best approach to resolving this task? How could pipeline be instructed to iterate though a list of the table names?
I am not sure if this will ever get responded. However, in this early stage of exploration, I have developed a pipeline that seems to fit a preliminary purpose -- extracting from 10 RDS MySQL tables and copying each to their respective sub-bucket on S3.
The logic is rather simple.
Configure connection for the RDS MySQL.
Extract data by specifying in "Select Query" field for each table.
Drop a Copy Activity and link up for each table above. It runs on a specified EC2 instance. If you're running expensive query, make sure you choose the appropriate EC2 instance with enough CPU and memory. This step copies the extracted dump, which lives temporarily in ec2 tmp filesystem, to a designated S3 bucket you will set up next.
Finally, the designated / target destination.
By default, data extracted and loaded to S3 bucket will be comma separated. If you need it to be tab delimited, then in the last target S3 destination:
- Add an optional field.. > select Data Format.
- Create a new Tab Separated. This will appear under the category of 'Others'.
- Give it a name. I call it Tab Separated.
- Type: TSV. Hover mouse over 'Type' to learn more of other data formats.
- Column separator: \t (i could leave this blank as type was already specified as tsv)
Screenshot
-
If the tables are all in the same RDS Why not use a SQLActivity pipeline with a SQL statement containing multiple unload commands to S3?
You can just write one query and use one pipeline.