Duplicate value in column error - Azure Analysis Services - powerbi

I have a table in my model which is extracting data from an adls location in Azure Datalake.
I am getting the following error while deploying the model:
'Column 'xy_id' in Table 'ABC' contains a duplicate value '' and this is not allowed for columns on the one side of a many-to-one relationship or for columns that are used as the primary key of a table.
I have checked the adls file for duplicates. There are no duplicate values. Also checked the count and distinct count in AAS which is same.
All the tables are getting processed successfully. The error comes on the "Deploy Metadata" step and the deployment fails.
There are only 3 tables in the model. I have created a One to Many relationship from table ABC to other 2 tables.
Can anyone suggest any fixes? I am not able to figure out why I am facing this error.
Thanks in advance.

You are interacting with different systems which stores data in different ways. By default Azure Analysis Services is case INsensitive.
This means that AAA and aAa are the exact same thing. Therefore, it will cause issues if:
the field is defined as Unique in AAS
the field is defined as Primary Key in AAS
the field is on the one-side of a one-to-many relationship
You can either solve this issue in the data source or change the database settings in SSMS.
However, as a general best-practice please define relationship only on integer fields. For example, you could create a Surrogate Key in the data source.

Related

Informatica Power Center - ERROR: "Target table [TABLE_NAME] has no keys specified."

everyone,
I've a problem in Informatica PowerCenter.
In my mapping I have 5 objects:
1x Source Table
1x Source Qualifier
1x Expression Transformation
1x Update Strategy
1x Target Table
The source and target table have no primary key, how come Informatica PowerCenter expects a key?
I have tried changing the "Treat source rows as" property of my workflow session from "Insert" to "Data driven" and it is working.
You have an update strategy in your mapping. Which expects you must have some key defined on target. Infa fires query like
UPDATE tgt SET col =? WHERE KEY = ?
Question mark 1 is updated column and question mark 2 is the key.
You can set unique keys as primary key.
If you don't have a primary or unique keys in target, pls define all columns as keys except the updatable column.
Or, you can use target overwrite to write sql to update target, but here too, you have to set similar query like above.
Data driven should be set.
In Informatica, the ports marked as keys in Target Transformation indicate what should be used to build the Update statement in DB. It has nothing physically to do with real Primary Key defined in the database itself. Usually you use same columns as keys in Informatica and in DB - but this is not necessary. DB is unaware of what is set in Informatica and vice versa.
It's even perfectly valid to have same database table defined multiple times in Informatica and have different mappings that will update the data using different columns as keys.
Note however that if you use Update Strategy you have to define which columns to use as keys.

BigQuery Cannot Modify Partitioned Table Schema

Per the BigQuery documentation I am attempting to modify a table's schema by adding a field. The table in question is a partition slice (partitioned by day). I am planning on performing the action on every slice.
Per the documentation (https://cloud.google.com/bigquery/docs/managing-partitioned-tables), I should be able to add field to a partitioned table like any other table. However whenever I attempt to add a field to a partitioned table, I am met with this error:
Could not edit table schema.: Cannot change partitioned/clustered table to non partitioned/clustered table.
I am not able to find any good information on what this error means, or what I'm doing wrong. I have successfully added a field to a non-partitioned table. Does the community have any good ideas to help me troubleshoot?
I understand that you are using the update_table method to update the schema in python, correct me if I'm wrong. You have to do it with the patch API you can try this API to have a better view on how to do it.

Hash distribution on identity column

Creating a table in Azure SQL Data Warehouse, I would like to make a hash distribution on an identity column, but get an error that
Cannot insert explicit value for identity column in table 'Table_ff4d8c5d544f4e26a31dbe71b44851cb_11' when IDENTITY_INSERT is set to OFF.
Is this not possible? And if not, why? And is there a work-around? (And where does this odd table name come from?)
Thanks!
You cannot use an IDENTITY column as the hash distributed column in your table.
https://learn.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-tables-identity#limitations
In SQLDW the name you give to your table is its logical name not its physical name. Logical metadata such as table names is maintained centrally on the control node so that operations such as table renames are quick and painless. However, SQLDW is still bound by the rules of table creation - we need to make sure the table name is unique both now and in the future. Therefore the physical names contain guids to deliver that uniqueness.
Saying that, the error you have here is not ideal. It would be helpful if you can post a repro so that we can improve the experience for you.
You are also welcome to post a feature request on our uservoice channel for hash distribution on the IDENTITY column. https://feedback.azure.com/forums/307516-sql-data-warehouse

How can I create a model with ActiveRecord capabilities but without an actual table behind?

I think this is a recurrent question in the Internet, but unfortunately I'm still unable to find a successful answer.
I'm using Ruby on Rails 4 and I would like to create a model that interfaces with a SQL query, not with an actual table in the database. For example, let's suppose I have two tables in my database: Questions and Answers. I want to make a report that contains statistics of both tables. For such purpose, I have a complex SQL statement that takes data from these tables to build up the statistics. However the SELECT used in the SQL statement does not directly take values from neither Answers nor Questions tables, but from nested SELECTs.
So far I've been able to create the StatItem model, without any migration, but when I try StatItem.find_by_sql("...nested selects...") the system complains about unexisting table stat_items in the database.
How can I create a model whose instance's data is retrieved from a complex query and not from a table? If it's not possible, I could create a temporary table to store the data in there. In such case, how can I tell the migration file to not create such table (it would be created by the query)?
How about creating a materialized view from your complex query and following this tutorial:
ActiveRecord + PostgreSQL Materialized Views
Michael Kohl and his proposal of materialized views has given me an idea, which I initially discarded because I wrongly thought that a single database connection could be shared by two processes, but after reading about how Rails processes requests, I think my solution is fine.
STEP 1 - Create the model without migration
rails g model StatItem --migration=false
STEP 2 - Create a temporary table called stat_items
#First, drop any existing table created by older requests (database connections are kept open by the server process(es).
ActiveRecord::Base.connection.execute('DROP TABLE IF EXISTS stat_items')
#Second, create the temporary table with the desired columns (notice: a dummy column called 'id:integer' should exist in the table)
ActiveRecord::Base.connection.execute('CREATE TEMP TABLE stat_items (id integer, ...)')
STEP 3 - Execute an SQL statement that inserts rows in stat_items
STEP 4 - Access the table using the model, as usual
For example:
StatItem.find_by_...
Any comments/improvements are highly appreciated.

Foreign key relationships are lost during syncing using MS Sync Framework

I have seen several posts on this site, and on others, stating that the problem is usually caused by the order in which the tables are added to the Configuration of the SyncAgent on the client side, or the SyncAdapter on the provider side. I believe I have my ordering of the tables correctly in both these places (I have an N-Tier architecture - a web service that is providing sync functionality).
Does anyone know of any other potential cause for this behavior?
Also: Sycning works for all tables, except one. For some reason, that table is created on the client but, no records are transferred...even on the initial sync, when the database is created on the client. Any ideas?
Any help would really be appreciated. (getting this sync functionality to work, and then the data entities for the client to use based on the synced data, is turning into a life mission. Don't you just love working with (massive) Frameworks?)
Thanks very much for whatever you can suggest.
[UPDATE: I have found the problem that caused the records for one table to be omitted from the sync, while the records from all the other tables were synced. The InsertId column for the table in question was full of NULL values, and UniqueIdentifier data can't be compared to NULL. The other tables don't have an InsertId column, because they are for download only. Still, the main problem of no Foreign Key relationships persists]
OK, I found this statement:
By default, the following constraints are not copied to the client: FOREIGN KEY constraints, UNIQUE constraints, and DEFAULT constraints
in this document: http://msdn.microsoft.com/en-us/library/bb726037.aspx
So, it appears I have to "manually" create the relationships, once the schema is created on the client.
It is crucial that you add the adapters to the server side provider in the correct order. You also need to make sure that you avoid all multi-table circular references or you will need to write some complicated multi-pass synchronization logic to sync first the tables without the foreign keys and then the foreign keys after the fact. Perhaps a circular reference is why you are losing just the one table. Good discussion of the issue here http://www.8bit.rs/blog/2009/12/replicating-self-referencing-tables-and-circular-foreign-keys-with-microsoft-sync-framework/.
When I was working on this same problem last month, I found that using the INFORMATION_SCHEMA, you can write a pretty good stored procedure to dynamically determine the relationship hierarchy for use in setting up a generic synchronization provider. Let me know if you are interested in something like this...
One workaround for syncing Foreign Key Relationships is explained in my answer here Sync Framework 2.1 Foreign key constraints