I have a table in prod which is having huge data, recently I have added new column for that table and loaded surrogate values for new rows. Existing rows are null's.
Now I want to update this new columns with max value+1 incremental values where new columns are null. Can any one please provide solution for this scenario.
Thanks in advance...
Related
I have a bigQuery table which has around 2M rows which are loaded from a JSON file. Actual fields in JSON file are 10 but table has 7 columns as per initial DDL. Now I altered the table and added remaining three columns. After altering, the values in newly added columns are filled with NULL.
Now I want to backfill the data in existing 2M rows but for only those three newly added columns with actual data from json file. How can i bulk update the table so that existing column values remain untouched and only new column values are updated.
Note: Table has streaming buffer enabled and the table is NOT Partitioned.
Now I want to backfill the data in existing 2M rows but for only those three newly added columns with actual data from json file.
Since loading data is free of charge, I'd reload the whole table with WRITE_TRUNCATE option to overwrite the existing data.
What you said confuses me because:
If your 2M rows in BQ table has the same data as what's in JSON file, why do you care whether they are touched or not?
If your 2M rows in BQ table has been altered in some way, how do you expect the rows in JSON file matches the altered data on a per row basis (to backfill the missing column)?
--
Update: based on the comment, it seems that the loaded rows has been altered in some way. Then:
For your existing data, if there is not a (logical) primary key for you to use to match the rows, then it is technically impossible to "match and update".
If your existing data do have a logical primary key, and you don't mind the cost, you could load the full table into a temporary table then use DML to backfill the missing columns.
For your future data loading, if you want the loading to be incremental (either on rows or on columns), better you could have your loaded table untouched so that it represents that 'full fact' and keep the 'altered rows' in a separate table, assuming you have a logical primary key to match them.
I want to create a second table from the first table using filters with dates and other variables as follows. How can I create this?
Following is the expected table and original table,
Go to Edit Queries. Lets say our base table is named RawData. Add a blank query and use this expression to copy your RawData table:
=RawData
The new table will be RawDataGrouped. Now select the new table and go to Home > Group By and use the following settings:
The result will be the following table. Note that I didnt use the exactly values you used to keep this sample at a miminum effort:
You also can now create a relationship between this two tables (by the Index column) to use cross filtering between them.
You could show the grouped data and use the relationship to display the RawDate in a subreport (or custom tooltip) for example.
I assume you are looking for a calculated table. Below is the workaround for the same,
In Query Editor you can create a duplicate table of the existing (Original) table and select the Date Filter -> Is Earliest option by clicking right corner of the Date column in new duplicate table. Now your table should contain only the rows which are having minimum date for the column.
Note: This table is dynamic and will give subsequent results based on data changes in the original table, but you to have refresh both the table.
Original Table:
Desired Table:
When I have added new column into it, post to refreshing dataset I have got below result (This implies, it is doing recalculation based on each data change in the original source)
New data entry:
Output:
Working on a way to compare 2 tables in PowerBI.
I'm joining the 2 tables using the primary key and making custom columns that compare if the old and new are equal.
This doesn't seem like the most efficient way of doing things, and I can't even color code the matrix because some values aren't integers.
Any suggestions?
I did a big project like this last year, comparing two versions of a data warehouse (SQL database).
I tackled most of it in the Query Editor (actually using Power Query for Excel, but that's the same as PBI's Query Editor).
My key technique was to first create a Query for each table, and use Unpivot Other Columns on everything apart from the Primary Key columns. This transforms it into rows of Attribute, Value. You can filter Attribute to just the columns you want to compare.
Then in a new Query you can Merge & Expand the "old" and "new" Queries, joining on on the Primary Key columns + the Attribute column. Then add Filter or Add Column steps to get to your final output.
Today is my first day to use PowerBI 2.0 Desktop.
Is there any way to add new columns from external data into the existing table in my PowerBI?
Or is there anyway to add new columns from another table in PowerBI?
It seems that, in PowerQuery, all the tabs Add Custom Column, Add Index Column and Duplicate Column are all using the existing columns in the same table.....
You can use Merge Queries to join together two queries, which will let you bring in the other table's columns.
Also, Add Custom Column accepts an arbitrary expression, so you can reference other tables in that expression. For example, if Table1 and Table2 had the same number of rows, I could copy over Table2's column by doing the following:
Add an Index Column. Let's call it Index.
Add a Custom Column with the following expression: Table2[ColumnName]{[Index]}
In my current project I'm trying to check a MySQL database: a MySQL database is updated by another program, so my C++ program needs to select only the new rows. It is not going to be a small table (>10000 rows), so I do not want to search each row. i.e. checking a column like isNew=0 or =1. I already found:
Query to find tables modified in the last hour
http://www.codediesel.com/mysql/how-to-check-when-a-mysql-table-was-last-updated/
However, in this example you can only get the table which is updated. How can I only select the new rows from a table?
How can I only select the new rows from a table?
Assuming new rows means newly inserted, and if you can change the database schema, you could use an auto increment column. By remembering the largest value each time your program selects a result set, it could save that value for the next query:
select * from table where id > 123
I would recommend adding an isNew column to the table with default value 1 and add an index on it. The index will prevent your query from checking all rows. After you have processed the row, set isNew to 0.