SQL Server exception: Received an invalid column length from the bcp client for colid modify_time - azure-sqldw

I am getting the below error while using bcp client of SQL Server to load data into Azure SQL Data warehouse.
Exact exception:
com.microsoft.sqlserver.jdbc.SQLServerException: 107096;Received an invalid column length from the bcp client for colid modify_time.
I am able to load the data correctly to Azure SQL database. But while loading the data to Azure SQL data warehouse, this issue happens.
And, this is happening only for timestamp columns.
When I created the table in Azure SQL data warehouse, it was created like this:
name | type | warehouse type | precision | length | java sql type
------------+------------+----------------+-----------+--------+-----------
modify_time | datetime2 | -9 | 27 | 54 | -9*
Bulk load operation is done by the following sample code:
SQLServerBulkCopy copy = new SQLServerBulkCopy(conn);
copy.setDestinationTableName("my_table");
copy.writeToServer(new ISQLServerBulkRecord() {
//Overridden methods
});

Related

aws athena query result in json format

I create aws athena table that contain some rows
example of data:
first_name | age
=================
a 20
b 30
c 35
When I query the data I the result are saved in CSV format in S3.
SELECT * FROM table1
I would query the data and get the result in JSON format.
The reason is that I should transfer that JSON data to another application for another process.
Is there a way to get query result in JSON format?

AWS Dynamo DB: High read latency using query on GSI

I have a dynamo db table which contains Date, City and other attributes as the columns. I have configured GSI with Date as the hash key. The table contains 27 attributes from 350 cities recorded daily.
| Date | City | Attribute1 | Attribute27|
+------------+------------+-------------+------------+
| 25-06-2020 | Boston | someValue | someValue |
| 25-06-2020 | NY | someValue | someValue |
| 25-06-2020 | Chicago | someValue | someValue |
+------------+------------+-------------+------------+
I have a Lambda proxy integration setup in API Gateway. The lambda function receives a 7 day date range as the request. Each of the date, in this range is used query the dynamodb (using query input) to get all the items for a given day. The result for each day is consolidated for a week, and is then sent back as a JSON response.
The latency seen in POSTMAN is around 1.5s, after increasing the lambda memory to 1024MB (Even though, only 76MB is being consumed).
Is there any way to improve the performance? The dynamo db is already running in On-Demand Capacity.
You don't say if you are using parallel queries or not.
If not do so.
You also don't say what cloudwatch is showing for Query latency, as mentioned by Marcin, DAX can help reduce that.
You also don't mention what cloudwatch is showing for lambda execution. There's various articles about optimizing lambda.
Whatever's left is networking...not much you can do about that..one piece to consider is reusing Db connections in your lambda

Using Dataprep to write to just a date partition in a date partitioned table

I'm using a BigQuery view to fetch yesterday's data from a BigQuery table and then trying to write into a date partitioned table using Dataprep.
My first issue was that Dataprep would not correctly pick up DATE type columns, but converting them to TIMESTAMP works (thanks Elliot).
However, when using Dataprep and setting an output BigQuery table you only have 3 options for: Append, Truncate or Drop existing table. If the table is date partitioned and you use Truncate it will remove all existing data, not just data in that partition.
Is there another way to do this that I should be using? My alternative is using Dataprep to overwrite a table and then using Cloud Composer to run some SQL pushing this data into a date partitioned table. Ideally, I'd want to do this just with Dataprep but that doesn't seem possible right now.
BigQuery table schema:
Partition details:
The data I'm ingesting is simple. In one flow:
+------------+--------+
| date | name |
+------------+--------+
| 2018-08-08 | Josh1 |
| 2018-08-08 | Josh2 |
+------------+--------+
In the other flow:
+------------+--------+
| date | name |
+------------+--------+
| 2018-08-09 | Josh1 |
| 2018-08-09 | Josh2 |
+------------|--------+
It overwrites the data in both cases.
You ca create a partitioned table bases on DATE. Data written to a partitioned table is automatically delivered to the appropriate partition.
Data written to a partitioned table is automatically delivered to the appropriate partition based on the date value (expressed in UTC) in the partitioning column.
Append the data to have the new data added to the partitions.
You can create the table using the bq command:
bq mk --table --expiration [INTEGER1] --schema [SCHEMA] --time_partitioning_field date
time_partitioning_field is what defines which field you will be using for the partitions.

PowerBi: incremental data load by using OData feed

is there any possibility to save previous data before overriding because of refreshing data?
Steps i have done:
I Created a table and appended to table A
Created a Column called DateTime with the function
DateTime.LocalNow()
Now i have a problem how to save previous data before the refreshing phase. I need to preserve the timestamp of previous data and actually data.
Example giving:
Before refreshing:
Table A:
|Columnname x| DateTime | ....
| value | 23.03.2016 23:00
New Table:
|Columnname x| DateTime | ....
| value | 23.03.2016 23:00
After refreshing:
Table A:
|Columnname x| DateTime | ....
| value | 23.03.2016 23:00
| value 2 | 23.03.2016 23:01
New Table:
|Columnname x| DateTime | ....
| value | 23.03.2016 23:00
| value 2 | 23.03.2016 23:01
kind regards
Incremental refreshes in the Power BI Service or Power BI Desktop aren't currently supported. But please vote for this feature. (update: see that link for info on a preview feature that does this)
If you need this behavior you need to load these rows to a database then incrementally load the database. The load to Power BI will still be a full load of the table(s).
This is now available in PowerBI Premium
From the docs
Incremental refresh enables very large datasets in the Power BI Premium service with the following benefits:
Refreshes are faster. Only data that has changed needs to be refreshed. For example, refresh only the last 5 days of a 10-year dataset.
Refreshes are more reliable. For example, it is not necessary to maintain long-running connections to volatile source systems.
Resource consumption is reduced. Less data to refresh reduces overall consumption of memory and other resources.

Syncframework:Map single table into multiple tables

I have two tables like the fallowing:
On server:
| Orders Table | OrderDetails Table
-------------------------------------------------------------------------------------
| Id | Id
| OrderDate | OrderId
| ServerName | Product
| Quantity
On client:
| Orders Table | OrderDetails Table
-------------------------------------------------------------------------------------
| Id | Id
| OrderDate | OrderId
| Product
| Quantity
| ClientName
I need to sync the [Server].[Orders Table].[ServerName] to [Client].[OrderDetails Table].[ClientName]
The Question:
What is the true and efficient way of making it?
I know Deprovisioning and provisioning with different config, is one way of doing it.
So I just wanna know the correct way.
Thanks.
EDIT :
Other columns of each table should sync normally ([Server].[Orders Table].[Id] to [Client].[Orders Table].[Id] ...).
And mapping strategy sometimes changes based on the row of data (which which is sending/receiving).
Sync Fx is not an ETL tool. simply put, it's DB sync is per table.
if you really want to force it to do what you want, you can simply intercept ChangesSelected event for the OrderDetails table, lookup the extra column from the other table and then dynamically add the column to the dataset before it gets applied on the other side.
see this link on how to manipulate the change dataset