I have a timestamp in YYYY-MM-DD XX:XX:XX format in a csv file that is stored in S3 but when I use the timestamp data type to load into a Redshift database using Glue, the timestamp column is null. It appears that format is valid but I've also tried YYYYMMDD XXXXXX and YYMMDD XX:XX:XX formats as well just incase.
My mapping in Glue goes from timestamp to timestamp and the column datatype of the table is also timestamp. Ex of data in csv format:
1,2016 Summer,2016-06-22 00:00:00
Actual Output:
Line | Term | Date
-----+-------------+------------
1 | 2016 Summer |
Expected Output:
Line | Term | Date
-----+-------------+---------------------
1 | 2016 Summer | 2016-06-22 00:00:00
It seems that this should be a straightforward task but I can't get it right so if anyone else can find my mistake(s), that would be greatly appreciated.
Code:
val datasource37 = glueContext.getCatalogSource(database = "data", tableName = "term", redshiftTmpDir = "", transformationContext = "datasource37").getDynamicFrame()
val applymapping37 = datasource37.applyMapping(mappings = Seq(("id", "bigint", "id", "bigint"), ("name", "string", "name", "varchar(256)"), ("date", "timestamp", "date_start", "timestamp")), caseSensitive = false, transformationContext = "applymapping37")
val resolvechoice37 = applymapping37.resolveChoice(choiceOption = Some(ChoiceOption("make_cols")), transformationContext = "resolvechoice37")
val dropnullfields37 = resolvechoice37.dropNulls(transformationContext = "dropnullfields37")
val datasink37 = glueContext.getJDBCSink(catalogConnection = "dataConnection", options = JsonOptions("""{"dbtable": "term", "database": "data"}"""), redshiftTmpDir = args("TempDir"), transformationContext = "datasink37").writeDynamicFrame(dropnullfields37)
I ended up mapping from string -> timestamp and it worked. Glue had it automatically mapping from timestamp -> timestamp so I assumed it was right.
Ex:
val applymapping37 = datasource37.applyMapping
(mappings = Seq(("id", "bigint", "id", "bigint"),
("name", "string", "name", "varchar(256)"),
("date", "string", "date_start", "timestamp")),
caseSensitive = false, transformationContext = "applymapping37")
Related
I have a list of tables (in actual data) with different columns for which, after to combine, I get a table of 15 columns. In actual data, the list of tables is get from several previous steps and each step takes less than a second, but only Table.Combine() takes almost 2 minutes with an input of about 1200 rows. In order to show the example, I show below an output of 4 columns only,
Is there a faster alternative way to get the same output given by Table.Combine()? Thanks for any help.
This is the code of the query I has so far.
let
Tables = {
Table.FromRecords({[Name = "Bob", Phone = "123-4567"],
[Name = "",Phone = ""]
}),
Table.FromRecords({[Fax = "987-6543", Phone = "838-7171"],
[Fax = "", Phone = "233-687"],
[Fax = "", Phone = "544-778"]
}),
Table.FromRecords({[Cell = "543-7890"],
[Cell = ""],
[Cell = ""]
})
},
CombinedTable = Table.Combine(Tables)
in
CombinedTable
The current output is:
UPDATE
This is the entire query, with Table.Buffer() added in step group5
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("jVNdb4JAEPwvPFty3KHUR1C4Sg+0x5lqqSF+tGlqH5q0mv787hkal41FEsLNLjczu5NQlk6kpqP7yixnceU5Pad+Vr3SCQF4qI4A9FE9AiBQXcyj6qTWlBkDYKiOSd1C8wkNTyPpdK174DntHpzscUucB8R5iOqE6rU6B1ecOXEeEueOUXmE5nejBS1uCZHt6C5JnCge3mReiiMLJ/mNELidARgZreMCNTWAsQozPMgC0N2DFbHK8unUTTOr9XxgTGzzuVIn9AKtRZrDe7M5uod3xrj7uf8I2MD9Or7u1t9rxA2VmsJRGIPui//vX/CK8rOX7zHLFXBgbntM9L+T01koSUaRnvQNiciEanw1Ir37gSLZ7mx/Uw/qcbFvDKjjFD7Bijq1Eywf64uC87egWwzS/xP3hmfG6hc=", BinaryEncoding.Base64), Compression.Deflate)),
let
_t = ((type nullable text) meta [Serialized.Text = true])
in
type table [COL1 = _t, COL2 = _t, COL3 = _t, COL4 = _t]
),
fx = each not List.IsEmpty(List.RemoveItems(_,{"",null})),
group0 = Table.Group(Source, "COL2", {"n", each _}, 0, (x, y) => Byte.From(y = "" or y = null)),
group1 = Table.TransformColumns(
group0,
{
"n",
each
let
a = Table.Skip(_),
b = Table.FirstN(a, each [COL3] = "" or [COL3] = null),
c = Table.Skip(a, Table.RowCount(b))
in
[a = a, b = b, c = c]
}
),
group2 = Table.TransformColumns(
group1,
{"n", each Table.ToColumns(Table.Transpose([b])) & Table.ToColumns([c])}
),
group3 = Table.TransformColumns(group2, {"n", each List.Select(_, fx)}),
group4 = Table.TransformColumns(group3, {"n", each Table.FromColumns(_)}),
group5 = Table.Buffer( Table.TransformColumns(group4, {"n", each Table.PromoteHeaders(_)}) ) ,
combine = Table.Combine(group5[n]),
Custom1 = Table.SelectRows(combine, each fx(Record.ToList(_)))
in
Custom1
The purpose of this query is to tabulate data that appears in repeated blocks and subblock in the way I show below.
This is the output given by the query.
No, but try wrapping the initial table definitions as you go along in Table.Buffer()
let
a= Table.Buffer(Table.FromRecords({[Name = "Bob", Phone = "123-4567"],[Name = "",Phone = ""]})),
b= Table.Buffer(Table.FromRecords({[Fax = "987-6543", Phone = "838-7171"], [Fax = "", Phone = "233-687"],[Fax = "", Phone = "544-778"]})),
c= Table.Buffer(Table.FromRecords({[Cell = "543-7890"],[Cell = ""],[Cell = ""]})),
CombinedTable = Table.Combine({a,b,c})
in CombinedTable
I create a stream to read data from csv and write it in Postgresql it do everything just insert data on db:
my csv consist of 1,test,1
my stream :
#App:name("StockFile")
#App:description('test ...')
#source(type='file',
dir.uri='file:C:\file',
action.after.process='NONE',
#map(type='csv'))
define stream IntputStream (Amount int, Location string, ProductId int);
#store(type = 'rdbms',
jdbc.url = "jdbc:postgresql://localhost:5432/postgres",
username = "xxx",
password = "xxx",
jdbc.driver.name = "org.postgresql.Driver",
table.name = 'Test',
operation = 'insert',
#map(type = 'keyvalue' ))
define stream outputstream (Amount int, Location string, ProductId int);
#info(name = 'Save stock records')
from IntputStream
select Amount,Location,ProductId
insert into outputstream;
When you define a table, the definition should be a table definition so the correct way to define the outputstream is
#store(type = 'rdbms',jdbc.url = "jdbc:postgresql://localhost:5432/postgres",username = "xxx",password = "xxx",jdbc.driver.name = "org.postgresql.Driver",table.name = 'Test',operation = 'insert', #map(type = 'keyvalue' ))
define table outputstream (Amount int, Location string, ProductId int);
I am attempting (and can successfully do so) to connect to an API and loop through several iterations of the API call in order to grab the next_page value, put it in a list and then call the list.
Unfortunately, when this is published to the PBI service I am unable to refresh there and indeed 'Data Source Settings' tells me I have a 'hand-authored query'.
I have attempted to follow Chris Webbs' blog post around the usage of query parameters and relative path, but if I use this I just get a constant loop of the first page that's hit.
The Start Epoch Time is a helper to ensure I only grab data less than 3 months old.
let
iterations = 10000, // Number of MAXIMUM iterations
url = "https://www.zopim.com/api/v2/" & "incremental/" & "chats?fields=chats(*)" & "&start_time=" & Number.ToText( StartEpochTime ),
FnGetOnePage =
(url) as record =>
let
Source1 = Json.Document(Web.Contents(url, [Headers=[Authorization="Bearer MY AUTHORIZATION KEY"]])),
data = try Source1[chats] otherwise null, //get the data of the first page
next = try Source1[next_page] otherwise null, // the script ask if there is another page*//*
res = [Data=data, Next=next]
in
res,
GeneratedList =
List.Generate(
()=>[i=0, res = FnGetOnePage(url)],
each [i]<iterations and [res][Data]<>null,
each [i=[i]+1, res = FnGetOnePage([res][Next])],
each [res][Data])
Lookups
If Source1 exists, but [chats] may not, you can simplify
= try Source1[chats] otherwise null
to
= Source1[chats]?
Plus it you don't lose non-lookup errors.
m-spec-operators
Chris Web Method
should be something closer to this.
let
Headers = [
Accept="application/json"
],
BaseUrl = "https://www.zopim.com", // very important
Options = [
RelativePath = "api/v2/incremental/chats",
Headers = [
Accept="application/json"
],
Query = [
fields = "chats(*)",
start_time = Number.ToText( StartEpocTime )
],
Response = Web.Contents(BaseUrl, Options),
Result = Json.Document(Response) // skip if it's not JSON
in
Result
Here's an example of a reusable Web.Contents function
helper function
let
/*
from: <https://github.com/ninmonkey/Ninmonkey.PowerQueryLib/blob/master/source/WebRequest_Simple.pq>
Wrapper for Web.Contents returns response metadata
for options, see: <https://learn.microsoft.com/en-us/powerquery-m/web-contents#__toc360793395>
Details on preventing "Refresh Errors", using 'Query' and 'RelativePath':
- Not using Query and Relative path cause refresh errors:
<https://blog.crossjoin.co.uk/2016/08/23/web-contents-m-functions-and-dataset-refresh-errors-in-power-bi/>
- You can opt-in to Skip-Test:
<https://blog.crossjoin.co.uk/2019/04/25/skip-test-connection-power-bi-refresh-failures/>
- Debugging and tracing the HTTP requests
<https://blog.crossjoin.co.uk/2019/11/17/troubleshooting-web-service-refresh-problems-in-power-bi-with-the-power-query-diagnostics-feature/>
update:
- MaybeErrResponse: Quick example of parsing an error result.
- Raw text is returned, this is useful when there's an error
- now response[json] does not throw, when the data isn't json to begin with (false errors)
*/
WebRequest_Simple
= (
base_url as text,
optional relative_path as nullable text,
optional options as nullable record
)
as record =>
let
headers = options[Headers]?, //or: ?? [ Accept = "application/json" ],
merged_options = [
Query = options[Query]?,
RelativePath = relative_path,
ManualStatusHandling = options[ManualStatusHandling]? ?? { 400, 404, 406 },
Headers = headers
],
bytes = Web.Contents(base_url, merged_options),
response = Binary.Buffer(bytes),
response_metadata = Value.Metadata( bytes ),
status_code = response_metadata[Response.Status]?,
response_text = Text.Combine( Lines.FromBinary(response,null,null, TextEncoding.Utf8), "" ),
json = Json.Document(response),
IsJsonX = not (try json)[HasError],
Final = [
request_url = metadata[Content.Uri](),
response_text = response_text,
status_code = status_code,
metadata = response_metadata,
IsJson = IsJsonX,
response = response,
json = if IsJsonX then json else null
]
in
Final,
tests = {
WebRequest_Simple("https://httpbin.org", "json"), // expect: json
WebRequest_Simple("https://www.google.com"), // expect: html
WebRequest_Simple("https://httpbin.org", "/headers"),
WebRequest_Simple("https://httpbin.org", "/status/codes/406"), // exect 404
WebRequest_Simple("https://httpbin.org", "/status/406"), // exect 406
WebRequest_Simple("https://httpbin.org", "/get", [ Text = "Hello World"])
},
FinalResults = Table.FromRecords(tests,
type table[
status_code = Int64.Type, request_url = text,
metadata = record,
response_text = text,
IsJson = logical, json = any,
response = binary
],
MissingField.Error
)
in
FinalResults
I am working on a custom connector for Power Bi to get Data from multiple APIs.
The user could select the API instance then the list of all available tables in this API should be shown in a list field. Then write the desired query in the Query field.
Actually I could show the of the list of tables when I hardcoded them in my code.
I want to dynalically show the tables depending on the selected API.
for example:
API 1 => table1, table2
API 2 => table3, table4, table5...
I use a function to get this list.
My question is how to render the list of table so it shows the table dynamically without hardcode it!
The hardcoded list of tables:
MyQueryType = type function (
APIType as (type text meta [
Documentation.FieldCaption = "API name",
Documentation.FieldDescription = "Enter the API name",
Documentation.AllowedValues = { "BS_PARAMETER", "Elastic API DSL", "Elastic API SQL" }
//Documentation.AllowedValues = Indices2
]),
PartnerCodeType as (type text meta [
Documentation.FieldCaption = "Partner code",
Documentation.FieldDescription = "Enter the partner code",
Documentation.AllowedValues = { "jp_demo", "bilan_sanguin" }
]),
TableType as (type text meta [
Documentation.FieldCaption = "Table",
Documentation.FieldDescription = "Enter you query",
Documentation.AllowedValues = { "table1", "table2", "table3", "table4", "table5" }
]),
QueryType as (type text meta [
Documentation.FieldCaption = "Query",
Documentation.FieldDescription = "Enter you query",
Documentation.SampleValues = { "{ ""query"": { ""match_all"":{} }}" },
Formatting.IsMultiLine = true
])
)
as table meta [
Documentation.Name = "My Custom Query",
Documentation.LongDescription = "Long Desc"
];
I created a function Indices() that gets the list of table for each API then I call it in the MyQueryType function:
MyQueryType = type function (
APIType as (type text meta [
Documentation.FieldCaption = "API name",
Documentation.FieldDescription = "Enter the API name",
Documentation.AllowedValues = { "BS_PARAMETER", "Elastic API DSL", "Elastic API SQL" }
//Documentation.AllowedValues = Indices2
]),
PartnerCodeType as (type text meta [
Documentation.FieldCaption = "Partner code",
Documentation.FieldDescription = "Enter the partner code",
Documentation.AllowedValues = { "jp_demo", "bilan_sanguin" }
]),
TableType as (type text meta [
Documentation.FieldCaption = "Table",
Documentation.FieldDescription = "dec",
Documentation.AllowedValues = Indices()
]),
QueryType as (type text meta [
Documentation.FieldCaption = "Query",
Documentation.FieldDescription = "Enter you query",
Documentation.SampleValues = { "{ ""query"": { ""match_all"":{} }}" },
Formatting.IsMultiLine = true
])
)
as table meta [
Documentation.Name = "My Custom Query",
Documentation.LongDescription = "Long Desc"
];
//Other code snippet
//Get ES Metadata: indexes
Indices =
let
api_url = "http://MyURL:9200/_cat/indices?format=json",
response = Web.Contents(api_url),
source = Json.Document(response),
#"Converted to Table" = Table.FromList(source, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
#"Expanded Column1" = Table.ExpandRecordColumn(#"Converted to Table", "Column1", {"index"}, {"Column1.index"}),
#"Changed Type" = Table.TransformColumnTypes(#"Expanded Column1",{{"Column1.index", type text}}),
mylist = Table.ToList(#"Expanded Column1")
//mylistb = Table.TransformColumnTypes(mylist,{{"Column1.index", type text}})
//totext = Text.Combine(mylist, ",")
in
//#"Changed Type";
mylist;
So I'm working on a website, and I want to have some kind of a summary page to display the data that I have. Let's say I have these models:
class IceCream(TimeStampedModel):
name = models.CharField()
color = models.CharField()
class Cupcake(TimeStampedModel):
name = models.CharField()
icing = models.CharField()
So on this page, users will be able to input a date range for the summary. I'm using DRF to serialize the data and to display them on the view actions. After I receive the filter dates, I will filter out the IceCream objects and Cupcake objects using the created field from TimeStampedModel.
#action(detail=False, methods=['get'])
def dessert_summary(self, request, **kwargs):
start_date = self.request.query_params.get('start_date')
end_date = self.request.query_params.get('end_date')
cupcakes = Cupcake.objects.filter(created__date__range=[start_date, end_date])
ice_creams = IceCream.objects.filter(created__date__range=[start_date, end_date])
After filtering, I want to count the total cupcakes and the total ice creams that is created within that period of time. But I also want to group them by the dates, and display the total count for both ice creams and cupcakes based on that date. So I tried to annotate the querysets like this:
cupcakes = cupcakes.annotate(date=TruncDate('created'))
cupcakes = cupcakes.values('date')
cupcakes = cupcakes.annotate(total_cupcakes=Count('id'))
ice_creams = ice_creams.annotate(date=TruncDate('created'))
ice_creams = ice_creams.values('date')
ice_creams = ice_creams.annotate(total_ice_creams=Count('id'))
So I want the result to be something like this:
{
'summary': [{
'date': "2020-09-24",
'total_ice_creams': 10,
'total_cupcakes': 7,
'total_dessert': 17
}, {
'date': "2020-09-25',
'total_ice_creams': 6,
'total_cupcakes': 5,
'total_dessert': 11
}]
}
But right now this is what I am getting:
{
'summary': [{
'cupcakes': [{
'date': "2020-09-24",
'total_cupcakes': 10,
}, {
'date': "2020-09-25",
'total_cupcakes': 5,
}],
'ice_creams': [{
'date': "2020-09-24",
'total_ice_creams': 7,
}, {
'date': "2020-09-27",
'total_ice_creams': 6,
}]
}]
}
What I want to ask is how do I get all the dates of both querysets, sum the ice creams and cupcakes, and return the data like the expected result? Thanks in advance for your help!
So here's what you can do:
gather all icecream/cupcakes count data into a dictionary
icecream_dict = {obj['date']: obj['count'] for obj in ice_creams}
cupcakes_dict = {obj['date']: obj['count'] for obj in cupcakes}
create a sorted list with all the dates
all_dates = sorted(set(list(icecream_dict.keys()) + list(cupcakes_dict.keys())))
create a list with items for each date and their count
result = []
for each_date in all_dates:
total_ice_creams = icecream_dict.get(each_date, 0)
total_cupcakes = cupcakes_dict.get(each_date, 0)
res = {
'date': each_date,
'total_ice_creams': total_ice_creams,
'total_cupcakes': total_cupcakes,
'total_dessert': total_ice_creams + total_cupcakes
}
result.append(res)
# check the result
print(result)
Hint: If you plan to add more desert-like models, consider have a base model Desert that you could query directly instead of querying each desert type model.