Presto - reducing an array of structs

Presto - reducing an array of structs - amazon-athena

I'm trying to reduce an array of complex types, however I'm running into a syntax error (maybe this isn't even supported?).
SYNTAX_ERROR: line 2:1: Unexpected parameters (array(row(count
double,name varchar)), integer,
com.facebook.presto.sql.analyzer.TypeSignatureProvider#16881774,
com.facebook.presto.sql.analyzer.TypeSignatureProvider#1718b83d) for
function reduce. Expected: reduce(array(T), S, function(S,T,S),
function(S,R)) T, S, R
The complex type is defined as counters array<struct<count:double,name:string>> in the table. I have tried selecting reduce(counters, 0, (state, counter) -> state + counter.count , s -> s) and reduce(counters, 0, (state, counter) -> state + counter['count'] , s -> s), however neither work.

Your approach is correct (tested with Presto 0.205):
presto:default> desc t;
Column | Type | Extra | Comment
----------+----------------------------------------+-------+---------
counters | array(row(count double, name varchar)) | |
presto:default> select * from t;
counters
--------------------------------------------
[{count=1.0, name=a}, {count=2.0, name=b}]
presto:default> select reduce(
counters, 0,
(state, counter) -> state + counter.count,
state -> state) from t;
_col0
-------
3.0
(1 row)
You tagged the question prestodb and amazon-athena. If you are trying this on Athena, please bear in mind that Athena is based on Presto 0.172 (which was released April 2017)

Related

How to repeatedly insert arguments from a list into a function until the list is empty?

Using R, I am working with simulating the outcome from an experiment where participants choose between two options (A or B) defined by their outcomes (x) and probabilities of winning the outcome (p). I have a function "f" that collects its arguments in a matrix with the columns "x" (outcome) and "p" (probability):
f <- function(x, p) {
t <- matrix(c(x,p), ncol=2)
colnames(t) <- c("x", "p")
t
}
I want to use this function to compile a big list of all the trials in the experiment. One way to do this is:
t1 <- list(1A=f(x=c(10), p=c(0.8)),
1B=f(x=c(5), p=c(1)))
t2 <- list(2A=f(x=c(11), p=c(0.8)),
2B=f(x=c(7), p=c(1)))
.
.
.
tn <- list(nA=f(x=c(3), p=c(0.8)),
nB=f(x=c(2), p=c(1)))
Big_list <- list(t1=t1, t2=t2, ... tn=tn)
rm(t1, t2, ... tn)
However, I have very many trials, which may change in future simulations, why repeating myself in this way is intractable. I have my trials in an excel document with the following structure:
| Option | x | p |
|---- |------| -----|
| A | 10 | 0.8 |
| B | 7 | 1 |
| A | 9 | 0.8 |
| B | 5 | 1 |
|... |...| ...|
I am trying to do some kind of loop which takes "x" and "p" from each "A" and "B" and inserts them into the function f, while skipping two rows ahead after each iteration (so that each option is only inserted once). This way, I want to get a set of lists t1 to tn while not having to hardcode everything. This is my best (but still not very good) attempt to explain it in pseudocode:
TRIALS <- read.excel(file_with_trials)
for n=1 to n=(nrows(TRIALS)-1) {
t(*PRINT 'n' HERE*) <- list(
(*PRINT 'n' HERE*)A=
f(x=c(*INSERT COLUMN 1, ROW n FROM "TRIALS"*),
p=c(*INSERT COLUMN 2, ROW n FROM "TRIALS"*)),
(*PRINT 'Z' HERE*)B=
f(x=c(*INSERT COLUMN 1, ROW n+1 FROM "TRIALS"*),
p=c(*INSERT COLUMN 2, ROW n+1 FROM "TRIALS"*)))
}
Big_list <- list(t1=t1, t2=t2, ... tn=tn)
That is, I want the code to create a numbered set of lists by drawing x and p from each pair of rows until my excel file is empty.
Any help (and feedback on how to improve this question) is greatly appreciated!

antlr visitor: lookup of reserved words efficiently

I'm learning Antlr. At this point, I'm writing a little stack-based language as part of my learning process -- think PostScript or Forth. An RPN language. For instance:
10 20 mul
This would push 10 and 20 on the stack and then perform a multiply, which pops two values, multiplies them, and pushes 200. I'm using the visitor pattern. And I find myself writing some code that's kind of insane. There has to be a better way.
Here's a section of my WaveParser.g4 file:
any_operator:
value_operator |
stack_operator |
logic_operator |
math_operator |
flow_control_operator;
value_operator:
BIND | DEF
;
stack_operator:
DUP |
EXCH |
POP |
COPY |
ROLL |
INDEX |
CLEAR |
COUNT
;
BIND is just the bind keyword, etc. So my visitor has this method:
antlrcpp::Any WaveVisitor::visitAny_operator(Parser::Any_operatorContext *ctx);
And now here's where I'm getting to the very ugly code I'm writing, which leads to the question.
Value::Operator op = Value::Operator::NO_OP;
WaveParser::Value_operatorContext * valueOp = ctx->value_operator();
WaveParser::Stack_operatorContext * stackOp = ctx->stack_operator();
WaveParser::Logic_operatorContext * logicOp = ctx->logic_operator();
WaveParser::Math_operatorContext * mathOp = ctx->math_operator();
WaveParser::Flow_control_operatorContext * flowOp = ctx->flow_control_operator();
if (valueOp) {
if (valueOp->BIND()) {
op = Value::Operator::BIND;
}
else if (valueOp->DEF()) {
op = Value::Operator::DEF;
}
}
else if (stackOp) {
if (stackOp->DUP()) {
op = Value::Operator::DUP;
}
...
}
...
I'm supporting approximately 50 operators, and it's insane that I'm going to have this series of if statements to figure out which operator this is. There must be a better way to do this. I couldn't find a field on the context that mapped to something I could use in a hashmap table.
I don't know if I should make every one of my operators have a separate rule, and use the corresponding method in my visitor, or if what else I'm missing.
Is there a better way?

With ANTLR, it's usually very helpful to label components of your rules, as well as the high level alternatives.
If part of a parser rule can only be one thing with a single type, usually the default accessors are just fine. But if you have several alternatives that are essentially alternatives for the "same thing", or perhaps you have the same sub-rule reference in a parser rule more than one time and want to differentiate them, it's pretty handy to give them names. (Once you start doing this and see the impact to the Context classes, it'll become pretty obvious where they provide value.)
Also, when rules have multiple top-level alternatives, it's very handy to give each of them a label. This will cause ANTLR to generate a separate Context class for each alternative, instead of dumping everything from every alternative into a single class.
(making some stuff up just to get a valid compile)
grammar WaveParser
;
any_operator
: value_operator # val_op
| stack_operator # stack_op
| logic_operator # logic_op
| math_operator # math_op
| flow_control_operator # flow_op
;
value_operator: op = ( BIND | DEF);
stack_operator
: op = (
DUP
| EXCH
| POP
| COPY
| ROLL
| INDEX
| CLEAR
| COUNT
)
;
logic_operator: op = (AND | OR);
math_operator: op = (ADD | SUB);
flow_control_operator: op = (FLOW1 | FLOW2);
AND: 'and';
OR: 'or';
ADD: '+';
SUB: '-';
FLOW1: '>>';
FLOW2: '<<';
BIND: 'bind';
DEF: 'def';
DUP: 'dup';
EXCH: 'exch';
POP: 'pop';
COPY: 'copy';
ROLL: 'roll';
INDEX: 'index';
CLEAR: 'clear';
COUNT: 'count';

Django ORM calculations between records

Is it possible to perform calculations between records in a Django query?
I know how to perform calculations across records (e.g. data_a + data_b). Is there way to perform say the percent change between data_a row 0 and row 4 (i.e. 09-30-17 and 09-30-16)?
+-----------+--------+--------+
| date | data_a | data_b |
+-----------+--------+--------+
| 09-30-17 | 100 | 200 |
| 06-30-17 | 95 | 220 |
| 03-31-17 | 85 | 205 |
| 12-31-16 | 80 | 215 |
| 09-30-16 | 75 | 195 |
+-----------+--------+--------+
I am currently using Pandas to perform these type of calculations, but would like eliminate this additional step if possible.

I would go with a Database cursor raw SQL
(see https://docs.djangoproject.com/en/2.0/topics/db/sql/)
combined with a Lag() window function as so:
result = cursor.execute("""
select date,
data_a - lag(data_a) over (order by date) as data_change,
from foo;""")
This is the general idea, you might need to change it according to your needs.

There is no row 0 in a Django database, so we'll assume rows 1 and 5.
The general formula for calculation of percentage as expressed in Python is:
((b - a) / a) * 100
where a is the starting number and b is the ending number. So in your example:
a = 100
b = 75
((b - a) / a) * 100
-25.0
If your model is called Foo, the queries you want are:
(a, b) = Foo.objects.filter(id__in=[id_1, id_2]).values_list('data_a', flat=True)
values_list says "get just these fields" and flat=True means you want a simple list of values, not key/value pairs. By assigning it to the (a, b) tuple and using the __in= clause, you get to do this as a single query rather than as two.
I would wrap it all up into a standalone function or model method:
def pct_change(id_1, id_2):
# Get a single column from two rows and return percentage of change
(a, b) = Foo.objects.filter(id__in=[id_1, id_2]).values_list('data_a', flat=True)
return ((b - a) / a) * 100
And then if you know the row IDs in the db for the two rows you want to compare, it's just:
print(pct_change(233, 8343))
If you'd like to calculate progressively the change between row 1 and row 2, then between row 2 and row 3, and so on, you'd just run this function sequentially for each row in a queryset. Because row IDs might have gaps we can't just use n+1 to compute the next row. Instead, start by getting a list of all the row IDs in a queryset:
rows = [r.id for r in Foo.objects.all().order_by('date')]
Which evaluates to something like
rows = [1,2,3,5,6,9,13]
Now for each elem in list and the next elem in list, run our function:
for (index, row) in enumerate(rows):
if index < len(rows):
current, next_ = row, rows[index + 1]
print(current, next_)
print(pct_change(current, next_))

Haskell, how to implement SQL like operations?

I’m trying to do some SQL-like operations with Haskell, but I have no idea about what data structures to use. I have 3 different tables: customer, sales, and order. The schemas are below:
Customer
custid — integer (primary key)
name — string
Example:
1|Samson Bowman
2|Zelda Graves
3|Noah Hensley
4|Noelle Haynes
5|Paloma Deleon
Sales
orderid — integer (primary key)
custid — integer
date — string
Example:
1|3|20/3/2014
2|4|25/4/2014
3|5|17/7/2014
4|9|5/1/2014
5|5|9/6/2014
Order
orderid — integer
item — string
Example:
2|gum
4|sandals
3|pen
1|gum
2|pen
3|chips
1|pop
5|chips
What i want to do is to “merge” these three tables into a new table, and the schema of new table is:
Customername Order# Date Items
Samson Bowman 17 20/3/2014 shoes, socks, milk
Samson Bowman 34 19/5/2014 gum, sandals, butter, pens, pencils
Paloma Deleon 41 6/1/2014 computer
…
So yeah, it is very SQL like. I know the SQL is very simple, but how can I implement this without SQL but instead using built-in data structure?
TEXT PRINT ERROR
When i run the function , it shows the following error:
Couldn't match type `[Char]' with `Char'
Expected type: Customer -> String
Actual type: Customer -> [String]
In the first argument of `map', namely `formatCustomer'
In the second argument of `($)', namely `map formatCustomer result'
And i am thinking that the return type of condense is [Customer], but formatCustomer uses only Customer. is this the reason?

All of your associations are one-to-many, and they don’t refer to eachother; it is strictly hierarchical. Customers have sales, sales have orders. Given that, you probably wouldn’t store each bit of information separately, but hierarchically as it truly is. I might put it into data types like this:
data Customer = Customer { customerName :: String
, sales :: [Sale]
} deriving (Eq, Read, Show)
data Sale = Sale { saleDate :: Day
, soldItems :: [String]
} deriving (Eq, Read, Show)
This will probably be very easy to manipulate from within Haskell, and, as a bonus, it’s very easy to turn into the table you wanted to end up with, simply because it’s so close to that in the first place.
But maybe I’ve misinterpreted your question and you’re not just asking for the best data structure to hold it, but how to convert from your flat data structure into this sort of structure. Fortunately, that’s easy enough. Since everything is keyed, I’d construct a Map and start unionWithing things in, or even better, do both at once with fromListWith. To put that more concretely, say you have these data structures:
data DBCustomer = DBCustomer { dbCustomerName :: String
, dbCustomerID :: Int
} deriving (Eq, Read, Show)
data DBSale = DBSale { saleOrderID :: Int
, saleCustomerID :: Int
, dbSaleDate :: Day
} deriving (Eq, Read, Show)
data DBOrder = DBOrder { dbOrderID :: Int
, dbOrderItem :: String
} deriving (Eq, Read, Show)
If I wanted a function with the type [DBSale] -> [DBOrder] -> [Sale], I could write it easily enough:
condense :: [DBSale] -> [DBOrder] -> [Sale]
condense dbSales dbOrders = flip map dbSales $ \dbSale ->
Sale (dbSaleDate dbSale)
$ fromMaybe [] (Map.lookup (saleOrderID dbSale) ordersByID) where
ordersByID = Map.fromListWith (++) . flip map dbOrders
$ \dbOrder -> (dbOrderID dbOrder, [dbOrderItem dbOrder])
Here I’m discarding the customer ID since there’s no slot in Sale for that, but you could certainly throw in another Map and get whole Customer objects out:
condense :: [DBCustomer] -> [DBSale] -> [DBOrder] -> [Customer]
condense dbCustomers dbSales dbOrders = flip map dbCustomers $ \dbCustomer ->
Customer (dbCustomerName dbCustomer)
$ lookupDef [] (dbCustomerID dbCustomer) salesByCustomerID where
lookupDef :: (Ord k) => a -> k -> Map.Map k a -> a
lookupDef def = (fromMaybe def .) . Map.lookup
salesByCustomerID = Map.fromListWith (++) . flip map dbSales
$ \dbSale -> (saleCustomerID dbSale,
[ Sale (dbSaleDate dbSale)
$ lookupDef [] (saleOrderID dbSale)
ordersByID])
ordersByID = Map.fromListWith (++) . flip map dbOrders
$ \dbOrder -> (dbOrderID dbOrder, [dbOrderItem dbOrder])
Printing
This should be reasonably easy. We’ll use Text.Printf since it makes putting things in columns easier. On the whole, each row in the result is a Sale. First, we can try formatting a single row:
formatSale :: Customer -> Sale -> String
formatSale customer sale = printf "%-16s%-8d%-10s%s"
(customerName customer)
(orderID sale)
(show $ saleDate sale)
(intercalate "," $ soldItems sale)
(Actually, we discarded the order ID; if you want to preserve that in your output, you’ll have to add that into the Sale data structure.) Then to get a list of lines for each customer is easy:
formatCustomer :: Customer -> [String]
formatCustomer customer = map (formatSale customer) $ sales customer
And then to do it for all customers and print it out, if customers was the output of condense:
putStr . unlines $ concatMap formatCustomer customers

I have some similar problems and the best I found to do SQL Join operations is to use the align function from the these package, combined with Map (where the key is on what you want to join).
The result of align will give you a map or list of These a b which is either an a, a b or both. That's pretty neat.

Operating on an F# List of Union Types

This is a continuation of my question at F# List of Union Types. Thanks to the helpful feedback, I was able to create a list of Reports, with Report being either Detail or Summary. Here's the data definition once more:
module Data
type Section = { Header: string;
Lines: string list;
Total: string }
type Detail = { State: string;
Divisions: string list;
Sections: Section list }
type Summary = { State: string;
Office: string;
Sections: Section list }
type Report = Detail of Detail | Summary of Summary
Now that I've got the list of Reports in a variable called reports, I want to iterate over those Report objects and perform operations based on each one. The operations are the same except for the cases of dealing with either Detail.Divisions or Summary.Office. Obviously, I have to handle those differently. But I don't want to duplicate all the code for handling the similar State and Sections of each.
My first (working) idea is something like the following:
for report in reports do
let mutable isDetail = false
let mutable isSummary = false
match report with
| Detail _ -> isDetail <- true
| Summary _ -> isSummary <- true
...
This will give me a way to know when to handle Detail.Divisions rather than Summary.Office. But it doesn't give me an object to work with. I'm still stuck with report, not knowing which it is, Detail or Summary, and also unable to access the attributes. I'd like to convert report to the appropriate Detail or Summary and then use the same code to process either case, with the exception of Detail.Divisions and Summary.Office. Is there a way to do this?
Thanks.

You could do something like this:
for report in reports do
match report with
| Detail { State = s; Sections = l }
| Summary { State = s; Sections = l } ->
// common processing for state and sections (using bound identifiers s and l)
match report with
| Detail { Divisions = l } ->
// unique processing for divisions
| Summary { Office = o } ->
// unique processing for office

The answer by #kvb is probably the approach I would use if I had the data structure you described. However, I think it would make sense to think whether the data types you have are the best possible representation.
The fact that both Detail and Summary share two of the properties (State and Sections) perhaps implies that there is some common part of a Report that is shared regardless of the kind of report (and the report can either add Divisions if it is detailed or just Office if if is summary).
Something like that would be better expressed using the following (Section stays the same, so I did not include it in the snippet):
type ReportInformation =
| Divisions of string list
| Office of string
type Report =
{ State : string;
Sections : Section list
Information : ReportInformation }
If you use this style, you can just access report.State and report.Sections (to do the common part of the processing) and then you can match on report.Information to do the varying part of the processing.
EDIT - In answer to Jeff's comment - if the data structure is already fixed, but the view has changed, you can use F# active patterns to write "adaptor" that provides access to the old data structure using the view that I described above:
let (|Report|) = function
| Detail dt -> dt.State, dt.Sections
| Summary st -> st.State, st.Sections
let (|Divisions|Office|) = function
| Detail dt -> Divisions dt.Divisions
| Summary st -> Office st.Office
The first active pattern always succeeds and extracts the common part. The second allows you to distinguish between the two cases. Then you can write:
let processReport report =
let (Report(state, sections)) = report
// Common processing
match report wiht
| Divisions divs -> // Divisions-specific code
| Office ofc -> // Offices-specific code
This is actually an excellent example of how F# active patterns provide an abstraction that allows you to hide implementation details.

kvb's answer is good, and probably what I would use. But the way you've expressed your problem sounds like you want classic inheritance.
type ReportPart(state, sections) =
member val State = state
member val Sections = sections
type Detail(state, sections, divisions) =
inherit ReportPart(state, sections)
member val Divisions = divisions
type Summary(state, sections, office) =
inherit ReportPart(state, sections)
member val Office = office
Then you can do precisely what you expect:
for report in reports do
match report with
| :? Detail as detail -> //use detail.Divisions
| :? Summary as summary -> //use summary.Office
//use common properties

You can pattern match on the Detail or Summary record in each of the union cases when you match and handle the Divisions or Office value with a separate function e.g.
let blah =
for report in reports do
let out = match report with
| Detail({ State = state; Divisions = divisions; Sections = sections } as d) ->
Detail({ d with Divisions = (handleDivisions divisions) })
| Summary({ State = state; Office = office; Sections = sections } as s) ->
Summary( { s with Office = handleOffice office })
//process out

You can refactor the code to have a utility function for each common field and use nested pattern matching:
let handleReports reports =
reports |> List.iter (function
| Detail {State = s; Sections = ss; Divisions = ds} ->
handleState s
handleSections ss
handleDivisions ds
| Summary {State = s; Sections = ss; Office = o} ->
handleState s
handleSections ss
handleOffice o)
You can also filter Detail and Summary to process them separately in different functions:
let getDetails reports =
List.choose (function Detail d -> Some d | _ -> None) reports
let getSummaries reports =
List.choose (function Summary s -> Some s | _ -> None) reports

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Presto - reducing an array of structs - amazon-athena

Related

How to repeatedly insert arguments from a list into a function until the list is empty?

antlr visitor: lookup of reserved words efficiently

Django ORM calculations between records

Haskell, how to implement SQL like operations?

Operating on an F# List of Union Types

Categories

Resources