I’m trying to do some SQL-like operations with Haskell, but I have no idea about what data structures to use. I have 3 different tables: customer, sales, and order. The schemas are below:
Customer
custid — integer (primary key)
name — string
Example:
1|Samson Bowman
2|Zelda Graves
3|Noah Hensley
4|Noelle Haynes
5|Paloma Deleon
Sales
orderid — integer (primary key)
custid — integer
date — string
Example:
1|3|20/3/2014
2|4|25/4/2014
3|5|17/7/2014
4|9|5/1/2014
5|5|9/6/2014
Order
orderid — integer
item — string
Example:
2|gum
4|sandals
3|pen
1|gum
2|pen
3|chips
1|pop
5|chips
What i want to do is to “merge” these three tables into a new table, and the schema of new table is:
Customername Order# Date Items
Samson Bowman 17 20/3/2014 shoes, socks, milk
Samson Bowman 34 19/5/2014 gum, sandals, butter, pens, pencils
Paloma Deleon 41 6/1/2014 computer
…
So yeah, it is very SQL like. I know the SQL is very simple, but how can I implement this without SQL but instead using built-in data structure?
TEXT PRINT ERROR
When i run the function , it shows the following error:
Couldn't match type `[Char]' with `Char'
Expected type: Customer -> String
Actual type: Customer -> [String]
In the first argument of `map', namely `formatCustomer'
In the second argument of `($)', namely `map formatCustomer result'
And i am thinking that the return type of condense is [Customer], but formatCustomer uses only Customer. is this the reason?
All of your associations are one-to-many, and they don’t refer to eachother; it is strictly hierarchical. Customers have sales, sales have orders. Given that, you probably wouldn’t store each bit of information separately, but hierarchically as it truly is. I might put it into data types like this:
data Customer = Customer { customerName :: String
, sales :: [Sale]
} deriving (Eq, Read, Show)
data Sale = Sale { saleDate :: Day
, soldItems :: [String]
} deriving (Eq, Read, Show)
This will probably be very easy to manipulate from within Haskell, and, as a bonus, it’s very easy to turn into the table you wanted to end up with, simply because it’s so close to that in the first place.
But maybe I’ve misinterpreted your question and you’re not just asking for the best data structure to hold it, but how to convert from your flat data structure into this sort of structure. Fortunately, that’s easy enough. Since everything is keyed, I’d construct a Map and start unionWithing things in, or even better, do both at once with fromListWith. To put that more concretely, say you have these data structures:
data DBCustomer = DBCustomer { dbCustomerName :: String
, dbCustomerID :: Int
} deriving (Eq, Read, Show)
data DBSale = DBSale { saleOrderID :: Int
, saleCustomerID :: Int
, dbSaleDate :: Day
} deriving (Eq, Read, Show)
data DBOrder = DBOrder { dbOrderID :: Int
, dbOrderItem :: String
} deriving (Eq, Read, Show)
If I wanted a function with the type [DBSale] -> [DBOrder] -> [Sale], I could write it easily enough:
condense :: [DBSale] -> [DBOrder] -> [Sale]
condense dbSales dbOrders = flip map dbSales $ \dbSale ->
Sale (dbSaleDate dbSale)
$ fromMaybe [] (Map.lookup (saleOrderID dbSale) ordersByID) where
ordersByID = Map.fromListWith (++) . flip map dbOrders
$ \dbOrder -> (dbOrderID dbOrder, [dbOrderItem dbOrder])
Here I’m discarding the customer ID since there’s no slot in Sale for that, but you could certainly throw in another Map and get whole Customer objects out:
condense :: [DBCustomer] -> [DBSale] -> [DBOrder] -> [Customer]
condense dbCustomers dbSales dbOrders = flip map dbCustomers $ \dbCustomer ->
Customer (dbCustomerName dbCustomer)
$ lookupDef [] (dbCustomerID dbCustomer) salesByCustomerID where
lookupDef :: (Ord k) => a -> k -> Map.Map k a -> a
lookupDef def = (fromMaybe def .) . Map.lookup
salesByCustomerID = Map.fromListWith (++) . flip map dbSales
$ \dbSale -> (saleCustomerID dbSale,
[ Sale (dbSaleDate dbSale)
$ lookupDef [] (saleOrderID dbSale)
ordersByID])
ordersByID = Map.fromListWith (++) . flip map dbOrders
$ \dbOrder -> (dbOrderID dbOrder, [dbOrderItem dbOrder])
Printing
This should be reasonably easy. We’ll use Text.Printf since it makes putting things in columns easier. On the whole, each row in the result is a Sale. First, we can try formatting a single row:
formatSale :: Customer -> Sale -> String
formatSale customer sale = printf "%-16s%-8d%-10s%s"
(customerName customer)
(orderID sale)
(show $ saleDate sale)
(intercalate "," $ soldItems sale)
(Actually, we discarded the order ID; if you want to preserve that in your output, you’ll have to add that into the Sale data structure.) Then to get a list of lines for each customer is easy:
formatCustomer :: Customer -> [String]
formatCustomer customer = map (formatSale customer) $ sales customer
And then to do it for all customers and print it out, if customers was the output of condense:
putStr . unlines $ concatMap formatCustomer customers
I have some similar problems and the best I found to do SQL Join operations is to use the align function from the these package, combined with Map (where the key is on what you want to join).
The result of align will give you a map or list of These a b which is either an a, a b or both. That's pretty neat.
Related
I have my model from models.persistentmodels
...
Thing
title Text
price Int
kosher Bool
optionalstuff [Text] Maybe
createdat UTCTime
updatedat UTCTime
deriving Show
...
It contains two time fields, which are UTCTime.
I am receiving via AJAX what is almost a Thing, in JSON. But the user JSON should not have createdat and updatedat or kosher. So we need to fill them in.
postNewEventR = do
inputjson <- requireCheckJsonBody :: Handler Value
...
-- get rawstringofthings from inputjson
...
let objectsMissingSomeFields = case (decode (BL.fromStrict $ TE.encodeUtf8 rawstringofthings) :: Maybe [Object]) of
Nothing -> error "Failed to get a list of raw objects."
Just x -> x
now <- liftIO getCurrentTime
-- Solution needs to go here:
let objectsWithAllFields = objectsMissingSomeFields
-- We hope to be done
let things = case (eitherDecode $ encode objectsWithAllFields) :: Either String [Thing] of
Left err -> error $ "Failed to get things because: " <> err
Right xs -> xs
The error "Failed to get things" comes here because the JSON objects we parsed are missing fields that are needed in the model.
Solution
let objectsWithAllFields = Import.map (tackOnNeccessaryThingFields now True) objectsMissingSomeFields
So we take the current object and tack on the missing fields e.g. kosher and createdat.
But there is some strange difference in the way UTCTime is read vs aeson's way to parse UTCTime. So when I print UTCTime in to a Aeson String, I needed to print out the UTCTime into the format that it is expecting later:
tackOnNeccessaryThingFields :: UTCTime -> Bool -> Object -> Object
tackOnNeccessaryThingFields t b hm = G.fromList $ (G.toList hm) <> [
("createdat", String (pack $ formatTime defaultTimeLocale "%FT%T%QZ" t)),
("updatedat", String (pack $ formatTime defaultTimeLocale "%FT%T%QZ" t)),
("kosher", Bool b)
]
tackOnNeccessaryThingFields _ _ _ = error "This isn't an object."
After this fix, the object has all the fields needed to make the record, so the code gives [Thing].
And also the code runs without runtime error, instead of failing to parse the tshow t as UTCTime.
Note:
This aeson github issue about this problem seems to be closed but it seems to be not any more permissive: https://github.com/bos/aeson/issues/197
Thanks to Artyom:
https://artyom.me/aeson#records-and-json-generics
Thanks to Pbrisbin:
https://pbrisbin.com/posts/writing_json_apis_with_yesod/
Thanks to Snoyman:
For everything
Sas has a procedure called rank that assigns a "rank" to each row in a dataframe according to the position in an ordered set of a variable, kind of; but the rank is not just the position: one has to tell the procedure how many groups use in the ranking. The rank is actually the group to which the row belongs.
In SQL terms, this is called a dense ranking.
Example (the salary variable is included for generality, but it is not used in this example):
Say we have this data frame:
If we rank by age using 4 groups, sas would give us this:
It is easier to understand what happened if we sort the data by the variable we ranked:
Now we can see why rank gives us the position in an ordered set, kind of.
The rank procedure is very useful and cool, but I could't find in Deedle's doc how to perform it. Is there direct way to do it in Deedle or I need to create my own extension?
I suppose I could do it using these functions:
SortRows(frame, key)
chunk size series
I wrote my own extension:
type Frame<'TRowKey, 'TColumnKey
when 'TRowKey : equality
and 'TColumnKey : equality> with
static member denseRank column (groups:int) rankName frame =
let frameLength = Frame.countRows frame |> float
let chunkSize = frameLength / (float groups) |> Math.Ceiling |> int64
let sorted =
frame
|> Frame.sortRows column
let ranks =
Frame.getCol column frame
|> Series.map(fun k _ ->
int ((Frame.indexForKey k sorted) / chunkSize)
)
let clone = frame.Clone()
clone.AddColumn(rankName, ranks)
clone
where indexForKey is this other custom extension:
// index for row with key
// index starting at 0
static member indexForKey (key:'K) (frame:Frame<'K,_>) : int64 =
frame.RowIndex.Locate key
|> frame.RowIndex.AddressOperations.OffsetOf
I tried this other definition hoping that it would run faster. It is slightly faster, but not by a lot; any comments on performance issues are welcomed:
static member denseRank column (groups:int) rankName frame =
let frameLength = Frame.countRows frame
let chunkSize = (float frameLength) / (float groups) |> Math.Ceiling
let sorted =
frame
|> Frame.sortRows column
let sortedKeys = Frame.getRowKeys sorted
let ranksArr = Array.zeroCreate frameLength
sortedKeys
|> Seq.iteri (fun index _ -> ranksArr.[index] <- index / (int chunkSize))
let ranks = Series(sortedKeys, ranksArr)
let clone = frame.Clone()
clone.AddColumn(rankName, ranks)
clone
I would like to know how to filter a whole list out of list of lists
Example: [ ["Bob", "Baker", "male", "70000"],
["Alice", "Allen", "female", "82000"] ]
And now I would like to filter the list which contains female. So output would be:
["Alice", "Allen", "female", "82000"]
Thanks
Ankur's answer will certainly solve your problem, but I would like to make a suggestion that could make your life easier. It seems that you're storing all your data as strings in lists, but really what you'd like is a data type that could hold all this data in a more organized fashion, which can be done using Haskell data types, something like:
data Person = Person {
firstName :: String,
lastName :: String,
gender :: String,
salary :: String
} deriving (Eq, Show)
Then you could easily sort your data with filter (("female" ==) . gender) a. While this is a bit more code up front, later on if you were to add a "title" field for "Mr", "Mrs", etc, then it wouldn't matter if you added it at as the first field or the last field, this code would still work. Also, if for whatever reason you had an invalid value like ["Bob", "Baker", "male", "70000", "female"], Ankur's solution would give you an incorrect result, but with a custom data type, this would not even compile.
You could further improve your data structure with a few tweaks. I would suggest making a data type for gender, and then use Int or Double for the salary field, so you would have
data Gender = Male | Female deriving (Eq, Show, Read)
data Person = Person {
firstName :: String,
lastName :: String,
gender :: Gender,
salary :: Int
} deriving (Eq, Show)
filtGender :: Gender -> [Person] -> [Person]
filtGender gend people = filter ((gend ==) . gender) people
main :: IO ()
main = do
let people = [Person "Bob" "Baker" Male 70000,
Person "Alice" "Allen" Female 82000]
putStr "The females are: "
print $ filtGender Female people
putStr "The males are: "
print $ filtGender Male people
Prelude> let a = [ ["Bob", "Baker", "male", "70000"], ["Alice", "Allen", "female", "82000"] ]
Prelude> filter (elem "female") a
[["Alice","Allen","female","82000"]]
This is a continuation of my question at F# List of Union Types. Thanks to the helpful feedback, I was able to create a list of Reports, with Report being either Detail or Summary. Here's the data definition once more:
module Data
type Section = { Header: string;
Lines: string list;
Total: string }
type Detail = { State: string;
Divisions: string list;
Sections: Section list }
type Summary = { State: string;
Office: string;
Sections: Section list }
type Report = Detail of Detail | Summary of Summary
Now that I've got the list of Reports in a variable called reports, I want to iterate over those Report objects and perform operations based on each one. The operations are the same except for the cases of dealing with either Detail.Divisions or Summary.Office. Obviously, I have to handle those differently. But I don't want to duplicate all the code for handling the similar State and Sections of each.
My first (working) idea is something like the following:
for report in reports do
let mutable isDetail = false
let mutable isSummary = false
match report with
| Detail _ -> isDetail <- true
| Summary _ -> isSummary <- true
...
This will give me a way to know when to handle Detail.Divisions rather than Summary.Office. But it doesn't give me an object to work with. I'm still stuck with report, not knowing which it is, Detail or Summary, and also unable to access the attributes. I'd like to convert report to the appropriate Detail or Summary and then use the same code to process either case, with the exception of Detail.Divisions and Summary.Office. Is there a way to do this?
Thanks.
You could do something like this:
for report in reports do
match report with
| Detail { State = s; Sections = l }
| Summary { State = s; Sections = l } ->
// common processing for state and sections (using bound identifiers s and l)
match report with
| Detail { Divisions = l } ->
// unique processing for divisions
| Summary { Office = o } ->
// unique processing for office
The answer by #kvb is probably the approach I would use if I had the data structure you described. However, I think it would make sense to think whether the data types you have are the best possible representation.
The fact that both Detail and Summary share two of the properties (State and Sections) perhaps implies that there is some common part of a Report that is shared regardless of the kind of report (and the report can either add Divisions if it is detailed or just Office if if is summary).
Something like that would be better expressed using the following (Section stays the same, so I did not include it in the snippet):
type ReportInformation =
| Divisions of string list
| Office of string
type Report =
{ State : string;
Sections : Section list
Information : ReportInformation }
If you use this style, you can just access report.State and report.Sections (to do the common part of the processing) and then you can match on report.Information to do the varying part of the processing.
EDIT - In answer to Jeff's comment - if the data structure is already fixed, but the view has changed, you can use F# active patterns to write "adaptor" that provides access to the old data structure using the view that I described above:
let (|Report|) = function
| Detail dt -> dt.State, dt.Sections
| Summary st -> st.State, st.Sections
let (|Divisions|Office|) = function
| Detail dt -> Divisions dt.Divisions
| Summary st -> Office st.Office
The first active pattern always succeeds and extracts the common part. The second allows you to distinguish between the two cases. Then you can write:
let processReport report =
let (Report(state, sections)) = report
// Common processing
match report wiht
| Divisions divs -> // Divisions-specific code
| Office ofc -> // Offices-specific code
This is actually an excellent example of how F# active patterns provide an abstraction that allows you to hide implementation details.
kvb's answer is good, and probably what I would use. But the way you've expressed your problem sounds like you want classic inheritance.
type ReportPart(state, sections) =
member val State = state
member val Sections = sections
type Detail(state, sections, divisions) =
inherit ReportPart(state, sections)
member val Divisions = divisions
type Summary(state, sections, office) =
inherit ReportPart(state, sections)
member val Office = office
Then you can do precisely what you expect:
for report in reports do
match report with
| :? Detail as detail -> //use detail.Divisions
| :? Summary as summary -> //use summary.Office
//use common properties
You can pattern match on the Detail or Summary record in each of the union cases when you match and handle the Divisions or Office value with a separate function e.g.
let blah =
for report in reports do
let out = match report with
| Detail({ State = state; Divisions = divisions; Sections = sections } as d) ->
Detail({ d with Divisions = (handleDivisions divisions) })
| Summary({ State = state; Office = office; Sections = sections } as s) ->
Summary( { s with Office = handleOffice office })
//process out
You can refactor the code to have a utility function for each common field and use nested pattern matching:
let handleReports reports =
reports |> List.iter (function
| Detail {State = s; Sections = ss; Divisions = ds} ->
handleState s
handleSections ss
handleDivisions ds
| Summary {State = s; Sections = ss; Office = o} ->
handleState s
handleSections ss
handleOffice o)
You can also filter Detail and Summary to process them separately in different functions:
let getDetails reports =
List.choose (function Detail d -> Some d | _ -> None) reports
let getSummaries reports =
List.choose (function Summary s -> Some s | _ -> None) reports
For example i have erlang record:
-record(state, {clients
}).
Can i make from clients field list?
That I could keep in client filed as in normal list? And how can i add some values in this list?
Thank you.
Maybe you mean something like:
-module(reclist).
-export([empty_state/0, some_state/0,
add_client/1, del_client/1,
get_clients/1]).
-record(state,
{
clients = [] ::[pos_integer()],
dbname ::char()
}).
empty_state() ->
#state{}.
some_state() ->
#state{
clients = [1,2,3],
dbname = "QA"}.
del_client(Client) ->
S = some_state(),
C = S#state.clients,
S#state{clients = lists:delete(Client, C)}.
add_client(Client) ->
S = some_state(),
C = S#state.clients,
S#state{clients = [Client|C]}.
get_clients(#state{clients = C, dbname = _D}) ->
C.
Test:
1> reclist:empty_state().
{state,[],undefined}
2> reclist:some_state().
{state,[1,2,3],"QA"}
3> reclist:add_client(4).
{state,[4,1,2,3],"QA"}
4> reclist:del_client(2).
{state,[1,3],"QA"}
::[pos_integer()] means that the type of the field is a list of positive integer values, starting from 1; it's the hint for the analysis tool dialyzer, when it performs type checking.
Erlang also allows you use pattern matching on records:
5> reclist:get_clients(reclist:some_state()).
[1,2,3]
Further reading:
Records
Types and Function Specifications
dialyzer(1)
#JUST MY correct OPINION's answer made me remember that I love how Haskell goes about getting the values of the fields in the data type.
Here's a definition of a data type, stolen from Learn You a Haskell for Great Good!, which leverages record syntax:
data Car = Car {company :: String
,model :: String
,year :: Int
} deriving (Show)
It creates functions company, model and year, that lookup fields in the data type. We first make a new car:
ghci> Car "Toyota" "Supra" 2005
Car {company = "Toyota", model = "Supra", year = 2005}
Or, using record syntax (the order of fields doesn't matter):
ghci> Car {model = "Supra", year = 2005, company = "Toyota"}
Car {company = "Toyota", model = "Supra", year = 2005}
ghci> let supra = Car {model = "Supra", year = 2005, company = "Toyota"}
ghci> year supra
2005
We can even use pattern matching:
ghci> let (Car {company = c, model = m, year = y}) = supra
ghci> "This " ++ c ++ " " ++ m ++ " was made in " ++ show y
"This Toyota Supra was made in 2005"
I remember there were attempts to implement something similar to Haskell's record syntax in Erlang, but not sure if they were successful.
Some posts, concerning these attempts:
In Response to "What Sucks About Erlang"
Geeking out with Lisp Flavoured Erlang. However I would ignore parameterized modules here.
It seems that LFE uses macros, which are similar to what provides Scheme (Racket, for instance), when you want to create a new value of some structure:
> (define-struct car (company model year))
> (define supra (make-car "Toyota" "Supra" 2005))
> (car-model supra)
"Supra"
I hope we'll have something close to Haskell record syntax in the future, that would be really practically useful and handy.
Yasir's answer is the correct one, but I'm going to show you WHY it works the way it works so you can understand records a bit better.
Records in Erlang are a hack (and a pretty ugly one). Using the record definition from Yasir's answer...
-record(state,
{
clients = [] ::[pos_integer()],
dbname ::char()
}).
...when you instantiate this with #state{} (as Yasir did in empty_state/0 function), what you really get back is this:
{state, [], undefined}
That is to say your "record" is just a tuple tagged with the name of the record (state in this case) followed by the record's contents. Inside BEAM itself there is no record. It's just another tuple with Erlang data types contained within it. This is the key to understanding how things work (and the limitations of records to boot).
Now when Yasir did this...
add_client(Client) ->
S = some_state(),
C = S#state.clients,
S#state{clients = [Client|C]}.
...the S#state.clients bit translates into code internally that looks like element(2,S). You're using, in other words, standard tuple manipulation functions. S#state.clients is just a symbolic way of saying the same thing, but in a way that lets you know what element 2 actually is. It's syntactic saccharine that's an improvement over keeping track of individual fields in your tuples in an error-prone way.
Now for that last S#state{clients = [Client|C]} bit, I'm not absolutely positive as to what code is generated behind the scenes, but it is likely just straightforward stuff that does the equivalent of {state, [Client|C], element(3,S)}. It:
tags a new tuple with the name of the record (provided as #state),
copies the elements from S (dictated by the S# portion),
except for the clients piece overridden by {clients = [Client|C]}.
All of this magic is done via a preprocessing hack behind the scenes.
Understanding how records work behind the scenes is beneficial both for understanding code written using records as well as for understanding how to use them yourself (not to mention understanding why things that seem to "make sense" don't work with records -- because they don't actually exist down in the abstract machine...yet).
If you are only adding or removing single items from the clients list in the state you could cut down on typing with a macro.
-record(state, {clients = [] }).
-define(AddClientToState(Client,State),
State#state{clients = lists:append([Client], State#state.clients) } ).
-define(RemoveClientFromState(Client,State),
State#state{clients = lists:delete(Client, State#state.clients) } ).
Here is a test escript that demonstrates:
#!/usr/bin/env escript
-record(state, {clients = [] }).
-define(AddClientToState(Client,State),
State#state{clients = lists:append([Client], State#state.clients)} ).
-define(RemoveClientFromState(Client,State),
State#state{clients = lists:delete(Client, State#state.clients)} ).
main(_) ->
%Start with a state with a empty list of clients.
State0 = #state{},
io:format("Empty State: ~p~n",[State0]),
%Add foo to the list
State1 = ?AddClientToState(foo,State0),
io:format("State after adding foo: ~p~n",[State1]),
%Add bar to the list.
State2 = ?AddClientToState(bar,State1),
io:format("State after adding bar: ~p~n",[State2]),
%Add baz to the list.
State3 = ?AddClientToState(baz,State2),
io:format("State after adding baz: ~p~n",[State3]),
%Remove bar from the list.
State4 = ?RemoveClientFromState(bar,State3),
io:format("State after removing bar: ~p~n",[State4]).
Result:
Empty State: {state,[]}
State after adding foo: {state,[foo]}
State after adding bar: {state,[bar,foo]}
State after adding baz: {state,[baz,bar,foo]}
State after removing bar: {state,[baz,foo]}