Multiple dependencies when trying to achieve the third normal form - foreign-keys

Imagine I got the following DB:
a //primary key
b
c
d
At such the following functional dependencies are valid:
a -> bcd
b -> cd
c -> bd
Wht should I do to pass it to the third normal form?
I tried to separate as follows:
a -> b //this b is the foreing key to the b of the other tables
b -> c
b -> d
Is it correct?

You are thinking about it the wrong way. You do not play around with the dependencies (unless this is some toy HW problem that specifically tells you to); you want to split the table up so that all tables are in 3NF. In your case, this would be (I think!!):
ab
bc
cd
Where the italicized letter represents a key. Now, an example of why you do not play with dependencies:
Say this database was of people and held their SS, BDate, and Name. You could then say that SS -> BDate, Name, since it is pretty much true that your SS number is unique to you. Now, when you play around with dependencies, you play around with what the data means. It's not really up to you to say that SS number can determine your name; it simply is. Saying SS -> BDate and eliminating the Name attribute is simply false.
Similarly, with your database, although ABCD don't mean anything, their dependencies are fixed and not to be changed. So, that was my super long way of saying: split the tables, don't touch the dependencies!! =)

Related

Getting circular dependency warning upon formulating username creation

So I am trying to create a sheet to help our HR Department create the emails for new hires. One of the issues is we use a format of First Initial Last Name as our naming scheme, but if you don't check it can double up with common last names. HR usually does not check for previous emails that currently exist.
Basic recreation I am trying to do is this:
Username: IFS(F2<>"", F2, IF(COUNTIF(A:A, D) > 1, E2, D2)
First Choice: LEFT(B2, 1) & B3
Second Choice: B2 & B3
What I want for A2:
So basically if Override is set, i want it to use that. If no override is set, i want to check and see if First Choice is already found in column A, if it is already used then use Second Choice. I keep getting a circular dependency. I even tried having the calculation done in Column G, which works. But once I try and set A2 to G2, it gives the circular dependency error again.
you can outsmart it...
paste in A2 cell:
=ARRAYFORMULA(IF(F2:F<>"", F2:F,
IF(COUNTIFS(IF(F2:F<>"", F2:F, D2:D),
IF(F2:F<>"", F2:F, D2:D), ROW(A2:A), "<="&ROW(A2:A))=1,
IF(F2:F<>"", F2:F, D2:D), E2:E)))
paste in D2 cell:
=ARRAYFORMULA(LOWER(LEFT(B2:B, 1)&C2:C))
paste in E2 cell:
=ARRAYFORMULA(LOWER(B2:B&C2:C))
If you are getting a circular dependency you may just need to change the calculation settings.
Go to File > Spreadsheet Settings > Calculation and switch Iterative Calculation on
Let me know if this doesn't work!

Modify current list by adding an element - Haskell 101

I'd like to add an element on the "movies" list of a data type Director variable called billy.
type Name = String
type Movie = String
data Director = Director {name:: Name, movies::[Movie]}
deriving (Show)
let billy = Director "Billy J." ["Good movie 1"]
--addMovieToDirector :: Movie -> Director -> Director
addMovieToDirector m (Director n ms) = Director n (m:ms)
The problem is previous function doesn't update billy's list of movies, it creates a new Director with the desired list (the changes are not stored on billy). How can I operate on billy's list without creating another Director? I understand, that Haskell works with constants, but then should I create a different 'billy' "variable" every time I modify the list?
Thanks!
What you would like to do can be described as "in-place modification", or "using mutable data".
There are ways for Haskell to do this. Since in-place modification of anything almost always considered as a "side-effect", such things can only be done in the IO monad, or with dirty tricks like unsafePerformIO.
These are somewhat advanced topics, and at a beginner level it is arguably beneficial to think about Haskell values as being totally immutable.
So yes, you can't modify variables. Actually there are no "variables" at all.
Think about billy as a name for a value, not a variable.
All a function can do in Haskell is to take arguments, and calculate some result without any side effects.
This is probably the biggest mental barrier for people coming from imperative languages: "how should I work with data if I can't modify it"?
The answer is: you should structure your program like giant assembly line: raw materials (raw data, initial parameters, etc.) are put on the line at the beginning (the first function you call), and each workstation (function) does something useful (returns a value), consuming the result of the previous workstation. At the end, something valuable might fall off the line.
What I described is simple function composition: if you need to do c task after b, after a, on a value x, then you can write it as (c . b . a) x, or c (b (a x)) or rather c $ b $ a x.
This way, you can write programs without ever changing anything explicitly, and only describing how to create new things out of old ones.
This sounds awfully inefficient, and indeed, there are some performance implications of functional programming (let alone laziness). However the compiler is smart enough to figure out a whole lot thing about programs written in Haskell, and optimize it in certain ways.
I hope it'll all make sense soon. :)
Oh, and welcome to Haskell. ;)
You can use a State monad if you want to have a mutable state in your program, for some reason. Here's an example:
module Main where
import Control.Monad.State
type GameValue = Int
type GameState = (Bool, Int)
type Name = String
type Movie = String
data Director = Director {name:: Name, movies::[Movie]}
deriving (Show)
addMovieToDirector :: Movie -> Director -> Director
addMovieToDirector m (Director n ms) = Director n (m:ms)
handleDirector :: Name -> State Director Director
handleDirector m = do
director <- get
put (addMovieToDirector m director)
returnDirector
returnDirector = do
director <- get
return director
startState = Director "Billy J." ["Good movie 1"]
main = print $ evalState (handleDirector "Good movie 2") startState
The printed result will be
Director {name = "Billy J.", movies = ["Good movie 2","Good movie 1"]}
Here handleDirector function of type Name -> State Director Director has a mutable state inside it of type Director and a "result" value of type, again, Director. get means get the state, put is used to change it and evalstate is used to "calculate" the result, enveloped by the constructed State monad.

Infomatica Row Count

I am new to Informatica and I am confused.
I have data from a flat file and need to do some transformation for it. I just need a general idea on how to actually do it.
Say I have data that looks like this:
COL1, COl2, COl3, COL4
A B C D
A B B B
G G G G
B D D X
F F F F
B B A D
1) I need to transfer only rows that have the first column as A or B
2) I need the count of the rows that are A, and I need a separate count that is B
3) I need a comparison of the count of A and the count of B. If the count do not match then I need an email sent.
Can someone give me a link to something helpful or tell me exactly that types of transformation / logic I should be using? Thanks
There are multiple ways. Here's a simple one, step-by-step.
Use a filter on Source Qualifier to get just the data you need.
Separate into two pipelines using Router Transformation with two groups defined as COL1='A' and COL1='B'
Use Aggregate Transformation to get the counts (for each pipeline).
Use Expression Transformaiton to add a dummy port e.g. joinPort = 1 (for each pipeline).
Join the piplelines on dummy port using Joiner Transformation
Use Expression Transformation to compare the results.
Sending email a separate story.
Use a Workflow variable e.g. wfSendEmail initialized to 0 and a mapping variable e.g. mSendEmail
On session Components tab do the Pre-session variable assignment and assign wfSendEmail to mSendEmail.
In the exression transformation mentioned in p.6 above use the SETVARIABLE function if the counts do not match to set the mSendEmail to 1.
On session Components tab do the Post-session variable assignment and assign mSendEmail value to wfSendEmail.
Add and Email task with a condition wfSendEmail=1 on the link from the session.

OCaml dictionary update

I am new to OCaml and am trying to learn how to update dictionary and deal with if/else conditionals.
I wrote the following code to check whether the dictionary has some key. If not, add a default value for that key. Finally print it out.
module MyUsers = Map.Make(String)
let myGraph = MyUsers.empty;;
let myGraph = MyUsers.add "test" "preset" myGraph in
try
let mapped = MyUsers.find "test" myGraph
with
Not_found -> let myGraph = MyUsers.add "test" "default" myGraph in
Printf.printf "value for the key is now %s\n" (MyUsers.find "test" myGraph)
The error message I have now is syntax error for line 6: with
What is wrong here? Also, when to use in ; or;; ?
I have done some google searches and understand that in seems to define some scope before the next ;;. But it is still very vague to me. Could you please explain it more clearly?
Your immediate problem is that, except at the top level, let must be followed by in. The expression looks like let variable = expression1 in expression2. The idea is that the given variable is bound to the value of expression1 in the body of expression2. You have a let with no in.
It's hard to answer your more general question. It's easier to work with some specific code and a specific question.
However, a semicolon ; is used to separate two values that you want to be evaluated in sequence. The first should have type unit (meaning that it doesn't have a useful value).
In my opinion, the double semicolon ;; is used only in the top-level to tell the interpreter that you're done typing and that it should evaluate what you've given it so far. Some people use ;; in actual OCaml code, but I do not.
Your code indicates strongly that you're thinking imperatively about OCaml maps. OCaml maps are immutable; that is, you can't change the value of a map. You can only produce a new map with different contents than the old one.

Is it possible to detect and handle string collisions among grouped values when grouping in Hadoop Pig?

Assuming I have lines of data like the following that show user names and their favorite fruits:
Alice\tApple
Bob\tApple
Charlie\tGuava
Alice\tOrange
I'd like to create a pig query that shows the favorite fruit of each user. If a user appears multiple times, then I'd like to show "Multiple". For example, the result with the data above should be:
Alice\tMultiple
Bob\tApple
Charlie\tGuava
In SQL, this could be done something like this (although it wouldn't necessarily perform very well):
select user, case when count(fruit) > 1 then 'Multiple' else max(fruit) end
from FruitPreferences
group by user
But I can't figure out the equivalent PigLatin. Any ideas?
Write a "Aggregate Function" Pig UDF (scroll down to "Aggregate Functions"). This is a user-defined function that takes a bag and outputs a scalar. So basically, your UDF would take in the bag, determine if there is more than one item in it, and transform it accordingly with an if statement.
I can think of a way of doing this without a UDF, but it is definitely awkward. After your GROUP, use SPLIT to split your data set into two: one in which the count is 1 and one in which the count is more than one:
SPLIT grouped INTO one IF COUNT(fruit) == 0, more IF COUNT(fruit) > 0;
Then, separately use FOREACH ... GENERATE on each to transform it:
one = FOREACH one GENERATE name, MAX(fruit); -- hack using MAX to get the item
more = FOREACH more GENERATE name, 'Multiple';
Finally, union them back:
out = UNION one, more;
I haven't really found a better way of handing the same data set in two different ways based on some conditional, like you want. I typically do some sort of split/recombine like I did here. I believe Pig will be smart and make a plan that doesn't use more than 1 M/R job.
Disclaimer: I can't actually test this code at the moment, so it may have some mistakes.
Update:
In looking harder, I was reminded of the bicond operator and I think that will work here.
b = FOREACH a GENERATE name, (COUNT(fruit)==1 ? MAX(FRUIT) : 'Multiple');