How to find the mean and the variance of the normal distribution obtained using the advi method in PyMC? - pymc3

I am using the advi method to find the posterior distribution. How can I find the mean and the std of the normal posterior distribution that we get using the advi and not that of the samples obtained using advi in PyMC?

You can call approx.mean.eval(), approx.std.eval(), and then reference approx.ordering to map index-slices to parameters of Your model.
Using pymc==4.1.6,
and having defined a pm.Model called model,
I can do the following:
with model:
approx = pm.fit(method="advi")
approx_mu = approx.mean.eval()
approx_mu_dict = {
param: approx_mu[slice_]
for (param, (_, slice_, _, _))
in approx.ordering.items()}
approx_std = approx.std.eval()
approx_std_dict = {
param: approx_std[slice_]
for (param, (_, slice_, _, _))
in approx.ordering.items()}

Related

Power Query Table.TransformRows won't return table

In the M language documentation for Table.TransformRows, it lists the type signature as Table.TransformRows(table as table, transform as function) as list
followed by the slightly cryptic comment
If the return type of the transform function is specified, then the result will be a table with that row type.
However, whenever I define my function, Power Query always returns me a list instead of a table.
For instance, this query:
func = (row) as record => [B = Number.ToText(row[a])] as record,
#"firstTable"= Table.FromRecords({
[a = 1],
[a = 2],
[a = 3],
[a = 4],
[a = 5]}),
#"myTable" = Table.TransformRows(#"firstTable",func)
in
#"myTable"
returns a table.
I had wondered if I needed to create a record type which specifies the type of each record entry, but if I try
myType = type [B = text],
func = (row) as record => [B = Number.ToText(row[a])] as myType
then it tells me that my type identifier is invalid on the second line. Finally, I tried to set the function return type using
myType = type [B = text],
func1 = (row) as record => [B = Number.ToText(row[a])] as myType,
funcType = Type.ForFunction([ReturnType = myType, Parameters = [X = type number]], 1),
func = Value.ReplaceType(func1,funcType)
but Table.TransformRows still returns a list.
I am aware that I am able to use Table.FromRecords or Table.FromRows to turn the result of Table.TransformRows into a table, but I am currently experiencing some performance issues with these functions and was trying to cut them out to see if this would fix those issues
typing inline
returns a table. I had wondered if I needed to create a record type which specifies the type of each record entry
// you had this
record_list = { [ a = 1 ], [ a = 2 ], [ a = 3 ], [ a = 4 ], [ a = 5 ] },
firstTable = Table.FromRecords(
record_list
)
Which ends up as a column of type any. To fix that:
// declare the type
firstTable = Table.FromRecords(
record_list,
type table[a = text]
)
Whenever a function takes the parameter columns as any you can use any of these formats
"Name"
`{ "Name1", "Name2" }
type table[ Name1 = text, Name2 = number ]
Invalid type Identifier
then it tells me that my type identifier is invalid on the second line
The as operator only works on primitive data types.
If it were to work, this would mean assert the record is a myType, then the function return asserts the type is record, which is mixed types. The function already has a final assert, so you don't need one on your inner calls.
func = (row) as record => [B = Number.ToText(row[a])] as something
Replacing Function Types does not change behavior
Finally, I tried to set the function return type using ... Value.ReplaceType
Using Value.ReplaceType on a function does not modify the actual types or behavior. That's just changing the metadata.
Ben Gribaudo: /part18-Custom-Type-Systems As type ascription can only change information about a function, not the function’s behavior at the language level, ascribing a type onto a function that specifies different argument or return assertions has no effect on the behavior of the function. The mashup engine always uses the type assertions specified when the function was originally defined, even if another type is later ascribed onto it.
Check out Ben's Power Query series. It's high quality. It goes into far more detail than other places.
Performance
I am aware that I am able to use Table.FromRecords or Table.FromRows to turn the result of Table.TransformRows into a table, but I am currently experiencing some performance issues
Missing column data-types can cause issues. But you really should try the query diagnostics, it will tell you what steps are taking the most time.
Based on your example, you might want transform column instead of rows.
Also make sure your query is folding!
Example using Table.TransformRows with typing
let
// sample data, list of records like: [ Number = 0 ]
numberRecords = List.Transform( {0..10}, each [Number = _] ),
Source = Table.FromRecords(
numberRecords,
type table[Number = number]
),
transform_SquareNumbers = (source as table) =>
let
rows = Table.TransformRows(
source, (row) as record =>
[
Number = Number.Power(
row[Number], 2
)
]
),
result = Table.FromRecords(rows),
source_type = Value.Type(source)
in
Value.ReplaceType(result, source_type),
FinalTable = transform_SquareNumbers(Source)
in
FinalTable

In Spark, does the filter function turn the data into tuples?

Just wondering does the filter turn the data into tuples? For example
val filesLines = sc.textFile("file.txt")
val split_lines = filesLines.map(_.split(";"))
val filteredData = split_lines.filter(x => x(4)=="Blue")
//from here if we wanted to map the data would it be using tuple format ie. x._3 OR x(3)
val blueRecords = filteredData.map(x => x._1, x._2)
OR
val blueRecords = filteredData.map(x => x(0), x(1))
No, all filter does is take a predicate function and uses it such that any of the datapoints in the set that return a false when passed through that predicate, then they are not passed back out to the resultant set. So, the data remians the same:
filesLines //RDD[String] (lines of the file)
split_lines //RDD[Array[String]] (lines delimited by semicolon)
filteredData //RDD[Array[String]] (lines delimited by semicolon where the 5th item is Blue
So, to use filteredData, you will have to access the data as an array using parentheses with the appropriate index
filter will not change the RDD - filtered data would still be RDD(Array[String])

Saving partial spark DStream window to HDFS

I am counting values in each window and find the top values and want to save only the top 10 frequent values of each window to hdfs rather than all the values.
eegStreams(a) = KafkaUtils.createStream(ssc, zkQuorum, group, Map(args(a) -> 1),StorageLevel.MEMORY_AND_DISK_SER).map(_._2)
val counts = eegStreams(a).map(x => (math.round(x.toDouble), 1)).reduceByKeyAndWindow(_ + _, _ - _, Seconds(4), Seconds(4))
val sortedCounts = counts.map(_.swap).transform(rdd => rdd.sortByKey(false)).map(_.swap)
ssc.sparkContext.parallelize(rdd.take(10)).saveAsTextFile("hdfs://ec2-23-21-113-136.compute-1.amazonaws.com:9000/user/hduser/output/" + (a+1))}
//sortedCounts.foreachRDD(rdd =>println("\nTop 10 amplitudes:\n" + rdd.take(10).mkString("\n")))
sortedCounts.map(tuple => "%s,%s".format(tuple._1, tuple._2)).saveAsTextFiles("hdfs://ec2-23-21-113-136.compute-1.amazonaws.com:9000/user/hduser/output/" + (a+1))
I can print top 10 as above (commented).
I have also tried
sortedCounts.foreachRDD{ rdd => ssc.sparkContext.parallelize(rdd.take(10)).saveAsTextFile("hdfs://ec2-23-21-113-136.compute-1.amazonaws.com:9000/user/hduser/output/" + (a+1))}
but I get the following error. My Array is not serializable
15/01/05 17:12:23 ERROR actor.OneForOneStrategy:
org.apache.spark.streaming.StreamingContext
java.io.NotSerializableException:
org.apache.spark.streaming.StreamingContext
Can you try this?
sortedCounts.foreachRDD(rdd => rdd.filterWith(ind => ind)((v, ind) => ind <= 10).saveAsTextFile(...))
Note: I didn't test the snippet...
Your first version should work. Just declare #transient ssc = ... where the Streaming Context is first created.
The second version won't work b/c StreamingContext cannot be serialized in a closure.

Insertion order of a list based on order of another list

I have a sorting problem in Scala that I could certainly solve with brute-force, but I'm hopeful there is a more clever/elegant solution available. Suppose I have a list of strings in no particular order:
val keys = List("john", "jill", "ganesh", "wei", "bruce", "123", "Pantera")
Then at random, I receive the values for these keys at random (full-disclosure, I'm experiencing this problem in an akka actor, so events are not in order):
def receive:Receive = {
case Value(key, otherStuff) => // key is an element in keys ...
And I want to store these results in a List where the Value objects appear in the same order as their key fields in the keys list. For instance, I may have this list after receiving the first two Value messages:
List(Value("ganesh", stuff1), Value("bruce", stuff2))
ganesh appears before bruce merely because he appears earlier in the keys list. Once the third message is received, I should insert it into this list in the correct location per the ordering established by keys. For instance, on receiving wei I should insert him into the middle:
List(Value("ganesh", stuff1), Value("wei", stuff3), Value("bruce", stuff2))
At any point during this process, my list may be incomplete but in the expected order. Since the keys are redundant with my Value data, I throw them away once the list of values is complete.
Show me what you've got!
I assume you want no worse than O(n log n) performance. So:
val order = keys.zipWithIndex.toMap
var part = collection.immutable.TreeSet.empty[Value](
math.Ordering.by(v => order(v.key))
)
Then you just add your items.
scala> part = part + Value("ganesh", 0.1)
part: scala.collection.immutable.TreeSet[Value] =
TreeSet(Value(ganesh,0.1))
scala> part = part + Value("bruce", 0.2)
part: scala.collection.immutable.TreeSet[Value] =
TreeSet(Value(ganesh,0.1), Value(bruce,0.2))
scala> part = part + Value("wei", 0.3)
part: scala.collection.immutable.TreeSet[Value] =
TreeSet(Value(ganesh,0.1), Value(wei,0.3), Value(bruce,0.2))
When you're done, you can .toList it. While you're building it, you probably don't want to, since updating a list in random order so that it is in a desired sorted order is an obligatory O(n^2) cost.
Edit: with your example of seven items, my solution takes about 1/3 the time of Jean-Philippe's. For 25 items, it's 1/10th the time. 1/30th for 200 (which is the difference between 6 ms and 0.2 ms on my machine).
If you can use a ListMap instead of a list of tuples to store values while they're gathered, this could work. ListMap preserves insertion order.
class MyActor(keys: List[String]) extends Actor {
def initial(values: ListMap[String, Option[Value]]): Receive = {
case v # Value(key, otherStuff) =>
if(values.forall(_._2.isDefined))
context.become(valuesReceived(values.updated(key, Some(v)).collect { case (_, Some(v)) => v))
else
context.become(initial(keys, values.updated(key, Some(v))))
}
def valuesReceived(values: Seq[Value]): Receive = { } // whatever you need
def receive = initial(keys.map { k => (k -> None) })
}
(warning: not compiled)

Calling external services in scala code with dependencies

I am facing a major issue with my design at this juncture. My method is trying to accomplish the follows:
Insert the passed in object into the database.
Get the autoincremented id from the insert and use it to call webservice1 along with the object.
Get the result from webservice1 and call webservice2 with the original object and some response from webservice1.
Combine the results from webservice1 and 2 and write it into the database.
Get the resulting autoincremented id from the last insert and call webservice3 with the original object that would eventually result into the success or failure of the operation.
I want to design this in a flexible manner since the requirements are in a flux and I do not want to keep on modifying my logic based on any changing. I do realize some amount of change is inevitable but I would like to minimize the damage and respect the open-closed principle.
My initial take was as follows:
def complexOperation(someObject:T) =
dbService.insertIntoDb(someObject) match {
case Left(e:Exception) => Left(e)
case Right(id:Int) => webService.callWebService1(id,someObject) match {
case Left(e:Exception) => Left(e)
case Right(r:SomeResponse1) => webService.callWebservice2(r,someObject) match {
case Left(e:Exception) => webService.rollbackService1();Left(e)
case Right(context:ResponseContext) => dbService.insertContextIntoDb(context) match {
case Left(e:Exception) => Left(e)
case Right(id:Int) => webService.callWebservice3(id,someObject) match {
case Left(e:Exception) => webService.rollbackService3();Left(e)
case Right(r:Response) => Right(r)
}
}
}
}
As you can see, this is a tangled mess. I can neither unit test it, nor extend it nor very easily debug it if things spiral out of control. This code serves its purpose but it will be great to get some ideas on how I should refactor it to make the lives of the people who inherit my code a little more easier.
Thanks
Have a look at scala.util.Try. It's available in Scala 2.10, which may or may not be available to you as an option, but the idea of it is perfect for your scenario.
What you have in your code example is what I like calling the "pyramid" of nesting. The best solution to this is to use flat-mapping wherever you can. But obviously that's an issue when you have stuff like Either[Exception, Result] at every step. That's where Try comes in. Try[T] is essentially a replacement for Either[Exception, T], and it comes with all of the flatMap-ing goodness that you need.
Assuming you can either change the return type of those webService calls, or provide some implicit conversion from Either[Exception, Result] to Try[Result], your code block would become something more like...
for {
id <- dbService.insertIntoDb(someObject)
r <- webService.callWebService1(id,someObject)
context <- webService.callWebservice2(r,someObject)
id2 <- dbService.insertContextIntoDb(context)
response <- webService.callWebservice3(id,someObject).recoverWith {
case e: Exception => webService.rollbackService3(); Failure(e)
}
} yield response
Lift has a similar mechanism in net.liftweb.common.Box. It's like Option, but with a container for Exceptions too.
edit: It looks like you can use the left or right method of an Either, and it will let you use flatMap-ing almost exactly the way I described with Try. The only difference is that the end result is an Either[Exception, Result] instead of a Try[Result]. Check out LeftProjection for details/examples.
You can use for comprehension to reduce the noise in the code.
#Dylan had the right idea above. Let me see if I can help translate what you want to do into idiomatic Scala 2.9.1 code.
This version doesn't attempt any rollbacks:
// 1: No rollbacks, just returns the first exception in Left
def complexOperation1(someObject:T): Either[Exception, Response] = {
for {
id <- dbService.insertIntoDb(someObject).right
r <- webService.callWebService1(id, someObject).right
context <- webService.callWebservice2(idResp, someObject).right
id2 <- dbService.insertContextIntoDb(context).right
response <- webService.callWebservice3(id,someObject).right
} yield response
}
Now, let's try to do the rollbacks exactly as you had them above:
// 2: Rolls back all web services and returns first exception in Left
def complexOperation1(someObject:T): Either[Exception, Response] = {
for {
id <- dbService.insertIntoDb(someObject).right
r <- webService.callWebService1(id, someObject).right
context <- webService.callWebservice2(idResp, someObject).left.map { e =>
webService.rollbackService1()
e
}.right
id2 <- dbService.insertContextIntoDb(context).right
response <- webService.callWebservice3(id,someObject).left.map { e =>
webService.rollbackService3()
e
}.right
} yield response
}
If you define a function which does the effect (the rollback) on the left, it get's a little cleaner and easier to test, for example:
// 3: Factor out the side-effect of doing the follbacks on Left
def rollbackIfLeft[T](f: => Either[Exception, T], r: => Unit): Either[Exception, T] = {
val result = f
result.left.foreach(_ => r) // do the rollback if any exception occured
result
}
def complexOperation1(someObject:T): Either[Exception, Response] = {
for {
id <- dbService.insertIntoDb(someObject).right
r <- webService.callWebService1(id, someObject).right
context <- rollbackIfLeft(webService.callWebservice2(idResp, someObject),
webService.rollbackService1()).right
id2 <- dbService.insertContextIntoDb(context).right
response <- rollbackIfLeft(webService.callWebservice3(id,someObject),
webService.rollbackService3()).right
} yield response
}
You can try out rollbackIfLeft in the scala REPL to get a sense of it:
scala> rollbackIfLeft(Right(42), println("hey"))
res28: Either[Exception,Int] = Right(42)
scala> rollbackIfLeft(Left(new RuntimeException), println("ERROR!"))
ERROR!
res29: Either[Exception,Nothing] = Left(java.lang.RuntimeException)
Hope this helps!