How to handle probability notation in python? - python-2.7

I'm newbie for python 2.7. I would like to create some function that knows which are variables in the given probability notation.
For example: Given a probability P(A,B,C|D,E,F) as string input. The function should return a list of events ['A','B','C'] and a list of sample spaces ['D','E','F']. If it is impossible to return two lists in the same time. Returning a list of two lists would be fine.
In summary:
Input:
somefunction('P(A,B,C|D,E,F)')
Expected output: [['A','B','C'],['D','E','F']]
Thank you in advance

A simple brute-force implementation. As #fjarri pointed out if you want to do anything more complex you might need a parser (like PyParser) or at least some regular expressions.
def somefunction(str):
str = str.strip()[str.index("(")+1:-1]
left, right = str.split("|")
return [left.split(","), right.split(",")]

Related

Why the output of model.wv.similarity() in Word2Vec results different with model.wv.similar()?

I have trained a Word2Vec model and I am trying to use it.
When I input the most similar words of ‘动力', I got the output like this:
动力系统 0.6429724097251892
驱动力 0.5936785936355591
动能 0.5788494348526001
动力车 0.5579575300216675
引擎 0.5339343547821045
推动力 0.5152761936187744
扭力 0.501279354095459
新动力 0.5010953545570374
支撑力 0.48610919713974
精神力量 0.47970670461654663
But the problem is that if I input model.wv.similarity('动力','动力系统') I got the result 0.0, which is not equal with
0.6429724097251892
what confused me more was that when I got the next similarity of word '动力' and word '驱动力', it showed
3.689349e+19
So why ? Did I make misunderstanding with the similarity? I need someone to tell me!!
And the code is:
res = model.wv.most_similar('动力')
for r in res:
print(r[0],r[1])
print(model.wv.similarity('动力','动力系统'))
print(model.wv.similarity('动力','驱动力'))
print(model.wv.similarity('动力','动能'))
output:
动力系统 0.6429724097251892
驱动力 0.5936785936355591
动能 0.5788494348526001
动力车 0.5579575300216675
引擎 0.5339343547821045
推动力 0.5152761936187744
扭力 0.501279354095459
新动力 0.5010953545570374
支撑力 0.48610919713974
精神力量 0.47970670461654663
0.0
3.689349e+19
2.0
I have written a function to replace the model.wv.similarity method.
def Similarity(w1,w2,model):
A = model[w1]; B = model[w2]
return sum(A*B)/(pow(sum(pow(A,2)),0.5)*pow(sum(pow(B,2)),0.5)
Where w1 and w2 are the words you input, model is the Word2Vec model you have trained.
Using the similarity method directly from the model is deprecated. It has a bit extra logic in it that performs vector normalization before evaluating the result.
You should be using vw directly, because as stated in their documentation, for the word vectors it is of non importance how they were trained so they should be looked as independent structure, the model is just the means to obtain it.
Here is short discussion which should give you starting points if you want to investigate further.
It may be an encoding issue, where you are not actually comparing the same tokens.
Try the following, to see if it gives results closer to what you expect.
res = model.wv.most_similar('动力')
for r in res:
print(r[0],r[1])
print(model.wv.similarity('动力', res[0][0]))
print(model.wv.similarity('动力', res[1][0]))
print(model.wv.similarity('动力', res[2][0]))
If it does, you could look further into why the model might be reporting strings which print as 动力系统 (etc), but don't match your typed-in-code string literals like '动力系统' (etc). For example:
print(res[0][0]=='动力系统')
print(type(res[0][0]))
print(type('动力系统'))

Printing Values from a list without spaces in python 2.7

Suppose I have a list like
list1 = ['A','B','1','2']
When i print it out I want the output as
AB12
And not
A B 1 2
So far I have tried
(a)print list1,
(b)for i in list1:
print i,
(c)for i in list1:
print "%s", %i
But none seem to work.
Can anyone suggest an alternate method
Thank you.
From your comments on #jftuga answer, I guess that the input you provided is not the one you're testing with. You have mixed contents in your list.
My answer will fix it for you:
lst = ['A','B',1,2]
print("".join([str(x) for x in lst]))
or
print("".join(map(str,lst)))
I'm not just joining the items since not all of them are strings, but I'm converting them to strings first, all in a nice generator comprehension which causes no memory overhead.
Works for lists with only strings in them too of course (there's no overhead to convert to str if already a str, even if I believed otherwise on my first version of that answer: Should I avoid converting to a string if a value is already a string?)
Try this:
a = "".join(list1)
print(a)
This will give you: AB12
Also, since list is a built-in Python class, do not use it as a variable name.

Applying regexp and finding the highest number in a list

I have got a list of different names. I have a script that prints out the names from the list.
req=urllib2.Request('http://some.api.com/')
req.add_header('AUTHORIZATION', 'Token token=hash')
response = urllib2.urlopen(req).read()
json_content = json.loads(response)
for name in json_content:
print name['name']
Output:
Thomas001
Thomas002
Alice001
Ben001
Thomas120
I need to find the max number that comes with the name Thomas. Is there a simple way to to apply regexp for all the elements that contain "Thomas" and then apply max(list) to them? The only way that I have came up with is to go through each element in the list, match regexp for Thomas, then strip the letters and put the remaining numbers to a new list, but this seems pretty bulky.
You don't need regular expressions, and you don't need sorting. As you said, max() is fine. To be safe in case the list contains names like "Thomasson123", you can use:
names = ((x['name'][:6], x['name'][6:]) for x in json_content)
max(int(b) for a, b in names if a == 'Thomas' and b.isdigit())
The first assignment creates a generator expression, so there will be only one pass over the sequence to find the maximum.
You don't need to go for regex. Just store the results in a list and then apply sorted function on that.
>>> l = ['Thomas001',
'homas002',
'Alice001',
'Ben001',
'Thomas120']
>>> [i for i in sorted(l) if i.startswith('Thomas')][-1]
'Thomas120'

Removing the floating part from Ids list

I have list in format
"4186.0,7573.0,4300.0,9479.0,9488.0,10642.0,7987.0,9480.0 "
Is there any function in coldfusion available, which removes all ".0" from numbers in one go?
Thank you.
You could use map() from the UnderscoreCF library to gracefully solve this problem (in CF 10 or Railo 4).
_ = new Underscore();
listOfNums = "4186.0,7573.0,4300.0,9479.0,9488.0,10642.0,7987.0,9480.0 ";
arrayOfNums = _.map(listOfNums, function(num){
return round(num);
});
result = arrayToList(arrayOfNums);
map() produces a new array of values by mapping each value in the collection through a transformation function. This allows you to have more control over the results.
Note: I wrote UnderscoreCF.
There isn't a simple function to do this, but there are a number of things you can do.
You can loop through the list and numberFormat() each item, placing it back in the list or creating a new list. This is inefficient, both in processing and in programming.
Because your list is just a string, you can replace the decimal part of your numbers with a simple string replace: replace("123.0,456.0", ".0", "", "ALL"). If your list ever grows different decimal digits other than ".0", you can upgrade that replace function to a regular expression to catch patterns of numbers.
I usually use INT to drop the decimal of a number like barnyr suggested but if you wanted to treat it as a single string and not a list you could use reReplace (to elaborate on Nathan Strutz's idea) and do something like:
<cfset listOfNums = "4186.0,7573.540,4300.434,9479.,9488.0,10642.0,7987.0,9480.0">
<cfset listOfNums = reReplace(listOfNums, "\.[0-9]*", "", "all")>
Result is: 4186,7573,4300,9479,9488,10642,7987,9480
it also removes the decimal point even if no numbers follow.
It sounds like the Int() (equivalent of the floor() function in most other languages) function may be what you want: http://help.adobe.com/en_US/ColdFusion/9.0/CFMLRef/WSc3ff6d0ea77859461172e0811cbec22c24-7f89.html
You'll still need to iterate over the list, applying the Int() function though.

String formatting issue when using a function

I have what I believe to be an embarrassingly simple problem, but three hours of googling and checking stackoverflow have not helped.
Let's say I have a very simple piece of code:
def secret_formula(started):
jelly_beans = started*500
jars = jelly_beans/1000
crates = jars/100
return jelly_beans,jars,crates
start_point = 10000
print("We'd have {} beans, {} jars, and {} crates.".format(secret_formula(start_point)))
What happens is I get the "IndexError: tuple index out of range". So I just print the secret_formula function to see what it looks like, and it looks like this:
(5000000, 5000.0, 50.0)
Basically, it is treating the output as one 'thing' (I am still very new, sorry if my language is not correct). My question is, why does it treat it like this and how do I make it pass the three outputs (jelly_beans, jars, and crates) so that it formats the string properly?
Thanks!
The format function of the string take a variable number of argument. The secret_formula function is returning a tuple. You want to convert that to a list of arguments. This is done using the following syntax:
print("We'd have {} beans, {} jars, and {} crates.".format(*secret_formula(start_point)))
The important par is the * character. It tell that you want to convert the following iterable into a list of argument to pass to the function.