Conditional quit multiprocess in python - python-2.7

I'm trying to build a python script that runs several processes in parallel. Basically, the processes are independent, work on different folders and leave their output as text files in those folders. But in some special cases, a process might terminate with a special (boolean) status. If so, I want all the other processes to terminate right away. What is the best way to do this?
I've fiddled with multiprocessing.condition() and multiprocessing.manager, after reading the excellent tutorial by Doug Hellmann:
http://pymotw.com/2/multiprocessing/communication.html
However, I do not seem to understand how to get a multiprocessing process to monitor a status indicator and quit if it takes a special value.
To examplify this, I've written the small script below. It somewhat does what I want, but ends in an exception. Suggestions on more elegant ways to proceed are gratefully welcomed:
br,
Gro
import multiprocessing
def input(i):
"""arbitrary chosen value(8) gives special status = 1"""
if i == 8:
value = 1
else:
value = 0
return value
def sum_list(list):
"""accumulative sum of list"""
sum = 0
for x in range(len(list)):
sum = sum + list[x]
return sum
def worker(list,i):
value = input(i)
list[i] = value
print 'processname',i
if __name__ == '__main__':
mgr = multiprocessing.Manager()
list = mgr.list([0]*20)
jobs = [ multiprocessing.Process(target=worker, args=(list,i))
for i in range(20)
]
for j in jobs:
j.start()
sumlist = sum_list(list)
print sumlist
if sumlist == 1:
break
for j in jobs:
j.join()

Related

Increase recursion limit and stack size in python 2.7

I'm working with large trees and need to increase the recursion limit on Python 2.7.
Using sys.setrecursionlimit(10000) crashes my kernel, so I figured I needed to increase the stack size.
However I don't know how large the stack size should be. I tried 100 MiB like this threading.stack_size(104857600), but the kernel still dies. Giving it 1 GiB throws an error.
I haven't worked with the threading module yet so am I using it wrong when I just put the above statement at the beginning of my script? I'm not doing any kind of parallel processing, everything is done in the same thread.
My computer has 128 GB of physical RAM, running Windows 10, iPython console in Spyder.
The error displayed is simply:
Kernel died, restarting
Nothing more.
EDIT:
Full code to reproduce the problem. The building of the tree works well thought it takes quite long, the kernel only dies during the recursive execution of treeToDict() when reading the whole tree into a dictionary. Maybe there is something wrong with the code of that function. The tree is a non-binary tree:
import pandas as pd
import threading
import sys
import random as rd
import itertools as it
import string
threading.stack_size(104857600)
sys.setrecursionlimit(10000)
class treenode:
# class to build the tree
def __init__(self,children,name='',weight=0,parent=None,depth=0):
self.name = name
self.weight = weight
self.children = children
self.parent = parent
self.depth = depth
self.parentname = parent.name if parent is not None else ''
def add_child(node,name):
# add element to the tree
# if it already exists at the given node increase weight
# else add a new child
for i in range(len(node.children)):
if node.children[i].name == name:
node.children[i].weight += 1
newTree = node.children[i]
break
else:
newTree = treenode([],name=name,weight=1,parent=node,depth=node.depth+1)
node.children.append(newTree)
return newTree
def treeToDict(t,data):
# read the tree into a dictionary
if t.children != []:
for i in range(len(t.children)):
data[str(t.depth)+'_'+t.name] = [t.name, t.children[i].name, t.depth, t.weight, t.parentname]
else:
data[str(t.depth)+'_'+t.name] = [t.name, '', t.depth, t.weight, t.parentname]
for i in range(len(t.children)):
treeToDict(t.children[i],data)
# Create random dataset that leads to very long tree branches:
# A is an index for each set of data B which becomes one branch
rd.seed(23)
testSet = [''.join(l) for l in it.combinations(string.ascii_uppercase[:20],2)]
A = []
B = []
for i in range(10):
for j in range(rd.randint(10,6000)):
A.append(i)
B.append(rd.choice(testSet))
dd = {"A":A,"B":B}
data = pd.DataFrame(dd)
# The maximum length should be above 5500, use another seed if it's not:
print data.groupby('A').count().max()
# Create the tree
root = treenode([],name='0')
for i in range(len(data.values)):
if i == 0:
newTree = add_child(root,data.values[i,1])
oldses = data.values[i,0]
else:
if data.values[i,0] == oldses:
newTree = add_child(newTree,data.values[i,1])
else:
newTree = add_child(root,data.values[i,1])
oldses = data.values[i,0]
result={}
treeToDict(root,result)
PS: I'm aware the treeToDict() function is faulty in that it will overwrite entries because there can be duplicate keys. For this error this bug is unimportant however.
To my experience you have a problem not with stack size, but with an algorithm itself.
It's possible to implement tree traversal procedure without recursion at all. You should implement stack-based depth/breadth first search algorithm.
Python-like pseudo-code might look like this:
stack = []
def traverse_tree(root):
stack.append(root)
while stack:
cur = stack.pop()
cur.do_some_awesome_stuff()
stack.append(cur.get_children())
This approach is incredibly scalable and allows you to deal with any trees.
As further reading you can try this and that.

Python 2.7 - use waitKeys to capture user input (string) of a specified length

Please note I am a novice trying to learn
I've searched for ages, but haven't found an answer to my problem.
Basically, I'm displaying a number of alphabetical characters on a screen. The number of character increases in increments (5, 7, 9).
What I need is to have the loop pause and wait for the user to input the characters they've just seen, but so far, the code I have only seems to allow the user to input ONE character (or keypress), and I can't figure out how to make it keep waiting until a specified number of characters has been entered by the user. My code is below:
letter5.draw()
win.flip()
respClock.reset()
core.wait(info['letterTime'])
win.flip()
#wait for response
respList = waitKeys(maxWait = float('inf'), keyList = letters)
keys = respList [0]
I think a while loop may work here, but I've not managed to come up with a piece of code that works properly.
Thanks for any help!
Figured it out on my own and thought I'd share:
resp = ''
done = False
while len(resp) < 4:
respList = waitKeys(maxWait = float('inf'), keyList = alpha)
key = respList[0]
if len(key) == 1:
resp += key
elif key == 'space':
resp += ''
elif key == 'backspace' and len(resp) > 0:
resp = resp[0:-1]
if key == 'return':
done = True

Mt4 Probability Script

I'm fairly new to python
I have made a simple script that imports price feeds from mt4
My idea / Project is to turn this into some sort of a probability indicator, that is giving the probability, besides the bid and ask,
for example:
TIME/ BID ASK
USD/CADD 22:19 1.30451 60%^ 1.30D39 40%v
and the probability is changing within specific period, i.e for example 1hr period, so every hour it will give a new probability of the direction
It is looking for two patterns: A, B,
Pattern A represents a bullish pattern
Pattern B represents a bearish pattern
basically looking for how strong is the probability A or B reoccurring
out of the two which has a higher chance of reoccurring,
Here is where I am stuck
I have no idea how to put that together...
Here is what I have so far:
import datetime
import numpy as np
import pandas as pd
import sklearn
from pandas.io.data import DataReader
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.lda import LDA
from sklearn.metrics import confusion_matrix
from sklearn.qda import QDA
from sklearn.svm import LinearSVC, SVC
import dde_client as ddec
import time
QUOTE_client = ddec.DDEClient('MT4', 'QUOTE')
symbols = ['USDCAD', 'GOLD','EURUSD', 'SILVER', 'US30Cash', ]
for i in symbols:
QUOTE_client.advise(i)
def Get_quote():
while 1:
time.sleep(1)
print "Symbol\tDATE\t\tTIME\tBID\tASK"
for i in symbols:
current_quote = QUOTE_client.request(i).split(" ")
day, _time, bid, ask = (current_quote[0], current_quote[1],
current_quote[2], current_quote[3])
print i + ":\t" + day + "\t" + _time +"\t" +bid+ "\t" + ask
break
time.sleep(1)
return Get_quote()
continue
def create_lagged_series(cuurent_quote,start_time, end_time, lags=1):
ts = DataReader(cuurent_quote,symbols,
start_time-datetime.timedelta(hours=1),
end_time
)
tslag = pd.DataFrame(index=ts.index)
ts['Today'] = ts['Adj Close']
tslag["Volume"] = ts["Volume"]
for i in xrange(0, lags):
tslag["Lag%s" % str(i+1)] = ts["Adj Close"].shift(i+1)
tsret = pd.DataFrame(index=tslag.index)
tsret["Volume"] = tslag["Volume"]
tsret["Today"] = tslag["Today"].pct_change()*100.0
for i in xrange(0, lags):
if (abs(x) < 0.0001):
tsret["Today"][i] = 0.0001
for i in xrange(0,lags):
tsret["Lag%s" % str(i+1)] = \
tslag["Lag%s" % str(i+1)].pct_change()*100.0
tsret["Direction"] = np.sign(tsret["Today"])
tsret = tsret[tsret.index >= start_time]
return tsret
if __name__ == "__main__":
snpert = create_lagged_series(len('GOLD', Get_quote(), 1))
X = snpret[["Lag1","Lag2"]]
y = snpret["Direction"]
start_test = cuurernt_quote
X_train = X[X.index < start_test]
X_test = X[X.index >= start_test]
y_train = y[y.index < start_test]
y_test = y[y.index >= start_test]
print "Hit Rates/Confusion Matrices:\n"
models = [ ( "LR", LogisticRegression() ),
( "LDA", LDA() ),
( "QDA", QDA() ),
( "LSVC", LinearSVC() ),
( "RSVM", SVC( C = 1000000.0,
cache_size = 200,
class_weight = None,
coef0 = 0.0,
degree = 3,
gamma = 0.0001,
kernel = 'rbf',
max_iter = -1,
probability = False,
random_state = None,
shrinking = True,
tol = 0.001,
verbose = False
)
),
( "RF", RandomForestClassifier( n_estimators = 1000,
criterion = 'gini',
max_depth = None,
min_samples_split = 2,
min_samples_leaf = 1,
max_features = 'auto',
bootstrap = True,
oob_score = False,
n_jobs = 1,
random_state = None,
verbose = 0
)
)
]
# Iterate through the models
for m in models:
# Train each of the models on the training set
m[1].fit(X_train, y_train)
pred = m[1].predict(X_test)
print "%s:\n%0.3f" % (m[0], m[1].score(X_test, y_test))
print "%s\n" % confusion_matrix(pred, y_test)
Here is just my MT4 price feed script on its own:
import dde_client as ddec
import time
__author__ = 'forex Ticker'
print __author__
QUOTE_client = ddec.DDEClient('MT4', 'QUOTE')
symbols = ['USDCAD', 'GOLD','EURUSD', 'SILVER', 'US30Cash', ]
for i in symbols:
QUOTE_client.advise(i)
while 1:
time.sleep(1)
print "Symbol\tDATE\t\tTIME\tBID\tASK"
for i in symbols:
current_quote = QUOTE_client.request(i).split(" ")
day, _time, bid, ask = (current_quote[0], current_quote[1],
current_quote[2], current_quote[3])
print i + ":\t" + day + "\t" + _time +"\t" +bid+ "\t" + ask
break
time.sleep(1)
continue
As to fixing your algorithm I suggest that rather than doing it from scratch and hacking various libraries together, start from a working example and modify it to your liking. You don't have to understand it completely, but you need something to start with.
I would even abstract away the MT4 and quotes reading logic, and just have some test numbers (fake or sampled) in a CSV or TXT file. Start form a working example that can recognize A and B patterns in this file. Having that define how your own algorithm is different and try to adjust it.
When it's working the final step is the MT4 integration. It looks like the DDE server is only meant for data export, but not for building an indicator. Consider this alternative framework: linking MT4 to Python. Not only you can build an on-chart indicator with it, but also perform automatic trading using your algorithm.
Q: How to put that together...? A: Have a Realistic Plan - best before one puts the money on table. That can save youfrom even starting to do a nonsense orfrom aiming at non-realistic targets.
No one would be harmed, if plan is the first working document elaborated and agreed by all parties involved on HOW the "great and cool new disruptive vision" WILL BE CREATED.
Organise your further work in steps + always add budget controls, be it in [man*weeks] or [k$], one is willing to spend on items.
One ought be able to decide on the feasibility and surviveability of the initially great & cool idea.
Plan carefully inside the principal phases, both on the MQL4/5 side, python and other components:
X [man*weeks] on [System Integration Architecture],
Y [man*weeks] on [Integration Model Design],
Z [man*weeks] on [Integration Model Prototype],
U [man*weeks] on [Integration Model Testing],
V [man*weeks] on [Integration Model Release],
W [man*weeks] on [Integration Model Production Ecosystem]
S [man*weeks] on [Design Cycles on finding best Predictions Model]
T [man*weeks] on [Design Cycles on finding good Trading Strategies for Predictions]
Items not to be forgotten to overcome in the early architecture decisions:
0) Forget to use MQL4/5 examples, you put yourself at risk into sub-millisecond domain battle with hundreds of millions USD in fight and motion
1) Forget to use Custom Indicator in MQL4/5 MetaTrader Terminal ( blocking )
2) Forget to use DDE integration, some O/S do not support it at all
3) Forget to use pandas ( even for any AI/ML-model prototyping ) as nanoseconds matter a lot in ML-process, pandas is a great toy, but not for the performance a real-world trading needs for ML-model tuning.
4) Forget to use start-end logic, the AI/ML-engines have to be separate, in order to efficiently train/validate/test for their best generalisation abilities in vast HyperPARAM state-spaces.for m in models: can be in the source-code, but not in the reality. One instrument may take ( and does take ) about a few tens of [CPU-core*days] runtimes in parameter optimisation on COTS-hardware, so count with realistic numbers here, for proper budgeting of each of the [S]+[T] cycles.
Epilogue:
Anyway a smart Programme, if approved as financially feasible. May like other posts on Low-latency MT4-AI/ML-integration for algorithmic-trading.

Multithreading in Python with the threading and queue modules

I have a file with hundreds of thousands of lines, each line of which needs to be undergo the same process (calculating a co-variance). I was going to multithread because it takes pretty long as is. All the examples/tutorials I have seen have been fairly complicated for what I want to do, however. If anyone could point me to a good tutorial that explains how to use the two modules together that would be great.
Whenever I have to process something in parallel, I use something similar to this (I just ripped this out of an existing script):
#!/usr/bin/env python2
# This Python file uses the following encoding: utf-8
import os, sys, time
from multiprocessing import Queue, Manager, Process, Value, Event, cpu_count
class ThreadedProcessor(object):
def __init__(self, parser, input_file, output_file, threads=cpu_count()):
self.parser = parser
self.num_processes = threads
self.input_file = input_file
self.output_file = output_file
self.shared_proxy = Manager()
self.input_queue = Queue()
self.output_queue = Queue()
self.input_process = Process(target=self.parse_input)
self.output_process = Process(target=self.write_output)
self.processes = [Process(target=self.process_row) for i in range(self.num_processes)]
self.input_process.start()
self.output_process.start()
for process in self.processes:
process.start()
self.input_process.join()
for process in self.processes:
process.join()
self.output_process.join()
def parse_input(self):
for index, row in enumerate(self.input_file):
self.input_queue.put([index, row])
for i in range(self.num_processes):
self.input_queue.put('STOP')
def process_row(self):
for index, row in iter(self.input_queue.get, 'STOP'):
self.output_queue.put([index, row[0], self.parser.parse(row[1])])
self.output_queue.put('STOP')
def write_output(self):
current = 0
buffer = {}
for works in range(self.num_processes):
for index, id, row in iter(self.output_queue.get, 'STOP'):
if index != current:
buffer[index] = [id] + row
else:
self.output_file.writerow([id] + row)
current += 1
while current in buffer:
self.output_file.writerow(buffer[current])
del buffer[current]
current += 1
Basically, you have two processes managing the reading/writing of the file. One reads and parses the input, the other reads from the "done" queue and writes to your output file. The other processes are spawned (in this case the number is equal to the number of total processor cores your CPU has) and they all process elements from the input queue.

Numerical regression testing

I'm working on a scientific computing code (written in C++), and in addition to performing unit tests for the smaller components, I'd like to do regression testing on some of the numerical output by comparing to a "known-good" answer from previous revisions. There are a few features I'd like:
Allow comparing numbers to a specified tolerance (for both roundoff error and looser expectations)
Ability to distinguish between ints, doubles, etc, and to ignore text if necessary
Well-formatted output to tell what went wrong and where: in a multi-column table of data, only show the column entry that differs
Return EXIT_SUCCESS or EXIT_FAILURE depending on whether the files match
Are there any good scripts or applications out there that do this, or will I have to roll my own in Python to read and compare output files? Surely I'm not the first person with these kind of requirements.
[The following is not strictly relevant, but it may factor into the decision of what to do. I use CMake and its embedded CTest functionality to drive unit tests that use the Google Test framework. I imagine that it shouldn't be hard to add a few add_custom_command statements in my CMakeLists.txt to call whatever regression software I need.]
You should go for PyUnit, which is now part of the standard lib under the name unittest. It supports everything you asked for. The tolerance check, e.g., is done with assertAlmostEqual().
The ndiff utility may be close to what you're looking for: it's like diff, but it will compare text files of numbers to a desired tolerance.
I ended up writing a Python script to do more or less what I wanted.
#!/usr/bin/env python
import sys
import re
from optparse import OptionParser
from math import fabs
splitPattern = re.compile(r',|\s+|;')
class FailObject(object):
def __init__(self, options):
self.options = options
self.failure = False
def fail(self, brief, full = ""):
print ">>>> ", brief
if options.verbose and full != "":
print " ", full
self.failure = True
def exit(self):
if (self.failure):
print "FAILURE"
sys.exit(1)
else:
print "SUCCESS"
sys.exit(0)
def numSplit(line):
list = splitPattern.split(line)
if list[-1] == "":
del list[-1]
numList = [float(a) for a in list]
return numList
def softEquiv(ref, target, tolerance):
if (fabs(target - ref) <= fabs(ref) * tolerance):
return True
#if the reference number is zero, allow tolerance
if (ref == 0.0):
return (fabs(target) <= tolerance)
#if reference is non-zero and it failed the first test
return False
def compareStrings(f, options, expLine, actLine, lineNum):
### check that they're a bunch of numbers
try:
exp = numSplit(expLine)
act = numSplit(actLine)
except ValueError, e:
# print "It looks like line %d is made of strings (exp=%s, act=%s)." \
# % (lineNum, expLine, actLine)
if (expLine != actLine and options.checkText):
f.fail( "Text did not match in line %d" % lineNum )
return
### check the ranges
if len(exp) != len(act):
f.fail( "Wrong number of columns in line %d" % lineNum )
return
### soft equiv on each value
for col in range(0, len(exp)):
expVal = exp[col]
actVal = act[col]
if not softEquiv(expVal, actVal, options.tol):
f.fail( "Non-equivalence in line %d, column %d"
% (lineNum, col) )
return
def run(expectedFileName, actualFileName, options):
# message reporter
f = FailObject(options)
expected = open(expectedFileName)
actual = open(actualFileName)
lineNum = 0
while True:
lineNum += 1
expLine = expected.readline().rstrip()
actLine = actual.readline().rstrip()
## check that the files haven't ended,
# or that they ended at the same time
if expLine == "":
if actLine != "":
f.fail("Tested file ended too late.")
break
if actLine == "":
f.fail("Tested file ended too early.")
break
compareStrings(f, options, expLine, actLine, lineNum)
#print "%3d: %s|%s" % (lineNum, expLine[0:10], actLine[0:10])
f.exit()
################################################################################
if __name__ == '__main__':
parser = OptionParser(usage = "%prog [options] ExpectedFile NewFile")
parser.add_option("-q", "--quiet",
action="store_false", dest="verbose", default=True,
help="Don't print status messages to stdout")
parser.add_option("--check-text",
action="store_true", dest="checkText", default=False,
help="Verify that lines of text match exactly")
parser.add_option("-t", "--tolerance",
action="store", type="float", dest="tol", default=1.e-15,
help="Relative error when comparing doubles")
(options, args) = parser.parse_args()
if len(args) != 2:
print "Usage: numdiff.py EXPECTED ACTUAL"
sys.exit(1)
run(args[0], args[1], options)
I know I'm quite late to the party, but a few months ago I wrote the nrtest utility in an attempt to make this workflow easier. It sounds like it might help you too.
Here's a quick overview. Each test is defined by its input files and its expected output files. Following execution, output files are stored in a portable benchmark directory. A second step then compares this benchmark to a reference benchmark. A recent update has enabled user extensions, so you can define comparison functions for your custom data.
I hope it helps.