How to identify rectangular price congestion in stock market? (C++) - c++

In the field of stock market technical analysis there is the concept of rectangular price congestion levels, that is: the price goes up and down essentially never breaking the previous high and low price levels for some time, forming the figure of a rectangule. E.g.: http://cf.ydcdn.net/1.0.0.25/images/invest/congestion%20area.jpg.
Edit: to me clearer: the stock as well as the forex market is made by sets of movements called "impulse" and "correction", the first one being in the direction of the current stock's trendand the other in the opposite. When the stock is moving in the direction of the trend, the impulse movement is always bigger than the following correction, but sometimes what happens is a that the correction end-up being at the same size of the impulse. So for example, in a stock with a positive trend, the impulse movement moved from price $10,00 to $15,00, and than a correction appeared dropping the price to $12,00. When the new impulse appeared, thought, instead of passing the previous high value ($15,00), it stooped exactly on it, being followed by a new correction that dropped the price exactly to the previous low price ($12,00). So now we may draw two paralel horizontal lines in the stock's graph: one in the $15,00 price and other in the $12,00, forming a channel where the price is "congestioned" inside. And if we draw two vertical bars in the extreme sides, we have a rectangle: one that has its top bar in the high level and other in the low one.
I'm trying to create an algorithm in C++/Qt capable of detecting such patterns with candlestick data inside a list container (using Qt -> QList), but currently I'm doing research to see if anybody knows about someone who already did such code so I save lots of efforts and time in developing such algorithm.
So my first question will be: does anybody knows and open-source code that can detect such figure? - Obviously doesn't have to be exactly in this conditions, but if there is a code that do a similar taks, only needing for me to do the adjustments, that would be fine.
In the other hand, how could I create such algorithm anyway? It's clear the the high spot is to detect the high and low levels and than just control when those levels are 'broken' to detect the end of the figure, but how could I do that in an efficient way? Today the best thing I'm able to do is to detect high-and-low levels using time as parameter (e.g. "the highest price in four candles", and this using a very expensive code.

Technical analysis is very vague and subjective, hard to code in a program when everyone sees different things in the same chart. A good start would be to use some cost function such as choosing levels that minimizing the sum of squared distances, which penalizes large deviations more than smaller ones.

You should use the idea of 'hysteresis' thresholding; you create a 4-level state machine for how the price breaks the low (L) or high (H) levels. (first time reaches new low level) L->L, (return to low level) H->L,(new high level) H->H, and then (return to high level) L->H.

Related

Q-learning to learn minesweeping behavior

I am attempting to use Q-learning to learn minesweeping behavior on a discreet version of Mat Buckland's smart sweepers, the original available here http://www.ai-junkie.com/ann/evolved/nnt1.html, for an assignment. The assignment limits us to 50 iterations of 2000 moves on a grid that is effectively 40x40, with the mines resetting and the agent being spawned in a random location each iteration.
I've attempted performing q learning with penalties for moving, rewards for sweeping mines and penalties for not hitting a mine. The sweeper agent seems unable to learn how to sweep mines effectively within the 50 iterations because it learns that going to specific cell is good, but after a the mine is gone it is no longer rewarded, but penalized for going to that cell with the movement cost
I wanted to attempt providing rewards only when all the mines were cleared in an attempt to make the environment static as there would only be a state of not all mines collected, or all mines collected, but am struggling to implement this due to the agent having only 2000 moves per iteration and being able to backtrack, it never manages to sweep all the mines in an iteration within the limit with or without rewards for collecting mines.
Another idea I had was to have an effectively new Q matrix for each mine, so once a mine is collected, the sweeper transitions to that matrix and operates off that where the current mine is excluded from consideration.
Are there any better approaches that I can take with this, or perhaps more practical tweaks to my own approach that I can try?
A more explicit explanation of the rules:
The map edges wrap around, so moving off the right edge of the map will cause the bot to appear on the left edge etc.
The sweeper bot can move up down, left or right from any map tile.
When the bot collides with a mine, the mine is considered swept and then removed.
The aim is for the bot to learn to sweep all mines on the map from any starting position.
Given that the sweeper can always see the nearest mine, this should be pretty easy. From your question I assume your only problem is finding a good reward function and representation for your agent state.
Defining a state
Absolute positions are rarely useful in a random environment, especially if the environment is infinite like in your example (since the bot can drive over the borders and respawn at the other side). This means that the size of the environment isn't needed for the agent to operate (we will actually need it to simulate the infinite space, tho).
A reward function calculates its return value based on the current state of the agent compared to its previous state. But how do we define a state? Lets see what we actually need in order to operate the agent like we want it to.
The position of the agent.
The position of the nearest mine.
That is all we need. Now I said erlier that absolute positions are bad. This is because it makes the Q table (you call it Q matrix) static and very fragile to randomness. So let's try to completely eliminate abosulte positions from the reward function and replace them with relative positions. Luckily, this is very simple in your case: instead of using the absolute positions, we use the relative position between the nearest mine and the agent.
Now we don't deal with coordinates anymore, but vectors. Lets calculate the vector between our points: v = pos_mine - pos_agent. This vector gives us two very important pieces of information:
the direction in which the nearst mine is, and
the distance to the nearest mine.
And these are all we need to make our agent operational. Therefore, an agent state can be defined as
State: Direction x Distance
of which distance is a floating point value and direction either a float that describes the angle or a normalized vector.
Defining a reward function
Given our newly defined state, the only thing we care about in our reward function is the distance. Since all we want is to move the agent towards mines, the distance is all that matters. Here are a few guesses how the reward function could work:
If the agent sweeps a mine (distance == 0), return a huge reward (ex. 100).
If the agent moves towards a mine (distance is shrinking), return a neutral (or small) reward (ex. 0).
If the agent moves away from a mine (distance is increasing), retuan a negative reward (ex. -1).
Theoretically, since we penaltize moving away from a mine, we don't even need rule 1 here.
Conclusion
The only thing left is determining a good learning rate and discount so that your agent performs well after 50 iterations. But, given the simplicity of the environment, this shouldn't even matter that much. Experiment.

terminology and references for detecting light pulses in a field of light

Given a video with a fixed background containing a lot of variation in light I am trying to detect pulses of light that occur for relatively short spans of time. When the video is played it is pretty easy for a person to distinguish the light pulses but if only shown a still frame it would be impossible to distinguish a pulse from background light.
I would like to know if there is specific terminology in machine vision that I can use to search for algorithms used to solve this problem. Also if you have any references for papers or open source software that solves this problem that would be great.
Edit: More context
The video itself is of a biological process that occurs at the sub-cellular level and while the background is fixed there is also a significant amount of random signal noise at the pixel level (there doesn't appear to be significant correlation in the noise between neighboring pixels). Note that the variation I refer to in the first paragraph is true variation and not signal noise. Since I mentioned that the process is biological it's probably also worth saying that there is no movement going on; these are just pulses of light. Also, the pulses themselves occupy enough pixels so that it is easy to discern their relative sizes.
From statistics, you could look into change point detection. The essential idea being that most of the time each (x,y) point or region, if you define some granularity of regions, has an intensity I(x,y), where I(x,y) is random, but either bounded or stochastic with some assumed distribution (e.g. normal with a given mean and standard deviation), and then it is observed with an intensity that is anomalous for that distribution. Anomaly detection would also apply, but the time series nature is more appropriate.
(If you want to go more into the statistical methodologies, it would be far more appropriate to discuss this on the statistics Stack Exchange site.)
If you look into astronomical applications, you can find papers on supernova and pulsar detection.
Update 1. Just to clarify the astronomical analogies, if the pulse is repeating, then papers on pulsars or satellites may be most appropriate. If the pulse is one-time, then papers on supernova detection would be better. If the pulse is bursty, and spatially clustered, then meteor strike detection would be better. Although spatial time series analysis, especially change point or anomaly detection, is useful, it's best to have an understanding of the stochastic phenomena of interest in order to narrow down the detection methodology.
To continue the notion of applying statistics: you might consider gridding each image frame into rectangular neighborhoods. At each time t, compute the variance (or standard deviation) of the neighborhood. Presumably, the unexcited neighborhoods will exhibit some common distribution of intensity (i.e. uniform, but most likely some form of gaussian). The presence of pulse pixels will bias that distribution in some way. When comparing a neighborhood at time t and t-1, a significant change in mean intensity (or a change in the variance, etc.) would indicate an excited neighborhood.
You might also consider looking at other measures, such as skewness and kurtosis. Assuming the initial, unexcited distribution is gaussian, the "shape" parameters could also identify differences in the pixel populations.
*Note that I'm assuming a grayscale image for simplicity, but the same principles may be applied to an RGB image.
Assuming a completely static scene with no object and camera motion, then any color deviation would be due to lighting changes.
If you detect an abrupt color/intensity change at particular pixels (i.e. brighness change above a certain allowable threshold), then this should be due to the light source turning on/off.
If you are only interested in point light sources, then any change in a region larger than the maximum apparent light source should be considered as coming from something else (e.g. the sun suddenly revealed from behind clouds).

How to exploit periodicity to reduce noise of a signal?

100 periods have been collected from a 3 dimensional periodic signal. The wavelength slightly varies. The noise of the wavelength follows Gaussian distribution with zero mean. A good estimate of the wavelength is known, that is not an issue here. The noise of the amplitude may not be Gaussian and may be contaminated with outliers.
How can I compute a single period that approximates 'best' all of the collected 100 periods?
Time-series, ARMA, ARIMA, Kalman Filter, autoregression and autocorrelation seem to be keywords here.
UPDATE 1: I have no idea how time-series models work. Are they prepared for varying wavelengths? Can they handle non-smooth true signals? If a time-series model is fitted, can I compute a 'best estimate' for a single period? How?
UPDATE 2: A related question is this. Speed is not an issue in my case. Processing is done off-line, after all periods have been collected.
Origin of the problem: I am measuring acceleration during human steps at 200 Hz. After that I am trying to double integrate the data to get the vertical displacement of the center of gravity. Of course the noise introduces a HUGE error when you integrate twice. I would like to exploit periodicity to reduce this noise. Here is a crude graph of the actual data (y: acceleration in g, x: time in second) of 6 steps corresponding to 3 periods (1 left and 1 right step is a period):
My interest is now purely theoretical, as http://jap.physiology.org/content/39/1/174.abstract gives a pretty good recipe what to do.
We have used wavelets for noise suppression with similar signal measured from cows during walking.
I'm don't think the noise is so much of a problem here and the biggest peaks represent actual changes in the acceleration during walking.
I suppose that the angle of the leg and thus accelerometer changes during your experiment and you need to account for that in order to calculate the distance i.e you need to know what is the orientation of the accelerometer in each time step. See e.g this technical note for one to account for angle.
If you need get accurate measures of the position the best solution would be to get an accelerometer with a magnetometer, which also measures orientation. Something like this should work: http://www.sparkfun.com/products/10321.
EDIT: I have looked into this a bit more in the last few days because a similar project is in my to do list as well... We have not used gyros in the past, but we are doing so in the next project.
The inaccuracy in the positioning doesn't come from the white noise, but from the inaccuracy and drift of the gyro. And the error then accumulates very quickly due to the double integration. Intersense has a product called Navshoe, that addresses this problem by zeroing the error after each step (see this paper). And this is a good introduction to inertial navigation.
Periodic signal without noise has the following property:
f(a) = f(a+k), where k is the wavelength.
Next bit of information that is needed is that your signal is composed of separate samples. Every bit of information you've collected are based on samples, which are values of f() function. From 100 samples, you can get the mean value:
1/n * sum(s_i), where i is in range [0..n-1] and n = 100.
This needs to be done for every dimension of your data. If you use 3d data, it will be applied 3 times. Result would be (x,y,z) points. You can find value of s_i from the periodic signal equation simply by doing
s_i(a).x = f(a+k*i).x
s_i(a).y = f(a+k*i).y
s_i(a).z = f(a+k*i).z
If the wavelength is not accurate, this will give you additional source of error or you'll need to adjust it to match the real wavelength of each period. Since
k*i = k+k+...+k
if the wavelength varies, you'll need to use
k_1+k_2+k_3+...+k_i
instead of k*i.
Unfortunately with errors in wavelength, there will be big problems keeping this k_1..k_i chain in sync with the actual data. You'd actually need to know how to regognize the starting position of each period from your actual data. Possibly need to mark them by hand.
Now, all the mean values you calculated would be functions like this:
m(a) :: R->(x,y,z)
Now this is a curve in 3d space. More complex error models will be left as an excersize for the reader.
If you have a copy of Curve Fitting Toolbox, localized regression might be a good choice.
Curve Fitting Toolbox supports both lowess and loess localized regression models for curve and curve fitting.
There is an option for robust localized regression
The following blog post shows how to use cross validation to estimate an optimzal spaning parameter for a localized regression model, as well as techniques to estimate confidence intervals using a bootstrap.
http://blogs.mathworks.com/loren/2011/01/13/data-driven-fitting/

Kalman Tracking - Measurement variance

I'm doing some work on tracking moving objects using a ceiling mounted downward facing camera. I've got to the point where I can detect the position of the desired object in each frame.
I'm looking into using a Kalman filter to track the object's position and speed through the scene and I've reached a stumbling block. I've set up my system and have all the required parts of the Kalman filter except the measurement variance.
I want to be able to assign a meaningful variance to each measurement to allow the correction phase to use the new information in a sensible manner. I have several measures assigned to my detected objects which could in theory be useful in determining how accurate the position should be and it seems logical to try and combine them to derive a suitable variance.
Am I approaching this in the right manner and if so, can anyone point me in the right direction to continue?
Any help greatly appreciated.
I think you are right. According to this post:
Sensor fusioning with Kalman filter
determining the variance is 100% experimental. It seems to me you have everything you need to get good estimates of the variance.
sorry for the late reply. I have personally encountered the same problem in my previous project. I found the advice given by Gustaf Hendeby in his Sensor Fusion lecture slides ( Page 10 of the slides) extremely valuable.
To summarize:
(1) The SNR of your measurement noise and your process noise determines your filter behavior. A high process noise/measurement noise ration makes your filter slower (low-pass filter), which will usually allow smoother tracking, vice versa a if you set your measurement noise low, you essentially have a high pass filter, which tends to have more jitter.
(2) There are numerous papers in the literature discuss on how to set these noise model properly. However, usually a lot of "tuning" is needed depends on your application. Usually the measurement noise is what we can measure/characterize based on the hardware specification. Therefore a recommendation is to fix "R" (measurement noise covariance) and tune Q (the process model noise covariance).

detecting pauses in a spoken word audio file using pymad, pcm, vad, etc

First I am going to broadly state what I'm trying to do and ask for advice. Then I will explain my current approach and ask for answers to my current problems.
Problem
I have an MP3 file of a person speaking. I'd like to split it up into segments roughly corresponding to a sentence or phrase. (I'd do it manually, but we are talking hours of data.)
If you have advice on how to do this programatically or for some existing utilities, I'd love to hear it. (I'm aware of voice activity detection and I've looked into it a bit, but I didn't see any freely available utilities.)
Current Approach
I thought the simplest thing would be to scan the MP3 at certain intervals and identify places where the average volume was below some threshold. Then I would use some existing utility to cut up the mp3 at those locations.
I've been playing around with pymad and I believe that I've successfully extracted the PCM (pulse code modulation) data for each frame of the mp3. Now I am stuck because I can't really seem to wrap my head around how the PCM data translates to relative volume. I'm also aware of other complicating factors like multiple channels, big endian vs little, etc.
Advice on how to map a group of pcm samples to relative volume would be key.
Thanks!
PCM is a time frame base encoding of sound. For each time frame, you get a peak level. (If you want a physical reference for this: The peak level corresponds to the distance the microphone membrane was moved out of it's resting position at that given time.)
Let's forget that PCM can uses unsigned values for 8 bit samples, and focus on
signed values. If the value is > 0, the membrane was on one side of it's resting position, if it is < 0 it was on the other side. The bigger the dislocation from rest (no matter to which side), the louder the sound.
Most voice classification methods start with one very simple step: They compare the peak level to a threshold level. If the peak level is below the threshold, the sound is considered background noise.
Looking at the parameters in Audacity's Silence Finder, the silence level should be that threshold. The next parameter, Minimum silence duration, is obviously the length of a silence period that is required to mark a break (or in your case, the end of a sentence).
If you want to code a similar tool yourself, I recommend the following approach:
Divide your sound sample in discrete sets of a specific duration. I would start with 1/10, 1/20 or 1/100 of a second.
For each of these sets, compute the maximum peak level
Compare this maximum peak to a threshold (the silence level in Audacity). The threshold is something you have to determine yourself, based on the specifics of your sound sample (loudnes, background noise etc). If the max peak is below your threshold, this set is silence.
Now analyse the series of classified sets: Calculate the length of silence in your recording. (length = number of silent sets * length of a set). If it is above your Minimum silence duration, assume that you have the end of a sentence here.
The main point in coding this yourself instead of continuing to use Audacity is that you can improve your classification by using advanced analysis methods. One very simple metric you can apply is called zero crossing rate, it just counts how often the sign switches in your given set of peak levels (i.e. your values cross the 0 line). There are many more, all of them more complex, but it may be worth the effort. Have a look at discrete cosine transformations for example...
Just wanted to update this. I'm having moderate success using Audacity's Silence Finder. However, I'm still interested in this problem. Thanks.
PCM is a way of encoding a sinusoidal wave. It will be encoded as a series of bits, where one of the bits (1, I'd guess) indicates an increase in the function, and 0 indicates a decrease. The function can stay roughly constant by alternating 1 and 0.
To estimate amplitude, plot the sin wave, then normalize it over the x axis. Then, you should be able to estimate the amplitude of the sin wave at different points. Once you've done that, you should be able to pick out the spots where amplitude is lower.
You may also try to use a Fourier transform to estimate where the signals are most distinct.