I would like to track some traffic signs from a video. I found a nice way to solve the problem here: Link
My question now is: How should I handle the case of new incoming blobs? I mean for a Blob one could define a search-region, but maybe in the next frame there is also a second thing that appears? How should I handle this?
from what I understand from the paper you provide, this system is already made to track several signs at a time, appearing and disappearing. See the end of §2 :
the latest tracked blobs are stored in a temporary memory. Blobs in frame (t+1) are matched with those in the temporary memory (...) thus, when a sign disappears in particular frames, it could be tracked in the next frame when it appears again.
The next § (3 - blob matching) explains how you "recognize" the signs you are tracking from one frame to another. But if you can match them (recognize them), it also means that you can also not recognize them, meaning that there are new signs : They must then be added to the memory.
I think (but I can be wrong) that what is misleading you is the "search region reduction". I think that this region reduction is done independently for every sign/blob (see §2, the "bounding boxes are determined"). So it doesn't matter how many signs there are.
The algorithm is then the following :
for each frame :
detect "blobs" (= traffic sign candidates) using the Kalman-Filters
for each blob :
match this blob with the already known blobs using the ring partitioned method described in §3
if the blob doesn't match, add it to the memory as a new blob
The article doesn't cover when to remove a blob from the "latest known blobs" memory. Since the algorithm is made to work even if a blob is missing for a few frames then reappear (hidden by a truck or an electric pole for example) and whatever the movement (so we can't infer that signs will disappear to the sides of the picture or after getting bigger), I think (my opinion) that we could use both a time limit and an "area collision" detection. If a new blob appears in an area where we would expect a known blob but doesn't match it, then it means that the old blob is no longer relevant.
God luck with you project !
Related
I am currently trying to create a hack for Borderlands 2. One of the features I am trying to create is a Kill All feature which will iterate through each entity saved to my vector entity struct and set their health to 0. This is easy for me to do on other games but I am having major trouble on this game.
I found a pointer that shows me the true health value of the entity I am looking it but I noticed that in certain regions of the map the addresses change vastly. For example, I may find 7-8 entity's near me that start with the same address (Ex. 0x79C8000+)
But after I leave to go to another area the entity's I find here are vastly different in memory location. (Ex. 0x01CD0000+)
And I understand that such a large map would require things like that to happen, especially if I move to far away from a certain region of entity's they are unloaded from the world and the address value is set to 0 and no longer correctly points to the entity's health when I move back to said region and they are loaded back in.
The things that bothers me is that there has to be an array that stores every entity that is CURRENTLY loaded in memory, regardless if its just an object entity and non NPC because I'm almost certain there is one because other people are able to find it.
Any ideas??
I am working user behavior project. Based on user interaction I have got some data. There is nice sequence which smoothly increases and decreases over the time. But there are little discrepancies, which are very bad. Please refer to graph below:
You can also find data here:
2.0789 2.09604 2.11472 2.13414 2.15609 2.17776 2.2021 2.22722 2.25019 2.27304 2.29724 2.31991 2.34285 2.36569 2.38682 2.40634 2.42068 2.43947 2.45099 2.46564 2.48385 2.49747 2.49031 2.51458 2.5149 2.52632 2.54689 2.56077 2.57821 2.57877 2.59104 2.57625 2.55987 2.5694 2.56244 2.56599 2.54696 2.52479 2.50345 2.48306 2.50934 2.4512 2.43586 2.40664 2.38721 2.3816 2.36415 2.33408 2.31225 2.28801 2.26583 2.24054 2.2135 2.19678 2.16366 2.13945 2.11102 2.08389 2.05533 2.02899 2.00373 1.9752 1.94862 1.91982 1.89125 1.86307 1.83539 1.80641 1.77946 1.75333 1.72765 1.70417 1.68106 1.65971 1.64032 1.62386 1.6034 1.5829 1.56022 1.54167 1.53141 1.52329 1.51128 1.52125 1.51127 1.50753 1.51494 1.51777 1.55563 1.56948 1.57866 1.60095 1.61939 1.64399 1.67643 1.70784 1.74259 1.7815 1.81939 1.84942 1.87731
1.89895 1.91676 1.92987
I would want to smooth out this sequence. The technique should be able to eliminate numbers with characteristic of X and Y, i.e. error in mono-increasing or mono-decreasing.
If not eliminate, technique should be able to shift them so that series is not affected by errors.
What I have tried and failed:
I tried to test difference between values. In some special cases it works, but for sequence as presented in this the distance between numbers is not such that I can cut out errors
I tried applying a counter, which is some X, then only change is accepted otherwise point is mapped to previous point only. Here I have great trouble deciding on value of X, because this is based on user-interaction, I am not really controller of it. If user interaction is such that its plot would be a zigzag pattern, I am ending up with 'no user movement data detected at all' situation.
Please share the techniques that you are aware of.
PS: Data made available in this example is a particular case. There is no typical pattern in which numbers are going to occure, but we expect some range to be continuous with all the examples. Solution I am seeking is generic.
I do not know how much effort you want to involve in this problem but if you want theoretical guaranties,
topological persistence seems well adapted to your problem imho.
Basically with that method, you can filtrate local maximum/minimum by fixing a scale
and there are theoritical proofs that says that if you sampling is
close from your function, then you extracts correct number of maximums with persistence.
You can see these slides (mainly pages 7-9 to get the idea) to get an idea of the method.
Basically, if you take your points as a landscape and imagine a watershed starting from maximum height and decreasing, you have some picks.
Every pick has a time where it is born which is the time where it becomes emerged and a time where it dies which is when it merges with an higher pick. Now a persistence diagram pictures a point for every pick where its x/y coordinates are its time of birth/death (by assumption the first pick does not die and is not shown).
If a pick is a global maximal, then it will be further from the diagonal in the persistence diagram than a local maximum pick. To remove local maximums you have to remove picks close to the diagonal. There are fours local maximums in your example as you can see with the persistence diagram of your data (thanks for providing the data btw) and two global ones (the first pick is not pictured in a persistence diagram):
If you noise your data like that :
You will still get a very decent persistence diagram that will allow you to filter local maximum as you want :
Please ask if you want more details or references.
Since you can not decide on a cut off frequency, and not even on the filter you want to use, I would implement several, and let the user set the parameters.
The first thing that I thought of is running average, and you can see that there are so many things to set, to get different outputs.
The quicktime documentation recommends the following approach to finding a keyframe:
Finding a Key Frame
Finding a key frame for a specified time in a movie is slightly more
complicated than finding a sample for a specified time. The media
handler must use the sync sample atom and the time-to-sample atom
together in order to find a key frame.
The media handler performs the following steps:
Examines the time-to-sample atom to determine the sample number that contains the data for the specified time.
Scans the sync sample atom to find the key frame that precedes the sample number chosen in step 1.
Scans the sample-to-chunk atom to discover which chunk contains the key frame.
Extracts the offset to the chunk from the chunk offset atom.
Finds the offset within the chunk and the sample’s size by using the sample size atom.
source: https://developer.apple.com/library/mac/documentation/QuickTime/qtff/QTFFChap2/qtff2.html
This is quite confusing, since multiple tracks ("trak" atom) will yield different offsets. For example, the keyframe-sample-chunk-offset value for the video trak will be one value, and the audio will be another.
How does one translate the instructions above into a location in the file (or mdat atom)?
That's not restricted to key frames. You can't in general guarantee that samples for different tracks are close to each other in the file. You hope that audio and video will be interleaved so you can play back a movie without excessive seeking but that's up to the software that created the file. Each track has its own sample table and chunk atoms that tell you where the samples are in the file and they could be anywhere. (They could even be in a different file, though reference movies are deprecated nowadays so you can probably ignore them.)
I want to know about the utilities like winHex , which are the disk editor
. they access the hard disk & represent the data in hexadecimal of a whole harddisk of about 2TB .
How do they achieve this in a single scroll area & also provide the undo functionality in that....
which widget should be used to display such a huge amount of data.????
I want to make this application in QT.
How do they achieve this in a single scroll area
It is not a "single scroll area" containing the entire disk. It is a scrollbar and dynamically generated content for whatever disk content you are showing at the time.
Simply calculate the position based on the scroll location (unless your screen is 10000's of pixels tall, however, you will not be able to place the cursor EXACTLY on the disk sector you want).
also provide the undo functionality in that....
Undo functionality, I expect (I haven't looked at the code) is done by holding "address changed, old value" in some sort of container. Pretty much the same way you'd do undo information for any other large dataset.
which widget should be used to display such a huge amount of data.????
One which shows text and allows you to intercept the redraw and provide your own data on each redraw operation. I'm afraid I don't know QT very well, so can't advice on the details.
Obviously, one factor you haven't covered is "how do you open/mount the whole disk in read-write mode when it is already mounted" - I'm not sure if it allows this, but if it does, I expect there is a disk filter driver involved that has "sideways" interfaces to allow updates behind the scenes of the filesystem.
Edit: In answer to the question in the comment:
There are two options, either write to disk whenever the data is changed. In which case the code needs to remember all the original values, and restore them when the user does the undo operation. The alternative, which is approximately the same effort is to store all the edits, ("change value at 1000 to 05"), and then when asked to display some content, process any edits within the displayed range before the actual display operation.
Obviously, if someone decides to play "monkey on keyboard" for many many hours (weeks, months) to fill the ENTIRE disk with rather random content, then that would be a problem to "remember" all that without running out of memory, so you probably need a "I've run out of memory to store undo-information, do you want to save what you have done so far?" type option.
One could also consider a "same value stored in a large section" type compression (e.g. if you have a "fill from A to B with value X", you store simply that "from A to B we have filled with X", rather than store, potentially, many megabytes of "A = X, A+1 = X, A+2 = X ... B-1=X, B=X").
I'm working on an iOS music app (written in C++) and my model looks more or less like this:
--Song
----Track
----Track
------Pattern
------Pattern
--------Note
--------Note
--------Note
So basically a Song has multiple Tracks, a Track can have multiple Patterns and a Pattern has multiple Notes. Each one of those things is represented by a class and except for the Song object, they're all stored inside vectors.
Each Note has a "frame" parameter so that I can calculate when a note should be played. For example, if I have 44100 samples / second and the frame for a particular note is 132300 I know that I need that Note at the start of the third second.
My question is how I should represent those notes for best performance? Right now I'm thinking of storing the notes in a vector datamember of each pattern and than loop all the Tracks of the Song, than look the Patterns and than loop the Notes to see which one has a frame datamember that is greater than 132300 and smaller than 176400 (start of 4th second).
As you can tell, that's a lot of loops and a song could be as long as 10 minutes. So I'm wondering if this will be fast enough to calculate all the frames and send them to the buffer on time.
One thing you should remember is that to improve performance, normally memory consumption would have to increase. It is also relevant (and justified) in this case, because I believe you want to store the same data twice, in different ways.
First of all, you should have this basic structure for a song:
map<Track, vector<Pattern>> tracks;
It maps each Track to a vector of Patterns. Map is fine, because you don't care about the order of tracks.
Traversing through Tracks and Patterns should be fast, as their amounts will not be high (I assume). The main performance concern is to loop through thousands of notes. Here's how I suggest to solve it:
First of all, for each Pattern object you should have a vector<Note> as your main data storage. You will write all the changes on the Pattern's contents to this vector<Note> first.
vector<Note> notes;
And for performance considerations, you can have a second way of storing notes:
map<int, vector<Notes>> measures;
This one will map each measure (by its number) in a Pattern to the vector of Notes contained in this measure. Every time data changes in the main notes storage, you will apply the same changes to data in measures. You could also do it only once every time before the playback, or even while playback, in a separate thread.
Of course, you could only store notes in measures, without having to sync two sources of data. But it may be not so convenient to work with when you have to apply mass operations on bunches of notes.
During the playback, before the next measure starts, the following algorithm would happen (roughly):
In every track, find all patterns, for which pattern->startTime <= [current playback second] <= pattern->endTime.
For each pattern, calculate current measure number and get vector<Notes> for the corresponding measure from the measures map.
Now, until the next measure (second?) starts, you only have to loop through current measure's notes.
Just keep those vectors sorted.
During playback, you can just keep a pointer (index) into each vector for the last note player. To search for new notes, you check have to check the following note in each vector, no looping through notes required.
Keep your vectors sorted, and try things out - that is more important and any answer you can receive here.
For all of your questions you should seek to answer then with tests and prototypes, then you will know if you even have a problem. And also while trying it out you will see things that you wouldn't normally see with just the theory alone.
and my model looks more or less like this:
Several critically important concepts are missing from your model:
Tempo.
Dynamics.
Pedal
Instrument
Time signature.
(Optional) Tonality.
Effect (Reverberation/chorus, pitch wheel).
Stereo positioning.
Lyrics.
Chord maps.
Composer information/Title.
Each Note has a "frame" parameter so that I can calculate when a note should be played.
Several critically important concepts are missing from your model:
Articulation.
Aftertouch.
Note duration.
I'd advise to take a look at lilypond. It is typesetting software, but it is also one of the most precise way to represent music in human-readable text format.
My question is how I should represent those notes for best performance?
Put them all into std::map<Timestamp, Note> and find segment you want to playing using lower_bound/upper_bound. Alternatively you could binary search them in flat std::vector as long as data is sorted.
Unless you want to make a "beeper", making music application is much more difficult than you think. I'd strongly recommend to try another project.