Time series segmentation - data-mining

I have arrays of time series, averaging about 1000 values per array. I need to independently identify time series segments in each array.
I'm currently using the approach to calculate the mean of the array and segment items whenever the elapsed time between each item exceeds it. I couldn't find much information on standards on how to accomplish this. I'm sure there are more appropriate methods.
This is the code that I'm currently using.
def time_cluster(input)
input.sort!
differences = (input.size-1).times.to_a.map {|i| input[i+1] - input[i] }
mean = differences.mean
clusters = []
j = 0
input.each_index do |i|
j += 1 if i > 0 and differences[i-1] > mean
(clusters[j] ||= []) << input[i]
end
return clusters
end
A couple of samples from this code
time_cluster([1, 2, 3, 4, 7, 9, 250, 254, 258, 270, 292, 340, 345, 349, 371, 375, 382, 405, 407, 409, 520, 527])
Outputs
1 2 3 4 7 9, sparsity 1.3
250 254 258 270 292, sparsity 8.4
340 345 349 371 375 382 405 407 409, sparsity 7
520 527, sparsity 3
Another array
time_cluster([1, 2, 3, 4 , 5, 6, 7, 8, 9, 10, 1000, 1020, 1040, 1060, 1080, 1200])
Outputs
1 2 3 4 5 6 7 8 9 10, sparsity 0.9
1000 1020 1040 1060 1080, sparsity 16
1200

Use K-Means. http://ai4r.rubyforge.org/machineLearning.html
gem install ai4r
Singular Value Decomposition may also interest you.
http://www.igvita.com/2007/01/15/svd-recommendation-system-in-ruby/
If you can't do it in Ruby, here is a great example in Python.
Unsupervised clustering with unknown number of clusters

Related

Darknet: Loss decreases but iou stay very low when training 1 class

I'm training Darknet Yolo with 1 class (Have 9000 training examples!), but I have this sample of the output:
v3 (iou loss, Normalizer: (iou: 0.07, obj: 1.00, cls: 1.00) Region 150 Avg (IOU: 0.000000), count: 1, class_loss = 0.003734, iou_loss = 0.000000, total_loss = 0.003734
The iou remains constant at 0.07 and a very low class loss.
(next mAP calculation at 1000 iterations)
250: 0.004589, 0.011130 avg loss, 0.000004 rate, 7.071465 seconds, 16000 images, 2.931760 hours left
What is the problem that causes this constant small iou?
Details
The most relevant parts of the yolov4-custom.cfg file:
batch=64
subdivisions=16
width=512
height=512
channels=1
momentum=0.949
decay=0.0005
max_batches = 2000
steps=1600,1800
...
filters=18
activation=linear
[yolo]
mask = 6,7,8
anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401
classes=1
The obj.data file:
classes = 1
train = /content/darknet/build/darknet/x64/data/train.txt
names = /content/darknet/build/darknet/x64/data/obj.names
backup = /content/darknet/build/darknet/x64/backup/
The obj.name file:
Object
First you need to change cfg file. If you have <= 3 classes, you need to set max_batches = 6000, steps: 4800, 5400. Then you need to set all of 3 yolo layers and prev. layers' filters. And loss should become low, that is training and your model as it was shown trained very well.

C++ Reverse a smaller range in a vector

What would be the best way to do the following?
I would like to reverse a smaller range in a vector and haven't found any better solution than this: I extract the smaller range to a newly created vector, reverse it and then add the newly created vector at the former position in the original vector.
Another way to explain what I want to do is something like this:
Original vector: 1 2 3 4 5 6 7 8 9 10 11.
Wanted result:   1 2 3 4 5 6 7 10 9 8 11.
Copy 10, 9, 8 in that order into a new vector with three element or copy element 8, 9, 10 into a new vector an reverse it. The original vector consists now of nine elements because the elements 8, 9, 10 were erased in the procedure.
2.The new vector with the 3 elements 10, 9, 8 is then copied/appended into the original vector at position 8 as a vector or element by element at position 8, 9, 10 respectively.
I am sure there are better solutions then the method mentioned above.
You could in fact write an in-place swap,
that gets the last and the first index to swap,
swap these,
decreases the last index and increases the first index,
and repeats until last_index - 1 <= first_index.
Now, that sounds like less copying to me, but as Stroustrup himself once said:
I don't really understand your data structure, but I'm pretty sure that on real hardware, std::vector will kick the shit out of it.
I.e. accessing memory linearly is almost always faster, so the cost of copying a few numbers over to a new vector really isn't that bad, compared to having to jump back and forth, possibly thrashing your CPU cache if the jumps are larger than a cache line size.
Hence, I think for all practical reasons, your implementation is optimal, unless you run out of RAM.
I am sorry I was not clear enough. What I was asking for was something better than this:
cout<<"vpc contains:"<
//Create a sub-vector - new_vpc.
vector<PathCoordinates>::const_iterator begin=vpc.begin();
typedef PathCoordinates type;
int iFirst=problemsStartAt;//first index to copy
int iLast=problemsEndAt-1;//last index -1, 11th stays
int iLen=iLast-iFirst;//10-8=2
vector<PathCoordinates> new_vpc;
//Pre-allocate the space needed to write the data directly.
new_vpc.resize(iLen);
memcpy(&new_vpc[0],&vpc[iFirst],iLen*sizeof(PathCoordinates));
cout<<"new_vpc.size():"<<new_vpc.size()<<endl;
for(int i=0;i<new_vpc.size();i++)
{
cout<<"new_vpc[i]:"<<new_vpc[i].strt_col<<", "<<new_vpc[i].strt_row<<", "<<new_vpc[i].end_col<<", "<<new_vpc[i].end_row<<endl;
}
reverse(new_vpc.begin(),new_vpc.end());
for(int i=0;i<new_vpc.size();i++)
{
cout<<"new_vpc[i]:"<<new_vpc[i].strt_col<<", "<<new_vpc[i].strt_row<<", "<<new_vpc[i].end_col<<", "<<new_vpc[i].end_row<<endl;
}
//Add sub-vector - new_vpc to main vector - vpc.
copy_n(new_vpc.begin(),new_vpc.size(),&vpc[problemsStartAt]);
//Output
for(int i=0;i<vpc.size();i++)
{
cout<<"vpc[i]:"<<vpc[i].strt_col<<", "<<vpc[i].strt_row<<", "<<vpc[i].end_col<<", "<<vpc[i].end_row<<endl;
}
/*
Output:
Inside backTrack()8,11
vpc contains:11
vpc contains:11
new_vpc.size():2
new_vpc[i]:265, 185, 100, 105
new_vpc[i]:240, 185, 121, 125
new_vpc[i]:240, 185, 121, 125
new_vpc[i]:265, 185, 100, 105
vpc[i]:440, 288, 460, 303
vpc[i]:440, 263, 460, 225
vpc[i]:440, 238, 498, 210
vpc[i]:388, 185, 459, 155
vpc[i]:363, 185, 823, 171
vpc[i]:338, 185, 823, 425
vpc[i]:308, 185, 308, 144
vpc[i]:290, 185, 65, 193
vpc[i]:240, 185, 121, 125
vpc[i]:265, 185, 100, 105
vpc[i]:228, 700, 80, 750
*/

Numbers between a and b without their permutations

I've written a similar question which was closed I would like to ask not the code but an efficiency tip. I haven't coded but if I can't find any good hint in here I'll go and code straightforward. My question:
Suppose you have a function listNums that take a as lower bound and b as upper bound.
For example a=120 and b=400
I want to print numbers between these numbers with one rule. 120's permutations are 102,210,201 etc. Since I've got 120 I would like to skip printing 201 or 210.
Reason: The upper limit can go up to 1020 and reducing the permutations would help the running time.
Again just asking for efficiency tips.
I am not sure how you are handling 0s (eg: after outputting 1 do you skip 10, 100 etc since technically 1=01=001..).
The trick is to select a number such that all its digits are in increasing order (from left to right).
You can do it recursively. AT every recursion add a digit and make sure it is equal to or higher than the one you recently added.
EDIT: If the generated number is less than the lower limit then permute it in such a way that it is greater than or equal to the lower limit. If A1A2A3..Ak is your number and it is lower than limit), then incrementally check if any of A2A1A3...Ak, A3A1A2...Ak, ... , AkA1A2...Ak-1 are within limit. If need arises, repeat this step to with keeping Ak as first digit and finding a combination of A1A2..Ak-1.
Eg: Assume we are selecting 3 digits and lower limit is 99. If the combination is 012, then the lowest permutation that is higher than 99 is 102.
When the lower bound is 0, an answer is given by the set of numbers with non-decreasing digits (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 12, 13, 14, 15, 16, 17, 18, 19, 22, 23, 24, 25, 26, 27, 28, 29, 33, 34, 35, 36, 37, 38, 39, 44, 45, 46, 47, 48, 49, 55, 56, 57, 58, 59, 66, 67, 68, 69, 77, 78, 79, 88, 89, 99, 111, 112...) that fall in the requested range.
This sequence is easily formed by incrementing an integer, and when there is a carry, replicate the digit instead of carrying. Exemple: 73 is followed by 73+1 = 74 (no carry); 79 is followed by 79+1 = 80 (carry), so 88 instead; 22356999 is followed by 22356999+1 = 22357000, hence 22357777.
# Python code
A= 0 # CAUTION: this version only works for A == 0 !
B= 1000
N= A
while N < B:
# Detect zeroes at the end
S= str(N)
P= S.find('0')
if P > 0:
# Replicate the last nonzero digit
S= S[:P] + ((len(S) - P) * S[P-1])
N= eval(S)
# Next candidate
print N
N+= 1
Dealing with a nonzero lower bound is a lot more tricky.

How to pull information from an external source into a game

I am trying to find a way to import stat data into a game in progress Via spread sheets? Here's what I am working with:
Right now for example.. The spells, in order to name them, set stats, ect and be able to call them via Number I Have something like this going on in the actual code:
void spell(int & eMoney, int eSpell[10])
{
using namespace std;
char spellname[10][25] = {"Minor Heal", "Fire Shard", "Lightening Shard", "Ice Shard", "Magic Barrier", "Essence Of Life",
"Earth Shard", "Wind Shard", "Insigma", "Weaken"};
int spellcost[10] = {50, 80, 80, 80, 100, 100, 80, 80, 120, 80};
Which is all fine and dandy, it works... But it's an issue now and later.. I want to be able to use a spread sheet, like a CSV file, so I can have a spread sheet for like just spells, just swords, just clubs... I plan to have a very large selection, it's more ideal to be able to edit a single file in columns and rows and have the actual game pull the information from an external file when it's needed... But I am not able to figure out how to go about this? I am open to any ideas..
Here is an example of how I call on a spell's info now:
case 2:
do
{
cout << "Which spell would you like to cast?\n\n";
for(x=0;x<10;x++)
cout << x+1 << ". " << spellname[x] << ": " << eSpell[x] << " left" << endl;
cout << "11. Leave\n\n>> ";
cin >> decision;
system("cls");
}
while((decision<1)&&(decision>11)||(eSpell[decision-1]==0));
switch(decision)
and here is an example of the spread sheet I have in mind basically? Starting at A1:
Type sName mDmg sPrice
Spell 1 Minor Heal 10 100
Spell 2 Fire Shard 12 100
Spell 3 Lightening Shard 12 200
Spell 4 Ice Shard 12 150
Spell 5 Magic Barrier 10 130
Spell 6 Essence Of Life 15 10
Spell 7 Earth Shard 12 120
Spell 8 Wind Shard 12 230
Spell 9 Insigma 12 90
Spell 10 Weaken 12 100
Another Example:
Current Code:
char monsters[16][25] = {"Wolf", "Bear", "Bandit", "Traveler", "Gargoyle", "Knight", "Warlock", "Mammoth", "Cyclops", "Unicorn", "Dragon", "Your Mother", "Demon", "Jesus", "Satan", "God"};
//monster strengths
int monsterdamagemax[16] = {32, 42, 53, 53, 65, 65, 75, 75, 85, 85, 90, 90, 95, 95, 110, 110};
int monsterdamagemin[16] = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0};
int monsterdefensemax[16] = {2, 7, 13, 13, 20, 20, 25, 25, 35, 35, 40, 40, 45, 45, 55, 55};
int monsterdefensemin[16] = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0};
int monsterhealth[16] = {32, 52, 73, 73, 95, 95, 118, 118, 142, 142, 167, 167, 193, 193, 220, 220};
int monsterspeed[16] = {7, 8, 9, 9, 10, 10, 11, 11, 12, 12, 13, 13, 14, 14, 15, 15};
int monstergold[16] = {20, 30, 41, 41, 53, 53, 66, 66, 80, 80, 95, 95, 110, 110, 125, 125};
Ideally, I want to be able to get all that from a CSV file like:
mID mName mDmgMax mDmgMin mDefMax mDefMin mHp mSpeed mGold
1 Wolf 32 0 2 0 32 7 20
2 Bear 42 0 7 0 52 8 30
3 Bandit 53 0 13 0 73 9 41
4 Traveler 53 0 13 0 73 9 41
5 Gargoyle 65 0 20 0 95 10 53
6 Knight 65 0 20 0 95 10 53
7 Warlock 75 0 25 0 118 11 66
8 Mammoth 75 0 25 0 118 11 66
9 Cyclops 85 0 35 0 142 12 80
10 Unicorn 85 0 35 0 142 12 80
11 Dragon 90 0 40 0 167 13 95
12 Your Mother 90 0 40 0 167 13 95
13 Demon 95 0 45 0 193 14 110
14 Jesus 95 0 45 0 193 14 110
15 Statan 110 0 55 0 220 15 125
16 God 110 0 55 0 220 15 125
How about writing a small command based application that creates records for you, and in your "main" program that is game, you just have to read these records.
A sample structure -
struct monster
{
int mID;
char mName[25]; //from your code
int mDmgMax;
//and these as well mDmgMin mDefMax mDefMin mHp mSpeed mGold
};
in this "helper" program read each data item (like the mName) in a record one by one, and insert in this structure. Write the structure to monsters.dat file
std::ofstream fout;
fout.open("monsters.dat", std::ios::app | std::ios::binary);
fout.write( (char*) &monsterInstance, sizeof(monsterInstance) );
fout.close();
This will simply append records. (I have skipped error checking and reading data.)
For greater ease, this program should be able to show current monsters, add monster, delete monster (by entering mID).
Reading such records in your main program should be a easy task.
If you're going to have a lot of table-based data to keep around, you might look into using SQLite. It has some interesting costs and benefits.
On the down side (maybe), it's SQL. It can be a bit more complex and depending on your searching algorithm, could be slower. It also can't be edited by hand, you need something to open the database (there are free tools available).
On the up side, you get all the sorting and filtering power of a database (anything you'll need, like spell='fireball' AND damage < 5), and SQLite is fast (easily enough to store game data in, and very possibly faster than your own code). You can store all your data in a single file for easy deployment or modding, with unique tables for each type (weapons, spells, characters, etc), and no server involved (SQLite is a single DLL).
Relational databases excel at working with consistently-formed tables of data, which is exactly what you have in a game environment (each object type has a few fields, not much variation, maybe some blanks, with various data types). SQLite, despite being the smallest database, can handle thousands of rows with excellent time, so you won't have to worry about your game data becoming unwieldy (which happens very quickly with pure text table files, like NWN(2)'s 2DA format).
There is a learning curve to it, but you do gain some simplicity in the future (adding a new object type is a new table and queries, not a whole lot of code) and a very stable data format and load/save library. Depending on your needs, it may be worth a shot.
As pointed in question comments, you should go for <fstream> if you really want to deal with CSV files. Using that approach, getline should be enough for what you need.
This thread in C++.com and this question should point you some directions on how to handle CSV.
I use Boost to parse the CSV files I work with. Here's a simple example.
I agree with peachykeen though, SQLite may suit you better, but maybe this will help you get started.
#include <iostream>
#include <fstream>
#include <vector>
#include <boost/tokenizer.hpp>
#include <boost/token_functions.hpp>
typedef std::vector<std::string> csvLine;
typedef std::vector<csvLine> csvLines;
typedef boost::tokenizer<boost::escaped_list_separator<char> > csvTokenizer;
csvLines ReadCSVFile(const std::string& fileName)
{
csvLines retVec;
std::ifstream inFile(fileName.c_str());
if(inFile)
{
std::string fileLine;
while(std::getline(inFile, fileLine))
{
csvTokenizer lineTokens(fileLine);
retVec.push_back(csvLine(lineTokens.begin(), lineTokens.end()));
}
inFile.close();
}
return retVec;
}
int main(int argc, char** argv)
{
csvLines lines(ReadCSVFile(argv[1]));
for(csvLines::iterator lineIt = lines.begin(); lineIt != lines.end(); ++lineIt)
{
for(csvLine::iterator tokenIt = (*lineIt).begin(); tokenIt != (*lineIt).end(); ++tokenIt)
{
std::cout << *tokenIt << " ";
}
std::cout << std::endl;
}
return 0;
}

Represent sequence of tetrahedral numbers in Haskell

I've been wanting to learn some Haskell for a while now, and I know it and similar languages have really good support for various kinds of infinite lists. So, how could I represent the sequence of tetrahedral numbers in Haskell, preferably with an explanation of what's going on?
0 0 0
1 1 1
2 3 4
3 6 10
4 10 20
5 15 35
6 21 56
7 28 84
8 36 120
In case it's not clear what's going on there, the second column is a running total of the first column, and the third column is a running total of the second column. I'd prefer that the Haskell code retain something of the "running total" approach, since that's the concept I was wondering how to express.
You're correct, Haskell is really nice for doing things like this:
first_col = [0..]
second_col = scanl1 (+) first_col
third_col = scanl1 (+) second_col
first_col is an infinite list of integers, starting at 0
scanl (+) calculates a lazy running sum: Prelude docs
We can verify that the above code is doing the right thing:
Prelude> take 10 first_col
[0,1,2,3,4,5,6,7,8,9]
Prelude> take 10 second_col
[0,1,3,6,10,15,21,28,36,45]
Prelude> take 10 third_col
[0,1,4,10,20,35,56,84,120,165]
Adding to perimosocordiae's great answer, languages like Haskell are so slick they allow you to make an infinite list of infinite lists.
First lets define the operator that produces each successive row:
op :: [Integer] -> [Integer]
op = scanl1 (+)
As explained by perimosocordiae, this is just a lazy running sum.
We also need a base case:
tnBase :: [Integer]
tnBase = [0..]
So how do we get an infinite list of infinite lists of tetrahedral numbers? We iterate this operation on the base case, then the output produced by the base case, then that output...
tn = iterate op tnBase
iterate is in the Prelude, such functions can be found using hoogle and searching by name (if you have a good guess) or type signature (you generally know the signature of what you need). Source code is usually linked from the haddock documentation.
Presentation
(in case you're not comfortable with map, take, drop, and head)
This is all well and good, but rather useless if you don't know how to get passed the first infinite list to see the second, third, etc. There are plenty of options, for just getting a particular list you can drop the first few:
getNthTN n = head (drop n tn)
Getting the first few results of each list is probably more what you're looking for though:
printFirstFew n m = print $ take m (map (take n) tn)
Here map (take n) tn will take the first n values from each list of tetrahedral numbers while take m will limit our results to the first m lists.
And lastly, I like the awesome groom package for quick interactive playing with data:
> groom $ take 10 (map (take 10) tn)
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[0, 1, 3, 6, 10, 15, 21, 28, 36, 45],
[0, 1, 4, 10, 20, 35, 56, 84, 120, 165],
[0, 1, 5, 15, 35, 70, 126, 210, 330, 495],
[0, 1, 6, 21, 56, 126, 252, 462, 792, 1287],
[0, 1, 7, 28, 84, 210, 462, 924, 1716, 3003],
[0, 1, 8, 36, 120, 330, 792, 1716, 3432, 6435],
[0, 1, 9, 45, 165, 495, 1287, 3003, 6435, 12870],
[0, 1, 10, 55, 220, 715, 2002, 5005, 11440, 24310],
[0, 1, 11, 66, 286, 1001, 3003, 8008, 19448, 43758]]