High-performance way to compile regular expressions in AS3?

High-performance way to compile regular expressions in AS3? - regex

I have a pretty simple question. What is the best (highest-performance/lowest memory-usage, etc.) way to compile a regular expression in AS3?
For instance, is this:
private var expression:RegExp = new RegExp(".*a$");
private function modify():void {
/* uses "expression" to chop up string */
}
Faster than this:
private var expression:RegExp = /.*a$/;
private function modify():void {
/* uses "expression" to chop up string */
}
Also, is there any real need to make the expression an instance variable if I'm only going to be using it once? For example, which of the following blocks of code would, in theory, perform faster:
private var myRegEx:RegExp = /\n/;
private function modify1():void {
myString.split(/\n/);
}
private function modify2():void {
myString.split(myRegEx);
}
Will modify1() run at the same execution speed as modify2()? I mean, does AS3 compile a new RegExp instance in modify1(), since it's not tied down to an instance variable?
Any help would be most appreciated :)

Your test is not very good. Here's why:
getTimer measures time, but cpu-time actually matters. If at some moment, for some reason the scheduler decides not to run the flash player, then you have less cpu-time within the same time frame. This is why results vary. It is the best you can use, but actually not much of a help if you're trying to track deviations of a few %.
The deviation is really small. About 8%. Part of it stems from the effect described in point 1. When I ran the tests, results did vary in fact. 8% can come from anywhere. They may even simply depend on your machine, OS or minor player version or whatever. The result is just not signifficant enough to rely on. Also 8% speedup is not really worth consideration unless you find, there's a horrible bottleneck in your string processing that can be fixed by this or that trick with RegExps
Most imporantly: the difference you measure has nothing to do with regular expressions, only with everything else in the test.
Let my explain this in detail.
Try public function test7():void{}. On my machine it takes about 30%-40% of the other tests. Let's have some numbers:
Running Tests
-------------------------------
Testing method: test1, 50000 iterations...
Test Complete.
Average Iteration Time: 0.01716ms
Longest Iteration Time: 5ms
Shortest Iteration Time: 0ms
Total Test Time: 901ms
-------------------------------
Testing method: test2, 50000 iterations...
Test Complete.
Average Iteration Time: 0.01706ms
Longest Iteration Time: 5ms
Shortest Iteration Time: 0ms
Total Test Time: 892ms
-------------------------------
Testing method: test3, 50000 iterations...
Test Complete.
Average Iteration Time: 0.01868ms
Longest Iteration Time: 5ms
Shortest Iteration Time: 0ms
Total Test Time: 969ms
-------------------------------
Testing method: test4, 50000 iterations...
Test Complete.
Average Iteration Time: 0.01846ms
Longest Iteration Time: 5ms
Shortest Iteration Time: 0ms
Total Test Time: 966ms
-------------------------------
Testing method: test5, 50000 iterations...
Test Complete.
Average Iteration Time: 0.01696ms
Longest Iteration Time: 5ms
Shortest Iteration Time: 0ms
Total Test Time: 890ms
-------------------------------
Testing method: test6, 50000 iterations...
Test Complete.
Average Iteration Time: 0.01696ms
Longest Iteration Time: 5ms
Shortest Iteration Time: 0ms
Total Test Time: 893ms
-------------------------------
Testing method: test7, 50000 iterations...
Test Complete.
Average Iteration Time: 0.00572ms
Longest Iteration Time: 1ms
Shortest Iteration Time: 0ms
Total Test Time: 306ms
-------------------------------
But Why? The following few things are expensive:
getTimer() call to global functions (and static methods of other classes) is slow
(tester[methodName] as Function).apply(); - this is expensive. A dynamic property access requiring a closure creation, then a cast to an anonymous function and then invocation through apply. I couldn't think of a slower way to call a function.
var tester:RegExpTester = new RegExpTester(); - instantiation is expensive, since it requires allocation and initialization.
the following code will run signifficantly better. The overhead measured by test7 is reduced by factor 20 on my machine:
private function test(methodName:String, iterations:int = 100):void {
output("Testing method: " + methodName + ", " + iterations + " iterations...");
var start:Number = getTimer();
var tester:RegExpTester = new RegExpTester();
var f:Function = tester[methodName];
for (var i:uint = 0; i < iterations; i++) f();//this call to f still is slower than the direct method call would be
var wholeTime:Number = getTimer() - start;
output("Test Complete.");
output("\tAverage Iteration Time: " + (wholeTime / iterations) + "ms");
output("\tTotal Test Time: " + wholeTime + "ms");
output("-------------------------------");
}
again, some numbers:
Running Tests
-------------------------------
Testing method: test1, 50000 iterations...
Test Complete.
Average Iteration Time: 0.01094ms
Total Test Time: 547ms
-------------------------------
Testing method: test2, 50000 iterations...
Test Complete.
Average Iteration Time: 0.01094ms
Total Test Time: 547ms
-------------------------------
Testing method: test3, 50000 iterations...
Test Complete.
Average Iteration Time: 0.01296ms
Total Test Time: 648ms
-------------------------------
Testing method: test4, 50000 iterations...
Test Complete.
Average Iteration Time: 0.01288ms
Total Test Time: 644ms
-------------------------------
Testing method: test5, 50000 iterations...
Test Complete.
Average Iteration Time: 0.01086ms
Total Test Time: 543ms
-------------------------------
Testing method: test6, 50000 iterations...
Test Complete.
Average Iteration Time: 0.01086ms
Total Test Time: 543ms
-------------------------------
Testing method: test7, 50000 iterations...
Test Complete.
Average Iteration Time: 0.00028ms
Total Test Time: 14ms
-------------------------------
so now the overhead is reduced to less than 1%, which makes it insignifficant (although in fact it can be reduced a lot more). However the deviation is now 16%. That's twice as much. And it's starting to look a little clearer. It is still insignifficant, IMHO, but actually it points to the two slowest methods: test3 and test4.
Why would that be? Simple: Both methods create a new RegExp object (one using a literal, the other using the constructor). This consumes the time difference we can measure. The difference is bigger now, since before, per iteration you created 3 regular expressions (the two instance variables are initialized every time you instantiate a RegExpTester). But the difference that is left now is that of creating 50000 RegExp instances. Anything else is about equally fast.
If there's a conclusion to be drawn in answer to your question: There is no difference between literals or constructed RegExps. So I am afraid, the answer is: "It doesn't really matter, as long as you keep general performance optimization rules in mind.". Hope that helps.

For the given scenario, I wrote a test class which gives me all the info I need on which type of regular expression to use:
package {
import flash.utils.getTimer;
import flash.text.TextFormat;
import flash.text.TextField;
import flash.display.Sprite;
public class RegExpTest extends Sprite {
private var textfield:TextField;
public function RegExpTest() {
this.textfield = new TextField();
this.textfield.x = this.textfield.y = 10;
this.textfield.width = stage.stageWidth - 20;
this.textfield.height = stage.stageHeight - 20;
this.textfield.defaultTextFormat = new TextFormat("Courier New");
this.addChild(textfield);
this.runtests();
}
private function runtests():void {
output("Running Tests");
output("-------------------------------");
test("test1", 50000);
test("test2", 50000);
test("test3", 50000);
test("test4", 50000);
test("test5", 50000);
test("test6", 50000);
}
private function test(methodName:String, iterations:int = 100):void {
output("Testing method: " + methodName + ", " + iterations + " iterations...");
var wholeTimeStart:Number = getTimer();
var iterationTimes:Array = [];
for (var i:uint = 0; i < iterations; i++) {
var iterationTimeStart:Number = getTimer();
var tester:RegExpTester = new RegExpTester();
// run method.
(tester[methodName] as Function).apply();
var iterationTimeEnd:Number = getTimer();
iterationTimes.push(iterationTimeEnd - iterationTimeStart);
}
var wholeTimeEnd:Number = getTimer();
var wholeTime:Number = wholeTimeEnd - wholeTimeStart;
var average:Number = 0;
var longest:Number = 0;
var shortest:Number = int.MAX_VALUE;
for each (var iteration:int in iterationTimes) {
average += iteration;
if (iteration > longest)
longest = iteration;
if (iteration < shortest)
shortest = iteration;
}
average /= iterationTimes.length;
output("Test Complete.");
output("\tAverage Iteration Time: " + average + "ms");
output("\tLongest Iteration Time: " + longest + "ms");
output("\tShortest Iteration Time: " + shortest + "ms");
output("\tTotal Test Time: " + wholeTime + "ms");
output("-------------------------------");
}
private function output(message:String):void {
this.textfield.appendText(message + "\n");
}
}
}
class RegExpTester {
private static const expression4:RegExp = /.*a$/;
private static const expression3:RegExp = new RegExp(".*a$");
private var value:String = "There is a wonderful man which is quite smelly.";
private var expression1:RegExp = new RegExp(".*a$");
private var expression2:RegExp = /.*a$/;
public function RegExpTester() {
}
public function test1():void {
var result:Array = value.split(expression1);
}
public function test2():void {
var result:Array = value.split(expression2);
}
public function test3():void {
var result:Array = value.split(new RegExp(".*a$"));
}
public function test4():void {
var result:Array = value.split(/.*a$/);
}
public function test5():void {
var result:Array = value.split(expression3);
}
public function test6():void {
var result:Array = value.split(expression4);
}
}
The results retrieved by running this example are as follows:
Running Tests
-------------------------------
Testing method: test1, 50000 iterations...
Test Complete.
Average Iteration Time: 0.0272ms
Longest Iteration Time: 23ms
Shortest Iteration Time: 0ms
Total Test Time: 1431ms
-------------------------------
Testing method: test2, 50000 iterations...
Test Complete.
Average Iteration Time: 0.02588ms
Longest Iteration Time: 5ms
Shortest Iteration Time: 0ms
Total Test Time: 1367ms
-------------------------------
Testing method: test3, 50000 iterations...
Test Complete.
Average Iteration Time: 0.0288ms
Longest Iteration Time: 5ms
Shortest Iteration Time: 0ms
Total Test Time: 1498ms
-------------------------------
Testing method: test4, 50000 iterations...
Test Complete.
Average Iteration Time: 0.0291ms
Longest Iteration Time: 5ms
Shortest Iteration Time: 0ms
Total Test Time: 1495ms
-------------------------------
Testing method: test5, 50000 iterations...
Test Complete.
Average Iteration Time: 0.02638ms
Longest Iteration Time: 5ms
Shortest Iteration Time: 0ms
Total Test Time: 1381ms
-------------------------------
Testing method: test6, 50000 iterations...
Test Complete.
Average Iteration Time: 0.02666ms
Longest Iteration Time: 10ms
Shortest Iteration Time: 0ms
Total Test Time: 1382ms
-------------------------------
Interesting to say the least. It seems that our spread really isn't too big and that the compiler is probably doing something behind the scenes to statically compile regular expressions. Food for thought.

Related

Adafruit RTClib TimeSpan calculation fails on ESP32

I'm currently trying to measure remaining time until a given timestamp using Adafruit's RTClib library. My RTC module is a DS3231.
here is my TimeSpan object in code:
TimeSpan remaining = next.depart - rtc.now();
However when i try to print the remaining minutes i get no data.
Here is my code that prints the output:
Serial.println(String("current time: " + rtc.now().timestamp()));
Serial.println(String("targetTime: " + next.depart.timestamp()));
Serial.println(String("remaining time: " + remaining.minutes()));
And the output is right except for the remaining time in minutes:
current time: 2020-12-06T05:38:55
target time: 2020-12-06T05:42:30
aining time:
Notice that the last line is cut off in the serial output and the remaining minutes aren't displayed. current time and target time are both correct in the readout.
I can't perform any operations with the remaining time either:
if(remaining.minutes() >= 10)
In this case the condition is never met.
Am i missing something?

Your line of code:
Serial.println(String("remaining time: " + remaining.minutes()));
is not doing what you think it's doing.
remaining.minutes() returns type int8_t not type String. You're adding it to a C character pointer to the C string remaining time: - if the value is greater than the length of that string then the resulting pointer is invalid and you're lucky your program doesn't crash.
For instance, if remaining.minutes() were 3 then your output would be:
aining time:
Instead your code should look more like this:
Serial.println("remaining time: " + String(remaining.minutes()));
or better:
Serial.print("remaining time: ");
Serial.println(remaining.minutes());
The second form has the benefit of avoiding unnecessary String object instantiations and allocating memory.
The reason your two lines:
Serial.println(String("current time: " + rtc.now().timestamp()));
Serial.println(String("targetTime: " + next.depart.timestamp()));
work is that the timestamp() method returns a String, so adding a C character string to it results in string concatenation rather than adding an integer to a character pointer.
In these two cases your enclosing call to String() is superfluous and should be avoided:
Serial.println("current time: " + rtc.now().timestamp());
Serial.println("targetTime: " + next.depart.timestamp());
You're already computing String values; there's no need to make new String objects from them.
Find out what value remaining.minutes() is in order to answer your other question. It's likely an issue with the way remaining is computed, and is a matter for different question.

CPLEX - Error in accessing solution C++

I have a problem in accessing the solution of a LP problem.
This is the output of CPLEX after calling cplex.solve();
CPXPARAM_MIP_Strategy_CallbackReducedLP 0
Found incumbent of value 0.000000 after 0.00 sec. (0.70 ticks)
Tried aggregator 1 time.
MIP Presolve eliminated 570 rows and 3 columns.
MIP Presolve modified 88 coefficients.
Reduced MIP has 390 rows, 29291 columns, and 76482 nonzeros.
Reduced MIP has 29291 binaries, 0 generals, 0 SOSs, and 0 indicators.
Presolve time = 0.06 sec. (49.60 ticks)
Tried aggregator 1 time.
Reduced MIP has 390 rows, 29291 columns, and 76482 nonzeros.
Reduced MIP has 29291 binaries, 0 generals, 0 SOSs, and 0 indicators.
Presolve time = 0.04 sec. (31.47 ticks)
Probing time = 0.02 sec. (1.36 ticks)
MIP emphasis: balance optimality and feasibility.
MIP search method: dynamic search.
Parallel mode: deterministic, using up to 8 threads.
Root relaxation solution time = 0.03 sec. (17.59 ticks)
Nodes Cuts/
Node Left Objective IInf Best Integer Best Bound ItCnt Gap
* 0+ 0 0.0000 -395.1814 ---
* 0+ 0 -291.2283 -395.1814 35.69%
* 0 0 integral 0 -372.2283 -372.2283 201 0.00%
Elapsed time = 0.21 sec. (131.64 ticks, tree = 0.00 MB, solutions = 3)
Root node processing (before b&c):
Real time = 0.21 sec. (133.18 ticks)
Parallel b&c, 8 threads:
Real time = 0.00 sec. (0.00 ticks)
Sync time (average) = 0.00 sec.
Wait time (average) = 0.00 sec.
------------
Total (root+branch&cut) = 0.21 sec. (133.18 ticks)
However when I call cplex.getValues(values, variables); the program gives a SIGABRT signal throwing the following exception:
libc++abi.dylib: terminating with uncaught exception of type IloAlgorithm::NotExtractedException
This is my code. What I'm doing wrong?
std::vector<links_t> links(pointsA.size()*pointsB.size());
std::unordered_map<int, std::vector<std::size_t> > point2DToLinks;
for(std::size_t i=0; i<pointsA.size(); ++i){
for(std::size_t j=0; j<pointsB.size(); ++j){
std::size_t index = (i*pointsA.size()) + j;
links[index].from = i;
links[index].to = j;
links[index].value = cv::norm(pointsA[i] - pointsB[j]);
point2DToLinks[pointsA[i].point2D[0]->id].push_back(index);
point2DToLinks[pointsA[i].point2D[1]->id].push_back(index);
point2DToLinks[pointsA[i].point2D[2]->id].push_back(index);
point2DToLinks[pointsB[j].point2D[0]->id].push_back(index);
point2DToLinks[pointsB[j].point2D[1]->id].push_back(index);
point2DToLinks[pointsB[j].point2D[2]->id].push_back(index);
}
}
std::size_t size = links.size() + point2DToLinks.size();
IloEnv environment;
IloNumArray coefficients(environment, size);
for(std::size_t i=0; i<links.size(); ++i) coefficients[i] = links[i].value;
for(std::size_t i=links.size(); i<size; ++i) coefficients[i] = -lambda;
IloNumVarArray variables(environment, size, 0, 1, IloNumVar::Bool);
IloObjective objective(environment, 0.0, IloObjective::Minimize);
objective.setLinearCoefs(variables, coefficients);
IloRangeArray constrains = IloRangeArray(environment);
std::size_t counter = 0;
for(auto point=point2DToLinks.begin(); point!=point2DToLinks.end(); point++){
IloExpr expression(environment);
const std::vector<std::size_t> & inLinks = point->second;
for(std::size_t j=0; j<inLinks.size(); j++) expression += variables[inLinks[j]];
expression -= variables[links.size() + counter];
constrains.add(IloRange(environment, 0, expression));
expression.end();
++counter;
}
IloModel model(environment);
model.add(objective);
model.add(constrains);
IloCplex cplex(model);
cplex.solve();
if(cplex.getStatus() != IloAlgorithm::Optimal){
fprintf(stderr, "error: cplex terminate with an error.\n");
abort();
}
IloNumArray values(environment, size);
cplex.getValues(values, variables);
for(std::size_t i=0; i<links.size(); ++i)
if(values[i] > 0) pairs.push_back(links[i]);
environment.end();

This is an error that happens if you ask CPLEX for the value of a variable that CPLEX does not have in its model. When you build the model, it is not enough to just declare and define the variable for it to be included in the model. It also has to be part of one of the constraints or the objective in the model. Any variable that you declare/define that is NOT included in one of the constraints or the objective will therefore not be in the set of variables that gets extracted into the inner workings of CPLEX. There are two obvious things that you can do to resolve this.
First you can try to get the variable values inside a loop over the variables, and test whether each is actually in the cplex model - I think is is something like cplex.isExtracted(var). Do something simple like print a message when you come across a variable that is not extracted, telling you which variable is causing the problem.
Secondly you can export the model from CPLEX as an LP format file and check it manually. This is a very useful way to see what is actually in your model rather than what you think is in your model.

How to test a sleep function in golang

I have written my own sleep function and want to test it. Following is my code:
func TestSleep(t *testing.T) {
start := time.Now()
mySleepFunction(65)
end := time.Now()
if (end - start) != 65 {
t.Error("Incorrect sleep function")
}
}
This is not working. I am trying to get start time and end time and then compare it with expected time. The expected time will be in seconds. I tried end.Sub(start) but this returns me something like 1m30.0909, instead I need 90 as a result. It would be great if someone could help me.
Thanks :)

Elapsed time:
The easiest to get the elapsed time since a specific time (start) is to use the time.Since() function which returns a time.Duration which has a Duration.Seconds() method which returns the duration in seconds as float64.
So in your case the elapsed seconds:
sec := time.Since(start).Seconds()
Now on to testing...
When testing sleep-like functions you should take into consideration that after the sleep it is not guaranteed that the code will continue to execute immediately. For example quoting from the doc of time.Sleep():
Sleep pauses the current goroutine for at least the duration d.
So when writing test code for sleep-like functions, I would test like this:
elapsed time should be at least the specified,
and allow some error margin.
So for example test it like this:
func TestSleep(t *testing.T) {
const secSleep = 65.0
start := time.Now()
mySleepFunction(int(secSleep))
sec := time.Since(start).Seconds()
if sec < secSleep || sec > secSleep*1.05 {
t.Error("Incorrect sleep function")
}
}

If you use time.Unix() which gives you the amount of seconds since the epoch (January 1, 1970 UTC.) then your test should work.
func TestSleep(t *testing.T) {
start := time.Now().Unix()
mySleepFunction(65)
end := time.Now().Unix()
if (end - start) != 65{
t.Error("Incorrect sleep function")
}
}
Currently, your code is subtracting a time.Now() which doesn't return the result you intend. Your test wants simple int values that represent seconds.

Filter strange C++ multimap values

I have this multimap in my code:
multimap<long, Note> noteList;
// notes are added with this method. measureNumber is minimum `1` and doesn't go very high
void Track::addNote(Note &note) {
long key = note.measureNumber * 1000000 + note.startTime;
this->noteList.insert(make_pair(key, note));
}
I'm encountering problems when I try to read the notes from the last measure. In this case the song has only 8 measures and it's measure number 8 that causes problems. If I go up to 16 measures it's measure 16 that causes the problem and so on.
// (when adding notes I use as key the measureNumber * 1000000. This searches for notes within the same measure)
for(noteIT = trackIT->noteList.lower_bound(this->curMsr * 1000000); noteIT->first < (this->curMsr + 1) * 1000000; noteIT++){
if(this->curMsr == 8){
cout << "_______________________________________________________" << endl;
cout << "ID:" << noteIT->first << endl;
noteIT->second.toString();
int blah = 0;
}
// code left out here that processes the notes
}
I have only added one note to the 8th measure and yet this is the result I'm getting in console:
_______________________________________________________
ID:8000001
note toString()
Duration: 8
Start Time: 1
Frequency: 880
_______________________________________________________
ID:1
note toString()
Duration: 112103488
Start Time: 44
Frequency: 0
_______________________________________________________
ID:8000001
note toString()
Duration: 8
Start Time: 1
Frequency: 880
_______________________________________________________
ID:1
note toString()
Duration: 112103488
Start Time: 44
Frequency: 0
This keeps repeating. The first result is a correct note which I've added myself but I have no idea where the note with ID: 1 is coming from.
Any ideas how to avoid this? This loop gets stuck repeating the same two results and I can't get out of it. Even if there are several notes within measure 8 (so that means several values within the multimap that start with 8xxxxxx it only repeats the first note and the non-existand one.

You aren't checking for the end of your loop correctly. Specifically there is no guarantee that noteIT does not equal trackIT->noteList.end(). Try this instead
for (noteIT = trackIT->noteList.lower_bound(this->curMsr * 1000000);
noteIT != trackIT->noteList.end() &&
noteIT->first < (this->curMsr + 1) * 1000000;
++noteIT)
{
For the look of it, it might be better to use some call to upper_bound as the limit of your loop. That would handle the end case automatically.

Timing a function in microseconds

Hey guys I'm trying to time some search functions I wrote in microseconds, and it needs to take long enough to get it to show 2 significant digits. I wrote this code to time my search function but it seems to go too fast. I always end up getting 0 microseconds unless I run the search 5 times then I get 1,000,000 microseconds. I'm wondering if I did my math wrong to get the time in micro seconds, or if there's some kind of formatting function I can use to force it to display two sig figs?
clock_t start = clock();
index = sequentialSearch.Sequential(TO_SEARCH);
index = sequentialSearch.Sequential(TO_SEARCH);
clock_t stop = clock();
cout << "number found at index " << index << endl;
int time = (stop - start)/CLOCKS_PER_SEC;
time = time * SEC_TO_MICRO;
cout << "time to search = " << time<< endl;

You are using integer division on this line:
int time = (stop - start)/CLOCKS_PER_SEC;
I suggest using a double or float type, and you'll likely need to cast the components of the division.

Use QueryPerformanceCounter and QueryPerformanceFrequency, assuming your on windows platform
here a link to ms KB How To Use QueryPerformanceCounter to Time Code

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

High-performance way to compile regular expressions in AS3? - regex

Related

Adafruit RTClib TimeSpan calculation fails on ESP32

CPLEX - Error in accessing solution C++

How to test a sleep function in golang

Filter strange C++ multimap values

Timing a function in microseconds

Categories

Resources