How to read regex2dfa output - regex

I am messing around with this regex2dfa library -> https://github.com/kpdyer/regex2dfa, using the command ./regex2dfa -r "(abc+)+"
This returns
0 2 97 97
1 3 99 99
2 1 98 98
3 2 97 97
3 3 99 99
3
Looking at this
https://lambda.uta.edu/cse5317/spring01/notes/node8.html
and using the DFA generated here for the regex (abc+)+
http://hackingoff.com/compilers/regular-expression-to-nfa-dfa
I can't seem to figure out how to go from the diagram, to the transition table(?) that the regex2dfa tool is outputting.
What am I missing?

Related

KDB moving percentile using Swin function

I am trying to create a list of the 99th and 1st percentiles. Rather than a single percentile for today. I wanted percentiles for 500 days each using the prior 500 days. The functions I was using for this are the following
swin:{[f;w;s] f each { 1_x,y }\[w#0;s]}
percentile:{[x;y] y (100 xrank y:asc y) bin x}
swin[percentile[99;];500;List].
The issue I come across is that the 99th percentile calculates perfectly, but the 1st percentile makes the entire list = 0. a bit lost as to why it would do that. suggestions appreciated!
What's causing the zeros is two-fold:
What behaviour do you want for the earliest 500 days when there isn't 500 days of history to work with? On day 1 there's only 1 datapoint, on day 2 only 2 etc. Only on the 500th day is there 500 days of actual data to work with. By default that swin function fills the gaps with some seed value
You're using zero as that seed value, aka w#0
For example a 5 day lookback on each date looks something like:
q)swin[::;5;1 2 3 4 5]
0 0 0 0 1
0 0 0 1 2
0 0 1 2 3
0 1 2 3 4
1 2 3 4 5
You have zeros until you have data, so naturally the 1st percentile will pick up the zeros for the first roughly 500 dates.
So then you can decide to seed with a different value, or else possibly exclude zeros from your percentile function:
q)List:1000?1000
q)percentile:{[x;y] y (100 xrank y:asc y except 0) bin x}
q)swin[percentile[1;];500;List]
908 360 360 257 257 257 90 90 90 90 90 90 90 90...
If zeros are a legitimate value in your list and can't be excluded then maybe seed the swin with some other value that you know won't be in the list (negatives? infinity? null?) and then exclude that seed from the percentile function.
EDIT: A final alternative is to use a different sliding window function which doesn't fill gaps with a seed value, e.g.
q)swin2:{[f;w;s] f each(),/:{neg[x]sublist y,z}[w]\[s]}
q)swin2[::;5;1 2 3 4 5]
,1
1 2
1 2 3
1 2 3 4
1 2 3 4 5
q)percentile:{[x;y] y (100 xrank y:asc y) bin x}
q)swin2[percentile[99;];500;List]
908 908 908 908 908 908 908 908 908 908 908 959 959..
q)swin2[percentile[1;];500;List]
908 360 360 257 257 257 90 90 90 90 90 90 90 90 90..

Working with files I/O for beginners [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
Hi all I am working on a school beginners project using files I/O in C++,
This program consist of two parts:
1) reading and processing a student data file, and writing the results to a student report file
2) modifying part 1 to calculate some statistics and writing them to another file.
For this assignment, you will be reading one input file and writing out two other files.
Your program will be run using the referenced student data file.
Part 1 Detail
Read in the student data file. This 50 record file consists of a (8-digit numeric) student id, 8 assignment's points, midterm points, final points and lab exercise points. You must again follow the syllabus specifications for the determination of the letter grade, this time, processing 50 student grades. Extra credit points are not applicable for this assignment. You will write the input student data and the results of the processing into a student report file that looks like the output shown below. In addition to the input student data, the report should contain the "total" of the assignment grades, the total and percent of all points achieved, and the letter grade. You may assume that the input data file does not contain any errant data.
The file looks like the one below:
The file that we need to read from is hyperlinked here
The student report output file should look like this:
The Student Report Output File
Student --- Asignment Grades -- Ass Mid Fin LEx Total Pct Gr
-------- -- -- -- -- -- -- -- -- --- --- --- --- ----- --- --
56049257 16 16 20 16 12 15 12 20 115 58 123 59 355 89 B+
97201934 19 15 13 19 16 12 13 18 113 72 101 55 341 85 B
93589574 13 16 19 19 18 12 6 14 111 58 108 50 327 82 B
85404010 17 19 19 19 19 10 17 19 129 70 102 58 359 90 A-
99608681 11 15 19 19 17 10 16 19 116 42 117 57 332 83 B
84918110 11 20 18 17 12 8 12 19 109 46 122 31 308 77 C
89307179 16 16 19 18 14 17 15 19 120 56 117 52 345 86 B
09250373 15 15 18 18 11 18 17 19 120 44 106 51 321 80 B-
91909583 12 14 16 19 20 11 20 16 117 66 92 50 325 81 B-
...
Part 2 Detail
Write a summary report file that contains the average total points and average percent for all students. Also, display the number of A's, B's, C's, D's and F's for the students. Your summary output file should look something like this:
The average total points = ???
The average percent total = ??
The number of A's = ??
The number of B's = ??
The number of C's = ??
The number of D's = ??
The number of F's = ??
Additional requirements
All files must be checked for a successful open. They should also be closed when you are finished with them.
Make sure you write the student id with a leading 0, if appropriate (i.e. the 8th id).
Add headings to your output report file. They should be aligned and correctly identify the column data.
Do not use global variables, except for constants, in your solution.
For part 1 How do I duplicate the file and format it to add the headings above it and the grades at the end of each file into the new duplicated file??
Any help in this matter would be appreciated
thanks in advance.
Engineering is all about converting a large complex problem into many smaller, easy to solve, problems.
Here is how I would start.
1.) Open input file.
2.) Read one line from input file.
3.) Break the input string from one line into values.
4.) Close input file.
5.) Open output file.
6.) Write results to output file.
References:
1.)File I/O
2.)std::string
3.)File I/O C
Now you're pretty much there. Take it one step at a time.

Pandas quantile failing with NaN's present

I've encountered an interesting situation while calculating the inter-quartile range. Assuming we have a dataframe such as:
import pandas as pd
index=pd.date_range('2014 01 01',periods=10,freq='D')
data=pd.np.random.randint(0,100,(10,5))
data = pd.DataFrame(index=index,data=data)
data
Out[90]:
0 1 2 3 4
2014-01-01 33 31 82 3 26
2014-01-02 46 59 0 34 48
2014-01-03 71 2 56 67 54
2014-01-04 90 18 71 12 2
2014-01-05 71 53 5 56 65
2014-01-06 42 78 34 54 40
2014-01-07 80 5 76 12 90
2014-01-08 60 90 84 55 78
2014-01-09 33 11 66 90 8
2014-01-10 40 8 35 36 98
# test for q1 values (this works)
data.quantile(0.25)
Out[111]:
0 40.50
1 8.75
2 34.25
3 17.50
4 29.50
# break it by inserting row of nans
data.iloc[-1] = pd.np.NaN
data.quantile(0.25)
Out[115]:
0 42
1 11
2 34
3 12
4 26
The first quartile can be calculated by taking the median of values in the dataframe that fall below the overall median, so we can see what data.quantile(0.25) should have yielded. e.g.
med = data.median()
q1 = data[data<med].median()
q1
Out[119]:
0 37.5
1 8.0
2 19.5
3 12.0
4 17.0
It seems that quantile is failing to provide an appropriate representation of q1 etc. since it is not doing a good job of handling the NaN values (i.e. it works without NaNs, but not with NaNs).
I thought this may not be a "NaN" issue, rather it might be quantile failing to handle even-numbered data sets (i.e. where the median must be calculated as the mean of the two central numbers). However, after testing with dataframes with both even and odd-numbers of rows I saw that quantile handled these situations properly. The problem seems to arise only when NaN values are present in the dataframe.
I would like to use quntile to calculate the rolling q1/q3 values in my dataframe, however, this will not work with NaN's present. Can anyone provide a solution to this issue?
Internally, quantile uses numpy.percentile over the non-null values. When you change the last row of data to NaNs you're essentially left with an array array([ 33., 46., 71., 90., 71., 42., 80., 60., 33.]) in the first column
Calculating np.percentile(array([ 33., 46., 71., 90., 71., 42., 80., 60., 33.]) gives 42.
From the docstring:
Given a vector V of length N, the qth percentile of V is the qth ranked
value in a sorted copy of V. A weighted average of the two nearest
neighbors is used if the normalized ranking does not match q exactly.
The same as the median if q=50, the same as the minimum if q=0
and the same as the maximum if q=100.

Merge dataset without common variable (By)?

Currently I have two datasets with similar variable lists. Each dataset has a procedure variable. I want to compare the frequency of the procedure variable between datasets. I created a flag in both datasets to id the source dataset, and was going to merge but don't have a common identifier. How do I merge a dataset without deleting any observations? This isn't just a simple Merge without a By function, right?
Currently have:
Data.a Data.b
pproc proc1_numb
70 9
71 15
77 24
80 80
81 42
83 71
86 66
87 125
121 159
125 242
Want Output:
pproc freq
9 1
15 1
24 1
42 1
66 1
70 1
71 2
77 1
80 2
81 1
83 1
86 1
87 1
121 1
125 2
159 1
242 1
If I understand your question properly, you should just concatenate the two datasets into one and rename the variable. Then you can use PROC MEANS to get the frequencies. Something like this:
data all;
set a
b(rename=(proc1_numb=pproc));
run;
proc means nway data=all noprint;
class pproc;
output out=want(drop=_type_ rename=(_freq_=freq));
run;

Merge Sort Array of Integers

Can anyone help me with this pratice question: 33 31 11 47 2 20 24 12 2 43. I am trying to figure out what the contents of the two output lists would be after the first pass of the Merge Sort.
The answer is supposedly:
List 1: 33 11 47 12
List 2: 31 2 20 24 2 43
Not making any sense to me sense I was under the impression that the first pass was where it divided it into two lists at the middle....
33 31 11 47 2 20 24 12
At first the list divides into individual elements such that when a single ton list is formed each element is compared with the one next to it. So after first pass we have
31 33 11 47 2 20 12 24
After that
11 31 33 47 2 12 20 24
and then
2 11 12 20 24 3133 37