Bash: expand a list of coordinates (sed?) - regex

I have a list of simple coordinates (longitude, latitude pairs) like
110 30
-120 0
130 -30
0 30
and try to expand it to this:
110 30 110\272E 30\272N 110 30 LON0
-120 0 120\272W 0\272 -120 0 LON0
130 -30 130\272E 30\272S 130 -30 LON0
0 30 0\272 30\272N 0 30 LON0
Examining the first line:
110 30 110\272E 30\272N 110 30 LON0
110 30 The first two values just stay the same
110\272E the third value is basically the first value with an added (octal \272) degree symbol and an E for positive values or a W for negative values
30\272N similar to the third value, this is the latitude with an added degree symbol and a N for positive and a S for negative values.
110 30 is just a repetition of the first two values
LON0 is a fixed string for later replacement.
Things tried so far:
I played around with sed, but was unable to achieve anything remotely useful. I wasn't able to manipulate the matched values depending on them being negative or positive.
Any help is greatly appreciated.
All the best,
Chris
EDIT: #jaypal suggested to add different possible cases that can occur. Original was only one case with minor deviations in value.
EDIT2: Had to adjust the example data due to me not updating all values in the sample data. My apologies.

Can you use awk? It will be very easy:
$ cat file
110 30
-120 0
130 -30
0 30
awk '
function abs(x) {
x = x > 0 ? x : x * -1
return x
}
{
print abs($1),abs($2), ($1>0?abs($1)"\272E":$1==0?$1"\272":abs($1)"\272W"), ($2>0?abs($2)"\272N":$2==0?$2"\272":abs($2)"\272S"), abs($1), abs($2), "LON0"
}' file
110 30 110ºE 30ºN 110 30 LON0
120 0 120ºW 0º 120 0 LON0
130 30 130ºE 30ºS 130 30 LON0
0 30 0º 30ºN 0 30 LON0
If you want to print \272 instead of º just add another backslash to prevent it from interpolating. So modify the above script and use \\272 where ever you see \272.
We print the fields as you desire in your output and the following two syntax:
($1>0?$1"\272E":$1"\272W")
($2>0?$2"\272N":$2"\272S")
are ternary operators that checks for the positivity of the values. If first is positive use E else W. If second is positive use N else use S.
Update:
awk '
function abs(x) {
x = x > 0 ? x : x * -1
return x
}
{
print $1,$2,($1>0?$1"\\272E":$1==0?$1"\\272":abs($1)"\\272W"),($2>0?$2"\\272N":$2==0?$2"\\272":abs($2)"\\272S"),$1,$2, "LON0"
}' file
110 30 110\272E 30\272N 110 30 LON0
-120 0 120\272W 0\272 -120 0 LON0
130 -30 130\272E 30\272S 130 -30 LON0
0 30 0\272 30\272N 0 30 LON0

Related

Transform Ordered Values to Paired

I'm looking to transform a set of ordered values into a new dataset containing all ordered combinations.
For example, if I have a dataset that looks like this:
Code Rank Value Pctile
1250 1 25 0
1250 2 32 0.25
1250 3 37 0.5
1250 4 51 0.75
1250 5 59 1
I'd like to transform it to something like this, with values for rank 1 and 2 in a single row, values for 2 and 3 in the next, and so forth:
Code Min_value Min_pctile Max_value Max_pctile
1250 25 0 32 0.25
1250 32 0.25 37 0.5
1250 37 0.5 51 0.75
1250 51 0.75 59 1
It's simple enough to do with a handful of values, but when the number of "Code" families is large (as is mine), I'm looking for a more efficient approach. I imagine there's a straightforward way to do this with a data step, but it escapes me.
Looks like you just want to use the lag() function.
data want ;
set have ;
by code rank ;
min_value = lag(value) ;
min_pctile = lag(pctile) ;
rename value=max_value pctile=max_pctile ;
if not first.code ;
run;
Results
max_ max_ min_ min_
Obs Code Rank value pctile value pctile
1 1250 2 32 0.25 25 0.00
2 1250 3 37 0.50 32 0.25
3 1250 4 51 0.75 37 0.50
4 1250 5 59 1.00 51 0.75

Density of fractions between 2 given numbers

I'm trying to do some analysis over a simple Fraction class and I want some data to compare that type with doubles.
The problem
Right know I'm looking for some good way to get the density of Fractions between 2 numbers. Fractions is basically 2 integers (e.g. pair< long, long>), and the density between s and t is the amount of representable numbers in that range. And it needs to be an exact, or very good approximation done in O(1) or very fast.
To make it a bit simpler, let's say I want all the numbers (not fractions) a/b between s and t, where 0 <= s <= a/b < t <= M, and 0 <= a,b <= M (b > 0, a and b are integers)
Example
If my fractions were of a data type which only count to 6 (M = 6), and I want the density between 0 and 1, the answer would be 12. Those numbers are:
0, 1/6, 1/5, 1/4, 1/3, 2/5, 1/2, 3/5, 2/3, 3/4, 4/5, 5/6.
What I thought already
A very naive approach would be to cycle trough all the possible fractions, and count those which can't be simplified. Something like:
long fractionsIn(double s, double t){
long density = 0;
long M = LONG_MAX;
for(int d = 1; d < floor(M/t); d++){
for(int n = ceil(d*s); n < M; n++){
if( gcd(n,d) == 1 )
density++;
}
}
return density;
}
But gcd() is very slow so it doesn't works. I also try doing some math but i couldn't get to anything good.
Solution
Thanks to #m69 answer, I made this code for Fraction = pair<Long,Long>:
//this should give the density of fractions between first and last, or less.
double fractionsIn(unsigned long long first, unsigned long long last){
double pi = 3.141592653589793238462643383279502884;
double max = LONG_MAX; //i can't use LONG_MAX directly
double zeroToOne = max/pi * max/pi * 3; // = approx. amount of numbers in Farey's secuence of order LONG_MAX.
double res = 0;
if(first == 0){
res = zeroToOne;
first++;
}
for(double i = first; i < last; i++){
res += zeroToOne/(i * i+1);
if(i == i+1)
i = nextafter(i+1, last); //if this happens, i might not count some fractions, but i have no other choice
}
return floor(res);
}
The main change is nextafter, which is important with big numbers (1e17)
The result
As I explain at the begining, I was trying to compare Fractions with double. Here is the result for Fraction = pair<Long,Long> (and here how I got the density of doubles):
Density between 0,1: | 1,2 | 1e6,1e6+1 | 1e14,1e14+1 | 1e15-1,1e15 | 1e17-10,1e17 | 1e19-10000,1e19 | 1e19-1000,1e19
Doubles: 4607182418800017408 | 4503599627370496 | 8589934592 | 64 | 8 | 1 | 5 | 0
Fraction: 2.58584e+37 | 1.29292e+37 | 2.58584e+25 | 2.58584e+09 | 2.58584e+07 | 2585 | 1 | 0
Density between 0 and 1
If the integers with which you express the fractions are in the range 0~M, then the density of fractions between the values 0 (inclusive) and 1 (exclusive) is:
M: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
0~(1): 1 2 4 6 10 12 18 22 28 32 42 46 58 64 72 80 96 102 120 128 140 150 172 180 200 212 230 242 270 278 308 ...
This is sequence A002088 on OEIS. If you scroll down to the formula section, you'll find information about how to approximate it, e.g.:
Φ(n) = (3 ÷ π2) × n2 + O[n × (ln n)2/3 × (ln ln n)4/3]
(Unfortunately, no more detail is given about the constants involved in the O[x] part. See discussion about the quality of the approximation below.)
Distribution across range
The interval from 0 to 1 contains half of the total number of unique fractions that can be expressed with numbers up to M; e.g. this is the distribution when M = 15 (i.e. 4-bit integers):
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
72 36 12 6 4 2 2 2 1 1 1 1 1 1 1 1
for a total of 144 unique fractions. If you look at the sequence for different values of M, you'll see that the steps in this sequence converge:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1: 1 1
2: 2 1 1
3: 4 2 1 1
4: 6 3 1 1 1
5: 10 5 2 1 1 1
6: 12 6 2 1 1 1 1
7: 18 9 3 2 1 1 1 1
8: 22 11 4 2 1 1 1 1 1
9: 28 14 5 2 2 1 1 1 1 1
10: 32 16 5 3 2 1 1 1 1 1 1
11: 42 21 7 4 2 2 1 1 1 1 1 1
12: 46 23 8 4 2 2 1 1 1 1 1 1 1
13: 58 29 10 5 3 2 2 1 1 1 1 1 1 1
14: 64 32 11 5 4 2 2 1 1 1 1 1 1 1 1
15: 72 36 12 6 4 2 2 2 1 1 1 1 1 1 1 1
Not only is the density between 0 and 1 half of the total number of fractions, but the density between 1 and 2 is a quarter, and the density between 2 and 3 is close to a twelfth, and so on.
As the value of M increases, the distribution of fractions across the ranges 0-1, 1-2, 2-3 ... converges to:
1/2, 1/4, 1/12, 1/24, 1/40, 1/60, 1/84, 1/112, 1/144, 1/180, 1/220, 1/264 ...
This sequence can be calculated by starting with 1/2 and then:
0-1: 1/2 x 1/1 = 1/2
1-2: 1/2 x 1/2 = 1/4
2-3: 1/4 x 1/3 = 1/12
3-4: 1/12 x 2/4 = 1/24
4-5: 1/24 x 3/5 = 1/40
5-6: 1/40 x 4/6 = 1/60
6-7: 1/60 x 5/7 = 1/84
7-8: 1/84 x 6/8 = 1/112
8-9: 1/112 x 7/9 = 1/144 ...
You can of course calculate any of these values directly, without needing the steps inbetween:
0-1: 1/2
6-7: 1/2 x 1/6 x 1/7 = 1/84
(Also note that the second half of the distribution sequence consists of 1's; these are all the integers divided by 1.)
Approximating the density in given interval
Using the formulas provided on the OEIS page, you can calculate or approximate the density in the interval 0-1, and multiplied by 2 this is the total number of unique values that can be expressed as fractions.
Given two values s and t, you can then calculate and sum the densities in the intervals s ~ s+1, s+1 ~ s+2, ... t-1 ~ t, or use an interpolation to get a faster but less precise approximate value.
Example
Let's assume that we're using 10-bit integers, capable of expressing values from 0 to 1023. Using this table linked from the OEIS page, we find that the density between 0~1 is 318452, and the total number of fractions is 636904.
If we wanted to find the density in the interval s~t = 100~105:
100~101: 1/2 x 1/100 x 1/101 = 1/20200 ; 636904/20200 = 31.53
101~102: 1/2 x 1/101 x 1/102 = 1/20604 ; 636904/20604 = 30.91
102~103: 1/2 x 1/102 x 1/103 = 1/21012 ; 636904/21012 = 30.31
103~104: 1/2 x 1/103 x 1/104 = 1/21424 ; 636904/21424 = 29.73
104~105: 1/2 x 1/104 x 1/105 = 1/21840 ; 636904/21840 = 29.16
Rounding these values gives the sum:
32 + 31 + 30 + 30 + 29 = 152
A brute force algorithm gives this result:
32 + 32 + 30 + 28 + 28 = 150
So we're off by 1.33% for this low value of M and small interval with just 5 values. If we had used linear interpolation between the first and last value:
100~101: 31.53
104~105: 29.16
average: 30.345
total: 151.725 -> 152
we'd have arrived at the same value. For larger intervals, the sum of all the densities will probably be closer to the real value, because rounding errors will cancel each other out, but the results of linear interpolation will probably become less accurate. For ever larger values of M, the calculated densities should converge with the actual values.
Quality of approximation of Φ(n)
Using this simplified formula:
Φ(n) = (3 ÷ π2) × n2
the results are almost always smaller than the actual values, but they are within 1% for n ≥ 182, within 0.1% for n ≥ 1880 and within 0.01% for n ≥ 19494. I would suggest hard-coding the lower range (the first 50,000 values can be found here), and then using the simplified formula from the point where the approximation is good enough.
Here's a simple code example with the first 182 values of Φ(n) hard-coded. The approximation of the distribution sequence seems to add an error of a similar magnitude as the approximation of Φ(n), so it should be possible to get a decent approximation. The code simply iterates over every integer in the interval s~t and sums the fractions. To speed up the code and still get a good result, you should probably calculate the fractions at several points in the interval, and then use some sort of non-linear interpolation.
function fractions01(M) {
var phi = [0,1,2,4,6,10,12,18,22,28,32,42,46,58,64,72,80,96,102,120,128,140,150,172,180,200,212,230,242,270,278,308,
324,344,360,384,396,432,450,474,490,530,542,584,604,628,650,696,712,754,774,806,830,882,900,940,964,1000,
1028,1086,1102,1162,1192,1228,1260,1308,1328,1394,1426,1470,1494,1564,1588,1660,1696,1736,1772,1832,1856,
1934,1966,2020,2060,2142,2166,2230,2272,2328,2368,2456,2480,2552,2596,2656,2702,2774,2806,2902,2944,3004,
3044,3144,3176,3278,3326,3374,3426,3532,3568,3676,3716,3788,3836,3948,3984,4072,4128,4200,4258,4354,4386,
4496,4556,4636,4696,4796,4832,4958,5022,5106,5154,5284,5324,5432,5498,5570,5634,5770,5814,5952,6000,6092,
6162,6282,6330,6442,6514,6598,6670,6818,6858,7008,7080,7176,7236,7356,7404,7560,7638,7742,7806,7938,7992,
8154,8234,8314,8396,8562,8610,8766,8830,8938,9022,9194,9250,9370,9450,9566,9654,9832,9880,10060];
if (M < 182) return phi[M];
return Math.round(M * M * 0.30396355092701331433 + M / 4); // experimental; see below
}
function fractions(M, s, t) {
var half = fractions01(M);
var frac = (s == 0) ? half : 0;
for (var i = (s == 0) ? 1 : s; i < t && i <= M; i++) {
if (2 * i < M) {
var f = Math.round(half / (i * (i + 1)));
frac += (f < 2) ? 2 : f;
}
else ++frac;
}
return frac;
}
var M = 1023, s = 100, t = 105;
document.write(fractions(M, s, t));
Comparing the approximation of Φ(n) with the list of the 50,000 first values suggests that adding M÷4 is a workable substitute for the second part of the formula; I have not tested this for larger values of n, so use with caution.
Blue: simplified formula. Red: improved simplified formula.
Quality of approximation of distribution
Comparing the results for M=1023 with those of a brute-force algorithm, the errors are small in real terms, never more than -7 or +6, and above the interval 205~206 they are limited to -1 ~ +1. However, a large part of the range (57~1024) has fewer than 100 fractions per integer, and in the interval 171~1024 there are only 10 fractions or fewer per integer. This means that small errors and rounding errors of -1 or +1 can have a large impact on the result, e.g.:
interval: 241 ~ 250
fractions/integer: 6
approximation: 5
total: 50 (instead of 60)
To improve the results for intervals with few fractions per integer, I would suggest combining the method described above with a seperate approach for the last part of the range:
Alternative method for last part of range
As already mentioned, and implemented in the code example, the second half of the range, M÷2 ~ M, has 1 fraction per integer. Also, the interval M÷3 ~ M÷2 has 2; the interval M÷4 ~ M÷3 has 4. This is of course the Φ(n) sequence again:
M/2 ~ M : 1
M/3 ~ M/2: 2
M/4 ~ M/3: 4
M/5 ~ M/4: 6
M/6 ~ M/5: 10
M/7 ~ M/6: 12
M/8 ~ M/7: 18
M/9 ~ M/8: 22
M/10 ~ M/9: 28
M/11 ~ M/10: 32
M/12 ~ M/11: 42
M/13 ~ M/12: 46
M/14 ~ M/13: 58
M/15 ~ M/14: 64
M/16 ~ M/15: 72
M/17 ~ M/16: 80
M/18 ~ M/17: 96
M/19 ~ M/18: 102 ...
Between these intervals, one integer can have a different number of fractions, depending on the exact value of M, e.g.:
interval fractions
202 ~ 203 10
203 ~ 204 10
204 ~ 205 9
205 ~ 206 6
206 ~ 207 6
The interval 204 ~ 205 lies on the edge between intervals, because M ÷ 5 = 204.6; it has 6 + 3 = 9 fractions because M modulo 5 is 3. If M had been 1022 or 1024 instead of 1023, it would have 8 or 10 fractions. (This example is straightforward because 5 is a prime; see below.)
Again, I would suggest using the hard-coded values for Φ(n) to calculate the number of fractions for the last part of the range. If you use the first 17 values as listed above, this covers the part of the range with fewer than 100 fractions per integer, so that would reduce the impact of rounding errors below 1%. The first 56 values would give you 0.1%, the first 182 values 0.01%.
Together with the values of Φ(n), you could hard-code the number of fractions of the edge intervals for each modulo value, e.g.:
modulo: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
M/ 2 1 2
M/ 3 2 3 4
M/ 4 4 5 5 6
M/ 5 6 7 8 9 10
M/ 6 10 11 11 11 11 12
M/ 7 12 13 14 15 16 17 18
M/ 8 18 19 19 20 20 21 21 22
M/ 9 22 23 24 24 25 26 26 27 28
M/10 28 29 29 30 30 30 30 31 31 32
M/11 32 33 34 35 36 37 38 39 40 41 42
M/12 42 43 43 43 43 44 44 45 45 45 45 46
M/13 46 47 48 49 50 51 52 53 54 55 56 57 58
M/14 58 59 59 60 60 61 61 61 61 62 62 63 63 64
M/15 64 65 66 66 67 67 67 68 69 69 69 70 70 71 72
M/16 72 73 73 74 74 75 75 76 76 77 77 78 78 79 79 80
M/17 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96
M/18 96 97 97 97 97 98 98 99 99 99 99 100 100 101 101 101 101 102
This is exactly the same as: (Sum of phi(k)) where m <= k <= M where phi(k) is the Euler Totient Function and with phi(0) = 1 (as defined by the problem). There is no known closed form for this sum. However there are many optimizations known as mentioned in the wiki link. This is known as the Totient Summatory Function in Wolfram. The same website also links to the series: A002088 and provides a few asymptotic approximations.
The reasoning is this: consider the number of values of the form {1/M, 2/M, ...., (M-1)/M, M/M}. All those fractions that will be reducible to a smaller value will not be counted in phi(M) because they are not relatively prime. They will appear in the summation of another totient.
For example, phi(6) = 12 and you have 1 + phi(6), since you also count the 0.

Solid fill of an ellipse in python dxf

I'd like to draw filled ellipse with python. This would be easy if I could use PIL oder some other libraries. The problem is I need the ellipse in a .dxf file format. Therefore I used the dxfwrite package. This allows me to draw an ellipse but I couldn't find a way to fill it with a solid color. The following code does draw an ellipse line, but does not fill it.
import dxfwrite
from dxfwrite import DXFEngine as dxf
name = 'ellipse.dxf'
dwg = dxf.drawing(name)
dwg.add(dxf.ellipse((0,0), 5., 10., segments=200))
dwg.save()
Does anybody of you guys know a solution?
The HATCH entity is not supported by dxfwrite, if you use ezdxf this is the solution:
import ezdxf
dwg = ezdxf.new('AC1015') # hatch requires the DXF R2000 (AC1015) format or later
msp = dwg.modelspace() # adding entities to the model space
# important: major axis >= minor axis (ratio <= 1.) else AutoCAD crashes
msp.add_ellipse((0, 0), major_axis=(0, 10), ratio=0.5)
hatch = msp.add_hatch(color=2)
with hatch.edit_boundary() as boundary: # edit boundary path (context manager)
edge_path = boundary.add_edge_path()
# an edge path can contain line, arc, ellipse or spline elements
edge_path.add_ellipse((0, 0), major_axis_vector=(0, 10), minor_axis_length=0.5)
# upcoming ezdxf 0.7.7:
# renamed major_axis_vector to major_axis
# renamed minor_axis_length to ratio
dwg.saveas("solid_hatch_ellipse.dxf")
You could fill an ellipse by using a solid hatch object:
For the above example, here is a snippet from the DXF file that contains the ellipse and the hatch:
AcDbEntity
8
0
100
AcDbEllipse
10
2472.192919
20
1311.37942
30
0.0
11
171.0698134145308
21
-27.61597470964863
31
0.0
210
0.0
220
0.0
230
1.0
40
0.2928953354556341
41
0.0
42
6.283185307179586
0
HATCH
5
5A
330
2
100
AcDbEntity
8
0
100
AcDbHatch
10
0.0
20
0.0
30
0.0
210
0.0
220
0.0
230
1.0
2
SOLID
70
1
71
1
91
1
92
5
93
1
72
3
10
2472.192919357234
20
1311.379420138197
11
171.0698134145308
21
-27.61597470964863
40
0.2928953354556341
50
0.0
51
360.0
73
1
97
1
330
59
75
1
76
1
47
0.794178
98
1
10
2428.34191358924
20
1317.777876434349
450
0
451
0
460
0.0
461
0.0
452
0
462
1.0
453
2
463
0.0
63
5
421
255
463
1.0
63
2
421
16776960
470
LINEAR
1001
GradientColor1ACI
1070
5
1001
GradientColor2ACI
1070
2
1001
ACAD
1010
0.0
1020
0.0
1030
0.0
There are a lot of DXF codes involved. This is the information Autodesk provide:
Hatch group codes
Group code
Description
100
Subclass marker (AcDbHatch)
10
Elevation point (in OCS)
DXF: X value = 0; APP: 3D point (X and Y always equal 0, Z represents the elevation)
20, 30
DXF: Y and Z values of elevation point (in OCS)
Y value = 0, Z represents the elevation
210
Extrusion direction (optional; default = 0, 0, 1)
DXF: X value; APP: 3D vector
220, 230
DXF: Y and Z values of extrusion direction
2
Hatch pattern name
70
Solid fill flag (solid fill = 1; pattern fill = 0); for MPolygon, the version of MPolygon
63
For MPolygon, pattern fill color as the ACI
71
Associativity flag (associative = 1; non-associative = 0); for MPolygon, solid-fill flag (has solid fill = 1; lacks solid fill = 0)
91
Number of boundary paths (loops)
varies
Boundary path data. Repeats number of times specified by code 91. See Boundary Path Data
75
Hatch style:
0 = Hatch “odd parity” area (Normal style)
1 = Hatch outermost area only (Outer style)
2 = Hatch through entire area (Ignore style)
76
Hatch pattern type:
0 = User-defined; 1 = Predefined; 2 = Custom
52
Hatch pattern angle (pattern fill only)
41
Hatch pattern scale or spacing (pattern fill only)
73
For MPolygon, boundary annotation flag (boundary is an annotated boundary = 1; boundary is not an annotated boundary = 0)
77
Hatch pattern double flag (pattern fill only):
0 = not double; 1 = double
78
Number of pattern definition lines
varies
Pattern line data. Repeats number of times specified by code 78. See Pattern Data
47
Pixel size used to determine the density to perform various intersection and ray casting operations in hatch pattern computation for associative hatches and hatches created with the Flood method of hatching
98
Number of seed points
11
For MPolygon, offset vector
99
For MPolygon, number of degenerate boundary paths (loops), where a degenerate boundary path is a border that is ignored by the hatch
10
Seed point (in OCS)
DXF: X value; APP: 2D point (multiple entries)
20
DXF: Y value of seed point (in OCS); (multiple entries)
450
Indicates solid hatch or gradient; if solid hatch, the values for the remaining codes are ignored but must be present. Optional; if code 450 is in the file, then the following codes must be in the file: 451, 452, 453, 460, 461, 462, and 470. If code 450 is not in the file, then the following codes must not be in the file: 451, 452, 453, 460, 461, 462, and 470
0 = Solid hatch
1 = Gradient
451
Zero is reserved for future use
452
Records how colors were defined and is used only by dialog code:
0 = Two-color gradient
1 = Single-color gradient
453
Number of colors:
0 = Solid hatch
2 = Gradient
460
Rotation angle in radians for gradients (default = 0, 0)
461
Gradient definition; corresponds to the Centered option on the Gradient Tab of the Boundary Hatch and Fill dialog box. Each gradient has two definitions, shifted and unshifted. A Shift value describes the blend of the two definitions that should be used. A value of 0.0 means only the unshifted version should be used, and a value of 1.0 means that only the shifted version should be used.
462
Color tint value used by dialog code (default = 0, 0; range is 0.0 to 1.0). The color tint value is a gradient color and controls the degree of tint in the dialog when the Hatch group code 452 is set to 1.
463
Reserved for future use:
0 = First value
1 = Second value
470
String (default = LINEAR)
I hope this may be of some use to you. I apologize if I missunderstood your issue.

Indicator for Top3 and ranking across ros

What I am trying to do are following: I want to find out if a observation (A) is top 3 across others.
For example,
A B C D E F G H TOP3-A
1 20 30 40 50 60 70 80 90 N
2 80 90 70 80 0 0 0 0 Y
3 70 0 0 80 90 0 0 0 Y
4 60 70 80 90 0 0 0 0 N
I am thinking transpose + rank + transpose + if <4 then Y else N, however it seems too cumbersome and to be honest as a newbie I do not how to code all these steps correctly...
Your method would work, but there's a much simpler way of doing it.
You could use an array, which reads across rows, however I'm using an even easier way of reading across rows.
The OF statement can be used in conjunction with a summary function to calculate values across rather than down. The LARGEST function returns the largest nth value from a range, so you can compare field A to the 3rd largest value in the row.
I've given you the answer to produce Y, N plus an alternative that produces 1, 0 which is even simpler.
data have;
input A B C D E F G H;
datalines;
20 30 40 50 60 70 80 90
80 90 70 80 0 0 0 0
70 0 0 80 90 0 0 0
60 70 80 90 0 0 0 0
;
run;
data want;
set have;
if A >= largest(3, of A--H) then top3_A = 'Y'; /* A--H references all columns between A and H */
else top3_A = 'N';
/* or */
top3_A2 = (A >= largest(3, of A--H)); /* returns 1 for true, 0 for flase */
run;

Assigning Variables from CSV files (or another format) in C++

Hello Stack Overflow world :3 My name is Chris, I have a slight issue.. So I am going to present the issue in this format..
Part 1
I will present the materials & code snippets I am currently working with that IS working..
Part 2
I will explain in my best ability my desired new way of achieving my goal.
Part 3
So you guys think I am not having you do all the work, I will go ahead and present my attempts at said goal, as well as possibly ways research has dug up that I did not fully understand.
Part 1
mobDB.csv Example:
ID Sprite kName iName LV HP SP EXP JEXP Range1 ATK1 ATK2 DEF MDEF STR AGI VIT INT DEX LUK Range2 Range3 Scale Race Element Mode Speed aDelay aMotion dMotion MEXP ExpPer MVP1id MVP1per MVP2id MVP2per MVP3id MVP3per Drop1id Drop1per Drop2id Drop2per Drop3id Drop3per Drop4id Drop4per Drop5id Drop5per Drop6id Drop6per Drop7id Drop7per Drop8id Drop8per Drop9id Drop9per DropCardid DropCardper
1001 SCORPION Scorpion Scorpion 24 1109 0 287 176 1 80 135 30 0 1 24 24 5 52 5 10 12 0 4 23 12693 200 1564 864 576 0 0 0 0 0 0 0 0 990 70 904 5500 757 57 943 210 7041 100 508 200 625 20 0 0 0 0 4068 1
1002 PORING Poring Poring 1 50 0 2 1 1 7 10 0 5 1 1 1 0 6 30 10 12 1 3 21 131 400 1872 672 480 0 0 0 0 0 0 0 0 909 7000 1202 100 938 400 512 1000 713 1500 512 150 619 20 0 0 0 0 4001 1
1004 HORNET Hornet Hornet 8 169 0 19 15 1 22 27 5 5 6 20 8 10 17 5 10 12 0 4 24 4489 150 1292 792 216 0 0 0 0 0 0 0 0 992 80 939 9000 909 3500 1208 15 511 350 518 150 0 0 0 0 0 0 4019 1
1005 FARMILIAR Familiar Familiar 8 155 0 28 15 1 20 28 0 0 1 12 8 5 28 0 10 12 0 2 27 14469 150 1276 576 384 0 0 0 0 0 0 0 0 913 5500 1105 20 2209 15 601 50 514 100 507 700 645 50 0 0 0 0 4020 1
1007 FABRE Fabre Fabre 2 63 0 3 2 1 8 11 0 0 1 2 4 0 7 5 10 12 0 4 22 385 400 1672 672 480 0 0 0 0 0 0 0 0 914 6500 949 500 1502 80 721 5 511 700 705 1000 1501 200 0 0 0 0 4002 1
1008 PUPA Pupa Pupa 2 427 0 2 4 0 1 2 0 20 1 1 1 0 1 20 10 12 0 4 22 256 1000 1001 1 1 0 0 0 0 0 0 0 0 1010 80 915 5500 938 600 2102 2 935 1000 938 600 1002 200 0 0 0 0 4003 1
1009 CONDOR Condor Condor 5 92 0 6 5 1 11 14 0 0 1 13 5 0 13 10 10 12 1 2 24 4233 150 1148 648 480 0 0 0 0 0 0 0 0 917 9000 1702 150 715 80 1750 5500 517 400 916 2000 582 600 0 0 0 0 4015 1
1010 WILOW Willow Willow 4 95 0 5 4 1 9 12 5 15 1 4 8 30 9 10 10 12 1 3 22 129 200 1672 672 432 0 0 0 0 0 0 0 0 902 9000 1019 100 907 1500 516 700 1068 3500 1067 2000 1066 1000 0 0 0 0 4010 1
1011 CHONCHON Chonchon Chonchon 4 67 0 5 4 1 10 13 10 0 1 10 4 5 12 2 10 12 0 4 24 385 200 1076 576 480 0 0 0 0 0 0 0 0 998 50 935 6500 909 1500 1205 55 601 100 742 5 1002 150 0 0 0 0 4009 1
So this is an example of the Spreadsheet I have.. This is what I wish to be using in my ideal goal. Not what I am using right now.. It was done in MS Excel 2010, using Columns A-BF and Row 1-993
Currently my format for working code, I am using manually implemented Arrays.. For example for the iName I have:
char iName[16][25] = {"Scorpion", "Poring", "Hornet", "Familiar", "null", "null", "null", "null", "null", "null", "null", "null", "null", "null", "null", "null"};
Defined in a header file (bSystem.h) now to apply, lets say their health variable? I have to have another array in the same Header with corresponding order, like so:
int HP[16] = {1109, 50, 169, 155, 95, 95, 118, 118, 142, 142, 167, 167, 193, 193, 220, 220};
The issue is, there is a large amount of data to hard code into the various file I need for Monsters, Items, Spells, Skills, ect.. On the original small scale to get certain system made it was fine.. I have been using various Voids in header files to transfer data from file to file when it's called.. But when I am dealing with 1,000+ Monsters and having to use all these variables.. Manually putting them in is kinda.. Ridiculous? Lol...
Part 2
Now my ideal system for this, is to be able to use the .CSV Files to load the data.. I have hit a decent amount of various issues in this task.. Such as, converting the data pulled from Names to a Char array, actually pulling the data from the CSV file and assigning specific sections to certain arrays... The main idea I have in mind, that I can not seem to get to is this;
I would like to be able to find a way to just read these various variables from the CSV file... So when I call upon the variables like:
cout << name << "(" << health << " health) VS. " << iName[enemy] << "(" << HP[enemy] << " health)";
where [enemy] is, it would be the ID.. the enemy encounter is in another header (lSystem.h) where it basically goes like;
case 0:
enemy = 0;
Where 0 would be the first data in the Arrays involving Monsters.. I hate that it has to be order specific.. I would want to be able to say enemy = 1002; so when the combat systems start it can just pull the variables it needs from the enemy with the ID 1002..
I always hit a few different issues, I can't get it to pull the data from the file to the program.. When I can, I can only get it to store int values to int arrays, I have issues getting it to convert the strings to char arrays.. Then the next issue I am presented with is recalling it and the actual saving part... Which is where part 3 comes in :3
Part 3
I have attempted a few different things so far and have done research on how to achieve this.. What I have came across so far is..
I can write a function to read the data from let's say mobDB, record it into arrays, then output it to a .dat? So when I need to recall variables I can do some from the .dat instead of a modifiable CSV.. I was presented with the same issues as far as reading and converting..
I can go the SQL route, but I have had a ton of issues understanding how to pull the data from the SQL? I have a PowerEdge 2003 Server box in my house which I store data on, it does have NavicatSQL Premium set up, so I guess my main 2 questions about the SQL route is, is it possible to hook right into the SQLServer and as I update the Data Base, when the client runs it would just pull the variables and data from the DB? Or would I be stuck compiling SQL files... When it is an online game, I know I will have to use something to transfer from Server to Client, which is why I am trying to set up this early in dev so I have more to build off of, I am sure I can use SQL servers for that? If anyone has a good grasp on how this works I would very much like to take the SQL route..
Attempts I have made are using like, Boost to Parse the data from the CSV instead of standard libs.. Same issues were presented.. I did read up on converting a string to a char.. But the issue lied in once I pulled the data, I couldn't convert it?..
I've also tried the ADO C++ route.. Dead end there..
All in all I have spent the last week or so on this.. I would very much like to set up the SQL server to actually update the variables... but I am open to any working ideas that presents ease of editing and implementing large amounts of data..
I appreciate any and all help.. If anyone does attempt to help get a working code for this, if it's not too much trouble to add comments to parts you feel you should explain? I don't want someone to just give me a quick fix.. I actually want to learn and understand what I am using. Thank you all very much :)
-Chris
Let's see if I understand your problem correctly: You are writing a game and currently all the stats for your game actors are hardcoded. You already have an Excel spreadsheet with this data and you just want to use this instead of the hardcoded header files, so that you can tweak the stats without waiting for a long recompilation. You are currently storing the stats in your code in a column-store fashion, i.e. one array per attribute. The CSV file stores stuff in a row-wise fashion. Correct so far?
Now my understanding of your problem becomes a little blurry. But let's try. If I understand you correctly, you want to completely remove the arrays from your code and directly access the CSV file when you need the stats for some creature? If yes, then this is already the problem. File I/O is incredibly slow, you need to keep this data in main memory. Just keep the arrays, but instead of manually assigning the values in the headers, you have a load function that reads the CSV file when you start the game and loads its contents into the array. You can keep the rest of your code unchanged.
Example:
void load (std::ifstream &csv)
{
readFirstLineAndCheckThatItIsCorrect (csv);
while (!csv.eof())
{
int id;
std::string spriteName;
csv >> id;
csv >> spriteName >> kName[id] >> iName[id] >> LV[id] >> HP[id] >> SP[id] >> ...
Sprite[id] = getSpriteForName (spriteName);
}
}
Using a database system is completely out of scope here. All you need to do is load some data into some arrays. If you want to be able to change the stats without restarting the program, add some hotkey for reloading the CSV file.
If you plan to write an online game, then you still have a long way ahead of you. Even then, SQL is a very bad idea for exchanging data between server and clients because a) it just introduces way too much overhead and b) it is an open invitation for cheaters and hackers because if clients have direct access to your database, you can no longer validate their inputs. See http://forums.somethingawful.com/showthread.php?noseen=0&pagenumber=258&threadid=2803713 for an actual example.
If you really want this to be an online game, you need to design your own communication protocol. But maybe you should read some books about that first, because it really is a complex issue. For instance, you need to hide the latency from the user by guessing on the client side what the server and the other players will most likely do next, and gracefully correct your guesses if they were wrong, all without the player noticing (Dead Reckoning).
Still, good luck on your game and I hope to play it some day. :-)
IMO, the simplest thing to do would be to first create a struct that holds all the data for a monster. Here's a reduced version because I don't feel like typing all those variables.
struct Mob
{
std::string SPRITE, kName, iName;
int ID, LV, HP, SP, EXP;
};
The loading code for your particular format is then fairly simple:
bool ParseMob(const std::string & str, Mob & m)
{
std::stringstream iss(str);
Mob tmp;
if (iss >> tmp.ID >> tmp.SPRITE >> tmp.kName >> tmp.iName
>> tmp.LV >> tmp.HP >> tmp.SP >> tmp.EXP)
{
m = tmp;
return true;
}
return false;
}
std::vector<Mob> LoadMobs()
{
std::vector<Mob> mobs;
Mob tmp;
std::ifstream fin("mobDB.csv");
for (std::string line; std::getline(fin, line); )
{
if (ParseMob(line,tmp))
mobs.emplace_back(std::move(tmp));
}
return mobs;
}