extract data from txt file using regexp in matlab - regex

I need to extract some info from a txt file which looks like this using regexp:
##FileName = disp_20120803_064635_1
#Plane1
x1 = 10008 x2= -9991 x3= -9991
y1 = 137 y2 = 10 y3 = 158
z1= 844 z2= 779 z3 = 700
#Plane2
x1 = -16 x2= 193 x3= 320
y1 = -4472 y2 = -556 y3 = 5143
z1= 3215 z2= -1309 z3 = 370
#Plane3
x1 = -8145 x2= 5387 x3= 8070
y1 = -4808 y2 = 7643 y3 = 3051
z1= 4212 z2= 4120 z3 = -4176
##end
I want to extract the file name by the following code:
buffer = fileread('test.txt') ;
pattern = '##FileName\s=\s+(\w+?\d+)';
tokens = regexp(buffer, pattern, 'tokens');
fileName = [tokens{:}]
But the result is just disp_20120803 which is not the complete file name?
Any help?

Use this pattern instead:
pattern = '##FileName\s=\s+(\w+)';
Edit:
I don't know matlab syntax but you can use the following regex to capture the variables name and their values:
pattern = '([xyz][123])\s*=\s*(-?\d+)'
The variable name is in group 1 and its value in group 2.

Related

If statement and value of an input variable - Pine Script - Tradingview

I'm having issue with using the value of a variable used as input value, in a if statement Here's a piece of my code :
//#version=3
study(title="v5.0", shorttitle="v5.0", overlay=true)
PP_display = input(1, minval=0, maxval=1)
if (PP_display = 1)
xHigh = security(ticker,"D", high[0])
xLow = security(ticker,"D", low[0])
xClose = security(ticker,"D", close[0])
vPP = (xHigh+xLow+xClose) / 3
vR1 = vPP+(vPP-xLow)
vS1 = vPP-(xHigh - vPP)
vR2 = vPP + (xHigh - xLow)
vS2 = vPP - (xHigh - xLow)
vR3 = xHigh + 2 * (vPP - xLow)
vS3 = xLow - 2 * (xHigh - vPP)
plot(vPP, color=change(vPP) ? na : black, title="vPP", style = linebr, linewidth = width, transp=0)
end if
As a result, I'm getting this error : "syntax error at input 'PP_display'".
I can't find why...
Thanks for your help
If you want to compare PP_display variable with an integer you should use == (equal to) operator. Single = is used to declare variables.
There is no end if in pinescript syntax.
You can't use plot function in the local scope, only in global.
Declaring a variable using the security() function in the local scope will produce a compilation error - Can't call 'security' inside: 'if', 'for'
The solution is to move all your calcs, security calls and plot function to the global scope.
If your intention is to hide the plot with the PP_display input you could use a ternary conditional operator ? : directly in the series argument of the plot function.
//#version=3
study(title="v5.0", shorttitle="v5.0", overlay=true)
PP_display = input(1, minval=0, maxval=1)
xHigh = security(ticker,"D", high[0])
xLow = security(ticker,"D", low[0])
xClose = security(ticker,"D", close[0])
vPP = (xHigh+xLow+xClose) / 3
vR1 = vPP+(vPP-xLow)
vS1 = vPP-(xHigh - vPP)
vR2 = vPP + (xHigh - xLow)
vS2 = vPP - (xHigh - xLow)
vR3 = xHigh + 2 * (vPP - xLow)
vS3 = xLow - 2 * (xHigh - vPP)
plot(PP_display == 1 ? vPP : na, color=change(vPP) ? na : black, title="vPP", style = linebr, linewidth = 2, transp=0)

How do I break apart items in a line of text when there may not be space between some terms?

In MATLAB, I have a block of text that I need to split apart. This is an example of such text:
ROW SHORT-NAME TYPE y1 y2 yRef eq_lhs eq_rhs eq_ref errorCon tolerance isConverged
1 CmpFan.S_Qhx.integ_TmatI +6.3631e+002 +0.0000e+000 +6.3631e+002 TgasPath Tmat TgasPath +1.0000e+000 +1.0000e-004 FALSE DY1I1
2 CmpL.S_Qhx.integ_Tmat I +8.0865e+002 +0.0000e+000 +8.0865e+002 TgasPath Tmat TgasPath +1.0000e+000 +1.0000e-004 FALSE DY2I1
3 CmpH.S_Qhx.integ_Tmat I +1.2874e+003 +0.0000e+000 +1.2874e+003 TgasPath Tmat TgasPath +1.0000e+000 +1.0000e-004 FALSE DY3I1
4 BrnPri.S_Qhx.integ_TmatI +2.8494e+003 +0.0000e+000 +2.8494e+003 TgasPath Tmat TgasPath +1.0000e+000 +1.0000e-004 FALSE DY4I1
5 TrbH.S_Qhx.integ_Tmat I +3.3983e+003 +0.0000e+000 +3.3983e+003 TgasPath Tmat TgasPath +1.0000e+000 +1.0000e-004 FALSE DY5I1
6 TrbL.S_Qhx.integ_Tmat I +2.6320e+003 +0.0000e+000 +2.6320e+003 TgasPath Tmat TgasPath +1.0000e+000 +1.0000e-004 FALSE DY6I1
7 BrnAug.S_Qhx.integ_TmatI +1.6385e+003 +0.0000e+000 +1.6385e+003 TgasPath Tmat TgasPath +1.0000e+000 +1.0000e-004 FALSE DY7I1
8 dep_FanCustomerBleed D +0.0000e+000 +0.0000e+000 +1.0000e-001 CmpFan.CbldAPTMS.WbldFanBleed 0.1 +0.0000e+000 +1.0000e-002 TRUE DY8I1
9 dep_LPCCustomerBleed D +0.0000e+000 +0.0000e+000 +1.0000e-001 CmpL.CbldAPTMS.WbldLPCbleed 0.1 +0.0000e+000 +1.0000e-002 TRUE DY9I1
10 dep_HPCCustomerBleed D +3.0000e+000 +3.0000e+000 +1.0000e-001 CmpH.CbldAPTMS.WbldHPCbleed 0.1 +0.0000e+000 +1.0000e-002 TRUE DY10I1
11 dep_HPCCustomerBleedMidD +0.0000e+000 +0.0000e+000 +1.0000e-001 CmpH.CbldAPTMSmid.WbldHPCmidBleed 0.1 +0.0000e+000 +1.0000e-002 TRUE DY11I1
12 dep_HPXhigh D +2.0000e+002 +2.0000e+002 +2.0000e+002 ShH.HPX HPXhigh HPXhigh +0.0000e+000 +1.0000e-004 TRUE DY12I1
13 dep_HPXlow D +5.0000e+002 +5.0000e+002 +5.0000e+002 ShL.HPX HPXlow HPXlow +0.0000e+000 +1.0000e-004 TRUE DY13I1
14 FlowControl.dep_Tt D +8.6941e+002 +9.2300e+002 +9.2300e+002 Fl_I.Tt Fl_O.Tt FlowControl.Fl_O.Tt-5.8056e-002 +1.0000e-004 FALSE DY14I1
15 FlowControl.dep_Pt D +7.0096e+001 +8.5000e+001 +8.5000e+001 Fl_I.Pt Fl_O.Pt FlowControl.Fl_O.Pt-1.7534e-001 +1.0000e-004 FALSE DY15I1
16 FlowControl.dep_W D +8.0000e-002 +5.4000e-001 +5.4000e-001 Fl_I.W Fl_O.W FlowControl.Fl_O.W-8.5185e-001 +1.0000e-004 FALSE DY16I1
17 MoveCCA.dep_Tt D +1.7141e+003 +1.7310e+003 +1.7310e+003 Fl_I.Tt Fl_O.Tt MoveCCA.Fl_O.Tt-9.7494e-003 +1.0000e-004 FALSE DY17I1
18 MoveCCA.dep_Pt D +7.0096e+002 +6.9900e+002 +6.9900e+002 Fl_I.Pt Fl_O.Pt MoveCCA.Fl_O.Pt+2.8001e-003 +1.0000e-004 FALSE DY18I1
19 MoveCCA.dep_W D +3.4000e+001 +2.2000e+001 +2.2000e+001 Fl_I.W Fl_O.W MoveCCA.Fl_O.W +5.4545e-001 +1.0000e-004 FALSE DY19I1
20 dep_CCAflow D +3.4000e+001 +3.4000e+001 +3.4000e+001 CmpH.B_CCA.WbldCCAflow CCAflow +0.0000e+000 +1.0000e-004 TRUE DY20I1
21 ShH.integrate_Nmech I -2.6321e+003 +0.0000e+000 +2.4194e+004 trqNet 0.0000 trqIn -1.0879e-001 +1.0000e-004 FALSE DY21I1
22 ShL.integrate_Nmech I -5.1547e+003 +0.0000e+000 +3.0562e+004 trqNet 0.0000 trqIn -1.6866e-001 +1.0000e-004 FALSE DY22I1
23 DESIGN_OPR D +5.0176e+001 +5.0000e+001 +5.0000e+001 Overall_PR D_OPR 50.0 +3.5200e-003 +1.0000e-004 FALSE DY23I1
24 DESIGN_T41 D +3.7465e+003 +3.5500e+003 +3.8000e+003 TrbH.Fl_I.Tt D_T41 3800.0 +5.1708e-002 +1.0000e-004 FALSE DY24I1
25 DESIGN_CombinedFanPR D +5.1200e+000 +4.9000e+000 +5.0000e+001 CmpFan.PR*CmpL.PRD_FANPR 50.0 +4.4000e-003 +1.0000e-004 FALSE DY25I1
26 DESIGN_ThirdStreamFlowD +9.9099e-002 +9.2500e-002 +1.3000e-001 ThirdStreamFlowD_ThirdStreamFlow0.13 +5.0762e-002 +1.0000e-004 FALSE DY26I1
27 DESIGN_RMIX D +8.2279e-001 +1.0500e+000 +1.0500e+000 Mixer.RMIX D_RMIX 1.05 -2.1639e-001 +1.0000e-004 FALSE DY27I1
28 DESIGN_Wc D +4.2158e+002 +4.2500e+002 +4.0000e+002 CmpFan.Fl_I.Wc D_WAC 400.0 -8.5534e-003 +1.0000e-004 FALSE DY28I1
Each line has the same type of information in it, but unfortunately the way it is produced, there is not necessarily space between terms. When this happens, it becomes difficult/impossible to know where to split terms. I would be OK with losing some of the string information columns in the middle, but I still need to be able to get the numbers somehow.
For rows like 13 where things are nicely spaced, something like the following works nicely (where one line is stored in the variable "txt"):
>>asCells = textscan(txt,'%d %s %c %f %f %f %s %s %s %f %f %s %s');
>> depTxt = asCells{2}{1}
depTxt =
'dep_HPXlow'
>> type = asCells{3}
type =
'D'
>> y1 = asCells{4}
y1 =
500
>> y2 = asCells{5}
y2 =
500
>> yRef = asCells{6}
yRef =
500
>> lhsTxt = asCells{7}{1}
lhsTxt =
'ShL.HPX'
>> rhsTxt = asCells{8}{1}
rhsTxt =
'HPXlow'
>> depTxt = asCells{9}{1}
depTxt =
'HPXlow'
>> err = asCells{10}
err =
0
>> tol = asCells{11}
tol =
0.0001
>> if strncmp('TRUE',asCells{12}{1},4), conv = 1, else, conv = 0, end
conv =
1
For something like row 11 that doesn't work at all since the first string and the character run together, throwing off the format string. Similarly, there is no way that it could know that the "CmpH.CbldAPTMSmid.WbldHPCmidBleed" piece should be broken up into "CmpH.CbldAPTMSmid.Wbld" and "HPCmidBleed". I'd be OK losing the eq_lhs, eq_rhs, and eq_ref info if there was a way to still get the numbered items and convergence flag later on, but that's where I'm struggling.
I can grab the first string (which I do need to keep) like this:
asCells = textscan(txt,'%d %s',1);
>> depTxt = asCells{2}{1}
depTxt =
'dep_HPCCustomerBleedMidD'
But I am not sure how to conditionally strip off the last character based on whether it ran into the TYPE column or not.
I noticed that all the actual numbers have a leading plus or minus and are in scientific notation (the numbers in the eq_ref column are really strings in this case). So I tried to use regexp to grab the numeric values like this:
>> asCells=regexp(txt,'[+-]\d+\.?\d*([eE][+-]?\d+)?','match','forceCellOutput');
>> y1 = str2double(asCells{1}{1})
y1 =
0
>> y2 = str2double(asCells{1}{2})
y2 =
0
>> yRef = str2double(asCells{1}{3})
yRef =
0.1
>> err = str2double(asCells{1}{4})
err =
0
>> tol = str2double(asCells{1}{5})
tol =
0.01
That seems to work OK, but I have no idea how to combine that with grabbing that string up front (especially with the need to conditionally strip off the I or D TYPE character). I'm also not sure how to get the convergence flag when it's not consistent which term it would be in the row based on the spacing. Can I regex search for the string TRUE or FALSE on each line? I think I'm close, but I'm struggling as to how to put all the pieces together.
Here's a solution that gets you everything except eq_lhs, eq_rhs, and eq_ref. Had to do 2 passes of the regex because I couldn't capture TYPE from inside a lookahead expression for the first pass (maybe its possible but I don't quite know how...).
% load data from txt file
fptr = fopen('myData.txt');
s = fread(fptr, inf, 'uint8=>char');
fclose(fptr);
s = s';
% expressions w/named tokens
exprVarType = '(I|D)(?=\s+[+-])';
exprLineStart = '(?<row>\d+)\s+(?<shortName>[^\s]+(?=(\s*I|\s*D|(\s+(I|D)))))+[^+-]+';
exprSciNot1 = '(?<y1>[+\-]?(?:0|[1-9]\d*)(?:\.\d*)?(?:[eE][+\-]?\d+)?)';
exprSciNot2 = '(?<y2>[+\-]?(?:0|[1-9]\d*)(?:\.\d*)?(?:[eE][+\-]?\d+)?)';
exprSciNot3 = '(?<yref>[+\-]?(?:0|[1-9]\d*)(?:\.\d*)?(?:[eE][+\-]?\d+)?)';
exprSciNot4 = '(?<errorCon>[+\-]?(?:0|[1-9]\d*)(?:\.\d*)?(?:[eE][+\-]?\d+)?)';
exprSciNot5 = '(?<tolerance>[+\-]?(?:0|[1-9]\d*)(?:\.\d*)?(?:[eE][+\-]?\d+)?)';
% concatenate regexs
myExpr = strcat(exprLineStart, exprSciNot1, '\s+', exprSciNot2,...
'\s+', exprSciNot3, '[^+-]+', exprSciNot4, '\s+', exprSciNot5, '\s+',...
'(\w+)', '\s+', '([^\r\n]+)');
% first pass: collect all variables except
% 3. Type
% 7. eq_lhs
% 8. eq_rhs
% 9. eq_ref
myData = regexp(s, myExpr, 'names');
% second pass: collect variable type
% couldnt capture this on the first pass because its part of a lookahead
% expression
varType = regexp(s, exprVarType, 'match');
% assign varType to the myData struct
[myData.varType] = deal(varType{:});

Subsetting using a Bool-Vector in Rcpp-Function (problems of a Rcpp Beginner...)

Problem description (think of a membership with different prices for adults and kids):
I am having two data sets, one containing age and a code. A second dataframe "decodes" the codes to numeric values dependent someone is a kid or adult. I know want to match the codes in both data sets and receive a vector that contains numeric values for each customer in the data set.
I can make this work with standard R-functionalities, but since my original data contains several million observations I would like to speed up computation using the Rcpp package.
Unfortunately I do not succeed, especially how to perform the subsetting based on a logical vector as I would do it in R. I am quite new to Rcpp and have no experience with C++ so I am maybe missing some very basic point.
I attached a minimum working example for R and appreciate any kind of help or explanation!
library(Rcpp)
raw_data = data.frame(
age = c(10, 14, 99, 67, 87, 54, 12, 44, 22, 8),
iCode = c("code1", "code2", "code3", "code1", "code4", "code3", "code2", "code5", "code5", "code3"))
decoder = data.frame(
code = c("code1","code2","code3","code4","code5"),
kid = c(0,0,0,0,100),
adult = c(100,200,300,400,500))
#-------- R approach (works, but takes ages for my original data set)
calc_value = function(data, decoder){
y = nrow(data)
for (i in 1:nrow(data)){
position_in_decoder = (data$iCode[i] == decoder$code)
if (data$age[i] > 18){
y[i] = decoder$adult[position_in_decoder]
}else{
y[i] = decoder$kid[position_in_decoder]
}
}
return(y)
}
y = calc_value(raw_data, decoder)
#--------- RCPP approach (I cannot make this one work) :(
cppFunction(
'NumericVector calc_Rcpp(DataFrame df, DataFrame decoder) {
NumericVector age = df["age"];
CharacterVector iCode = df["iCode"];
CharacterVector code = decoder["code"];
NumericVector adult = decoder["adult"];
NumericVector kid = decoder["kid"];
const int n = age.size();
LogicalVector position;
NumericVector y(n);
for (int i=0; i < n; ++i) {
position = (iCode[i] == code);
if (age[i] > 18 ) y[i] = adult[position];
else y[i] = kid[position];
}
return y;
}')
There is no need to go for C++ here. Just use R properly:
raw_data = data.frame(
age = c(10, 14, 99, 67, 87, 54, 12, 44, 22, 8),
iCode = c("code1", "code2", "code3", "code1", "code4", "code3", "code2", "code5", "code5", "code3"))
decoder = data.frame(
code = c("code1","code2","code3","code4","code5"),
kid = c(0,0,0,0,100),
adult = c(100,200,300,400,500))
foo <- merge(raw_data, decoder, by.x = "iCode", by.y = "code")
foo$res <- ifelse(foo$age > 18, foo$adult, foo$kid)
foo
#> iCode age kid adult res
#> 1 code1 10 0 100 0
#> 2 code1 67 0 100 100
#> 3 code2 14 0 200 0
#> 4 code2 12 0 200 0
#> 5 code3 54 0 300 300
#> 6 code3 99 0 300 300
#> 7 code3 8 0 300 0
#> 8 code4 87 0 400 400
#> 9 code5 44 100 500 500
#> 10 code5 22 100 500 500
That should also work for large data sets.

How to define path for input file and out file

How to define input path and output path (Both are different location) and input file name and output file name will be same?
I want to add input and output directory to take and save input and output on respective directory but input and output file name will be same.
Simply, I want to make modification of my code to take input files from input directory folder and save output on output directory folder.
Now my code is working but at a time only one file taking, I want to run my same code for multiple files.
Source:
IMPLICIT DOUBLE PRECISION (A-H,O-Z)
c
c iaa: residue name
c iat: atom name
CHARACTER iaa*4,iat*4,chain*1
DIMENSION iaa(199),iat(199),chain(199)
c
c xpr,ypr,zpr: orthogonal coordinates
c irn: residue number
DIMENSION xpr(199),ypr(199),zpr(199),irn(199),taa(5)
c
c set pi value and rad to deg conversion factor
pi = 0.4D+01*DATAN(0.1D+01)
rad2deg = 0.18D+03/pi
deg2rad = pi/0.18D+03
c
fprec = 2.0*(sin(36.0*deg2rad)+sin(72.0*deg2rad))
c
c open the input file
OPEN (1,FILE="1e3i_A.txt")
c
c read the xyzs, atom and res name and res #
i = 1
396 READ (1,500,END=398) iat(i),iaa(i),chain(i),irn(i),xpr(i),
1 ypr(i),zpr(i)
c WRITE(*,500) iat(i),iaa(i),chain(i),irn(i),xpr(i),ypr(i),zpr(i)
i = i+1
c
c ensure arrays do not overflow
IF (i .GT. 199) THEN
WRITE(*,618)
STOP
ENDIF
c
c loop back and read the next atom entry
GO TO 396
c
c set the number of atoms, nat; close input file
398 nat = i-1
IF (nat .NE. 40) THEN
WRITE(*,632) nat
STOP
ENDIF
CLOSE(1)
c
c open output file
OPEN (3,FILE="output.dat")
c
c loop over all atoms; first get B of A-B-C-D
c
c irt = count of ring torsion angles
irt = 0
DO 716 i1 = 1,nat,4
j1 = i1
j2 = i1+1
j3 = i1+2
j4 = i1+3
c
c calculate the relevant vectors: j1-j2, j2-j3, j3-j4
d11 = xpr(j2) - xpr(j1)
d12 = ypr(j2) - ypr(j1)
d13 = zpr(j2) - zpr(j1)
d21 = xpr(j3) - xpr(j2)
d22 = ypr(j3) - ypr(j2)
d23 = zpr(j3) - zpr(j2)
d31 = xpr(j4) - xpr(j3)
d32 = ypr(j4) - ypr(j3)
d33 = zpr(j4) - zpr(j3)
c
c normals to j1-j2-j3 AND j2-j3-j4
c cross product of j1-j2 and j2-j3 AND of j2-j3 and j3-j4
p1 = d12*d23 - d13*d22
p2 = d13*d21 - d11*d23
p3 = d11*d22 - d12*d21
q1 = d22*d33 - d23*d32
q2 = d23*d31 - d21*d33
q3 = d21*d32 - d22*d31
c
c calculate cos-of-TA: angle between vectos p and q
ta = (p1*q1+p2*q2+p3*q3)/(dsqrt(p1*p1+p2*p2+p3*p3)*
1 dsqrt(q1*q1+q2*q2+q3*q3))
IF (DABS(ta) .GT. 1.0) ta = 1.0
c
c calculate magnitude of TA and convert to degrees
c ta = DACOS(ta)*rad2deg
ta = DACOS(ta)
c
c find sign of TA: find cross product of p and q
x1 = p2*q3 - p3*q2
y1 = p3*q1 - p1*q3
z1 = p1*q2 - p2*q1
c
c if not a null vector, find if parallel to j2-j3 or antiparallel
IF (DABS(x1)+DABS(y1)+DABS(z1) .NE. 0.0) THEN
xx = x1*d21 + y1*d22 + z1*d23
IF (xx .LT. 0.0) ta = -ta
ENDIF
irt = irt+1
taa(irt) = ta
IF (irt .EQ. 5) THEN
fnum = taa(5)+taa(2)-taa(4)-taa(1)
fden = fprec*taa(3)
tanp = fnum/fden
phase = DATAN(tanp)*rad2deg
IF (taa(3) .LT. 0) phase = phase+180.0
IF (phase .LT. 0) phase = phase+360.0
WRITE(*,624) phase
irt = 0
ENDIF
c
c output the value
c WRITE(3,606) iat(j1),irn(j1),iat(j2),irn(j2),iat(j3),
c 2 irn(j3),iat(j4),irn(j4),ta
716 CONTINUE
c
c close output file and exit
CLOSE(3)
c WRITE(*,620)
c
STOP
500 FORMAT(13X,2A4,A1,I4,4X,3F8.3)
502 FORMAT(A20)
606 FORMAT(1X,A4,I4,'... ',A4,I4,'... ',A4,I4,'... ',A4,I4,F12.3)
618 FORMAT(/' ERROR: number of atoms > 199'/)
620 FORMAT(/' Bye...Bye....'/)
622 FORMAT(/' Input: number of atoms ',I4,
3 ' is not a multiple of 4'/)
624 FORMAT(F12.3)
632 FORMAT(/' No. of atoms is',I4,' but expected only 40'/)
END

Parsing text file in matlab

I have this txt file:
BLOCK_START_DATASET
dlcdata L:\loads\confidential\000_Loads_Analysis_Environment\Tools\releases\01_Preprocessor\Version_3.0\Parameterfiles\Bladed4.2\DLC-Files\DLCDataFile.txt
simulationdata L:\loads\confidential\000_Loads_Analysis_Environment\Tools\releases\01_Preprocessor\Version_3.0\Parameterfiles\Bladed4.2\DLC-Files\BladedFile.txt
outputfolder Pfadangabe\runs_test
windfolder L:\loads2\WEC\1002_50-2\_calc\50-2_D135_HH95_RB-AB66-0O_GL2005_towerdesign_Bladed_v4-2_revA01\_wind
referenzfile_servesea L:\loads\confidential\000_Loads_Analysis_Environment\Tools\releases\01_Preprocessor\Version_3.0\Dataset_to_start\Referencefiles\Bladed4.2\DLC\dlc1-1_04a1.$PJ
referenzfile_generalsea L:\loads\confidential\000_Loads_Analysis_Environment\Tools\releases\01_Preprocessor\Version_3.0\Dataset_to_start\Referencefiles\Bladed4.2\DLC\dlc6-1_000_a_50a_022.$PJ
externalcontrollerdll L:\loads\confidential\000_Loads_Analysis_Environment\Tools\releases\01_Preprocessor\Version_3.0\Dataset_to_start\external_Controller\DisCon_V3_2_22.dll
externalcontrollerparameter L:\loads\confidential\000_Loads_Analysis_Environment\Tools\releases\01_Preprocessor\Version_3.0\Dataset_to_start\external_Controller\ext_Ctrl_Data_V3_2_22.txt
BLOCK_END_DATASET
% ------------------------------------
BLOCK_START_WAVE
% a6*x^6 + a5*x^5 + a4*x^4 + a3*x^3 + a2*x^2 + a1*x + a0
factor_hs 0.008105;0.029055;0.153752
factor_tz -0.029956;1.050777;2.731063
factor_tp -0.118161;1.809956;3.452903
spectrum_gamma 3.3
BLOCK_END_WAVE
% ------------------------------------
BLOCK_START_EXTREMEWAVE
height_hs1 7.9
period_hs1 11.8
height_hs50 10.8
period_hs50 13.8
height_hred1 10.43
period_hred1 9.9
height_hred50 14.26
period_hred50 11.60
height_hmax1 14.8
period_hmax1 9.9
height_hmax50 20.1
period_hmax50 11.60
BLOCK_END_EXTREMEWAVE
% ------------------------------------
BLOCK_START_TIDE
normal 0.85
yr1 1.7
yr50 2.4
BLOCK_END_TIDE
% ------------------------------------
BLOCK_START_CURRENT
velocity_normal 1.09
velocity_yr1 1.09
velocity_yr50 1.38
BLOCK_END_CURRENT
% ------------------------------------
BLOCK_START_EXTREMEWIND
velocity_v1 29.7
velocity_v50 44.8
velocity_vred1 32.67
velocity_vred50 49.28
velocity_ve1 37.9
velocity_ve50 57
velocity_Vref 50
BLOCK_END_EXTREMEWIND
% ------------------------------------
Currently I'm parsing it this way:
clc, clear all, close all
%Find all row headers
fid = fopen('test_struct.txt','r');
row_headers = textscan(fid,'%s %*[^\n]','CommentStyle','%','CollectOutput',1);
row_headers = row_headers{1};
fclose(fid);
%Find all attributes
fid1 = fopen('test_struct.txt','r');
attributes = textscan(fid1,'%*s %s','CommentStyle','%','CollectOutput',1);
attributes = attributes{1};
fclose(fid1);
%Collect row headers and attributes in a single cell
parameters = [row_headers,attributes];
%Find all the blocks
startIdx = find(~cellfun(#isempty, regexp(parameters, 'BLOCK_START_', 'match')));
endIdx = find(~cellfun(#isempty, regexp(parameters, 'BLOCK_END_', 'match')));
assert(all(size(startIdx) == size(endIdx)))
%Extract fields between BLOCK_START_ and BLOCK_END_
extract_fields = #(n)(parameters(startIdx(n)+1:endIdx(n)-1,1));
struct_fields = arrayfun(extract_fields, 1:numel(startIdx), 'UniformOutput', false);
%Extract attributes between BLOCK_START_ and BLOCK_END_
extract_attributes = #(n)(parameters(startIdx(n)+1:endIdx(n)-1,2));
struct_attributes = arrayfun(extract_attributes, 1:numel(startIdx), 'UniformOutput', false);
%Get structure names stored after each BLOCK_START_
structures_name = #(n) strrep(parameters{startIdx(n)},'BLOCK_START_','');
structure_names = genvarname(arrayfun(structures_name,1:numel(startIdx),'UniformOutput',false));
%Generate structures
for i=1:numel(structure_names)
eval([structure_names{i} '=cell2struct(struct_attributes{i},struct_fields{i},1);'])
end
It works, but not as I want. The overall idea is to read the file into one structure (one field per block BLOCK_START / BLOCK_END). Furthermore, I would like the numbers to be read as double and not as char, and delimiters like "whitespace" "," or ";" have to be read as array separator (e.g. 3;4;5 = [3;4;5] and similar).
To clarify better, I will take the block
BLOCK_START_WAVE
% a6*x^6 + a5*x^5 + a4*x^4 + a3*x^3 + a2*x^2 + a1*x + a0
factor_hs 0.008105;0.029055;0.153752
factor_tz -0.029956;1.050777;2.731063
factor_tp -0.118161;1.809956;3.452903
spectrum_gamma 3.3
BLOCK_END_WAVE
The structure will be called WAVE with
WAVE.factor_hs = [0.008105;0.029055;0.153752]
WAVE.factor_tz = [-0.029956;1.050777;2.731063]
WAVE.factor_tp = [-0.118161;1.809956;3.452903]
WAVE.spectrum.gamma = 3.3
Any suggestion will be strongly appreciated.
Best regards.
You have answers to this question (which is also yours) as a good starting point! To extract everything into a cell array, you do:
%# Read data from input file
fd = fopen('test_struct.txt', 'rt');
C = textscan(fd, '%s', 'Delimiter', '\r\n', 'CommentStyle', '%');
fclose(fd);
%# Extract indices of start and end lines of each block
start_idx = find(~cellfun(#isempty, regexp(C{1}, 'BLOCK_START', 'match')));
end_idx = find(~cellfun(#isempty, regexp(C{1}, 'BLOCK_END', 'match')));
assert(all(size(start_idx) == size(end_idx)))
%# Extract blocks into a cell array
extract_block = #(n)({C{1}{start_idx(n):end_idx(n) - 1}});
cell_blocks = arrayfun(extract_block, 1:numel(start_idx), 'Uniform', false);
Now, to translate that into corresponding structs, do this:
%# Iterate over each block and convert it into a struct
for i = 1:length(cell_blocks)
%# Extract the block
C = strtrim(cell_blocks{i});
C(cellfun(#(x)isempty(x), C)) = []; %# Ignore empty lines
%# Parse the names and values
params = cellfun(#(s)textscan(s, '%s%s'), {C{2:end}}, 'Uniform', false);
name = strrep(C{1}, 'BLOCK_START_', ''); %# Struct name
fields = cellfun(#(x)x{1}{:}, params, 'Uniform', false);
values = cellfun(#(x)x{2}{:}, params, 'Uniform', false);
%# Create a struct
eval([name, ' = cell2struct({values{idx}}, {fields}, 2)'])
end
Well, I've never used matlab, but you could use the following regex to find a block:
/BLOCK_START_(\w+).*?BLOCK_END_\1/s
Then for each block, find all the attributes:
/^(?!BLOCK_END_)(\w+)\s+((?:-?\d+\.?\d*)(?:;(?:-?\d+\.?\d*))*)/m
Then based on the presence of semi colons in the second sub match you could assign it as either a single or multiple value variable. Not sure how to translate that into matLab, but I hope this helps!