Write results between different text while adapting spaces in Fortran - fortran

I try in a small code to write output results with numerical values between various text.
For the moment, I do :
! Print results
write(*,*)
write(*,*) ' Time step = ',dt
write(*,*)
write(*,1001) epsilon,step
write(*,*)
write(*,*) ' Problem size = ',size_x*size_y
write(*,*)
write(*,1002) elapsed_time
write(*,*)
write(*,*) ' Computed solution in seq.dat file '
write(*,*)
! Formats available to display the computed values on the grid
1001 format(' Convergence = ',f11.9,' after ',i9,' steps ')
1002 format(' Wall Clock = ',f15.6)
which produces at the execution :
Time step = 0.000003755783907217
Convergence = 0.100000000 after 8882 steps
Problem size = 24576
Wall Clock = 5.213814
Computed solution in Seq.dat
My issue is about the line "Wall Clock = 5.213814", I would like to get only one space juste after "Wall Clock =" before the value "5.213814". Currently, I think these multiple spaces that I get come from the "f15.6" with 1002 format(' Wall Clock = ',f15.6).
Here's what I want to get (with another value for steps) :
Time step = 0.000003755783907217
Convergence = 0.100000000 after 20910988821 steps
Problem size = 24576
Wall Clock = 5.213814
Computed solution in Seq.dat
I have set "f15.6" since I can get high number for "Wall Clock", same thing for espilon and step variables.
I don't know in all cases how to set just one space between words and values to write between them, as when I printf, in C language, different values and words on the same line.
I know there's a simple solution but can't find it.
UPDATE 1 :
I tried the solution indicated in the first answer.
Here's what I have done :
write(*,1001) epsilon,step
write(*,1002) elapsed_time
1001 format(' Convergence = ',f0.9,' after ',i9,' steps ')
1002 format(' Wall Clock = ',f0.6)
and I get :
Convergence = .100000000 after 8882 steps
Problem size = 24576
Wall Clock = 2.492813
As you can see, "Convergence" value is .100000000 instead of 0.100000000 (leading zero has disappeared).
And what about the integers values, can I write "i0" to have as few as possible ?
Thanks

Modern Fortran compilers understand a 'length' of 0 to mean: As few as possible:
program write_format
use iso_fortran_env, only: real64
implicit none
print 1001, 5.213814
print 1001, 12345678.901234_real64
1001 format("Wall Clock = ", f0.6)
end program write_format
Output:
Wall Clock = 5.213814
Wall Clock = 12345678.901234
Cheers
Usually it's not liked to update the question after the answer to ask additional questions, but since they're quite similar, I think it's okay.
Firstly, yes, format I0 means as few digits as necessary, and probably is what you want.
The second part is trickier, it seems to boil down to 'at least that many digits, but more if needed' -- and I don't think there's a format specifier for that (but I might be wrong).
I'd probably cheat and use something like this:
if (epsilon < 10.) then
write(*, 1002) epsilon
else
write(*, 1003) epsilon
end if
1002 format("Convergenge = ", f11.9)
1003 format("Convergence = ", f0.9)
But then again, I also found this answer quite intuitive: How to pad FORTRAN floating point output with leading zeros?
Adapted for you, it would mean splitting the floating point number into an integer and the rest, and putting it back together again:
write(*, 1002) int(epsilon), epsilon-int(epsilon)
1002 format("Convergence = ", I0, F0.9)

this is a bit cumbersome, but one way to get minimum width and preserve the lead zero is to use an internal write like this:
character*30 val
write(val,'(f11.9)')0.1d0
write(*,'(3a,i0,a)')'converge = ',trim(adjustl(val)),' after ',32432,' steps'
converge = 0.100000000 after 32432 steps

Related

How can I generate a square wave plot of a pulse train of multiple signals from the data in a csv file (in Linux)?

For instance, given the data in a text file:
10:37:18.459 1
10:37:18.659 0
10:37:19.559 1
How could this be displayed as an image that looked like a square wave that correctly represented the high time and low time? I am trying both gnuplot and scipy. The result should ultimately include more than one sensor, and all plots would have to be displayed above one another so as to show a time delta.
The code in the following link creates a square wave from the formulas listed,
link to waveforms. How can the lower waveform (pwm) be driven by the numbers above if they were in a file (to show a high state for 200 ms, then a low state for 100 ms, and finally a high state)?
If I understood your question correctly you want to plot a step function based on timedata. To avoid further guessing please specify in more detail.
In gnuplot there is the plotting style with steps. Check help steps.
Code:
### display waveform as steps
reset sesion
$Data <<EOD
10:37:18.459 1
10:37:18.659 0
10:37:19.559 1
10:37:19.789 0
10:37:20.123 1
10:37:20.456 0
10:37:20.789 1
EOD
set yrange [-0.05:1.2]
myTimeFmt = "%H:%M:%S" # input time format
set format x "%M:%.1S" time # output time format on x axis
plot $Data u (timecolumn(1,myTimeFmt)):2 w steps lc rgb "red" lw 2 ti "my square wave"
### end of code
Result:
The answer I ended up with was:
file_info = os.stat( self.__outfile)
if file_info.st_size:
x,y,z,a = np.genfromtxt( self.__outfile, delimiter=',',unpack=True )
fig = plt.figure(self.__outfile)
ax = fig.add_subplot(111)
fig.canvas.draw()
test_array = [(datetime.datetime.utcfromtimestamp(e2).strftime('%d_%H:%M:%S.%f')).rstrip('0') for e2 in x]
plt.xticks(x, test_array)
l1, = plt.plot(x,y, drawstyle='steps-post')
l2, = plt.plot(x,a-2, drawstyle='steps-post')
l3, = plt.plot(x,z-4, drawstyle='steps-post')
ax.grid()
ax.set_xlabel('Time (s)')
ax.set_ylabel('HIGH/LOW')
ax.set_ylim((-6.5,1.5))
ax.set_title('Sensor Sequence')
fig.autofmt_xdate()
ax.legend([l1,l2, l3],['sprinkler','lights', 'alarm'], loc='lower left')
plt.show()
I had a input file that had convertDateToFloat values in it. That was passed in to this function. The name is perhaps misleading (__outfile), but on the previous function, it was the output.

How to iterate a python list and compare items in a string or another list

Following my earlier question, I have tried to work on a code to return a string if a search term in a certain list is in a string to be returned as follows.
import re
from nltk import tokenize
from nltk.tokenize import sent_tokenize
def foo():
List1 = ['risk','cancer','ocp','hormone','OCP',]
txt = "Risk factors for breast cancer have been well characterized. Breast cancer is 100 times more frequent in women than in men.\
Factors associated with an increased exposure to estrogen have also been elucidated including early menarche, late menopause, later age\
at first pregnancy, or nulliparity. The use of hormone replacement therapy has been confirmed as a risk factor, although mostly limited to \
the combined use of estrogen and progesterone, as demonstrated in the WHI (2). Analysis showed that the risk of breast cancer among women using \
estrogen and progesterone was increased by 24% compared to placebo. A separate arm of the WHI randomized women with a prior hysterectomy to \
conjugated equine estrogen (CEE) versus placebo, and in that study, the use of CEE was not associated with an increased risk of breast cancer (3).\
Unlike hormone replacement therapy, there is no evidence that oral contraceptive (OCP) use increases risk. A large population-based case-control study \
examining the risk of breast cancer among women who previously used or were currently using OCPs included over 9,000 women aged 35 to 64 \
(half of whom had breast cancer) (4). The reported relative risk was 1.0 (95% CI, 0.8 to 1.3) among women currently using OCPs and 0.9 \
(95% CI, 0.8 to 1.0) among prior users. In addition, neither race nor family history was associated with a greater risk of breast cancer among OCP users."
words = txt
corpus = " ".join(words).lower()
sentences1 = sent_tokenize(corpus)
a = [" ".join([sentences1[i-1],j]) for i,j in enumerate(sentences1) if [item in List1] in word_tokenize(j)]
for i in a:
print i,'\n','\n'
foo()
The problem is that the python IDLE does not print anything. What could I have done wrong. What it does is run the code and I get this
>
>
Your question isn't very clear to me so please correct me if i'm getting this wrongly. Are you trying to match the list of keywords (in list1) against the text (in txt)? That is,
For each keyword in list1
Do a match against every sentences in txt.
Print the sentence if they matches?
Instead of writing a complicated regular expression to solve your problem I have broken it down into 2 parts.
First I break the whole lot of text into a list of sentences. Then write simple regular expression to go through every sentences. Trouble with this approach is that it is not very efficient but hey it solves your problem.
Hope this small chunk of code can help guide you to the real solution.
def foo():
List1 = ['risk','cancer','ocp','hormone','OCP',]
txt = "blah blah blah - truncated"
words = txt
matches = []
sentences = re.split(r'\.', txt)
keyword = List1[0]
pattern = keyword
re.compile(pattern)
for sentence in sentences:
if re.search(pattern, sentence):
matches.append(sentence)
print("Sentence matching the word (" + keyword + "):")
for match in matches:
print (match)
--------- Generate random number -----
from random import randint
List1 = ['risk','cancer','ocp','hormone','OCP',]
print(randint(0, len(List1) - 1)) # gives u random index - use index to access List1

PDL matrix confusion

I have a simple but large data file. It's output from a neural network simulation. The first column is a time step, 1..200. The second is the target word (for the current simulation, 1..212). Then there are 212 columns, one for each word. That is, each row has the activation values of each word node at a particular time step given a particular target (input) word.
I need to do simple operations, such as converting each activation to a response strength (exp(constant x activation)) and then dividing each response strength by the row sum of response strength. Doing this in R is very slow (20 minutes), and doing it with conventional looping in perl is faster but still slow (7 minutes) given that later simulations will involve thousands of words.
It seems like PDL should be able to do this much more quickly. I've been reading the PDL documentation, but I'm really at a loss for how to do the second step. The first one seems as easy as selecting just the activation columns and putting them in $act and then:
$rp = exp($act * $k);
But I can't figure out how then to divide each value by its row sum. Any advice would be appreciated.
Thanks!
It looks like you need to make a copy of the matrix, then use the first one to read from, and the second to write too.
NOTE using $c++ instead of the for $loop() { might be more efficient ! }
$x = sequence(3,3)*2+1;
[ 1 3 5]
[ 7 9 11]
[13 15 17]
$y .= $x; # if you use = here it will change both x and y
for $c(0..2) { for $d(0..2) { $y($c,$d) .= $y($c,$d) / sum($x(,$d)) }}
p $y;
[0.11111111 0.33333333 0.55555556]
[0.25925926 0.33333333 0.40740741]
[0.28888889 0.33333333 0.37777778]
As is often the case in PDL, a good answer to this involves slicing and indices.
$k = 0.7; # made-up value
$data = zeroes 214,200;
$data((0)) .= sequence(200) + 1; # column 0=1..200
$data((1)) .= indx(zeroes(200)->random*212) + 1; # column 1 randomly 1..212
$data(2:-1)->inplace->random; # rest of columns random values for this demo
$indices = ($data(1)+1)->append($data((0))->sequence->transpose); # indices are [column 1 value,row index]
$act = $data->indexND($indices); # vector of the activation values
$rp = exp($act * $k);
$rp /= $data(2:-1)->sumover; # divide by sum of each row's non-index values

Speedy test on R data frame to see if row values in one column are inside another column in the data frame

I have a data frame of marketing data with 22k records and 6 columns, 2 of which are of interest.
Variable
FO.variable
Here's a link with the dput output of a sample of the dataframe: http://dpaste.com/2SJ6DPX
Please let me know if there's a better way of sharing this data.
All I want to do is create an additional binary keep column which should be:
1 if FO.variable is inside Variable
0 if FO.Variable is not inside Variable
Seems like a simple thing...in Excel I would just add another column with an "if" formula and then paste the formula down. I've spent the past hours trying to get this and R and failing.
Here's what I've tried:
Using grepl for pattern matching. I've used grepl before but this time I'm trying to pass a column instead of a string. My early attempts failed because I tried to force grepl and ifelse resulting in grepl using the first value in the column instead of the entire thing.
My next attempt was to use transform and grep based off another post on SO. I didn't think this would give me my exact answer but I figured it would get me close enough for me to figure it out from there...the code ran for a while than errored because invalid subscript.
transform(dd, Keep = FO.variable[sapply(variable, grep, FO.variable)])
My next attempt was to use str_detect, but I don't think this is the right approach because I want the row level value and I think 'any' will literally use any value in the vector?
kk <- sapply(dd$variable, function(x) any(sapply(dd$FO.variable, str_detect, string = x)))
EDIT: Just tried a for loop. I would prefer a vectorized approach but I'm pretty desperate at this point. I haven't used for-loops before as I've avoided them and stuck to other solutions. It doesn't seem to be working quite right not sure if I screwed up the syntax:
for(i in 1:nrow(dd)){
if(dd[i,4] %in% dd[i,2])
dd$test[i] <- 1
}
As I mentioned, my ideal output is an additional column with 1 or 0 if FO.variable was inside variable. For example, the first three records in the sample data would be 1 and the 4th record would be zero since "Direct/Unknown" is not within "Organic Search, System Email".
A bonus would be if a solution could run fast. The apply options were taking a long, long time perhaps because they were looping over every iteration across both columns?
This turned out to not nearly be as simple as I would of thought. Or maybe it is and I'm just a dunce. Either way, I appreciate any help on how to best approach this.
I read the data
df = dget("http://dpaste.com/2SJ6DPX.txt")
then split the 'variable' column into its parts and figured out the lengths of each entry
v = strsplit(as.character(df$variable), ",", fixed=TRUE)
len = lengths(v) ## sapply(v, length) in R-3.1.3
Then I unlisted v and created an index that maps the unlisted v to the row from which it came from
uv = unlist(v)
idx = rep(seq_along(v), len)
Finally, I found the indexes for which uv was equal to its corresponding entry in FO.variable
test = (uv == as.character(df$FO.variable)[idx])
df$Keep = FALSE
df$Keep[ idx[test] ] = TRUE
Or combined (it seems more useful to return the logical vector than the modified data.frame, which one could obtain with dd$Keep = f0(dd))
f0 = function(dd) {
v = strsplit(as.character(dd$variable), ",", fixed=TRUE)
len = lengths(v)
uv = unlist(v)
idx = rep(seq_along(v), len)
keep = logical(nrow(dd))
keep[ idx[uv == as.character(dd$FO.variable)[idx]] ] = TRUE
keep
}
(This could be made faster using the fact that the columns are factors, but maybe that's not intentional?) Compared with (the admittedly simpler and easier to understand)
f1 = function(dd)
mapply(grepl, dd$FO.variable, dd$variable, fixed=TRUE)
f1a = function(dd)
mapply(grepl, as.character(dd$FO.variable),
as.character(dd$variable), fixed=TRUE)
f2 = function(dd)
apply(dd, 1, function(x) grepl(x[4], x[2], fixed=TRUE))
with
> library(microbenchmark)
> identical(f0(df), f1(df))
[1] TRUE
> identical(f0(df), unname(f2(df)))
[1] TRUE
> microbenchmark(f0(df), f1(df), f1a(df), f2(df))
Unit: microseconds
expr min lq mean median uq max neval
f0(df) 57.559 64.6940 70.26804 69.4455 74.1035 98.322 100
f1(df) 573.302 603.4635 625.32744 624.8670 637.1810 766.183 100
f1a(df) 138.527 148.5280 156.47055 153.7455 160.3925 246.115 100
f2(df) 494.447 518.7110 543.41201 539.1655 561.4490 677.704 100
Two subtle but important additions during the development of the timings were to use fixed=TRUE in the regular expression, and to coerce the factors to character.
I would go with a simple mapply in your case, as you correctly said, by row operations will be very slow. Also, (as suggested by Martin) setting fixed = TRUE and apriori converting to character will significantly improve performance.
transform(dd, Keep = mapply(grepl,
as.character(FO.variable),
as.character(variable),
fixed = TRUE))
# VisitorIDTrue variable value FO.variable FO.value Keep
# 22 44888657 Direct / Unknown,Organic Search 1 Direct / Unknown 1 TRUE
# 2 44888657 Direct / Unknown,System Email 1 Direct / Unknown 1 TRUE
# 6 44888657 Direct / Unknown,TV 1 Direct / Unknown 1 TRUE
# 10 44888657 Organic Search,System Email 1 Direct / Unknown 1 FALSE
# 18 44888657 Organic Search,TV 1 Direct / Unknown 1 FALSE
# 14 44888657 System Email,TV 1 Direct / Unknown 1 FALSE
# 24 44888657 Direct / Unknown,Organic Search 1 Organic Search 1 TRUE
# 4 44888657 Direct / Unknown,System Email 1 Organic Search 1 FALSE
...
Here is a data.table approach that I think is very similar in spirit to Martin's:
require(data.table)
dt <- data.table(df)
dt[,`:=`(
fch = as.character(FO.variable),
rn = 1:.N
)]
dt[,keep:=FALSE]
dtvars <- dt[,strsplit(as.character(variable),',',fixed=TRUE),by=rn]
setkey(dt,rn,fch)
dt[dtvars,keep:=TRUE]
dt[,c("fch","rn"):=NULL]
The idea is to
identify all pairs of rn & variable (saved in dtvars) and
see which of these pairs match with rn & F0.variable pairs (in the original table, dt).

Fortran95 -- Reading from a formatted text file

I need to read some values from a table. These are the first five rows, to give you some idea of what it should look like:
1 + 3 98 96 1
2 + 337 2799 2463 1
3 + 2801 3733 933 1
4 + 3734 5020 1287 1
5 + 5234 5530 297 1
My interest is in the first four columns of each row. I need to read these into arrays. I used the following code:
program ----
implicit none
integer, parameter :: totbases = 4639675, totgenes = 4395
integer :: codtot, ks
integer, dimension(totgenes) :: ngene, lend, rend
character :: genome*4639675, sign*4
open(1,file='e_coli_g_info')
open(2,file='e_coli_g_str')
do ks = 1, totgenes
read(1,100) ngene(ks),sign(ks:ks),lend(ks), rend(ks)
end do
100 format(1x,i4,8x,a1, 2(5x,i7), 22x)
do ks = 1, 100
write(*,*) ngene(ks), sign(ks:ks),lend(ks), rend(ks)
end do
end program
The loop at the end of the program is to print the first hundred entries to test that they are being read correctly. The problem is that I am getting this garbage (the fourth row is the problem):
1 + 3 757934891
2 + 337 724249387
3 + 2801 757803819
4 + 3734 757803819
5 + 5234 757935405
Clearly, the fourth column is way off. In fact, I cannot find these values anywhere in the file that I am reading from. I am using the gfortran compiler for Ubuntu 12.04. I would greatly appreciate if somebody would point me in the right direction. I'm sure it's likely that I'm missing something very obvious because I'm new at Fortran.
Fortran formats are (traditionally, there's some newer stuff that I won't go into here) fixed format, that is, they are best suited for file formats with fixed columns. I.e. column N always starts at character position M, no ifs or buts. If your file format is more "free format"-like, that is, columns are separated by whitespace, it's often easier and more robust to read data using list formatting. That is, try to do your read loop as
do ks = 1, totgenes
read(1, *) ngene(ks), sign(ks:ks), lend(ks), rend(ks)
end do
Also, as a general advice, when opening your own files, start from unit 10 and go upwards from there. Fortran implementations typically use some of the low-numbered units for standard input, output, and error (a common choice is units 1, 5, and 6). You probably don't want to redirect those.
PS 2: I haven't tried your code, but it seems that you have a bounds overflow in the sign variable. It's declared of length 4, but then you assign to index ks which goes all the way up to totgenes. As you're using gfortran on Ubuntu 12.04 (that is, gfortran 4.6), when developing compile with options "-O1 -Wall -g -fcheck=all"