Reading lines from a pdb file - fortran

I want to read only those lines that contains "ATOM" as the first word and write in a file using a Fortran code.
I have tried to write a code but was unable to read that specific lines containing word "ATOM" only. I hope someone can help me in this regard.

you need something like this:
character*4 word
character*256 line
read(filename,1000) word,line
1000 format(A4,A256)
if (word .eq. 'ATOM') then
write(10,1000) word,line

Related

reading in a file from fortran

I am reading in code from a file that looks like this "01/06/2009,Tom Sanders„264,220,73,260" I want to skip to the first comma and then start with the name. However when I read it in I am only getting the dates. I have used a format(T9,'(A)'), but it comes out in columns and is not what I want. How should I approach this problem?

How do I read data from a file with description and blank lines with Fortran 77?

I am new to Fortran 77. I need to read the data from a given text file into two arrays, but there are some lines that either are blank or contain descriptive information on the data set before the lines containing the data I need to read. How do I skip those lines?
Also, is there a way my code can count the number of lines containing the data I'm interested in in that file? Or do I necessarily have to count them by hand to build my do-loops for reading the data?
I have tried to find examples online and in Schaum's Programming with Fortran 77, but couldn't find anything too specific on that.
Part of the file I need to read data from follows below. I need to build an array with the entries under each column.
Data from fig. 3 in Klapdor et al., MPLA_17(2002)2409
E(keV) counts_in_bin
2031.5 5.4
2032.5 0
2033.5 0
I am assuming this question is very basic, but I've been fighting with this for a while now, so I thought I would ask.
If you know where the lines are that you don't need/want to read, you can advance the IO with a call to read with no input items.
You can use:
read(input-unit,*)
to read a line from your input file, discard its contents and advance IO to the next line.
It has been a long time since I have looked at F77 code, but in general if your read statement in a DO loop can deal with finding empty lines, or even a record that contains only blanks, then you could write logic to trap that condition and go to a break or continue statement. I just don't recall if read can deal with the situation intelligently.
Alternatively, if you are using a UNIX shell and coreutils, you can use sed to remove empty line, /^$/
or /^ *$/ to preprocess the file before you send it onto F77
Something like
$ sed infile -e 'd/^$/;d/^ *$/' > outfile
It should look something like this:-
C Initialise
integer i
character*80 t1,t2,t3
real*8 x,y
open(unit=1,file='qdata.txt')
C Read headers
read(1,100)t1
100 format(A80)
write(6,*) t1
read(1,100)t2
write(6,*) t2
read(1,100)t3
write(6,*) t3
write(6,*)
C Read data
do 10 i=1,10
read(1,*,end=99) x,y
write(6,*) x,y
10 continue
99 continue
end
So I've used a classic formatted read to read in the header lines, then free-format to read the numbers. The free-format read with the asterisk skips white space including blank lines so it does what you want, and when there is no more data it will go to statement 99 and finish.
The output looks like this:-
Data from fig. 3 in Klapdor et al., MPLA_17(2002)2409
E(keV) counts_in_bin
2031.5000000000000 5.4000000000000004
2032.5000000000000 0.0000000000000000
2033.5000000000000 0.0000000000000000

Strings in Matlab

I have a file of tweets that I have read into MATLAB using dataread and they're stored in a cell. I wanted to find the average number of characters in the tweets. How would I go about doing that? Here is the code I have so far:
fid=fopen('tweets.txt');
lines = dataread('file', 'tweets.txt', '%s', 'delimiter', '\n');
I was thinking I could use something along the lines of cellfun but I'm unsure how to format it. Any help would be greatly appreciated.
Try cellfun(#numel,lines), it returns the length of each line.
btw: fid=fopen('tweets.txt'); is unnecessary if you use dataread this way. Simply delete the line.

Efficiently read the last row of a csv file

Is there an efficient C or C++ way to read the last row of a CSV file? The naive approach involves reading in the entire file and then going to the end. Is there a quicker way this can be done (particularly if the CSV files are large)?
What you can do is guess the line length, then jump 2-3 lines before the end of the file and read the remaining lines. The last line you read is the last one, as long you read at least one line prior (otherwise, you still start again with a bigger offset)
I posted some sample code for doing a similar thing (reading last N lines) in this answer (in PHP, but serves as an illustration)
For implementations in a variety of languages, see
C++ : c++ fastest way to read only last line of text file?
Python : Efficiently finding the last line in a text file
Perl : How can I read lines from the end of file in Perl?
C# : Get last 10 lines of very large text file > 10GB c#
PHP : how to read only 5 last line of the txt file
Java: Read last n lines of a HUGE file
Ruby: Reading the last n lines of a file in Ruby?
Objective-C : How to read data from NSFileHandle line by line?
You can try working backwards. Read some size block of bytes from the end of the file, and look for the newline. If there is no newline in that block, then read the previous block, and so on.
Note that if the size of a row relative to the size of the file is large that this may result in worse performance, because most file caching schemes assume someone reads forward in the file.
You can use Perl module File::ReadBackwards.
Your problem falls into the same domain as searching for a string within a file. As you rightly point out, it's not always a great idea to read the entire file into memory and then search for your string. But you can always do the next best thing. Memory map your file. Then use your string searching functions to search backwards from the end of the string for your newline.
It's an extremely efficient mechanism with minimal memory footprint and optimum disk I/O.
Read with what and on what? On a Unix system, if you want the last line, it is as simple as
tail -n1 file.csv
If you want this approach from within your C++ app, you can do something like
system("tail -n1 file.csv")
if you want a quick and dirty way to accomplish this task.

Deleting comments from a file, using Regex

I want to write a program that deletes all the comments (starting with "//" until the end of the line) from a file.
I want to do it using regular expressions.
I tried this:
let mutable text = File.ReadAllText("C:\\a.txt")
let regexComment = new Regex("//.*\\r\\n$")
text <- regexComment.Replace(text, "")
File.WriteAllText("C:\\a.txt",text)
But it doesn't work...
Can you please explain to me why, and give me some suggestion to something that does work (preferable using regex..) ?
Thanks :)
Rather than loading the whole file into memory and running a regex on it, a faster approach that will handle any size file without memory issues might look like this:
open System
open System.IO
open System.Text.RegularExpressions
// regex: beginning of line, followed by optional whitespace,
// followed by comment chars.
let reComment = Regex(#"^\s*//", RegexOptions.Compiled)
let stripComments infile outfile =
File.ReadLines infile
|> Seq.filter (reComment.IsMatch >> not)
|> fun lines -> File.WriteAllLines(outfile, lines)
stripComments "input.txt" "output.txt"
The output file must be different from the input file, because we're writing to the output while we're still reading from the input. We use the regex to identify comment lines (with optional leading whitespace), and Seq.filter to make sure the comment lines don't get sent to the output file.
Because we never hold the entire input or output file in memory, this function will work on any size file, and it's likely faster than the "read entire file, regex everything, write entire file" approach.
Danger Ahead
This code will not strip out comments that appear after some code on the same line. However, a regular expression is not the right tool for that job, unless someone can come up with a regular expression that can tell the following two lines of code apart and avoid breaking the first one when you strip everything that matches the regex from the file:
let request = WebRequest.Create("http://foo.com")
let request = WebRequest.Create(inputUrl) // this used to be hard-coded
let regexComment = new Regex(#"//.*$",RegexOptions.Multiline)
Never mind, I figured it out. It should have been:
let regexComment = new Regex("//.*\\r\\n")
Your regex string seems to be wrong. "\\/\\/.*\\r\\n" worked for me.