Erlang - Open file read line delete line and save - concurrency

I need to open a file, read the line, do some thing, delete that line save, open again the same file and continue with the rest
so far i have this code ..
1 -module(setup_data).
2 -export([for_each_line_in_file/1]).
3
4 for_each_line_in_file(Name) ->
5 {ok, Device} = file:open(Name, [read]),
6 for_each_line(Device).
7
8 for_each_line(Device) ->
9 case io:get_line(Device, "") of
10 eof -> file:close(Device);
11 Line ->
12 do_something(Line)
13 for_each_line(Device)
14 end.
So I want something like
1 -module(setup_data).
2 -export([for_each_line_in_file/1]).
3
4 for_each_line_in_file(Name) ->
4
6 for_each_line(Name).
7
8 for_each_line(Device) ->
9 {ok, Device} = file:open(Name, [read]),
9 case io:get_line(Device, "") of
10 eof -> file:close(Device);
11 Line ->
12 io:format("LINE : ...... ~p~n",[Line]),
23
43 /DELETE THAT CURRENT LINE AND SAVE?
33 file:close(Device)
13 for_each_line(Name)
14 end.

To elaborate on Armon comment:
In server fault q&a site, there was a post about how fast certain operations are. Disk seek for the first line in your file takes 10ms, while reading 1Mb of sequential data lasts 30ms. If your file has 300 lines, your solutions would work 10 times slower and if it has 3000 lines, it would be 100 times slower than reading 1Mb od data.
In this situation, read entire file to memory as a binary, do what you need to do with the lines and finally save the file again. You can distribute work between processes, but I wouldn't bother, because accessing disk is probably the longest operation in your code.

Related

Correction using error file in Fortran 90

I have two text files. File_1 has huge data in 100 columns(formatted) and file_2 has mistakes in data entry, has 20 columns and first 15 are same format as data and last 5 contains text like "ACCNT","ADDSS" etc.
file_1 format is as follows
15382987019547317994113................(100 columns)
file_2 format is as follows
153829870195473ACCNT
What i have tried is
program rem_err
implicit none
character * 20 record
character * 15 match, mat_dat
character *5 element,replace
character *80 data1
integer::i
open (9,status='old',file='file_1.txt'
open (10,status='old',file='file_2.txt'
open (11,status='new',file='out.txt'
read (10,1,end=15)record
record(1:15)=match
record(16:20)=element
do 4 i=1,100000
read(9,5,end=15) mat_dat,replace,data1
if(mat_dat.eq.match)then
if (element=ACCNT)replace='*****)
if (element=ADDSS)replace=' '
write(11,5)mat_dat,replace,data1
else
write(11,5)mat_dat,replace,data1
endif
4 continue
goto 2
1 format(a20)
5 format(a15,a5,a80)
15 stop
end program rem_err
How to correct file_1 using information from file_2?

Reading records from a file in FORTRAN66 using stdin adding extra unwanted junk

I'm trying to read a file in the format specified below using FORTRAN 66.
1000
MS 1 - Join Grps Group Project 5 5
Four Programs Programming 15 9
Quiz 1 Quizzes 10 7
FORTRAN Programming 25 18
Quiz 2 Quizzes 10 9
HW 1 - Looplang Homework 20 15
I execute and read the file like so:
program < grades.txt
The first line is the total number of points that can be earned in a class
The rest of the lines are assignments in a class
Each line is formatted as such: Assignment name(20 chars) category (20 chars) possible points(14 chars) earned points(14 chars)
For some reason, when the code runs and reads the file, starting at the first assignment record, I get error 5006, and cannot find an explanation of the error code. The output of the program while debugging looks like this:
$ file < grades.txt
MS 1 - Join Grps Group Project 5 6417876
NOT EOF
EOF 5006
NAME CATEGORY POSSIBLE EARNED
My goal is to be able to read each line and put each column into it's appropriate array, then reference those arrays later on to print a report for each category, with each assignment, points possible, earned, and total percentage for the category, then loop, etc.
I do not understand where the "6417876" in the output is coming from, it is definitely not part of the file that's being piped into stdin while the program reads.
The code for the program is as follows:
CHARACTER*20 ASSIGNMENTT(100)
CHARACTER*20 CATEGORY(100)
INTEGER POSSIBLE(100)
INTEGER EARNED(100)
INTEGER TOTALPTS
INTEGER REASON
INTEGER I, N
READ(5,50)TOTALPTS
50 FORMAT(I4)
c Read the arrays in
I=1
100 READ(5,110,IOSTAT=REASON)ASSIGNMENTT(I),CATEGORY(I),POSSIBLE(I),EARNED(I)
110 FORMAT(2A20x,2I14x)
WRITE(*,110)ASSIGNMENTT(I),CATEGORY(I),POSSIBLE(I),EARNED(I)
I=I+1
IF (REASON < 0) GOTO 120
WRITE(*,*)"NOT EOF"
IF (I<100 .AND. REASON == 0) GOTO 100
WRITE(*,*)"EOF", REASON
c Get the number of items (For some reason stdin adds an extra item that's not in the file, so I subtract 2 instead of 1
120 N=I-2
c Display the Names and Ages
WRITE(*,200)
200 FORMAT("NAME",T20,"CATEGORY",T40,"POSSIBLE",T54,"EARNED",T68)
DO 300 I=1,N
210 FORMAT(A20,A20,I14,I14)
300 WRITE(*,210)ASSIGNMENTT(I),CATEGORY(I),POSSIBLE(I),EARNED(I)
END
What could be causing the read issues I'm facing?
The line to read the file contents was too long, so I shortened the names of the variables to save some space and the problem was solved.

How do I convert s.st_dev to /sys/block/<name>

I want to determine whether a file is on an HDD or an SDD.
I found out that I could check the type of drive using the /sys/block info:
prompt$ cat /sys/block/sdc/queue/rotational
1
This has 1 if it is rotational or unknown. It is 0 when the disk is an SSD.
Now I have a file and what to know whether it is on an HDD or an SDD. I can stat() the file to get the device number:
struct stat s;
stat(filename, &s);
// what do I do with s.st_dev now?
I'd like to convert s.st_dev to a drive name as I have in my /sys/block directory, in C.
What functions do I have to use to get that info? Or is it available in some /proc file?
First of all for the input file we need to file on which partition the file exists
you can use the following command for that
df -P <file name> | tail -1 | cut -d ' ' -f 1
Which will give you output something like this : /dev/sda3
Now you can apply following command to determine HDD , SDD
cat /sys/block/sdc/queue/rotational
You can use popen in your program to get output of these system commands
Okay, I really found it!
So my first solution, reading the partitions, wouldn't work. It would give me sbc1 instead of sbc. I also found the /proc/mounts which includes some info about what's mounted where, but it would still not help me convert the value to sbc.
Instead, I found another solution, which is to look at the block devices and more specifically this softlink:
/sys/dev/block/<major>:<minor>
The <major> and <minor> numbers can be extracted using the functions of the same name in C (I use C++, but the basic functions are all in C):
#include <sys/types.h>
...
std::string dev_path("/sys/dev/block/");
dev_path += std::to_string(major(s.st_dev));
dev_path += ":";
dev_path += std::to_string(minor(s.st_dev));
That path is a soft link and I want to get the real path of the destination:
char device_path[PATH_MAX + 1];
if(realpath(dev_path.c_str(), device_path) == nullptr)
{
return true;
}
From that real path, I then break up the path in segments and search for a directory with a sub-directory named queue and a file named rotational.
advgetopt::string_list_t segments;
advgetopt::split_string(device_path, segments, { "/" });
while(segments.size() > 3)
{
std::string path("/"
+ boost::algorithm::join(segments, "/")
+ "/queue/rotational");
std::ifstream in;
in.open(path);
if(in.is_open())
{
char line[32];
in.getline(line, sizeof(line));
return std::atoi(line) != 0;
}
segments.pop_back();
}
The in.getline() is what reads the .../queue/rotational file. If the value is not 0 then I consider that this is an HDD. If something fails, I also consider that the drive is an HDD drive. The only way my function returns false is if the rotational file exists and is set to 0.
My function can be found here. The line number may change over time, search for tool::is_hdd.
Old "Solution"
The file /proc/partition includes the major & minor device numbers, a size, and a name. So I just have to parse that one and return the name I need. VoilĂ .
$ cat /proc/partitions
major minor #blocks name
8 16 1953514584 sdb
8 17 248832 sdb1
8 18 1 sdb2
8 21 1953263616 sdb5
8 0 1953514584 sda
8 1 248832 sda1
8 2 1 sda2
8 5 1953263616 sda5
11 0 1048575 sr0
8 32 976764928 sdc
8 33 976763904 sdc1
252 0 4096 dm-0
252 1 1936375808 dm-1
252 2 1936375808 dm-2
252 3 1936375808 dm-3
252 4 16744448 dm-4
As you can see in this example, the first two lines represent the column names and an empty.The Name column is what I was looking for.

How to search elements of a list in a file

Please check the code for searching list elements in file.
f=open("a.txt","r")
p=open("b.txt","r")
disk=[]
for line in p:
line = line.strip()
disk.append(line)
for line in f.readlines():
for word in disk[0]:
if word in line:
print line
The list is below:
>>> disk
['5000cca025884d5', '5000cca025a1ee6']
I want to search this list elements in file below, but I am not getting the output for index 0.
c0t5000CCA025A1EE6Cd0 <preSUN30G-A2B0-279.40GB> /scsi_vhci/disk#g5000cca025a1ee6c
1. c0t5000CCA025A28FECd0 <preSUN30G-A2B0-279.40GB> i/disk#g5000cca025a28fec
2. c0t5000CCA0258BA1DCd0 <HsdfdsSUN30G-A2B0 cyl 46873 alt 2 hd 20 sec 625> i/disk#g5000cca0258ba1dc
3. c0t5000CCA025884D5Cd0 <UN300G cyl 46873 alt 2 hd 20 sec 625> solaris i/disk#g5000cca025884d5c
4. c0t5000CCA02592705Cd0 <UN300G cyl 46873 alt 2 hd 20 sec 625> solaris i/disk#g5000cca02592705c
The only error that presents itself in your code is this:
for word in disk[0]:
As I mentioned in the comments, what this does is grab the first string in the disk list and start iterating over the individual characters. This will lead to most of the lines in a.txt getting printed multiple times.
Another possible problem would be getting the two files backwards. I did this accidentally when I was trying to duplicate your problem. When the files are backwards, nothing gets printed, because none of the lines in a.txt are in b.txt (in fact, most of them are much longer).
Here is a project on repl.it that shows the program working.

When do I put the variable into the parenthesis of a .open() and when do i put it in front of the .open()

1 from sys import argv
2 from os.path import exists
3
4 script, from_file, to_file = argv
5
6 print "Copying from %s to %s" % (from_file, to_file)
7
8 # we could do these two on one line too, how?
9 input = open(from_file)
10 indata = input.read()
11
12 print "The input file is %d bytes long" % len(indata)
13
14 print "Does the output file exist? %r" % exists(to_file)
15 print "Ready, hit RETURN to continue, CTRL-C to abort."
16 raw_input()
17
18 output = open(to_file, 'w')
19 output.write(indata)
20
21 print "Alright, all done."
22
23 output.close()
24 input.close()
Im not sure what the rule difference is between something like line 19 where there is a variable before the period and also within the parenthesis. I'm a beginner and would like to clarify this because I tried to write some code and was confused about this point...
This means you're calling a method of an object. Let's look at line 18:
output = open(to_file, 'w')
This returns a file object and assigns it to the variable output. You can now call methods of a file object (such as output.read() to read the file's contents). Similarly, you can use output.write(...) to write data to the file:
output.write(indata)
The above line means: write the contents of indata to the file object output. You're correct that there are two variables involved in this operation.