replacing part of string using sed - replace

I have this big file 1,000,000+ lines, which includes some memory data. For a certain use I need to convert g to mb, example:
DateAndTime#15/03/15 07:57:07
**********************
top - 07:57:27 up 2 days, 15:28, 18 users, load average: 4.65, 3.15, 2.11
Tasks: 774 total, 2 running, 771 sleeping, 0 stopped, 1 zombie
%Cpu(s): 12.8 us, 2.5 sy, 0.0 ni, 83.5 id, 1.2 wa, 0.0 hi, 0.1 si, 0.0 st
KiB Mem: 16327128 total, 16119192 used, 207936 free, 177868 buffers
KiB Swap: 36060156 total, 78552 used, 35981604 free. 6570548 cached Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
26636 fpd 20 0 9987.4m 6.307g 47728 S 0.0 40.5 192:07.10 AppExe
29019 fpd 20 0 1752832 785848 45652 S 77.0 4.8 17:32.74 python
to:
26636 fpd 20 0 9987.4m 6307m 47728 S 0.0 40.5 192:07.10 AppExe
The problem is that the file is built in a non comfortable to use structure, the x.xxxG needs to be found first and only than replaced, thus taking a whole lot of time (via readline), also the rest of the file should stay the same.

The following works on Linux and OSX/BSD systems:
sed -E 's/(^| )([0-9]).([0-9]{3})g( |$)/\1\2\3m\4/g' infile > outfile
It makes certain assumptions:
any field of the form d.dddg (where d is a decimal digit) should be replaced (possibly even multiple occurrences on a single line - remove the g after the last / to replace at most one per line)
fields are space-delimited
If, by contrast, actual calculations need to be performed, awk is your friend.

Related

Counting gradient using 2 columns array from external .dat file

I have got a .dat file with 2 columns and rows between 14000 to 36000 saved in file like below:
0.00 0.00
2.00 1.00
2.03 1.01
2.05 1.07
.
.
.
79.03 23.01
The 1st column is extension, the 2nd is strain. When I want to count gradient to designate Hooks Law of the plot, I use below code.
CCCCCC
Program gradient
REAL S(40000),E(40000),GRAD(40000,1)
open(unit=300, file='Probka1A.dat', status='OLD')
open(unit=321, file='result.out', status='unknown')
write(321,400)
400 format('alfa')
260 DO 200 i=1, 40000
read(300,30) S(i),E(i)
30 format(2F7.2)
GRAD(i,1)=(S(i)-S(i-1))/(E(i)-E(i-1))
write(321,777) GRAD(i,1)
777 Format(F7.2)
200 Continue
END
But after I executed it I got the warning
PGFIO-F-231/formatted read/unit=300/error on data conversion.
File name = Probka1A.dat formatted, sequential access record = 1
In source file gradient1.f, at line number 9
What can I do to count gradient by this or other way in Fortran 77?
You are reading from file without checking for the end of the file. Your code should be like this:
260 DO 200 i=1, 40000
read(300,*,ERR=400,END=400) S(i),E(i)
if (i>1) then
GRAD(i-1,1)=(S(i)-S(i-1))/(E(i)-E(i-1))
write(321,777) GRAD(i-1,1)
end if
777 Format(F7.2)
200 Continue
400 continue

Half-Life Determination

Here is the problem I am working on:
You are to develop a menu-driven program that will allow the analyses of data from the file Patient_Data.txt using the following equations:
Half-Life Equations
Ct = C0e^-kt
t½ = ln(2)/k
where:
Ct is the concentration in ug/L at time t
C0 is the initial concentration in ug/L
t is the time in hrs
k is the time constant (1/hrs)
t½ is the half-life in hrs
The user of the program must be able to obtain the average half-life (to 2 decimal places) along with the number of measurements used to calculate the average for any of the 5 patients for which data has been collected.
The program must also be able to display the 2 patient numbers and averages of the patients that have the highest half-life average values.
A menu must be used to select the different options with an additional option for Exit. The program must run until exit is selected by the user.
The program must be designed using functions.
A function called analyzeData must take as input the patient number and must return both the average half-life and the number of measurements in the average for the input patient number.
A separate function called halfLife is to be used for calculating the t½ (half-life) based on C0 (initial concentration), Ct (concentration at time t) and t (time) that are in the data file.
A third function called highest2halfLifes must also be used to determine the two patients with the longest average half-life from the five different patients. All four values (patient1, halfLife1, patient2, halfLife2) must be returned to the main function.
The following data file Patient_Data.txt lists values for C0, Ct, and t, respectively (Patient Data)
1 325 160 2.0
1 600 100 6.2
2 325 220 1.0
3 600 200 4.4
4 325 100 3.0
4 325 88 3.2
2 600 200 3.3
2 325 100 3.3
4 600 210 3.4
5 325 105 3.5
1 600 110 6.0
3 325 100 3.1
2 600 120 5.5
2 600 125 5.5
5 120 60 2.2
2 325 100 3.4

rsync running differently from QProcess compared to bash command line

I am experimenting with launching rsync from QProcess and although it runs, it behaves differently when run from QProcess compared to running the exact same command from the command line.
Here is the command and stdout when run from QProcess
/usr/bin/rsync -atv --stats --progress --port=873 --compress-level=9 --recursive --delete --exclude="/etc/*.conf" --exclude="A*" rsync://myhost.com/haast/tmp/mysync/* /tmp/mysync/
receiving incremental file list
created directory /tmp/mysync
A
0 100% 0.00kB/s 0:00:00 (xfer#1, to-check=6/7)
B
0 100% 0.00kB/s 0:00:00 (xfer#2, to-check=5/7)
test.conf
0 100% 0.00kB/s 0:00:00 (xfer#3, to-check=4/7)
subdir/
subdir/A2
0 100% 0.00kB/s 0:00:00 (xfer#4, to-check=2/7)
subdir/C
0 100% 0.00kB/s 0:00:00 (xfer#5, to-check=1/7)
subdir/D
0 100% 0.00kB/s 0:00:00 (xfer#6, to-check=0/7)
Number of files: 7
Number of files transferred: 6
Total file size: 0 bytes
Total transferred file size: 0 bytes
Literal data: 0 bytes
Matched data: 0 bytes
File list size: 105
File list generation time: 0.001 seconds
File list transfer time: 0.000 seconds
Total bytes sent: 174
Total bytes received: 367
sent 174 bytes received 367 bytes 360.67 bytes/sec
total size is 0 speedup is 0.00
Notice that although I excluded 'A*', it still copied them! Now running the exact same command from the command line:
/usr/bin/rsync -atv --stats --progress --port=873 --compress-level=9 --recursive --delete --exclude="/etc/*.conf" --exclude="A*" rsync://myhost.com/haast/tmp/mysync/* /tmp/mysync/
receiving incremental file list
created directory /tmp/mysync
B
0 100% 0.00kB/s 0:00:00 (xfer#1, to-check=4/5)
test.conf
0 100% 0.00kB/s 0:00:00 (xfer#2, to-check=3/5)
subdir/
subdir/C
0 100% 0.00kB/s 0:00:00 (xfer#3, to-check=1/5)
subdir/D
0 100% 0.00kB/s 0:00:00 (xfer#4, to-check=0/5)
Number of files: 5
Number of files transferred: 4
Total file size: 0 bytes
Total transferred file size: 0 bytes
Literal data: 0 bytes
Matched data: 0 bytes
File list size: 83
File list generation time: 0.001 seconds
File list transfer time: 0.000 seconds
Total bytes sent: 132
Total bytes received: 273
sent 132 bytes received 273 bytes 270.00 bytes/sec
total size is 0 speedup is 0.00
Notice that now the 'A*' exclude is respected! Can someone explain why they are performing differently?
A noticed that if I removed the quotes surrounding the excludes, then the QProcess run performs correctly.
In your command-line execution, bash interpreter performs a previous substitution and remove quotes, so they are not passed to rsync arg list.
Next script shows how bash substitution is performed:
[tmp]$ cat printargs.sh
#!/bin/bash
echo $*
[tmp]$ ./printargs.sh --exclude="A*"
--exclude=A*

Writing both characters and digits in an array

I have a Fortran code which reads a txt file with seperate lines of characters and digits and then write them in a 1D array with 20 elements.
This code is not compatible with Fortran 77 compiler Force 2.0.9. My question is that how we can apply the aformenetioned procedure using a Fortran 77 compiler;i.e defining a 1D array nd then write the txt file line by line into elements of the array?
Thank you in advance.
The txt file follows:
Case 1:
10 0 1 2 0
1.104 1.008 0.6 5.0
25 125.0 175.0 0.7 1000.0
0.60
1 5
Advanced Case
15 53 0 10 0 1 0 0 1 0 0 0 0
0 0 0 0
0 0 1500.0 0 0 .03
0 0.001 0
0.1 0 0.125 0.08 0.46
0.1 5.0 0.04
# Jason:
I am a beginner and still learning Fortran. I guess Force 2 uses g77.
The followings are the correspond part of the original code. Force 2 editor returns an empty txt file as a result.
DIMENSION CARD(20)
CHARACTER*64 FILENAME
DATA XHEND / 4HEND /
OPEN(UNIT=3,FILE='CON')
OPEN(UNIT=4,FILE='CON')
OPEN(UNIT=7,STATUS='SCRATCH')
WRITE(3,9000) 'PLEASE ENTER THE INPUT FILE NAME : '
9000 FORMAT (A)
READ(4,9000) FILENAME
OPEN(UNIT=5,FILE=FILENAME,STATUS='OLD')
WRITE(3,9000) 'PLEASE ENTER THE OUTPUT FILE NAME : '
READ(4,9000) FILENAME
OPEN(UNIT=6,FILE=FILENAME,STATUS='NEW')
FILENAME = '...'
IR = 7
IW = 6
IP = 15
5 REWIND IR
I = 0
2 READ (5,7204,END=10000) CARD
IF (I .EQ. 0 ) WRITE (IW,7000)
7000 FORMAT (1H1 / 10X,15HINPUT DECK ECHO / 10X,15(1H-))
I= I + 1
WRITE (IW,9204) I,CARD
IF (CARD(1) .EQ. XHEND ) GO TO 7020
WRITE (IR,7204) CARD
7204 FORMAT (20A4)
9204 FORMAT (1X,I4,2X,20A4)
GO TO 2
7020 REWIND IR
It looks that CARD is being used as a to hold 20 4-character strings. I don't see the declaration as a character variable, only as an array, so perhaps in extremely old FORTRAN style a non-character variable is being used to hold characters? You are using a 20A4 format, so the values have to be positioned in the file precisely as 20 groups of 4 characters. You have to add blanks so that they are aligned into groups of 4 columns.
If you want to read numbers it would be much easier to read them into a numeric type and use list-directed IO:
real values (20)
read (5, *) values
Then you wouldn't have to worry about precision positioning of the values in the file.
This is really archaic FORTRAN ... even pre-FORTRAN-77 in style. I can't remember the last time that I saw Hollerith (H) formats! Where are you learning this from?
Edit: While I like Fortran for many programming tasks, I wouldn't use FORTRAN 66! Computers are supposed to make things easier ... there is no reason to have to count characters. Instead of
7000 FORMAT (1H1 / 10X,15HINPUT DECK ECHO / 10X,15(1H-))
You can use
7000 FORMAT ( / 10X, "INPUT DECK ECHO" / 10X, 15("-") )
I can think of only two reasons to use a Hollerith code: not bothering to change legacy source code (it is remarkable that a current Fortran compiler can process a feature that was obsolete 30 years ago! Fortran source code never dies!), or studying the history of computing languages. The name honors a great computing pioneer, whose invention accomplished the 1890 US Census in one year, when the 1880 Census took eight years: http://en.wikipedia.org/wiki/Herman_Hollerith
I much doubt that you will see the "1" in the first column performing "carriage control" today. I had to look up that "1" was the code for page eject. You are much more likely to see it in your output. See Are Fortran control characters (carriage control) still implemented in compilers?

Speed up database inserts from ORM

I have a Django view which creates 500-5000 new database INSERTS in a loop. Problem is, it is really slow! I'm getting about 100 inserts per minute on Postgres 8.3. We used to use MySQL on lesser hardware (smaller EC2 instance) and never had these types of speed issues.
Details:
Postgres 8.3 on Ubuntu Server 9.04.
Server is a "large" Amazon EC2 with database on EBS (ext3) - 11GB/20GB.
Here is some of my postgresql.conf -- let me know if you need more
shared_buffers = 4000MB
effective_cache_size = 7128MB
My python:
for k in kw:
k = k.lower()
p = ProfileKeyword(profile=self)
logging.debug(k)
p.keyword, created = Keyword.objects.get_or_create(keyword=k, defaults={'keyword':k,})
if not created and ProfileKeyword.objects.filter(profile=self, keyword=p.keyword).count():
#checking created is just a small optimization to save some database hits on new keywords
pass #duplicate entry
else:
p.save()
Some output from top:
top - 16:56:22 up 21 days, 20:55, 4 users, load average: 0.99, 1.01, 0.94
Tasks: 68 total, 1 running, 67 sleeping, 0 stopped, 0 zombie
Cpu(s): 5.8%us, 0.2%sy, 0.0%ni, 90.5%id, 0.7%wa, 0.0%hi, 0.0%si, 2.8%st
Mem: 15736360k total, 12527788k used, 3208572k free, 332188k buffers
Swap: 0k total, 0k used, 0k free, 11322048k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
14767 postgres 25 0 4164m 117m 114m S 22 0.8 2:52.00 postgres
1 root 20 0 4024 700 592 S 0 0.0 0:01.09 init
2 root RT 0 0 0 0 S 0 0.0 0:11.76 migration/0
3 root 34 19 0 0 0 S 0 0.0 0:00.00 ksoftirqd/0
4 root RT 0 0 0 0 S 0 0.0 0:00.00 watchdog/0
5 root 10 -5 0 0 0 S 0 0.0 0:00.08 events/0
6 root 11 -5 0 0 0 S 0 0.0 0:00.00 khelper
7 root 10 -5 0 0 0 S 0 0.0 0:00.00 kthread
9 root 10 -5 0 0 0 S 0 0.0 0:00.00 xenwatch
10 root 10 -5 0 0 0 S 0 0.0 0:00.00 xenbus
18 root RT -5 0 0 0 S 0 0.0 0:11.84 migration/1
19 root 34 19 0 0 0 S 0 0.0 0:00.01 ksoftirqd/1
Let me know if any other details would be helpful.
One common reason for slow bulk operations like this is each insert happening in its own transaction. If you can get all of them to happen in a single transaction, it could go much faster.
Firstly, ORM operations are always going to be slower than pure SQL. I once wrote an update to a large database in ORM code and set it running, but quit it after several hours when it had completed only a tiny fraction. After rewriting it in SQL the whole thing ran in less than a minute.
Secondly, bear in mind that your code here is doing up to four separate database operations for every row in your data set - the get in get_or_create, possibly also the create, the count on the filter, and finally the save. That's a lot of database access.
Bearing in mind that a maximum of 5000 objects is not huge, you should be able to read the whole dataset into memory at the start. Then you can do a single filter to get all the existing Keyword objects in one go, saving a huge number of queries in the Keyword get_or_create and also avoiding the need to instantiate duplicate ProfileKeywords in the first place.