I have a problem how to read the number from line n and column m from a text file.
Below is the file. The number I want to read is marked bold.
G07 1437 1437 1437 1437 PRN / # OF
OBS G08 1437 1437 1437 1437
PRN / # OF OBS G10 1437 1437 1437 1437
PRN / # OF OBS G13 1437 1437 1437 1437
PRN / # OF OBS G15 1437 1437 1437 1437
PRN / # OF OBS G24 809 804 774 774
PRN / # OF OBS G25 816 748 694 694
PRN / # OF OBS G28 1437 1437 1437 1437
PRN / # OF OBS CARRIER PHASE MEASUREMENTS: PHASE SHIFTS REMOVED
COMMENT
END OF HEADER 09 6 10 11 50 14.0000000 0 7G07G08G10G13G15G25G28
21640633.14117 211001.84417 145686.75058 21640629.92858
20270038.40917 47299.20717 29373.29759 20270035.88059
20921122.23717 35179.40617 20913.66459 20921119.72759
23375852.17815 245299.13715 159374.10256 23375851.73756
24332262.34516 -47567.80516 -24598.05157 24332261.03057
23397473.99216 238631.73016 166805.02757 23397473.15457
21826760.73217 -130774.53117 -98585.76258 21826757.69358 09 6 10 11 50 15.0000000 0 8G07G08G10G13G15G24G25G28
21641196.234 6 213960.266 6 147992.01248 21641193.20548
20270164.110 7 47960.691 7 29888.74248 20270161.61348
20921216.127 7 35674.703 7 21299.61749 20921213.51249
23376533.019 6 248872.664 6 162158.66846 23376531.77946
24331762.736 6 -50196.488 6 -26646.36747 24331760.98147
24599401.84316
23398117.377 6 242012.082 6 169439.07047 23398116.40347
21826403.256 7 -132653.473 7 -100049.87148 21826400.70948 09 6 10 11 50 16.0000000 0 8G07G08G10G13G15G24G25G28
21641758.805 7 216918.813 7 150297.37548 21641756.07848
20270290.115 7 48622.465 7 30404.41049 20270287.40449
20921310.905 7 36170.543 7 21685.98849 20921307.82749
23377213.438 6 252446.262 6 164943.27746 23377211.83946
24331261.918 6 -52825.297 6 -28694.78547 24331260.53947
24599639.441 6
23398760.704 6 245392.375 6 172073.07047 23398759.57647
21826045.702 7 -134532.152 7 -101513.77748 21826043.03148 09 6 10 11 50 17.0000000 0 8G07G08G10G13G15G24G25G28
21642322.125 7 219877.535 7 152602.87548 21642319.08048
20270416.177 7 49284.598 7 30920.36349 20270413.70449
20921405.154 7 36666.953 7 22072.80148 20921402.13148
23377893.321 6 256020.004 6 167728.00846 23377891.84946
24330762.414 6 -55454.160 6 -30743.25047 24330760.25447
24599877.399 6
23399403.485 6 248772.668 6 174707.06647 23399402.65647
21825688.338 7 -136410.488 7 -102977.41048 21825685.89248 09 6 10 11 50 18.0000000 0 8G07G08G10G13G15G24G25G28
21642885.530 7 222836.375 7 154908.46548 21642882.14548
20270542.254 7 49947.043 7 31436.54749 20270539.58049
20921499.789 7 37163.945 7 22460.07048 20921496.78948
23378573.557 6 259593.785 6 170512.77746 23378571.94746
24330261.660 6 -58083.102 6 -32791.77747 24330259.94247
24600114.981 6
23400047.052 6 252152.898 6 177341.01247 23400046.00147
21825331.225 7 -138288.496 7 -104440.78948 21825328.12748 09 6 10 11 50 19.0000000 0 8G07G08G10G13G15G24G25G28
21643448.790 7 225795.313 7 157214.13348 21643445.39848
20270668.553 7 50609.801 7 31952.98049 20270666.09149
20921594.496 7 37661.504 7 22847.77348 20921591.58548
23379252.757 5 263167.641 5 173297.60946 23379252.06246
24329761.100 6 -60712.121 6 -34840.36747 24329759.15547
24600353.301 6
23400690.655 6 255533.066 6 179974.90647 23400689.68747
21824974.083 7 -140166.188 7 -105903.92248 21824970.73848
Here is some sample code that shows techniques to skip lines and numbers to reach the item that you want. You might have to switch to formatted reads if the column alignment is more important the the count of numbers.
program test
integer, parameter :: DBL_K = selected_real_kind (14)
character (len=132) :: skip
integer :: i, j
real (DBL_K) :: junk (100), good (100)
open (file="test.txt", unit=25, form="formatted", access="sequential", status="old", action="read")
do i=1, 11
read (25, '(A)' ) skip
end do
read (25, *) (junk (j), j=1,3), good (1)
write (*, *) good (1)
end program test
Related
I have to compare a columns with all other columns in the dataframe. The column that i have to compare with others is located in position 4 so i write df.iloc[x,4] to take column values. Then i have to consider these values, multiply them with the values in the next column (for example df.iloc[x,5]), create a new column in the dataframe and save results. Then i have to repeat this procedure to the end the existing column (the original dataframe has 43 column, so the end it is the df.iloc[x,43] )
How can i do this in python?
If it is possibile can you do some examples? I try to put my code in the post but i 'm not good with my new phone.
I think you can use eq - compare filtered DataFrame with column E in position 4:
df = pd.DataFrame({'A':[1,2,3],
'B':[4,5,6],
'C':[7,8,9],
'D':[1,3,5],
'E':[5,3,6],
'F':[7,8,9],
'G':[1,3,5],
'H':[5,3,6],
'I':[7,4,3]})
print (df)
A B C D E F G H I
0 1 4 7 1 5 7 1 5 7
1 2 5 8 3 3 8 3 3 4
2 3 6 9 5 6 9 5 6 3
print (df.iloc[:,5:].eq(df.iloc[:,4], axis=0))
F G H I
0 False False True False
1 False True True False
2 False False True False
If need multiple by column in position 4 use mul:
print (df.iloc[:,5:].mul(df.iloc[:,4], axis=0))
F G H I
0 35 5 25 35
1 24 9 9 12
2 54 30 36 18
Or if need multiple by shifted columns:
print (df.iloc[:,4:].mul(df.iloc[:,5:], axis=0, fill_value=1))
E F G H I
0 5.0 49 1 25 49
1 3.0 64 9 9 16
2 6.0 81 25 36 9
I have the following dataset:
DATA survey;
INPUT id sex $ age inc r1 r2 r3 ;
DATALINES;
1 F 35 17 7 2 2
17 M 50 14 5 5 3
33 F 45 6 7 2 7
49 M 24 14 7 5 7
65 F 52 9 4 7 7
81 M 44 11 7 7 7
2 F 34 17 6 5 3
18 M 40 14 7 5 2
34 F 47 6 6 5 6
50 M 35 17 5 7 5
;
Now I would like to create to files based on whether the records are Female (F)or NOT. Therefore I do this:
date female other;
set survey;
if sex = "F" then output USA;
else output other;
run;
PROC PRINT; RUN;
This however does not give me two sets with data depending on the F and M value. Any idea on what I am doing wrong here?
When you look in the log window, do you see any error messages?
If your code is
if sex = "F" then output USA;
you should see an error, because the DATA statement does not include a dataset named USA. If you change USA to FEMALE it should work.
Learning to read log messages is an essential skill in SAS.
assume I have a dataframe looks like below.
df = pd.DataFrame({
'name' : ['1st', '2nd', '3rd'],
'john_01' : [1, 2, 3],
'mary_02' : [4,5,6],
'peter_03' : [7, 8, 9],
'roger_04' : [10,11, 12],
'ken_05' : [13, 14, 15],
})
df2 = df.set_index('name')
john_01 ken_05 mary_02 peter_03 roger_04
name
1st 1 13 4 7 10
2nd 2 14 5 8 11
3rd 3 15 6 9 12
Modify_List_col = ['mary_02','peter_03']
Modify_List_row = ['2nd'] # use tolist() to get this list from additional files
I only want to modify those cells in List_col and List_row. So I will get something like below, those cells are replaced by 'X'.
john_01 ken_05 mary_02 peter_03 roger_04
name
1st 1 13 4 7 10
2nd 2 14 X X 11
3rd 3 15 6 9 12
Does anyone know how to get the results in one line using pandas please?
You can use the loc method:
In[25]: df = pd.DataFrame(pd.np.arange(25).reshape(5,5)).set_index(0)
In[26]: df
Out[26]:
1 2 3 4
0
0 1 2 3 4
5 6 7 8 9
10 11 12 13 14
15 16 17 18 19
20 21 22 23 24
In[27]: df.loc[[10,15],[2,3,4]] = "x"
In[28]: df
Out[28]:
1 2 3 4
0
0 1 2 3 4
5 6 7 8 9
10 11 x x x
15 16 x x x
20 21 22 23 24
To do that, just set the column 0 as index, then select the portion of the dataframe with loc and assign the value "x".
It works in the same way for your last dataset:
In[51]: Modify_List_col = ['mary_02', 'peter_03']
Modify_List_row = ['2nd']
df.loc[Modify_List_row, Modify_List_col] = "X"
In[52]: df
Out[52]:
john_01 ken_05 mary_02 peter_03 roger_04
name
1st 1 13 4 7 10
2nd 2 14 X X 11
3rd 3 15 6 9 12
I hope this can help you.
I have a file that look at ratings that teacher X gives to teacher Y and the date it occurs
clear
rating_id RatingTeacher RatedTeacher Rating Date
1 15 12 1 "1/1/2010"
2 12 11 2 "1/2/2010"
3 14 11 3 "1/2/2010"
4 14 13 2 "1/5/2010"
5 19 11 4 "1/6/2010"
5 11 13 1 "1/7/2010"
end
I want to look in the history to see how many times the RatingTeacher had been rated at the time they make the rating and the cumulative score. The result would look like this.
rating_id RatingTeacher RatedTeacher Rating Date TimesRated CumulativeRating
1 15 12 1 "1/1/2010" 0 0
2 12 11 2 "1/2/2010" 1 1
3 14 11 3 "1/2/2010" 0 0
4 14 13 2 "1/5/2010" 0 0
5 19 11 4 "1/6/2010" 0 0
5 11 13 1 "1/7/2010" 3 9
end
I have been merging the dataset with itself to get this to work, and it is fine. I was wondering if there was a more efficient way to do this within the file
In your input data, I guess that the last rating_id should be 6 and that dates are MDY. Statalist members are asked to use dataex (SSC) to set up data examples. This isn't Statalist but there is no reason for lower standards to apply. See the Statalist FAQ
I rarely see even programmers be precise about what they mean by "efficient", whether it means fewer lines of code, less use of memory, more speed, something else or is just some all-purpose term of praise. This code loops over observations, which can certainly be slow for large datasets. More in this paper
We can't compare with your merge solution because you don't give the code.
clear
input rating_id RatingTeacher RatedTeacher Rating str8 SDate
1 15 12 1 "1/1/2010"
2 12 11 2 "1/2/2010"
3 14 11 3 "1/2/2010"
4 14 13 2 "1/5/2010"
5 19 11 4 "1/6/2010"
6 11 13 1 "1/7/2010"
end
gen Date = daily(SDate, "MDY")
sort Date
gen Wanted = .
quietly forval i = 1/`=_N' {
count if Date < Date[`i'] & RatedT == RatingT[`i']
replace Wanted = r(N) in `i'
}
list, sep(0)
+---------------------------------------------------------------------+
| rating~d Rating~r RatedT~r Rating SDate Date Wanted |
|---------------------------------------------------------------------|
1. | 1 15 12 1 1/1/2010 18263 0 |
2. | 2 12 11 2 1/2/2010 18264 1 |
3. | 3 14 11 3 1/2/2010 18264 0 |
4. | 4 14 13 2 1/5/2010 18267 0 |
5. | 5 19 11 4 1/6/2010 18268 0 |
6. | 6 11 13 1 1/7/2010 18269 3 |
+---------------------------------------------------------------------+
The building block is that the rater and ratee are a pair. You can use egen's group() to give a unique ID to each rater ratee pair.
egen pair = group(rater ratee)
bysort pair (date): timesRated = _n
I have a csv file that shows parts on order. The columns include days late, qty and commodity.
I need to group the data by days late and commodity with a sum of the qty. However the days late needs to be grouped into ranges.
>56
>35 and <= 56
>14 and <= 35
>0 and <=14
I was hoping I could use a dict some how. Something like this
{'Red':'>56,'Amber':'>35 and <= 56','Yellow':'>14 and <= 35','White':'>0 and <=14'}
I am looking for a result like this
Red Amber Yellow White
STRSUB 56 60 74 40
BOTDWG 20 67 87 34
I am new to pandas so I don't know if this is possible at all. Could anyone provide some advice.
Thanks
Suppose you start with this data:
df = pd.DataFrame({'ID': ('STRSUB BOTDWG'.split())*4,
'Days Late': [60, 60, 50, 50, 20, 20, 10, 10],
'quantity': [56, 20, 60, 67, 74, 87, 40, 34]})
# Days Late ID quantity
# 0 60 STRSUB 56
# 1 60 BOTDWG 20
# 2 50 STRSUB 60
# 3 50 BOTDWG 67
# 4 20 STRSUB 74
# 5 20 BOTDWG 87
# 6 10 STRSUB 40
# 7 10 BOTDWG 34
Then you can find the status category using pd.cut. Note that by default, pd.cut splits the Series df['Days Late'] into categories which are half-open intervals, (-1, 14], (14, 35], (35, 56], (56, 365]:
df['status'] = pd.cut(df['Days Late'], bins=[-1, 14, 35, 56, 365], labels=False)
labels = np.array('White Yellow Amber Red'.split())
df['status'] = labels[df['status']]
del df['Days Late']
print(df)
# ID quantity status
# 0 STRSUB 56 Red
# 1 BOTDWG 20 Red
# 2 STRSUB 60 Amber
# 3 BOTDWG 67 Amber
# 4 STRSUB 74 Yellow
# 5 BOTDWG 87 Yellow
# 6 STRSUB 40 White
# 7 BOTDWG 34 White
Now use pivot to get the DataFrame in the desired form:
df = df.pivot(index='ID', columns='status', values='quantity')
and use reindex to obtain the desired order for the rows and columns:
df = df.reindex(columns=labels[::-1], index=df.index[::-1])
Thus,
import numpy as np
import pandas as pd
df = pd.DataFrame({'ID': ('STRSUB BOTDWG'.split())*4,
'Days Late': [60, 60, 50, 50, 20, 20, 10, 10],
'quantity': [56, 20, 60, 67, 74, 87, 40, 34]})
df['status'] = pd.cut(df['Days Late'], bins=[-1, 14, 35, 56, 365], labels=False)
labels = np.array('White Yellow Amber Red'.split())
df['status'] = labels[df['status']]
del df['Days Late']
df = df.pivot(index='ID', columns='status', values='quantity')
df = df.reindex(columns=labels[::-1], index=df.index[::-1])
print(df)
yields
Red Amber Yellow White
ID
STRSUB 56 60 74 40
BOTDWG 20 67 87 34
You can create a column in your DataFrame based on your Days Late column by using the map or apply functions as follows. Let's first create some sample data.
df = pandas.DataFrame({ 'ID': 'foo,bar,foo,bar,foo,bar,foo,foo'.split(','),
'Days Late': numpy.random.randn(8)*20+30})
Days Late ID
0 30.746244 foo
1 16.234267 bar
2 14.771567 foo
3 33.211626 bar
4 3.497118 foo
5 52.482879 bar
6 11.695231 foo
7 47.350269 foo
Create a helper function to transform the data of the Days Late column and add a column called Code.
def days_late_xform(dl):
if dl > 56: return 'Red'
elif 35 < dl <= 56: return 'Amber'
elif 14 < dl <= 35: return 'Yellow'
elif 0 < dl <= 14: return 'White'
else: return 'None'
df["Code"] = df['Days Late'].map(days_late_xform)
Days Late ID Code
0 30.746244 foo Yellow
1 16.234267 bar Yellow
2 14.771567 foo Yellow
3 33.211626 bar Yellow
4 3.497118 foo White
5 52.482879 bar Amber
6 11.695231 foo White
7 47.350269 foo Amber
Lastly, you can use groupby to aggregate by the ID and Code columns, and get the counts of the groups as follows:
g = df.groupby(["ID","Code"]).size()
print g
ID Code
bar Amber 1
Yellow 2
foo Amber 1
White 2
Yellow 2
df2 = g.unstack()
print df2
Code Amber White Yellow
ID
bar 1 NaN 2
foo 1 2 2
I know this is coming a bit late, but I had the same problem as you and wanted to share the function np.digitize. It sounds like exactly what you want.
a = np.random.randint(0, 100, 50)
grps = np.arange(0, 100, 10)
grps2 = [1, 20, 25, 40]
print a
[35 76 83 62 57 50 24 0 14 40 21 3 45 30 79 32 29 80 90 38 2 77 50 73 51
71 29 53 76 16 93 46 14 32 44 77 24 95 48 23 26 49 32 15 2 33 17 88 26 17]
print np.digitize(a, grps)
[ 4 8 9 7 6 6 3 1 2 5 3 1 5 4 8 4 3 9 10 4 1 8 6 8 6
8 3 6 8 2 10 5 2 4 5 8 3 10 5 3 3 5 4 2 1 4 2 9 3 2]
print np.digitize(a, grps2)
[3 4 4 4 4 4 2 0 1 4 2 1 4 3 4 3 3 4 4 3 1 4 4 4 4 4 3 4 4 1 4 4 1 3 4 4 2
4 4 2 3 4 3 1 1 3 1 4 3 1]