I have been trying to extract the atom number corresponding to atom name "OW" using if condition in fortran code from a file. But when I am using the 'if condition' the values are not written in a file. Could anybody help me regarding the same where I am doing wrong.
implicit none
character(len=100)::head,grofile
character(len=5):: res_nm,at_name
integer :: n,i,ierror,at_num
write(*,*) 'enter the name of gro file'
read(*,*) grofile
open(unit=10,file=grofile,status='old',action='read')
openif : if (ierror == 0) then
!open was ok. Read values.
read(10,*)head
read(10,*)n
do i=1,n
read(10,200) at_name,at_num
if (at_name == 'OW') then
write(44,*)at_num
200 format (10x,a5,i5)
endif
enddo
endif openif
end program name
and the input file that I am using is
CNT in water
44316
1LIG C 1 2.814 2.448 2.231 -0.2002 0.0645 -0.2005
1LIG C 2 2.783 2.584 2.233 0.4146 0.2083 -0.1403
1LIG C 3 2.769 2.658 2.350 -0.4678 -0.0886 -0.0500
1LIG C 4 2.687 2.772 2.348 -0.7671 -0.3032 -0.0624
1LIG C 5 2.619 2.795 2.228 -0.2327 -0.2483 -0.3593
1LIG C 6 2.486 2.837 2.238 -0.0621 0.2349 -0.0781
................
1LIG H 1006 2.613 1.972 12.082 -1.2767 0.0570 0.2045
1LIG H 1007 2.804 2.173 12.099 -0.4228 1.8734 1.9762
1LIG H 1008 2.862 2.377 12.097 -0.7176 -2.2587 1.0804
2water OW 1009 2.221 1.281 6.853 -0.6831 -0.3395 0.1402
2water HW1 1010 2.191 1.215 6.789 -1.2195 0.6304 -0.6225
2water HW2 1011 2.143 1.333 6.871 -0.5687 -0.7024 1.7263
2water MW 1012 2.206 1.279 6.847 -0.7389 -0.2594 0.2489
3water OW 1013 2.826 4.482 12.736 -0.2852 0.1750 0.1277
3water HW1 1014 2.735 4.490 12.707 -0.3265 -0.3844 0.1046
3water HW2 1015 2.860 4.406 12.689 0.4937 0.9762 -0.6120
3water MW 1016 2.818 4.473 12.726 -0.1879 0.2065 0.0267
4water OW 1017 3.510 2.042 10.165 0.1154 -0.0258 -0.0813
4water HW1 1018 3.530 2.105 10.095 3.0124 -0.2562 0.4945
4water HW2 1019 3.434 1.993 10.132 -0.4188 1.8748 -1.7521
...............
4.90369 4.90369 14.25892
Also, I am not getting any error for the code and without any output.
The command that I am using
gfortran br_br_gofr_smooth_dlp4.f90 -o read -I /usr/local/include/ -lgmxfort -g -fcheck=all -fbounds-check
./read
You have two errors:
You are using ierror uninitialised, as noted in the comments
at_name is a length 5 character variable. You are comparing it with a 2 letter character variable. For this to be true the leftmost 2 characters of at_name have to be the same as those in the two character variable. Unfortunately as your code is written it reads the atom name into the rightmost 2 characters of at_name. Thus the test fails
The code below shows a way of fixing the above, and does what I think you want. Especially for point 2 there are other ways.
ijb#ijb-Latitude-5410:~/work/stack$ cat pdb.f90
implicit none
character(len=100)::head,grofile
character(len=5):: res_nm,at_name
integer :: n,i,ierror,at_num
write(*,*) 'enter the name of gro file'
read(*,*) grofile
open(unit=10,file=grofile,status='old',action='read',iostat=ierror)
openif : if (ierror == 0) then
!open was ok. Read values.
read(10,*)head
read(10,*)n
do i=1,n
read(10,200) at_name,at_num
if (Adjustl(at_name) == 'OW') then
write(44,*)at_num
200 format (10x,a5,i5)
endif
enddo
endif openif
end program
ijb#ijb-Latitude-5410:~/work/stack$ gfortran -std=f2008 -Wall -Wextra -fcheck=all -g -O pdb.f90
pdb.f90:3:27:
3 | character(len=5):: res_nm,at_name
| 1
Warning: Unused variable ‘res_nm’ declared at (1) [-Wunused-variable]
ijb#ijb-Latitude-5410:~/work/stack$ cat stuff
CNT in water
20
1LIG C 1 2.814 2.448 2.231 -0.2002 0.0645 -0.2005
1LIG C 2 2.783 2.584 2.233 0.4146 0.2083 -0.1403
1LIG C 3 2.769 2.658 2.350 -0.4678 -0.0886 -0.0500
1LIG C 4 2.687 2.772 2.348 -0.7671 -0.3032 -0.0624
1LIG C 5 2.619 2.795 2.228 -0.2327 -0.2483 -0.3593
1LIG C 6 2.486 2.837 2.238 -0.0621 0.2349 -0.0781
1LIG H 1006 2.613 1.972 12.082 -1.2767 0.0570 0.2045
1LIG H 1007 2.804 2.173 12.099 -0.4228 1.8734 1.9762
1LIG H 1008 2.862 2.377 12.097 -0.7176 -2.2587 1.0804
2water OW 1009 2.221 1.281 6.853 -0.6831 -0.3395 0.1402
2water HW1 1010 2.191 1.215 6.789 -1.2195 0.6304 -0.6225
2water HW2 1011 2.143 1.333 6.871 -0.5687 -0.7024 1.7263
2water MW 1012 2.206 1.279 6.847 -0.7389 -0.2594 0.2489
3water OW 1013 2.826 4.482 12.736 -0.2852 0.1750 0.1277
3water HW1 1014 2.735 4.490 12.707 -0.3265 -0.3844 0.1046
3water HW2 1015 2.860 4.406 12.689 0.4937 0.9762 -0.6120
3water MW 1016 2.818 4.473 12.726 -0.1879 0.2065 0.0267
4water OW 1017 3.510 2.042 10.165 0.1154 -0.0258 -0.0813
4water HW1 1018 3.530 2.105 10.095 3.0124 -0.2562 0.4945
4water HW2 1019 3.434 1.993 10.132 -0.4188 1.8748 -1.7521
4.90369 4.90369 14.25892
ijb#ijb-Latitude-5410:~/work/stack$ ls -lrt | tail
-rw-rw-r-- 1 ijb ijb 1106 Jun 11 05:43 pi_orig.f90
-rw-rw-r-- 1 ijb ijb 958 Jun 11 05:56 pi_ijb.f90~
-rw-rw-r-- 1 ijb ijb 1805 Jun 11 06:07 pi_ijb.f90
-rw-rw-r-- 1 ijb ijb 1106 Jun 11 06:29 pi2.f90~
-rw-rw-r-- 1 ijb ijb 1305 Jun 11 06:34 pi2.f90
-rw-rw-r-- 1 ijb ijb 537 Jun 14 08:39 pdb.f90~
-rw-rw-r-- 1 ijb ijb 1462 Jun 14 08:40 stuff~
-rw-rw-r-- 1 ijb ijb 1425 Jun 14 08:41 stuff
-rw-rw-r-- 1 ijb ijb 560 Jun 14 08:43 pdb.f90
-rwxrwxr-x 1 ijb ijb 20520 Jun 14 08:49 a.out
ijb#ijb-Latitude-5410:~/work/stack$ ./a.out
enter the name of gro file
stuff
ijb#ijb-Latitude-5410:~/work/stack$ ls -lrt | tail
-rw-rw-r-- 1 ijb ijb 958 Jun 11 05:56 pi_ijb.f90~
-rw-rw-r-- 1 ijb ijb 1805 Jun 11 06:07 pi_ijb.f90
-rw-rw-r-- 1 ijb ijb 1106 Jun 11 06:29 pi2.f90~
-rw-rw-r-- 1 ijb ijb 1305 Jun 11 06:34 pi2.f90
-rw-rw-r-- 1 ijb ijb 537 Jun 14 08:39 pdb.f90~
-rw-rw-r-- 1 ijb ijb 1462 Jun 14 08:40 stuff~
-rw-rw-r-- 1 ijb ijb 1425 Jun 14 08:41 stuff
-rw-rw-r-- 1 ijb ijb 560 Jun 14 08:43 pdb.f90
-rwxrwxr-x 1 ijb ijb 20520 Jun 14 08:49 a.out
-rw-rw-r-- 1 ijb ijb 39 Jun 14 08:49 fort.44
ijb#ijb-Latitude-5410:~/work/stack$ cat fort.44
1009
1013
1017
ijb#ijb-Latitude-5410:~/work/stack$
Related
I am using SAS E.G. 7.1
I have the following code:
data time_dim_monthly;
do i = 0 to 200;
index_no = i;
year_date = year(intnx('month','01JAN2008'd,i));
month_date = month(intnx('month','01JAN2008'd,i));
SOM = put(intnx('month', '01JAN2008'd, i, 'b'),date11.) ;
EOM = put(intnx('month', '01JAN2008'd, i, 'e'),date11.) ;
days_in_month = INTCK('day',intnx('month', '01JAN2008'd, i, 'b'),
intnx('month', '01JAN2008'd, i, 'e'));
output;
end;
run;
followed by
proc sql;
create table calendar as
select year_date, month_date, index_no, put(today(),date11.) as todays_dt, som, eom
from time_dim_monthly
where put(today(),date11.) between som and eom
/*or datepart((INTNX('month',today(),-1)) between som and eom)*/
order by index_no
;
quit;
The output looks like this:
year_date month_date index_no todays_dt SOM EOM
2008 10 9 31-MAY-2017 01-OCT-2008 31-OCT-2008
2009 10 21 31-MAY-2017 01-OCT-2009 31-OCT-2009
2010 10 33 31-MAY-2017 01-OCT-2010 31-OCT-2010
2011 10 45 31-MAY-2017 01-OCT-2011 31-OCT-2011
2012 10 57 31-MAY-2017 01-OCT-2012 31-OCT-2012
2013 10 69 31-MAY-2017 01-OCT-2013 31-OCT-2013
2014 10 81 31-MAY-2017 01-OCT-2014 31-OCT-2014
2015 10 93 31-MAY-2017 01-OCT-2015 31-OCT-2015
2016 10 105 31-MAY-2017 01-OCT-2016 31-OCT-2016
2017 5 112 31-MAY-2017 01-MAY-2017 31-MAY-2017
2017 10 117 31-MAY-2017 01-OCT-2017 31-OCT-2017
2018 5 124 31-MAY-2017 01-MAY-2018 31-MAY-2018
2018 10 129 31-MAY-2017 01-OCT-2018 31-OCT-2018
2019 5 136 31-MAY-2017 01-MAY-2019 31-MAY-2019
2019 10 141 31-MAY-2017 01-OCT-2019 31-OCT-2019
2020 5 148 31-MAY-2017 01-MAY-2020 31-MAY-2020
2020 10 153 31-MAY-2017 01-OCT-2020 31-OCT-2020
2021 5 160 31-MAY-2017 01-MAY-2021 31-MAY-2021
2021 10 165 31-MAY-2017 01-OCT-2021 31-OCT-2021
2022 5 172 31-MAY-2017 01-MAY-2022 31-MAY-2022
2022 10 177 31-MAY-2017 01-OCT-2022 31-OCT-2022
2023 5 184 31-MAY-2017 01-MAY-2023 31-MAY-2023
2023 10 189 31-MAY-2017 01-OCT-2023 31-OCT-2023
2024 5 196 31-MAY-2017 01-MAY-2024 31-MAY-2024
While I'd expected that it would only give me one line:
2017 5 112 31-MAY-2017 01-MAY-2017 31-MAY-2017
Would appreciate help in understanding why this is happening.
Thank you
This is your mistake:
SOM = put(intnx('month', '01JAN2008'd, i, 'b'),date11.) ;
EOM = put(intnx('month', '01JAN2008'd, i, 'e'),date11.) ;
where put(today(),date11.) between som and eom
put creates a character variable. You shouldn't really use between with character variables unless you really know what you're doing (it will compare in alphabetical order).
Use numeric variables. Get rid of the put. Instead use a format statement to make the variables look nice, but still be numeric.
SOM = intnx('month', '01JAN2008'd, i, 'b') ;
EOM = intnx('month', '01JAN2008'd, i, 'e') ;
format som eom date11.;
later
where today() between som and eom
I have a dataset and I would like to create a rolling conditional statement row by row (not sure what the exact term is called in SAS). I know how to do it in Excel but not sure on how it can be executed on SAS. The following is the dataset and what I would like to achieve.
Data set
----A---- | --Date-- | Amount |
11111 Jan 2015 1
11111 Feb 2015 1
11111 Mar 2015 2
11111 Apr 2015 2
11111 May 2015 2
11111 Jun 2015 1
11112 Jan 2015 2
11112 Feb 2015 1
11112 Mar 2015 1
11112 Apr 2015 4
11112 May 2015 3
11112 Jun 2015 1
I would like to 2 columns by the name of 'X' and 'Frequency' which would provide for each Column 'A' and 'Date' whether the Amount has gone up or down and by how much. See sample output below.
----A---- | --Date-- | Amount | --X-- | Frequency |
11111 Jan 2015 1 0 0
11111 Feb 2015 1 0 0
11111 Mar 2015 2 Add 1
11111 Apr 2015 2 0 0
11111 May 2015 2 0 0
11111 Jun 2015 1 Drop 1
11112 Jan 2015 2 0 0
11112 Feb 2015 1 Drop 1
11112 Mar 2015 1 0 0
11112 Apr 2015 4 Add 3
11112 May 2015 3 Drop 1
11112 Jun 2015 1 Drop 2
Example using Lag1():
Data A;
input date monyy7. Y $;
datalines;
Jan2015 1
Feb2015 1
Mar2015 2
Apr2015 2
May2015 2
Jun2015 1
Jan2015 2
Feb2015 1
Mar2015 1
Apr2015 4
May2015 3
Jun2015 1
;
data B;
set A;
lag_y=lag1(Y);
if lag_y = . then X ='missing';
if Y = lag_y then X='zero';
if Y > lag_y and lag_y ^= . then x='add';
if Y < lag_y then x= 'drop';
freq= abs(Y-lag_y);
run;
Output:
Obs date Y lag_y X freq
1 20089 1 missing
2 20120 1 1 zero 0
3 20148 2 1 add 1
4 20179 2 2 zero 0
5 20209 2 2 zero 0
6 20240 1 2 drop 1
7 20089 2 1 add 1
8 20120 1 2 drop 1
9 20148 1 1 zero 0
10 20179 4 1 add 3
11 20209 3 4 drop 1
12 20240 1 3 drop 2
I fetch my documents from MongoDB as below:
{
"amount": 1200,
"date_closed": "2012-07-02 17:00:00"
},
{
"amount": 0,
"date_closed": "2012-08-03 16:00:00"
},
{
"amount": 0,
"date_closed": "2012-08-04 20:00:00"
},
{
"amount": 0,
"date_closed": "2012-08-04 22:00:00"
}
I get a timestamp like 1343287040 from user(parameter called user_time) which refers to date datetime.datetime(2012, 7, 26, 11, 47, 20).
This is my solution to fill gaps:
Now I create a date format YYYY-mm-dd 00:00:00 by the below code:
hourly_date = str(datetime.datetime.fromtimestamp(user_time).year) + '-' + str(datetime.datetime.fromtimestamp(user_time).month) + '-' + str(datetime.datetime.fromtimestamp(user_time).day) + ' 00:00:00'
user_time is the start date. Now I generate hourly records from user_time until today. The below code generate date range in hour with the format I want:
date_range = pandas.date_range(start=hourly_date, end=datetime.datetime.today(), freq='H')
date_range = date_range.values.astype('<M8[h]').astype(str)
hourly_date = []
for i_hourly in date_range:
tmp_date = pandas.to_datetime(str(i_hourly)).strftime('%Y-%m-%d %H:00:00')
hourly_date.append(tmp_date)
After creating a template date range in hour from user_time until today, I compare it with my date_closed field which is returned from MongoDB:
records_len = len(records)
for i_hourly in hourly_date:
i = 0
for record in records:
i += 1
if i_hourly in record['date_closed']:
break # break from innermost loop
elif records_len == i and i_hourly not in record['date_closed']:
records.append({"amount": 0, "date_closed": i_hourly})
records contains many field lets say from 2012 until today, the problem I want to solve is to see is the date and hour is missing from returned document. If it is missing then we need to add it to records to fill the gap, otherwise I should break from the innermost loop.
This code takes about 57 seconds! This is a huge amount of time. Is there a better an more efficient way to generate date gaps in hour?
EDIT:
amount date_closed
0 21800 2015-07-21 10:00:00
1 5450 2015-07-05 04:00:00
2 571160 2015-06-22 12:00:00
3 65400 2015-06-15 12:00:00
4 10900 2015-06-15 09:00:00
5 109000 2015-06-14 07:00:00
6 109000 2015-06-14 04:00:00
7 1193550 2015-06-11 06:00:00
8 10900 2015-06-11 05:00:00
9 21800 2015-06-09 10:00:00
10 10900 2015-05-31 05:00:00
11 0 2015-05-30 09:00:00
12 114450 2015-05-19 13:00:00
13 261600 2015-05-19 08:00:00
14 108000 2015-05-11 08:00:00
15 2180 2015-05-11 07:00:00
16 344870 2015-05-05 13:00:00
17 70850 2015-05-05 12:00:00
18 5450 2015-05-05 05:00:00
19 109000 2015-05-03 12:00:00
20 327000 2015-05-03 11:00:00
21 310650 2015-04-30 05:00:00
22 38150 2015-04-28 13:00:00
23 26160 2015-04-27 07:00:00
24 109000 2015-04-22 12:00:00
25 97200 2015-03-09 08:00:00
26 21800 2015-07-11 05:00:00
27 26160 2015-05-20 05:00:00
28 37800 2015-03-03 07:00:00
29 130800 2015-06-29 06:00:00
.. ... ...
161 2180 2015-05-25 09:00:00
162 26160 2015-05-09 11:00:00
163 108000 2015-03-03 11:00:00
164 3337200 2014-09-13 05:00:00
165 5249880 2014-09-10 05:00:00
166 712800 2014-08-10 09:00:00
167 151200 2015-02-23 06:00:00
168 48600 2014-08-10 11:00:00
169 6540 2015-04-19 10:00:00
170 172800 2014-09-01 09:00:00
171 1370520 2014-10-15 09:00:00
172 421200 2014-07-26 09:00:00
173 86400 2015-03-01 12:00:00
174 118800 2015-02-21 12:00:00
175 97200 2014-09-17 07:00:00
176 54500 2015-04-23 07:00:00
177 1185840 2014-09-09 06:00:00
178 119016 2015-02-18 09:00:00
179 32400 2014-11-05 08:00:00
180 345600 2014-08-09 10:00:00
181 151200 2015-02-18 12:00:00
182 168480 2014-10-09 06:00:00
183 5668920 2014-10-04 21:00:00
184 669600 2014-08-06 12:00:00
185 194400 2014-08-02 07:00:00
186 313920 2015-06-23 08:00:00
187 6540 2015-05-04 09:00:00
188 669600 2014-07-23 10:00:00
189 64800 2015-01-22 06:00:00
190 669600 2014-08-25 04:00:00
[191 rows x 2 columns]
It shows that I just have 191 records these are returned from Mongo! I wanted to see a list of hourly generated list which is around 121000 records and 191 records of it will be filled by the above code.
The problem is that I suppose these two list are not merged together.
You can first make date_closed column as the index and then .reindex according to hourly_date_rng to populate the missing records.
Here is an example.
import json
import pandas as pd
json_data = [
{
"amount": 0,
"date_closed": "2012-08-04 16:00:00"
},
{
"amount": 0,
"date_closed": "2012-08-04 20:00:00"
},
{
"amount": 0,
"date_closed": "2012-08-04 22:00:00"
}
]
df = pd.read_json(json.dumps(json_data), orient='records')
df
amount date_closed
0 0 2012-08-03 16:00:00
1 0 2012-08-04 20:00:00
2 0 2012-08-04 22:00:00
The hourly_date_rng looks like this
hourly_date_rng = pd.date_range(start='2012-08-04 12:00:00', end='2012-08-4 23:00:00', freq='H')
hourly_date_rng.name = 'date_closed'
hourly_date_rng
DatetimeIndex(['2012-08-04 12:00:00', '2012-08-04 13:00:00',
'2012-08-04 14:00:00', '2012-08-04 15:00:00',
'2012-08-04 16:00:00', '2012-08-04 17:00:00',
'2012-08-04 18:00:00', '2012-08-04 19:00:00',
'2012-08-04 20:00:00', '2012-08-04 21:00:00',
'2012-08-04 22:00:00', '2012-08-04 23:00:00'],
dtype='datetime64[ns]', name='date_closed', freq='H', tz=None)
To align the index and fill the gaps
# make the column datetime object instead of string
df['date_closed'] = pd.to_datetime(df['date_closed'])
# align the index using .reindex
df.set_index('date_closed').reindex(hourly_date_rng).fillna(0).reset_index()
date_closed amount
0 2012-08-04 12:00:00 0
1 2012-08-04 13:00:00 0
2 2012-08-04 14:00:00 0
3 2012-08-04 15:00:00 0
4 2012-08-04 16:00:00 0
5 2012-08-04 17:00:00 0
6 2012-08-04 18:00:00 0
7 2012-08-04 19:00:00 0
8 2012-08-04 20:00:00 0
9 2012-08-04 21:00:00 0
10 2012-08-04 22:00:00 0
11 2012-08-04 23:00:00 0
Edit:
To convert back the result to JSON.
result = df.set_index('date_closed').reindex(hourly_date_rng).fillna(0).reset_index()
# maybe convert date_closed column to string first
result['date_closed'] = pd.DatetimeIndex(result['date_closed']).to_native_types()
# to json function
json_result = result.to_json(orient='records')
# print out the data with pretty print
from pprint import pprint
pprint(json.loads(json_result))
[{'amount': 0.0, 'date_closed': '2012-08-04 12:00:00'},
{'amount': 0.0, 'date_closed': '2012-08-04 13:00:00'},
{'amount': 0.0, 'date_closed': '2012-08-04 14:00:00'},
{'amount': 0.0, 'date_closed': '2012-08-04 15:00:00'},
{'amount': 0.0, 'date_closed': '2012-08-04 16:00:00'},
{'amount': 0.0, 'date_closed': '2012-08-04 17:00:00'},
{'amount': 0.0, 'date_closed': '2012-08-04 18:00:00'},
{'amount': 0.0, 'date_closed': '2012-08-04 19:00:00'},
{'amount': 0.0, 'date_closed': '2012-08-04 20:00:00'},
{'amount': 0.0, 'date_closed': '2012-08-04 21:00:00'},
{'amount': 0.0, 'date_closed': '2012-08-04 22:00:00'},
{'amount': 0.0, 'date_closed': '2012-08-04 23:00:00'}]
I have this data:
id test test_date value
1 A 02/06/2014 12:26 11
1 B 02/06/2014 12:26 23
1 C 02/06/2014 13:17 43
1 D 02/06/2014 13:17 65
1 E 02/06/2014 13:17 34
1 F 02/06/2014 13:17 64
1 A 05/06/2014 15:14 234
1 B 05/06/2014 15:14 646
1 C 05/06/2014 16:50 44
1 E 05/06/2014 16:50 55
2 E 05/06/2014 16:50 443
2 F 05/06/2014 16:50 22
2 G 05/06/2014 16:59 445
2 B 05/06/2014 20:03 66
2 C 05/06/2014 20:03 77
2 D 05/06/2014 20:03 88
2 E 05/06/2014 20:03 44
2 F 05/06/2014 20:19 33
2 G 05/06/2014 20:19 22
I would like to transform this data into wide format like this:
id date A B C D E F G
1 02/06/2014 12:26 11 23 43 65 34 64 .
1 05/06/2014 15:14 234 646 44 . 55 . .
2 05/06/2014 16:50 . . . . 443 22 445
2 05/06/2014 20:03 . 66 77 88 44 33 22
I am using reshape command in Stata, but it is not producing required results:
reshape wide test_date value, i(id) j(test) string
Any idea how to do this?
UPDATE:
You're right that we need this missvar. I try to create this by programming, but failed. Let say with-in 2 hours of test date the batch will consider same. We have only 7 tests (A,B,C,D,E,F,G). First I try to find the time difference;
bysort id: gen diff_bd = (test_date[_n] - test_date[_n-1])/(1000*60*60)
bysort id: generate missvar = _n if diff_bd <= 2
#jfeigenbaum has given part of the answer.
The problem I see is that you are missing a variable that identifies relevant sub-groups. These sub-groups seem to be bounded by test taking values A - G. But I may be wrong.
I've included this variable in the example data set, and named it missvar. I forced this variable into the data set believing it identifies groups that, although implicit in your original post, are important for your analysis.
clear
set more off
*----- example data -----
input ///
id str1 test str30 test_date value missvar
1 A "02/06/2014 12:26" 11 1
1 B "02/06/2014 12:26" 23 1
1 C "02/06/2014 13:17" 43 1
1 D "02/06/2014 13:17" 65 1
1 E "02/06/2014 13:17" 34 1
1 F "02/06/2014 13:17" 64 1
1 A "05/06/2014 15:14" 234 2
1 B "05/06/2014 15:14" 646 2
1 C "05/06/2014 16:50" 44 2
1 E "05/06/2014 16:50" 55 2
2 E "05/06/2014 16:50" 443 1
2 F "05/06/2014 16:50" 22 1
2 G "05/06/2014 16:59" 445 1
2 B "05/06/2014 20:03" 66 2
2 C "05/06/2014 20:03" 77 2
2 D "05/06/2014 20:03" 88 2
2 E "05/06/2014 20:03" 44 2
2 F "05/06/2014 20:19" 33 2
2 G "05/06/2014 20:19" 22 2
end
gen double tdate = clock( test_date, "DM20Yhm")
format %tc tdate
drop test_date
list, sepby(id)
*----- what you want ? -----
reshape wide value, i(id missvar tdate) j(test) string
collapse (min) tdate value?, by(id missvar)
rename value* *
list
There should be some way of identifying the groups programmatically. Relying on the original sort order of the data is one way, but it may not be the safest. It may be the only way, but only you know that.
Edit
Regarding your comment and the "missing" variable, one way to create it is:
// one hour is 3600000 milliseconds
bysort id (tdate): gen batch = sum(tdate - tdate[_n-1] > 7200000)
For your example data, this creates a batch variable identical to my missvar. You can also use time-series operators.
Let me emphasize the need for you to be carefull when presenting your example data. It must be representative of the real one or you might get code that doesn't suit it; that includes the possibility that you don't notice it because Stata gives no error.
For example, if you have the same test, applied to the same id within the two-hour limit, then you'll lose information with this code (in the collapse). (This is not a problem in your example data.)
Edit 2
In response to another question found in the comments:
Suppose a new observation for person 1, such that he receives a repeated test within the two-hour limit, but at a different time :
1 A "02/06/2014 12:26" 11 1 // old observation
1 B "02/06/2014 12:26" 23 1
1 A "02/06/2014 12:35" 99 1 // new observation
1 C "02/06/2014 13:17" 43 1
1 D "02/06/2014 13:17" 65 1
1 E "02/06/2014 13:17" 34 1
1 F "02/06/2014 13:17" 64 1
1 A "05/06/2014 15:14" 234 2
1 B "05/06/2014 15:14" 646 2
1 C "05/06/2014 16:50" 44 2
1 E "05/06/2014 16:50" 55 2
Test A is applied at 12:26 and at 12:35. Reshape will have no problem with this, but the collapse will discard information because it is taking the minimum values amongst the id missvar groups; notice that for the variable valueA, new information (the 99) will be lost (so too happens with all other variables, but you are explicit about wanting to discard that). After the reshape but before the collapse you get:
. list, sepby(id)
+--------------------------------------------------------------------------------------------------+
| id missvar tdate valueA valueB valueC valueD valueE valueF valueG |
|--------------------------------------------------------------------------------------------------|
1. | 1 1 02jun2014 12:26:00 11 23 . . . . . |
2. | 1 1 02jun2014 12:35:00 99 . . . . . . |
3. | 1 1 02jun2014 13:17:00 . . 43 65 34 64 . |
4. | 1 2 05jun2014 15:14:00 234 646 . . . . . |
5. | 1 2 05jun2014 16:50:00 . . 44 . 55 . . |
|--------------------------------------------------------------------------------------------------|
6. | 2 1 05jun2014 16:50:00 . . . . 443 22 . |
7. | 2 1 05jun2014 16:59:00 . . . . . . 445 |
8. | 2 2 05jun2014 20:03:00 . 66 77 88 44 . . |
9. | 2 2 05jun2014 20:19:00 . . . . . 33 22 |
+--------------------------------------------------------------------------------------------------+
Running the complete code confirms what we just said:
. list, sepby(id)
+--------------------------------------------------------------------------+
| id missvar tdate A B C D E F G |
|--------------------------------------------------------------------------|
1. | 1 1 02jun2014 12:26:00 11 23 43 65 34 64 . |
2. | 1 2 05jun2014 15:14:00 234 646 44 . 55 . . |
|--------------------------------------------------------------------------|
3. | 2 1 05jun2014 16:50:00 . . . . 443 22 445 |
4. | 2 2 05jun2014 20:03:00 . 66 77 88 44 33 22 |
+--------------------------------------------------------------------------+
Suppose now a new observation for person 1, such that he receives a repeated test within the two-hour limit, but at the same time:
1 A "02/06/2014 12:26" 11 1 // old observation
1 B "02/06/2014 12:26" 23 1
1 A "02/06/2014 12:26" 99 1 // new observation
1 C "02/06/2014 13:17" 43 1
1 D "02/06/2014 13:17" 65 1
1 E "02/06/2014 13:17" 34 1
1 F "02/06/2014 13:17" 64 1
1 A "05/06/2014 15:14" 234 2
1 B "05/06/2014 15:14" 646 2
1 C "05/06/2014 16:50" 44 2
1 E "05/06/2014 16:50" 55 2
Then the reshape won't work. Stata complains:
values of variable test not unique within id missvar tdate
and with reason. The error is clear in signalling the problem. (If not clear, go back to help reshape and work out some exercises.) The request makes no sense given the functioning of the command.
Finally, note it's relatively easy to check if something will work or not: just try it! All that was necessary in this case was to modify a bit the example data. Go back to help files and manuals, if necessary.
The command is slightly misspecified. You want to reshape value. Look at the output you want and notice the observations are uniquely identified by id and test_date. Therefore, they should be in the i option.
reshape wide value, i(id test_date) j(test) string
This yields something close you what you want, you just need to rename a few variables to get exactly the output. Specifically:
rename test_date date
renpfix value
I'm writing a program which gets the list of the files in an FTP server.
So it starts like that :
inet_parse:address(IP),
inets:start(),
{ok, Pid} = inets:start(ftpc, [{host, IP}]),
ftp:user(Pid, "anonymous", "lol#lol.com"),
ftp:pwd(Pid),
Content = ftp:ls(Pid),
The Content variable is a tuple, and returns something like that :
7> Content = ftp:ls(Pid).
{ok,"-rw-r--r-- 1 21 21 0 Oct 14 21:47 bar\r\ndrwxr-xr-x 2 21 21 4096 Oct 14 21:47 baz\r\n-rw-r--r-- 1 21 21 0 Oct 14 21:47 foo\r\nlrwxrwxrwx 1 1000 10 21 Sep 10 14:22 musique -> /home/foo/musique\r\n"}
I know I can convert this tuple to a list, but I want to know if I can get each element (which are separated by a \r\n) so I can access them individually (in order to save them in a database, for example).
Thanks again.
1> {ok,S} = {ok,"-rw-r--r-- 1 21 21 0 Oct 14 21:47 bar\r\ndrwxr-xr-x 2 21 21 4096 Oct 14 21:47 baz\r\n-rw-r--r-- 1 21 21 0 Oct 14 21:47 foo\r\nlrwxrwxrwx 1 1000 10 21 Sep 10 14:22 musique -> /home/foo/musique\r\n"}.
{ok,"-rw-r--r-- 1 21 21 0 Oct 14 21:47 bar\r\ndrwxr-xr-x 2 21 21 4096 Oct 14 21:47 baz\r\n-rw-r--r-- 1 21 21 0 Oct 14 21:47 foo\r\nlrwxrwxrwx 1 1000 10 21 Sep 10 14:22 musique -> /home/foo/musique\r\n"}
2> Split = string:tokens(S,"\r\n").
["-rw-r--r-- 1 21 21 0 Oct 14 21:47 bar",
"drwxr-xr-x 2 21 21 4096 Oct 14 21:47 baz",
"-rw-r--r-- 1 21 21 0 Oct 14 21:47 foo",
"lrwxrwxrwx 1 1000 10 21 Sep 10 14:22 musique -> /home/foo/musique"]
3>
There is a function string:tokens/2, however it takes a string as a list of separators so each character of the string is a separator. Calling
string:tokens(Content, "\r\n")
will work for your case, but in general it does not exactly what you need. Here is an example of function which takes a string (multicharacter) as a separator:
tokens(Str, Separator) ->
tokens(Str, Separator, []).
tokens(Str, Separator, Acc) ->
case string:str(Str, Separator) of
0 ->
lists:reverse([Str | Acc]);
N ->
Token = string:substr(Str, 1, N-1),
Str1 = string:substr(Str, N + string:len(Separator)),
tokens(Str1, Separator, [Token | Acc])
end.
{ok, Content} = ftp:ls(Pid),
string:tokens(Content, "\r\n").