How to split array of strings (string [])? - d

I have got array of strings
string [] foo
array consist next data:
2014 01 02 234 124
2014 01 03 640 710
2014 01 04 234 921
I need new array of strings date, that would include only date (yyyy-MM-dd). How can I do it?

Here's a functional approach:
string[] dates = foo.map!(line => line.split()[0..3].join("-")).array();

If you know that your dates always are yyyy MM dd than just slice the string.
translate is in std.string
dchar[dchar] translateTable = [' ' : '-'];
auto dates = foo.map!(line => translate(line[0..10], translateTable));

Related

Extracting the same groups from different regex patterns for different date formats

Based on a data frame like
import pandas as pd
string_1 = 'for urgent evaluation/treatment till first visit with Dr. Toney Winkler IN EIGHT WEEKS on 24 Jan 2001.'
string_2 = '03/25/93 Total time of visit (in minutes):'
string_3 = 'April 11, 1990 CPT Code: 90791: No medical services'
df = pd.Series([string_1,string_2,string_3])
each of the following statements succesfully extracts the date of exactly one row:
print(df.str.extract(r'((?P<month>\d{1,2})[/-](?P<day>\d{1,2})[/-](?P<year>\d{2,4}))').dropna())
0 month day year
1 03/25/93 03 25 93
print(df.str.extract(r'(?P<month>(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sept|Oct|Nov|Dec)[a-z\.]*)[\s\.\-\,](?P<day>\d{2})[\-\,\s]*(?P<year>\d{4})').dropna())
month day year
2 April 11 1990
print(df.str.extract(r'((?P<day>\d{2})\s(?P<month>(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sept|Oct|Nov|Dec)[a-z\.]*)[\s\.\-\,]*(?P<year>\d{4}))').dropna())
0 day month year
0 24 Jan 2001 24 Jan 2001
How can the statements be combined to create the data frame
day month year
0 24 Jan 2001
1 25 03 93
2 11 April 1990
Where the indices need to be the original indices?
You may use PyPi regex module (install using pip install regex) and join the patterns with OR inside a branch reset group:
import regex
import pandas as pd
string_1 = 'for urgent evaluation/treatment till first visit with Dr. Toney Winkler IN EIGHT WEEKS on 24 Jan 2001.'
string_2 = '03/25/93 Total time of visit (in minutes):'
string_3 = 'April 11, 1990 CPT Code: 90791: No medical services'
df = pd.Series([string_1,string_2,string_3])
pat1 = r'(?P<month>\d{1,2})[/-](?P<day>\d{1,2})[/-](?P<year>\d{2,4})'
pat2 = r'(?P<month>(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sept|Oct|Nov|Dec)[a-z.]*)[\s.,-](?P<day>\d{2})[-,\s]*(?P<year>\d{4})'
pat3 = r'(?P<day>\d{2})\s(?P<month>(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sept|Oct|Nov|Dec)[a-z.]*)[\s.,-]*(?P<year>\d{4})'
rx = regex.compile(r"(?|{}|{}|{})".format(pat1,pat2,pat3))
empty_val = pd.Series(["","",""], index=['month','day','year'])
def extract_regex(seq):
m = rx.search(seq)
if m:
return pd.Series(list(m.groupdict().values()), index=['month','day','year'])
else:
return empty_val
df2 = df.apply(extract_regex)
Output:
>>> df2
month day year
0 Jan 24 2001
1 03 25 93
2 April 11 1990
string_1 = 'for urgent evaluation/treatment till first visit with Dr. Toney Winkler IN EIGHT WEEKS on 24 Jan 2001.'
string_2 = '03/25/93 Total time of visit (in minutes):'
string_3 = 'April 11, 1990 CPT Code: 90791: No medical services'
df = pd.DataFrame([string_1,string_2,string_3])
patterns = [r'(?P<day>\d{1,2}) (?P<month>(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)) (?P<year>\d{4})',
r'(?P<month>\d{1,2})[/-](?P<day>\d{1,2})[/-](?P<year>\d{2,4})',
r'(?P<month>(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sept|Oct|Nov|Dec)[a-z\.]*) (?P<day>\d{2}), (?P<year>\d{4})']
def extract_date(s):
result = None, None, None
for p in patterns:
m = re.search(p, s)
if m:
result = m.group('year'), m.group('month'), m.group('day')
break
return result
df['year'], df['month'], df['day'] = zip(*df[0].apply(lambda s: extract_date(s)))

How to rename a column name to a new value in a dataframe if the column names are dynamic

I have csv file with column names changing based on month and year but has keyword like 'sales' 'product' etc. Is there a way to rename the column to a fixed value using python rename by searching the keyword
Sample column names would be 2019 May sales Tv, 2018 April sales Fridge
eg
nil
df_nw = df.rename(df.filter(like='Sales').columns.values
Current data:
column1 column2 2019AprilSalesTV 2018ActualSalesTV
X BBBB 7766 60
Y CCCC 10 20
Z LLLLL 60 65
K TTTTT 10 67
New Data:
column1 column2 Sales ActualSales
X BBBB 7766 60
Y CCCC 10 20
Z LLLLL 60 65
K TTTTT 10 67
You can do:
> clean_colname = lambda x: re.sub(r'(^\w+(?<!Actual))(Sales)', r'\2',
re.sub(r'^\d+|TV$', r'', x))
> df_nw.rename(clean_colname, axis=1)
column2 Sales ActualSales
column1
X BBBB 7766 60
Y CCCC 10 20
Z LLLLL 60 65
K TTTTT 10 67

Convert UTC to local time

I have a fairly large dataset that has UTC timestamps. I need to convert the UTC to local (central) timezone..I tried my google-fu, to no avail.
Dataframe is below.
STID UTCTIME TRES VRIR RETY REWT WEDN DELP WDIR DERT RTAX GAIN DEVD
0 ARFW 2012-01-01T00:00 28.47 65 -999 -999 41 41 289 12 20 0 0
1 ARFW 2012-01-01T00:30 28.55 62 -999 -999 32 33 359 23 31 0 0
2 ARFW 2012-01-01T01:00 28.59 60 -999 -999 29 30 345 19 26 0 0
3 ARFW 2012-01-01T01:30 28.63 60 -999 -999 24 25 339 20 27 0 0
4 ARFW 2012-01-01T02:00 28.66 58 -999 -999 22 25 335 24 30 0 0
#Define time as UTC
data_df['UTCTIME'] = pd.to_datetime(data_df['UTCTIME'], utc= True)
data_df.dtypes
STID object
UTCTIME datetime64[ns]
TRES float64
.
.
.
GAIN float64
DEVD int64
dtype: object
Here's the code I'm trying to use:
import pytz, datetime
utc = pytz.utc
fmt = '%Y-%m-%d %H:%M'
CSTM= pytz.timezone('US/Central')
local = pytz.timezone('US/Central')
dt = datetime.datetime.strptime(data_df['UTCTIME'], fmt)
CSTM_dt = CSTM.localize(dt)
and the error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-14-f10301993777> in <module>()
4 CSTM = pytz.timezone('US/Central')
5 local = pytz.timezone('US/Central')
----> 6 dt = datetime.datetime.strptime(data_df['UTCTIME'], fmt)
7 CSTM = CSTM.localize(dt)
TypeError: must be string, not Series
Also, there are duplicate entries for UTCTIME...I can't comprehend indexing...and I believe indexing could be one issue here..I am not sure what is missing here.
In your code in the strptime line you do not use the actual date string from your dataframe, but the literal string "UTCTIME".
from_zone = tz.gettz('UTCTIME')
to_zone = tz.tzlocal()
utc = datetime.strptime('UTCTIME', '%Y-%m-%dT%H:%M') # <====== STRING
utc = utc.replace(tzinfo = from_zone)
central = utc.astimezone(to_zone)
If you want to use that on your dataframe, you either need to loop over the UTCTIME column or create a helper function doing your conversion and use the DataFrame.column.apply(helperfunc) method.
To only test your code, replace the 'UTCTIME' string with a actual date string or use a variable with the string.

SAS tranpose long data that has multiple variables and values per group id?

I have data that is set up like this:
Pers Year Month Variable Value
AAA 2001 01 Var1 100
AAA 2001 01 Var2 200
AAA 2001 06 Var1 110
AAA 2001 06 Var2 210
AAA 2002 01 Var1 120
AAA 2002 01 Var2 .
BBB 2001 01 Var1 100
BBB 2001 01 Var2 200
BBB 2001 06 Var1 110
BBB 2001 06 Var2 210
BBB 2002 01 Var2 220
I would like data that looks like this:
Pers Year Month Var1 Var2
AAA 2001 01 100 200
AAA 2001 06 110 210
AAA 2002 01 120 .
BBB 2001 01 100 200
BBB 2001 06 110 210
BBB 2002 01 . 220
How can I do this in SAS, preferably with proc transpose or sql?
Note that in the input data, above, Person BBB is missing an observation for 2002-01 Var1, but the output data has returned a missing value in the last line, i.e. ".".
Using proc transpose is the obvious solution.
proc transpose data=yourdata out=yourdatat1(drop=_name_);
by pers year month;
id variable;
var value;
run;
Using proc sql, you can use case when logic to summarize the data like below:
proc sql;
create table yourdatat2 as
select
pers,
year,
month,
sum(case when variable = 'Var1' then value else . end) as Var1,
sum(case when variable = 'Var2' then value else . end) as Var2
from
yourdata
group by
pers,
year,
month
;
quit;

Create date variable from time (Using SAS 9.3)

Using SAS 9.3
I have files with two variables (Time and pulse), one file for each person.
I have the information which date they started measuring for each person.
Now I want create a date variable whom change date at midnight (of course), how?
Example from text files:
23:58:02 106
23:58:07 105
23:58:12 103
23:58:17 98
23:58:22 100
23:58:27 97
23:58:32 99
23:58:37 100
23:58:42 99
23:58:47 104
23:58:52 95
23:58:57 96
23:59:02 98
23:59:07 96
23:59:12 104
23:59:17 109
23:59:22 105
23:59:27 111
23:59:32 111
23:59:37 104
23:59:42 110
23:59:47 100
23:59:52 106
23:59:57 114
00:00:02 123
00:00:07 130
00:00:12 130
00:00:17 125
00:00:22 119
00:00:27 116
00:00:32 122
00:00:37 116
00:00:42 119
00:00:47 117
00:00:52 114
00:00:57 114
00:01:02 110
00:01:07 103
00:01:12 98
00:01:17 98
00:01:22 102
00:01:27 97
00:01:32 99
00:01:37 93
00:01:42 97
00:01:47 103
00:01:52 96
00:01:57 93
00:02:02 93
00:02:07 95
00:02:12 106
00:02:17 99
00:02:22 102
00:02:27 96
00:02:32 93
00:02:37 97
00:02:42 102
00:02:47 101
00:02:52 95
00:02:57 92
00:03:02 100
00:03:07 95
00:03:12 102
00:03:17 102
00:03:22 109
00:03:27 109
00:03:32 107
00:03:37 111
00:03:42 112
00:03:47 113
00:03:52 115
Regex:
\d{2}:\d{2}:\d{2} \d*
See here for an example and play around with regex:
https://regex101.com/r/xF1fQ5/1
EDIT: and have a look at the SAS regex tip sheet: http://support.sas.com/rnd/base/datastep/perl_regexp/regexp-tip-sheet.pdf
Something like this:
Date lastDate = startDate;
List<NData> ListData = new ArrayList<NData>();
for(FileData fdat:ListFileData){
Date nDate = this.getDate(lastDate,fdat.gettime());
NData ndata= new NData(ndate,fdat.getMeasuring());
LisData.add(nData);
lastDate = nDate;
}
.
.
.
.
function Date getDate(Date ld,String time){
Calendar cal = Calendar.getInstance();
cal.setTime(ld);
int year = cal.get(Calendar.YEAR);
int month = cal.get(Calendar.MONTH)+1;
int day = cal.get(Calendar.DAY_OF_MONTH);
int hourOfDay = this.getHour(time);
int minuteOfHour = this.getMinute(time);
org.joda.time.LocalDateTime lastDate = new org.joda.time.LocalDateTime(ld)
org.joda.time.LocalDateTime newDate = new org.joda.time.LocalDateTime(year,month,day,hourOfDay,minuteOfHour);
if(newDate.isBefore(lastDate)){
newDate = newDate.plusDays(1);
}
return newDate.toDate();
}
It's hard to provide a complete answer without sample code, but the SAS lag() function might be enough to do what you need. Your data step would include lines like the following, assuming your time variable is called time and your date variable is called date:
retain date;
if time < lag(time) then date = date + 1;
This assumes you never have any 24 hour gaps (but it appears you'd have to assume that anyway).
This answer also assumes that the time field is already in a SAS time format.