I would like some help on string functions - c++

I can only use string objects and string functions for this exercise. When I tried to put " over " << loser. It didn't output the loser team's name first. Also, I want the scores to be arranged as winnerscore << " to " << loserscore;
Basically,
cout << winner << " over " << loser << " " << winnerscore << " " << loserscore << endl;
Code:
#include <iostream>
#include <fstream>
#include <cstdlib>
#include <cctype>
using namespace std;
int main(){
ifstream fin;
string winner, winnerscore, loser, loserscore, hey, file;
size_t pos, blank, blank2;
fin.open("C:\\Users\\leewi\\Desktop\\Computer Programs & Projects\\C++\\BentleyCIS22B\\Ex5\\Ex5.txt");
if (!fin)
{
cout << "Can't Open File." << endl;
exit(0);
}
while(!fin.eof()){
getline(fin, hey);
pos = hey.find(' ');
winner = hey.substr(0, pos);
if(isalpha(hey[pos+1])){
blank = hey.find(' ');
winner += hey.substr(pos, blank);
}
else if(isdigit(hey[pos+1]))
{
blank = hey.find(',');
winnerscore = hey.substr(pos, blank);
}
if(isalpha(hey[pos+1])){
blank = hey.find(' ');
loser += hey.substr(pos, hey.length());
}
else if(isdigit(hey[pos+1]))
{
loserscore = hey.substr(pos, hey.length());
}
cout << winner << " over " << loser << " " << winnerscore << " to " << loserscore << endl;
}
fin.close();
return 0;
}
Output I Got:
Cincinnati over 27, Buffalo to 27, Buffalo 24
Detroit over 31, Cleve to 31, Cleveland 17
Kansas City over City 24, Oakland 7 31, Cleve to 31, Cleveland 17
Carolina over City 24, Oakland 7 35, Minnes to 35, Minnesota 10
Pittsburgh over City 24, Oakland 7 19, NY Jets to 19, NY Jets 6
Philadelphia over City 24, Oakland 7 31, Tampa Bay to 31, Tampa Bay 20
Green Bay over City 24, Oakland 7 Bay 19, Baltimore 17 31, Tampa Bay to 31, Tampa Bay 20
St. Lo over City 24, Oakland 7 Bay 19, Baltimore 17 Louis 38, Houston 13 31, Tampa Bay to 31, Tampa Bay 20
Denver over City 24, Oakland 7 Bay 19, Baltimore 17 Louis 38, Houston 13 35, Jack to 35, Jacksonville 19
Seattle over City 24, Oakland 7 Bay 19, Baltimore 17 Louis 38, Houston 13 20, Tenne to 20, Tennessee 13
New En over City 24, Oakland 7 Bay 19, Baltimore 17 Louis 38, Houston 13 England 30, New Orleans 27 20, Tenne to 20, Tennessee 13
San Fr over City 24, Oakland 7 Bay 19, Baltimore 17 Louis 38, Houston 13 England 30, New Orleans 27 Francisco 32, Arizona 20 20, Tenne to 20, Tennessee 13
Dallas over City 24, Oakland 7 Bay 19, Baltimore 17 Louis 38, Houston 13 England 30, New Orleans 27 Francisco 32, Arizona 20 31, Wash to 31, Washington 16
over City 24, Oakland 7 Bay 19, Baltimore 17 Louis 38, Houston 13 England 30, New Orleans 27 Francisco 32, Arizona 20 31, Wash to 31, Washington 16
Output I Want:
Cincinnati over Buffalo 27 to 24
Detroit over Cleveland 31 to 17
Kansas City over Oakland 24 to 7
Carolina over Minnesota 35 to 10
Pittsburgh over NY Jets 19 to 6
Philadelphia over Tampa Bay 31 to 20
Green Bay over Baltimore 19 to 17
St. Louis over Houston 38 to 13
Denver over Jacksonville 35 to 19
Seattle over Tennessee 20 to 13
New England over New Orleans 30 to 27
San Francisco over Arizona 32 to 20
Dallas over Washington 31 to 16

I see in the second if in the loop the same condition as in the first if in the loop. Did you forget something before the second if, maybe hey = hey.substr(pos); pos = hey.find(' ');
In the body of the same second if the calculated blank is not used.
Maybe there must be loser =, not loser +=.
#barmar gave you a useful advice.

Related

Replace ID repeated for next value available in the column ID

How can be modified dataframe below:
df <- data.frame (ID = c(1, 2, 2, 3), Name = c("Luke", "Pete", "Marie", "Frank"), Age = c(25, 34, 66, 45))
ID Name Age
1 Luke 25
2 Pete 34
2 Marie 66
3 Frank 45
To remove ID duplicated, and change it for next ID available.
ID Name Age
1 Luke 25
2 Pete 34
4 Marie 66
3 Frank 45
Thanks for help

How can I delete objects without complete data by using stata

I have a large panel dataset that looks as follows.
input id age high weight str6 daily_drink
1 10 110 35 water
1 10 110 35 coffee
1 11 120 38 water
1 11 120 38 coffee
1 12 130 50 water
1 12 130 50 coffee
2 11 118 31 water
2 11 118 31 coffee
2 11 118 31 milk
2 12 123 38 water
2 12 123 38 coffee
2 12 123 38 milk
3 10 98 55 water
3 11 116 36 water
3 12 129 39 water
4 12 125 40 water
end
However, I would like to use stata to keep objects with complete 10, 11, and 12 age. Looks like this.
id age high weight daily_drink
1 10 110 35 water
1 10 110 35 coffee
1 11 120 38 water
1 11 120 38 coffee
1 12 130 50 water
1 12 130 50 coffee
3 10 98 55 water
3 11 116 36 water
3 12 129 39 water
However, all the rows are without missing data, so I cannot simply delete the row with missing data. Is there any way to do it? Any suggestion will help. Thanks in advance.
You can use bysort and egen for this. Something along the lines of
bysort id: egen has10 = total(age==10)
bysort id: egen has11 = total(age==11)
bysort id: egen has12 = total(age==12)
keep if (has10 != 0) & (has11 != 0) & (has12 != 0)
should work (untested). See help egen for more info. Install gtools if you have very large data (ssc install gtools) and then replace egen by gegen.
A solution that works if 10, 11, 12 are the only age values possible:
bysort id (age) : gen nvals = sum(age != age[_n-1])
by id : replace nvals = nvals[_N]
keep if nvals == 3
Consider also
bysort id (age) : gen OK1 = age[1] == 10 & age[_N] == 12
by id : egen OK2 = max(age == 11)
keep if OK1 & OK2

Dates and Between statement

I am using SAS E.G. 7.1
I have the following code:
data time_dim_monthly;
do i = 0 to 200;
index_no = i;
year_date = year(intnx('month','01JAN2008'd,i));
month_date = month(intnx('month','01JAN2008'd,i));
SOM = put(intnx('month', '01JAN2008'd, i, 'b'),date11.) ;
EOM = put(intnx('month', '01JAN2008'd, i, 'e'),date11.) ;
days_in_month = INTCK('day',intnx('month', '01JAN2008'd, i, 'b'),
intnx('month', '01JAN2008'd, i, 'e'));
output;
end;
run;
followed by
proc sql;
create table calendar as
select year_date, month_date, index_no, put(today(),date11.) as todays_dt, som, eom
from time_dim_monthly
where put(today(),date11.) between som and eom
/*or datepart((INTNX('month',today(),-1)) between som and eom)*/
order by index_no
;
quit;
The output looks like this:
year_date month_date index_no todays_dt SOM EOM
2008 10 9 31-MAY-2017 01-OCT-2008 31-OCT-2008
2009 10 21 31-MAY-2017 01-OCT-2009 31-OCT-2009
2010 10 33 31-MAY-2017 01-OCT-2010 31-OCT-2010
2011 10 45 31-MAY-2017 01-OCT-2011 31-OCT-2011
2012 10 57 31-MAY-2017 01-OCT-2012 31-OCT-2012
2013 10 69 31-MAY-2017 01-OCT-2013 31-OCT-2013
2014 10 81 31-MAY-2017 01-OCT-2014 31-OCT-2014
2015 10 93 31-MAY-2017 01-OCT-2015 31-OCT-2015
2016 10 105 31-MAY-2017 01-OCT-2016 31-OCT-2016
2017 5 112 31-MAY-2017 01-MAY-2017 31-MAY-2017
2017 10 117 31-MAY-2017 01-OCT-2017 31-OCT-2017
2018 5 124 31-MAY-2017 01-MAY-2018 31-MAY-2018
2018 10 129 31-MAY-2017 01-OCT-2018 31-OCT-2018
2019 5 136 31-MAY-2017 01-MAY-2019 31-MAY-2019
2019 10 141 31-MAY-2017 01-OCT-2019 31-OCT-2019
2020 5 148 31-MAY-2017 01-MAY-2020 31-MAY-2020
2020 10 153 31-MAY-2017 01-OCT-2020 31-OCT-2020
2021 5 160 31-MAY-2017 01-MAY-2021 31-MAY-2021
2021 10 165 31-MAY-2017 01-OCT-2021 31-OCT-2021
2022 5 172 31-MAY-2017 01-MAY-2022 31-MAY-2022
2022 10 177 31-MAY-2017 01-OCT-2022 31-OCT-2022
2023 5 184 31-MAY-2017 01-MAY-2023 31-MAY-2023
2023 10 189 31-MAY-2017 01-OCT-2023 31-OCT-2023
2024 5 196 31-MAY-2017 01-MAY-2024 31-MAY-2024
While I'd expected that it would only give me one line:
2017 5 112 31-MAY-2017 01-MAY-2017 31-MAY-2017
Would appreciate help in understanding why this is happening.
Thank you
This is your mistake:
SOM = put(intnx('month', '01JAN2008'd, i, 'b'),date11.) ;
EOM = put(intnx('month', '01JAN2008'd, i, 'e'),date11.) ;
where put(today(),date11.) between som and eom
put creates a character variable. You shouldn't really use between with character variables unless you really know what you're doing (it will compare in alphabetical order).
Use numeric variables. Get rid of the put. Instead use a format statement to make the variables look nice, but still be numeric.
SOM = intnx('month', '01JAN2008'd, i, 'b') ;
EOM = intnx('month', '01JAN2008'd, i, 'e') ;
format som eom date11.;
later
where today() between som and eom

awk remove characters after number

I'm writing a AWK script to clean up a data stream so it's usable for analysis. Right now, I have the following issue.
I have a data stream that looks like this:
56, 2
64, 3
72, 0
80, -1-
88, -3--
96, 1
04, -2-
12, -7----
20, -1-
28, 7
36, 1
44, -3--
52, 3
60, 0
68, 0
76, -3--
84, -5---
92, 1
00, 4
08, 3
16, -2-
24, -3--
32, 1
40, 3
And I want to remove any dash that occurs after a number character, keep the minus in front of the numbers, so it would look like this:
56, 2
64, 3
72, 0
80, -1
88, -3
96, 1
04, -2
12, -7
20, -1
28, 7
36, 1
44, -3
52, 3
60, 0
68, 0
76, -3
84, -5
92, 1
00, 4
08, 3
16, -2
24, -3
32, 1
40, 3
I know how to do this with sed (sed 's/-*$//'), but how could this be done with only awk so i can use it in my script?
Cheers
One way, simply using sub():
awk '{ sub(/-+$/, "", $NF); print }' infile
It yields:
56, 2
64, 3
72, 0
80, -1
88, -3
96, 1
04, -2
12, -7
20, -1
28, 7
36, 1
44, -3
52, 3
60, 0
68, 0
76, -3
84, -5
92, 1
00, 4
08, 3
16, -2
24, -3
32, 1
40, 3
Using awk:
awk -F '-+$' '{$1=$1} 1' file
Using sed:
sed -i.bak 's/-*$//' file
Another possible solution :
awk -F "-+$" '{str=""; for(i=1; i<=NF; i++){str=str""$i} print str}' file
But, I think that sed is a better solution in this case.
Regards,
Idriss

Grouping data by value ranges

I have a csv file that shows parts on order. The columns include days late, qty and commodity.
I need to group the data by days late and commodity with a sum of the qty. However the days late needs to be grouped into ranges.
>56
>35 and <= 56
>14 and <= 35
>0 and <=14
I was hoping I could use a dict some how. Something like this
{'Red':'>56,'Amber':'>35 and <= 56','Yellow':'>14 and <= 35','White':'>0 and <=14'}
I am looking for a result like this
Red Amber Yellow White
STRSUB 56 60 74 40
BOTDWG 20 67 87 34
I am new to pandas so I don't know if this is possible at all. Could anyone provide some advice.
Thanks
Suppose you start with this data:
df = pd.DataFrame({'ID': ('STRSUB BOTDWG'.split())*4,
'Days Late': [60, 60, 50, 50, 20, 20, 10, 10],
'quantity': [56, 20, 60, 67, 74, 87, 40, 34]})
# Days Late ID quantity
# 0 60 STRSUB 56
# 1 60 BOTDWG 20
# 2 50 STRSUB 60
# 3 50 BOTDWG 67
# 4 20 STRSUB 74
# 5 20 BOTDWG 87
# 6 10 STRSUB 40
# 7 10 BOTDWG 34
Then you can find the status category using pd.cut. Note that by default, pd.cut splits the Series df['Days Late'] into categories which are half-open intervals, (-1, 14], (14, 35], (35, 56], (56, 365]:
df['status'] = pd.cut(df['Days Late'], bins=[-1, 14, 35, 56, 365], labels=False)
labels = np.array('White Yellow Amber Red'.split())
df['status'] = labels[df['status']]
del df['Days Late']
print(df)
# ID quantity status
# 0 STRSUB 56 Red
# 1 BOTDWG 20 Red
# 2 STRSUB 60 Amber
# 3 BOTDWG 67 Amber
# 4 STRSUB 74 Yellow
# 5 BOTDWG 87 Yellow
# 6 STRSUB 40 White
# 7 BOTDWG 34 White
Now use pivot to get the DataFrame in the desired form:
df = df.pivot(index='ID', columns='status', values='quantity')
and use reindex to obtain the desired order for the rows and columns:
df = df.reindex(columns=labels[::-1], index=df.index[::-1])
Thus,
import numpy as np
import pandas as pd
df = pd.DataFrame({'ID': ('STRSUB BOTDWG'.split())*4,
'Days Late': [60, 60, 50, 50, 20, 20, 10, 10],
'quantity': [56, 20, 60, 67, 74, 87, 40, 34]})
df['status'] = pd.cut(df['Days Late'], bins=[-1, 14, 35, 56, 365], labels=False)
labels = np.array('White Yellow Amber Red'.split())
df['status'] = labels[df['status']]
del df['Days Late']
df = df.pivot(index='ID', columns='status', values='quantity')
df = df.reindex(columns=labels[::-1], index=df.index[::-1])
print(df)
yields
Red Amber Yellow White
ID
STRSUB 56 60 74 40
BOTDWG 20 67 87 34
You can create a column in your DataFrame based on your Days Late column by using the map or apply functions as follows. Let's first create some sample data.
df = pandas.DataFrame({ 'ID': 'foo,bar,foo,bar,foo,bar,foo,foo'.split(','),
'Days Late': numpy.random.randn(8)*20+30})
Days Late ID
0 30.746244 foo
1 16.234267 bar
2 14.771567 foo
3 33.211626 bar
4 3.497118 foo
5 52.482879 bar
6 11.695231 foo
7 47.350269 foo
Create a helper function to transform the data of the Days Late column and add a column called Code.
def days_late_xform(dl):
if dl > 56: return 'Red'
elif 35 < dl <= 56: return 'Amber'
elif 14 < dl <= 35: return 'Yellow'
elif 0 < dl <= 14: return 'White'
else: return 'None'
df["Code"] = df['Days Late'].map(days_late_xform)
Days Late ID Code
0 30.746244 foo Yellow
1 16.234267 bar Yellow
2 14.771567 foo Yellow
3 33.211626 bar Yellow
4 3.497118 foo White
5 52.482879 bar Amber
6 11.695231 foo White
7 47.350269 foo Amber
Lastly, you can use groupby to aggregate by the ID and Code columns, and get the counts of the groups as follows:
g = df.groupby(["ID","Code"]).size()
print g
ID Code
bar Amber 1
Yellow 2
foo Amber 1
White 2
Yellow 2
df2 = g.unstack()
print df2
Code Amber White Yellow
ID
bar 1 NaN 2
foo 1 2 2
I know this is coming a bit late, but I had the same problem as you and wanted to share the function np.digitize. It sounds like exactly what you want.
a = np.random.randint(0, 100, 50)
grps = np.arange(0, 100, 10)
grps2 = [1, 20, 25, 40]
print a
[35 76 83 62 57 50 24 0 14 40 21 3 45 30 79 32 29 80 90 38 2 77 50 73 51
71 29 53 76 16 93 46 14 32 44 77 24 95 48 23 26 49 32 15 2 33 17 88 26 17]
print np.digitize(a, grps)
[ 4 8 9 7 6 6 3 1 2 5 3 1 5 4 8 4 3 9 10 4 1 8 6 8 6
8 3 6 8 2 10 5 2 4 5 8 3 10 5 3 3 5 4 2 1 4 2 9 3 2]
print np.digitize(a, grps2)
[3 4 4 4 4 4 2 0 1 4 2 1 4 3 4 3 3 4 4 3 1 4 4 4 4 4 3 4 4 1 4 4 1 3 4 4 2
4 4 2 3 4 3 1 1 3 1 4 3 1]