POA "weird" outcome (IMHO) - pvlib

I have gathered satellite data (every 5 minutes, from "Solcast") for GHI, DNI and DHI and I use pvlib to get the POA value.
The pvlib function I use:
def get_irradiance(site_location, date, tilt, surface_azimuth, ghi, dni, dhi):
times = pd.date_range(date, freq='5min', periods=12*24, tz=site_location.tz)
solar_position = site_location.get_solarposition(times=times)
POA_irradiance = irradiance.get_total_irradiance(
surface_tilt=tilt,
surface_azimuth=surface_azimuth,
ghi=ghi,
dni=dni,
dhi=dhi,
solar_zenith=solar_position['apparent_zenith'],
solar_azimuth=solar_position['azimuth'])
return pd.DataFrame({'GHI': ghi,
'DNI': dni,
'DHI': dhi,
'POA': POA_irradiance['poa_global']})
When I compare GHI and POA values for 12 June 2022 and 13 June 2022 is see the POA value for 12 June is significantly behind the GHI. The location is in The Netherlands, I use a tilt of 12.5 degrees and an azimuth of 180 degrees. Here is the outcome (per hour, from 6:00 - 20:00):
12 Juni 2022
GHI DNI DHI POA
6 86.750000 312.750000 40.500000 40.277034
7 224.583333 543.000000 69.750000 71.130218
8 366.833333 598.833333 113.833333 178.974322
9 406.083333 182.000000 304.000000 348.272844
10 532.166667 266.750000 346.666667 445.422584
11 725.666667 640.416667 226.500000 509.360716
12 688.500000 329.416667 409.583333 561.630762
13 701.333333 299.750000 439.333333 570.415438
14 725.416667 391.666667 387.750000 532.529676
15 753.916667 629.166667 244.333333 407.665794
16 656.750000 599.750000 215.333333 293.832376
17 381.833333 36.416667 359.416667 356.317883
18 411.750000 569.166667 144.750000 144.254438
19 269.750000 495.916667 102.500000 102.084439
20 134.583333 426.416667 51.583333 51.370738
And
13 June 2022
GHI DNI DHI POA
6 5.666667 0.000000 5.666667 5.616296
7 113.500000 7.750000 111.416667 111.948831
8 259.500000 106.833333 208.416667 256.410392
9 509.166667 637.750000 150.583333 514.516389
10 599.333333 518.666667 240.583333 619.050821
11 745.250000 704.500000 195.583333 788.773772
12 757.250000 549.666667 292.000000 798.739403
13 742.000000 464.583333 335.000000 778.857394
14 818.250000 667.750000 243.000000 869.972769
15 800.750000 776.833333 166.916667 852.559043
16 699.000000 733.666667 167.166667 730.484502
17 582.666667 729.166667 131.916667 593.802853
18 449.166667 756.583333 83.500000 434.958210
19 290.083333 652.666667 68.666667 254.048655
20 139.833333 466.916667 48.333333 97.272684
What can be an explanation of the significantly low POA compared to the GHI values on 12 June?
I have this outcome with other days too: some days have a POA much closer to the GHI than other days. Maybe this is "normal behaviour" and I do not reckon with weather influences which maybe important...
I use the POA to do a PR (Performance Ratio) calculation but I do not get "trusted" results..
Hope someone can shine a light on these values.
Kind regards,
Oscar
The Netherlands.

I'm really sorry, although the weather is unpredictable in the Netherlands I made a very big booboo in using dd-mm-yyyy format instead of mm-dd-yyyy. Something I overlooked for a long time...(I never had used mm-dd-yyyy, but that's a lame excuse...)
Really sorry, hope you did not think about it too long..
Thank you anyway for reacting!
I've good values now!
Oscar (shame..)

Related

Regex Pattern for Dates in String

Need help debugging Regex
I have a string column in pandas data frame that contains dates formatted as follows. And there is only one such date in each string.
semicolons are only used to deliminate dates here and not present in actual strings
04/20/2009; 04/20/09; 4/20/09; 4/3/09; 011/14/83;
Mar-20-2009; Mar 20, 2009; March 20, 2009; Mar. 20, 2009; Mar 20 2009;
20 Mar 2009; 20 March 2009; 20 Mar. 2009; 20 March, 2009
Mar 20th, 2009; Mar 21st, 2009; Mar 22nd, 2009
Feb 2009; Sep 2009; Oct 2010
6/2008; 12/2009
2009; 2010
My job is to extract these using regex. Here is the pattern I came up with.
my_pattern = r"((?:(\d{0,2}\d)|(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[a-z]*?)?[, -./]{0,2}(?:(\d{1,2})[dhnst]{0,2}|(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[a-z]*?)[, -./]{1,2}(\d{2,4}))|(\d{4})"
sample_series.str.extract(my_pattern, expand=False)
regex_problem_image
So far, I see it work for every date except for the format "Jan 27, 1983", it matches the month name and the date. But the year isn't matched. I am relatively new to regex and I think my pattern design is quite bad too. I need help figuring out what's wrong with my regex expression and how I could debug or improve it. Thanks.
Here is the sample data to make the problem reproducible.
sample_list = ['.Got back to U.S. Jan 27, 1983.\n',
'.On 21 Oct 1983 patient was discharged from Scroder Hospital after EIGHT DAY ADMISSION\n',
'4-13-89 Communication with referring physician?: Not Done\n',
'7intake for follow up treatment at Anson General Hospital on 10 Feb 1983 # 12 AM\n',
'. Pt diagnosed in Apr 1976 after he presented with 2 month history of headaches and gait instability. MRI demonstrated 4 cm L cereballar mass in the paravermian region. He was admitted to PRM and underwent resection complicated by post-op delirium. Post-op sequelas include left palatal myoclonus and ataxia on the left upper and lower extremities which has progressively improved. Pt has not had any evidence of tumor recurrence.\n',
'1-14-81 Communication with referring physician?: Done\n',
'. Went to Emerson, in Newfane Alaska. Started in 2002 at CNM. Generally likes job, does not have time to do what she needs to do. Feels she is working more than should be.\n',
'09/14/2000 CPT Code: 90792: With medical services\n',
'. Sep 2015- Transferred to Memorial Hospital from above. Discharged to MH Partial Hospital on Zoloft, Trazadone and Neurontin but unclear if she followed up.\n',
'Born and raised in Fowlerville, IN. Parents divorced when she was young, states that it was a "bad" divorce. Received her college degree from Allegheny College in 2003. Past verbal, emotional, physical, sexual abuse: No\n']
sample_series = pd.Series(sample_list)
From your data :
>>> import pandas as pd
>>> sample_list = ['.Got back to U.S. Jan 27, 1983.\n',
'.On 21 Oct 1983 patient was discharged from Scroder Hospital after EIGHT DAY ADMISSION\n',
'4-13-89 Communication with referring physician?: Not Done\n',
'7intake for follow up treatment at Anson General Hospital on 10 Feb 1983 # 12 AM\n',
'. Pt diagnosed in Apr 1976 after he presented with 2 month history of headaches and gait instability. MRI demonstrated 4 cm L cereballar mass in the paravermian region. He was admitted to PRM and underwent resection complicated by post-op delirium. Post-op sequelas include left palatal myoclonus and ataxia on the left upper and lower extremities which has progressively improved. Pt has not had any evidence of tumor recurrence.\n',
'1-14-81 Communication with referring physician?: Done\n',
'. Went to Emerson, in Newfane Alaska. Started in 2002 at CNM. Generally likes job, does not have time to do what she needs to do. Feels she is working more than should be.\n',
'09/14/2000 CPT Code: 90792: With medical services\n',
'. Sep 2015- Transferred to Memorial Hospital from above. Discharged to MH Partial Hospital on Zoloft, Trazadone and Neurontin but unclear if she followed up.\n',
'Born and raised in Fowlerville, IN. Parents divorced when she was young, states that it was a "bad" divorce. Received her college degree from Allegheny College in 2003. Past verbal, emotional, physical, sexual abuse: No\n']
>>> sample_series = pd.Series(sample_list)
>>> df = sample_series.to_frame()
>>> df
0
0 .Got back to U.S. Jan 27, 1983.\n
1 .On 21 Oct 1983 patient was discharged from Sc...
2 4-13-89 Communication with referring physician...
3 7intake for follow up treatment at Anson Gener...
4 . Pt diagnosed in Apr 1976 after he presented...
5 1-14-81 Communication with referring physician...
6 . Went to Emerson, in Newfane Alaska. Started ...
7 09/14/2000 CPT Code: 90792: With medical servi...
8 . Sep 2015- Transferred to Memorial Hospital f...
9 Born and raised in Fowlerville, IN. Parents d...
We can use a tool called datefinder to find the date in each row :
>>> import datefinder
>>> def find_date(df):
... return [match for match in datefinder.find_dates(df[0])]
>>> df["Vals"] = df.apply(find_date, axis=1)
>>> df
0 Vals
0 .Got back to U.S. Jan 27, 1983.\n [1983-01-27 00:00:00]
1 .On 21 Oct 1983 patient was discharged from Sc... [1983-10-21 00:00:00]
2 4-13-89 Communication with referring physician... [1989-04-13 00:00:00]
3 7intake for follow up treatment at Anson Gener... []
4 . Pt diagnosed in Apr 1976 after he presented... [1976-04-30 00:00:00, 2021-09-02 00:00:00, 202...
5 1-14-81 Communication with referring physician... [1981-01-14 00:00:00]
6 . Went to Emerson, in Newfane Alaska. Started ... [2002-09-30 00:00:00]
7 09/14/2000 CPT Code: 90792: With medical servi... [2000-09-14 00:00:00]
8 . Sep 2015- Transferred to Memorial Hospital f... [2015-09-30 00:00:00]
9 Born and raised in Fowlerville, IN. Parents d... [2003-09-30 00:00:00]

Find the range of year in pandas especially with hyphen formats?

Given the data below, I want to print the list of team who debut their match between 1934 to 1948. Since the Debut column is object, I am not able to get the column data in integer form.
Team Debut
0 Real Madrid 1929
1 Barcelona 1929
2 Atletico Madrid 1929
3 Valencia 1931-32
4 Athletic Bilbao 1929
5 Sevilla 1934-35
6 Espanyol 1929
7 Real Sociedad 1929
8 Zaragoza 1939-40
9 Real Betis 1932-33
10 Deportivo La Coruna 1941-42
11 Celta Vigo 1939-40
12 Valladolid 1948-49
Can somebody please help to give an idea how to achieve it?
Thanks in advance
You can use str.extract to extract first part of the date and check if its in the required range
mask = df['Debut'].str.extract('(\d+)')[0].astype(int).between(1934,1948)
df[mask]
Team Debut
5 5 Sevilla 1934-35
8 8 Zaragoza 1939-40
10 10 Deportivo La Coruna 1941-42
11 11 Celta Vigo 1939-40
12 12 Valladolid 1948-49
If only the first year of the range counts, you could use between after converting to a numeric value:
year = pd.to_numeric(df.Debut.str.split('-').str[0])
teams = df.Team[year.between(1934, 1948)]
print(teams)
Output
5 Sevilla
8 Zaragoza
10 Deportivo La Coruna
11 Celta Vigo
12 Valladolid
Name: Team, dtype: object

If index duplicated then add column value to sum

The pandas DF has datetime index with price and volume at that price.
Last Volume
Date_Time
20160907 070000 1.1249 17
20160907 070001 1.1248 12
20160907 070001 1.1249 15
20160907 070002 1.1248 13
20160907 070002 1.1249 20
I want to create a column that keeps a running total(sum) of volume through the sequence if the price repeats. I am trying to create a column that would look like this.
Last Volume VolumeCount
1.1249 17 17
1.1248 12 12
1.1249 15 32
1.1248 13 25
1.1249 20 52
I have been working on different functions and loops and I can't seem to create a column that that isn't a total sum of the group. I would really appreciate any help or suggestions. Thank you.
Try:
DF['VolumeCount'] = DF.groupby('Last')['Volume'].cumsum()
I hope this helps.
You want to accumulated volume on contiguous sets of same Last
consider the df
Last Volume
Date_Time
20160907-70000 1.1249 17
20160907-70001 1.1248 12
20160907-70001 1.1248 15
20160907-70002 1.1248 13
20160907-70002 1.1249 20
Then
df.Volume.groupby((df.Last != df.Last.shift()).cumsum()).cumsum()
Date_Time
20160907-70000 17
20160907-70001 12
20160907-70001 27
20160907-70002 40
20160907-70002 20
Name: Volume, dtype: int64

Attempting to determine astronomy values in a parameter file

Recently I inherited quite a few old QuickBasic programs which perform various astronomical calculations. I'm attempting to understand these programs and rewrite some of them in Python. I do not have a deep background in astronomy.
A number of the programs take a parameter file as input, YEAR.DAT. Below are 5 years of these files (each column represents one file). I need help in figuring out the various data values.
YEAR.DAT
year 2001 2008 2009 2010 2011
delta t 66 65 66 66 67
tilt 23.43909 23.43818 23.43805 23.43799 23.43786
dow 1 2 4 5 6
gst 6.71430 6.66860 6.71839 6.702509 6.68659
x1 105.690 330.340 310.959 291.631 272.303
bs 84 90 88 87 86
fs 301 300 298 304 303
x2 357.765 356.959 357.689 357.433 357.177
x3 354.289 193.159 335.720 105.105 234.489
jd 2451910.5 2454466.5 2454832.5 2455197.5 2455562.5
I believe that all the values which are time dependent are for 0:00 hours on Jan. 1 of the year given.
Here are the values I think I've figured out:
tilt is the obilquity of the ecliptic
dow is the day of the week, where Monday is day 1
bs is the number of the day of the year when British Summer Time (BST) begins
fs is the number of the day of the year when BST ends
jd is the Julian day number (of 0:00 hours Jan. 1)
Values I'm unsure about:
delta t is some sort of time delta, but I don't know what
gst seems to be Greenwich Mean Sidereal Time, but for what moment?
x1, x2, and x3 I'm clueless about
Here are my questions:
What might delta t be?
Is gst in fact Greenwich Mean Sidereal Time? For what moment?
What are x1, x2, and x3? (This is a low-priority question.)
How can delta t, gst, and, perhaps other values, be determined for
2018, 2019, ...?
Any help will be greatly appreciated.
Roger House

Discounting losses in SAS

I'm writing my master thesis on the costs of occupational injuries. As a part of the thesis I have estimated the expected wage loss for each person for every year for four years after the injure. I would like to discount the estimated losses to a specific base year (2009) in SAS.
For the year 2009 the discounted loss is just equal the estimated loss. For 2010 and on the discounted loss can be calculated with the netpv function:
IF year=2009 then discount_loss=wage;
IF year=2010 then discount_loss=netpv(0.1,1,0,wage);
IF year=2011 then discount_loss=netpv(0.1,1,0,0,wage);
And so forth. But starting from 2014 I would like to use the estimated wage loss for 2014 as the expected loss onward - so for instance if the estimated loss is 100$ that would represent the yearly loss until retirement. Since each person don't have the same age there would be too many ways just to hard code, so I'm looking for a better way. There are approximately 200.000 persons in my data set with different estimated losses for each year.
The format of the (fictional) data looks like this:
id age year age_retirement wage_loss rate discount_loss
1 35 2009 65 -100 0.1 -100
1 36 2010 65 -100 0.1 -90,91
1 37 2011 65 -100 0.1 -82,64
1 38 2012 65 -100 0.1 -75,13
1 39 2013 65 -100 0.1 -68,30
1 40 2014 65 -100 0.1
The column discount_loss is the net present value of the loss i 2009. Calculated as above.
I would like the loss in 2014 to represent the sum of losses for the rest of the period (until age_retirement) on the labor market. That would be -100$ discounted for 2009 starting from 2014 until 2014+(65-40).
Thanks!
Use the FINANCE function for PV, Present Value.
In your situation above, you're looking for the value of 100 for 25 years of payments (65-40)=25. I'll leave the calculation of the number of years up to you.
FINANCE('PV', rate, nper, payment, <fv>, <type>);
In your case, Future Value is 0 and the type=1 as you assume payment at the beginning of the year.
The formula below calculates the present value of a series of 100 payments over 25 years with a 10% interest rate and paid at the beginning of the period.
value=FINANCE('PV', 0.1, 25, -100, 0, 1);
Value = 998.47440201
Reference is here:
https://support.sas.com/documentation/cdl/en/lefunctionsref/67960/HTML/default/viewer.htm#p1cnn1jwdmhce0n1obxmu4iq26ge.htm
If you are looking for speed why not first calculate an array that contains the PV of $1 for for i years where i goes from 1 to n. Then just select the element you need and multiply. This could all be done in a data step.