Fetching TMYs from the NSRDB for interval size in 5 minutes - pvlib

I am using pvlib https://pvlib-python.readthedocs.io/en/v0.9.0/generated/pvlib.iotools.get_psm3.html for retrieving NSRDB PSM3 time-series data from the PSM3 API. I was able to retrieve data for 30 minutes, but it doesn't work for changing the interval for 5 minutes!!! any help
my_metadata, my_data = pvlib.iotools.get_psm3(
latitude= xxx, longitude=xxx,
interval=5,
api_key=My_key, email='xxx', # <-- any email works here fine
#names='tmy'
names='xxx')

Perhaps you're using an older version of pvlib? I would try updating to the latest version with pip install -U pvlib or similar. You can check your installed version with print(pvlib.__version__).
This snippet successfully retrieves 5-minute data for me with pvlib 0.9.1:
import pvlib
lat, lon = 40, -80
key = '<redacted>'
email = '<redacted>'
df, metadata = pvlib.iotools.get_psm3(lat, lon, key, email, names='2020',
interval=5, map_variables=True)
It's worth pointing out that TMYs are always hourly; the 5-minute data is only for single (calendar) years like the above 2020 request. The get_psm3 docstring mentions this:
interval : int, {60, 5, 15, 30}
interval size in minutes, must be 5, 15, 30 or 60. Only used for
single-year requests (i.e., it is ignored for tmy/tgy/tdy requests).

Related

Django Window annotation using combined with distinct clause

I have a Django model stored in a Postgres DB comprised of values of counts at irregular intervals:
WidgetCount
- Time
- Count
I'm trying to use a window function with Lag to give me a previous row's values as an annotation. My problem is when I try to combine it with some distinct date truncation the window function uses the source rows rather than the distinctly grouped ones.
For example if I have the following rows:
time count
2020-01-20 05:00 15
2020-01-20 06:00 20
2020-01-20 09:00 30
2020-01-21 06:00 35
2020-01-21 07:00 40
2020-01-22 04:00 50
2020-01-22 06:00 54
2020-01-22 09:00 58
And I want to return a queryset showing the first reading per day, I can use:
from django.db.models.functions import Trunc
WidgetCount.objects.distinct("date").annotate(date=Trunc("time", "day"))
Which gives me:
date count
01/01/20 15
01/01/21 35
01/01/22 50
I would like to add an annotation which gives me yesterday's value (so I can show the change per day).
date count yesterday_count
01/01/20 15
01/01/21 35 15
01/01/22 50 35
If I do:
from django.db.models.functions import Trunc, Lag
from django.db.models import Window
WidgetCount.objects.distinct("date").annotate(date=Trunc("time", "day"), yesterday_count=Window(expression=Lag("count")))
The second row return gives me 30 for yesterday_count - ie, its showing me the previous row before applying the distinct clause.
If I add a partiion clause like this:
WidgetCount.objects.distinct("date").annotate(date=Trunc("time", "day"), yesterday_count=Window(expression=Lag("count"), partition_by=F("date")))
Then yesterday_count is None for all rows.
I can do this calculation in Python if I need to but it's driving me a bit mad and I'd like to find out if what I'm trying to do is possible.
Thanks!
I think the main problem is that you're mixing operations that used in annotation generates a grouped query set such as sum with a operation that simples create a new field for each record in the given query set such as yesterday_count=Window(expression=Lag("count")).
So Ordering really matters here. So when you try:
WidgetCount.objects.distinct("date").annotate(date=Trunc("time", "day"), yesterday_count=Window(expression=Lag("count")))
The result queryset is simply the WidgetCount.objects.distinct("date") annotated, no grouping is perfomed.
I would suggest decoupling your operations so it becomes easier to understand what is happening, and notice you're iterating over the python object so don't need to make any new queries!
Note in using SUM operation as example because I am getting an unexpected error with the FirstValue operator. So I'm posting with Sum to demonstrate the idea which remains the same. The idea should be the same for first value just by changing acc_count=Sum("count") to first_count=FirstValue("count")
for truncDate_groups in Row.objects.annotate(trunc_date=Trunc('time','day')).values("trunc_date")\
.annotate(acc_count=Sum("count")).values("acc_count","trunc_date")\
.order_by('trunc_date')\
.annotate(y_count=Window(Lag("acc_count")))\
.values("trunc_date","acc_count","y_count"):
print(truncDate_groups)
OUTPUT:
{'trunc_date': datetime.datetime(2020, 1, 20, 0, 0, tzinfo=<UTC>), 'acc_count': 65, 'y_count': None}
{'trunc_date': datetime.datetime(2020, 1, 21, 0, 0, tzinfo=<UTC>), 'acc_count': 75, 'y_count': 162}
{'trunc_date': datetime.datetime(2020, 1, 22, 0, 0, tzinfo=<UTC>), 'acc_count': 162, 'y_count': 65}
It turns out FirstValue operator requires to use a Windows function so you can't nest FirtValue and then calculate Lag, so in this scenario I'm not exactly sure if you can do it. The question becomes how to access the First_Value column without nesting windows.
I haven't tested it out locally but I think you want to GROUP BY instead of using DISTINCT here.
WidgetCount.objects.values(
date=Trunc('time', 'day'),
).order_by('date').annotate(
date_count=Sum('count'), # Will trigger a GROUP BY date
).annotate(
yesterday_count=Window(Lag('date_count')),
)

Sampling rate for data returned with boto3 get_metric_statistics()

The documentation is here...
http://boto3.readthedocs.io/en/latest/reference/services/cloudwatch.html#CloudWatch.Client.get_metric_statistics
Here is our call
response = cloudwatch.get_metric_statistics(
Namespace='AWS/EC2',
MetricName='CPUUtilization', #reported every 5 minutes
Dimensions=[
{
'Name': 'AutoScalingGroupName',
'Value': 'Celery-AutoScalingGroup'
},
],
StartTime=now - datetime.timedelta(minutes=12),
EndTime=now,
Period=60, #I can't figure out what exactly changing this is doing
Statistics=['Average','SampleCount','Sum','Minimum','Maximum'],
)
Here is our response
>>> response['Datapoints']
[ {u'SampleCount': 5.0, u'Timestamp': datetime.datetime(2017, 8, 25, 12, 46, tzinfo=tzutc()), u'Average': 0.05, u'Maximum': 0.17, u'Minimum': 0.0, u'Sum': 0.25, u'Unit': 'Percent'},
{u'SampleCount': 5.0, u'Timestamp': datetime.datetime(2017, 8, 25, 12, 51, tzinfo=tzutc()), u'Average': 0.034, u'Maximum': 0.08, u'Minimum': 0.0, u'Sum': 0.17, u'Unit': 'Percent'}
]
Here is my question
Look at first dictionary in the returned list. SampleCount of 5 makes sense, I guess, because our Period is 60 (seconds) and CloudWatch supplies 'CPUUtilization' metric every 5 minutes.
But if I change Period, to say 3 minutes (180), I am still getting a SampleCount of 5 (I'd expect 1 or 2).
This is a problem because I want the Average, but I think it is averaging 5 datapoints, only 2 of which are valid (the beginning and end, which correspond to the Min and Max, that is the CloudWatch metric at some time t and the next reporting of that metric at time t+5minutes).
It is averaging this with 3 intermediate 0-value datapoints so that the Average is (Minimum+Maximum+0+0+0)/5
I can just get the Minumum, Maximum add them and divide by 2 for a better reading - but I was hoping somebody could explain just exactly what that 'Period' parameter is doing.
Like I said, changing it to 360 didn't change SampleCount, but when I changed it to 600, suddenly my SampleCount was 10.0 for one datapoint (that does make sense).
Data can be published to CloudWatch in two different ways:
You can publish your observations one by one and let CloudWatch do the aggregation.
You can aggregate the data yourself and publish the statistic set (SampleCount, Sum, Minimum, Maximum).
If data is published using method 1, you would get the behaviour you were expecting. But if the data is published using method 2, you are limited by the granularity of published data.
If ec2 is aggregating the data for 5 min and then publishing statistic set, there is no point in requesting data at 3 min level. However, if you request data with the period that is multiple of the period data was published with (eg. 10 min) stats can be calculated, which CloudWatch does.

Yahoo Finance API .get_historical() not working python

So I recently downloaded the yahoo_finance API and its version 1.4.0. I got it a few days ago, and the .get_historical() was working fine. Now however, it doesn't. Heres what its doing:
import yahoo_finance as yf
apple=yf.Share('AAPL')
apple_price=apple.get_price()
print apple.get_historical('2016-02-15', '2016-04-29')
The error I get is:YQLResponseMalformedError: Response malformed. Is there a bug in the API or am I forgetting something?
The Yahoo Stock Price API doesn't work anymore, which a lot of modules are based on, unfortunately.
Alternatively, you could use Google's API
https://www.google.com/finance/getprices?q=1101&x=TPE&i=86400&p=3d&f=d,c,h,l,o,v
q=1101 is the stock quote
x=TPE is the exchange (List of Exchanges here: https://www.google.com/googlefinance/disclaimer/ )
i=86400 interval in seconds (86400 sec = 1 day)
p=3d data since how long ago
f= fields of data (d=date, c=close, h=high, l=low, o=open, v=volume)
Data would look like this:
EXCHANGE%3DTPE
MARKET_OPEN_MINUTE=540
MARKET_CLOSE_MINUTE=810
INTERVAL=86400
COLUMNS=DATE,CLOSE,HIGH,LOW,OPEN,VOLUME
DATA=
TIMEZONE_OFFSET=480
a1496295000,24.4,24.75,24.35,24.75,11782000
1,24.5,24.5,24.3,24.4,10747000
a1496295000 is the Unix timestamp of first row of data
the second row 1 is the interval offset from first row (offset 1 day)

Bokeh glyph coordinates with x_axis_type 'datetime'

I am attempting to add a simple text string (glyph) to a Bokeh plot which uses x_axis_type='datetime'
My code (stripped to its essentials ) is as follows:
p = figure(plot_width=900, plot_height=380, x_axis_type='datetime')
dt = date(2003, 3, 15)
p.line(xvals, yvals)
txt = Text(
# x=some_formatting_function(dt),
x=1057005600000,
y=0.1,
text=["happy day!"],
text_align="left",
text_baseline="middle",
text_font_size="11pt",
text_font_style="italic",
)
p.add_glyph(txt)
show(p)
The x-axis range/values (ie dates) run from 2002 to 2006 and I'd like to add the text in, say, 2003. The x value I've shown in the code above (ie 1057005600000 -- which I've worked out by trial and error) drops the glyph in the right place.
But I cant work out how to use a datetime.date directly...
Is there a bokeh function (or a property of datetime.date) that will give me the value which the bokeh plot is expecting??
Many thanks.
N.B. I've tried using x = bokeh.properties.Date(dt) but this gives me:
ValueError: expected an element of either String,
Dict(String, Either(String, Float)) or Float, got <bokeh.properties.Date object
When the x_axis_type attr is set to 'datetime', Bokeh will plot things along the x-axis according to seconds-since-epoch. The easiest solution is to use datetime.datetime (not .date) and then cast your dt object to seconds-since-epoch using the timestamp() method (which will give the ~1.50e9 number you're getting) then use that for your x-coordinate.
$ from datetime import datetime
$ dt = datetime.now()
$ dt
> datetime.datetime(2015, 6, 17, 10, 41, 34, 617709)
$ dt.timestamp()
> 1434555694.617709
See the following SO question/answer for the python2 answer to my problem:
How can I convert a datetime object to milliseconds since epoch (unix time) in Python?
Thank you #Luke Canavan for pointing me in the right direction(!)

How to create a DS to store the accumulated result of another DS with rrdtool

I want to create a rrd file with two data souces incouded. One stores the original value the data, name it 'dc'. The other stores the accumulated result of 'dc', name it 'total'. The expected formula is current(total) = previous(total) + current(dc). For example, If I update the data sequence (2, 3, 5, 4, 9) to the rrd file, I want 'dc' is (2, 3, 5, 4, 9) and 'total' is (2, 5, 15, 19, 28).
I tried to create the rrd file with the command line below. The command fails and says that the PREV are not supported with DS COMPUTE.
rrdtool create test.rrd --start 920804700 --step 300 \
DS:dc:GAUGE:600:0:U \
DS:total:COMPUTE:PREV,dc,ADDNAN \
RRA:AVERAGE:0.5:1:1200 \
RRA:MIN:0.5:12:2400 \
RRA:MAX:0.5:12:2400 \
RRA:AVERAGE:0.5:12:2400
Is there an alternative manner to define the DS 'total' (DS:total:COMPUTE:PREV,dc,ADDNAN) ?
rrdtool does not store 'original' values ... it rather samples to signal you provide via the update command at the rate you defined when you setup the database ... in your case 1/300 Hz
that said, a total does not make much sense ...
what you can do with a single DS though, is build the average value over a time range and multiply the result with the number of seconds in the time range and thus arrive at the 'total'.
Sorry a bit late but may be helpful for someone else.
Better to use RRDtool's ' mrtg-traffic-sum ' package which when I'm using an rrd with GAUGE DS & LAST a the RRA's so it's allowing me to collect monthly traffic volumes & quota limits.
eg: Here is a basic Traffic chart with no traffic quota.
root#server:~# /usr/bin/mrtg-traffic-sum --range=current --units=MB /etc/mrtg/R4.cfg
Subject: Traffic total for '/etc/mrtg/R4.cfg' (1.9) 2022/02
Start: Tue Feb 1 01:00:00 2022
End: Tue Mar 1 00:59:59 2022
Interface In+Out in MB
------------------------------------------------------------------------------
eth0 0
eth1 14026
eth2 5441
eth3 0
eth4 15374
switch0.5 12024
switch0.19 151
switch0.49 1
switch0.51 0
switch0.92 2116
root#server:~#
From this you can then write up a script that will generate a new rrd which stores these values & presto you have a traffic volume / quota graph.
Example fixed traffic volume chart using GAUGE
This thread reminded me to fix this collector that had stopped & just got around to posting ;)