I'm getting a dataset with UTC data, and coordinates lat,long
I want to compute the solarposition for each of the row of this dateset, but I'm having trouble with manipulating the timezone.
So far,
I've managed to make the UTC data, timezone aware by:
# library for timezone computations
from timezonefinder import TimezoneFinder
from pytz import timezone
import pytz
# scientific python add-ons
import numpy as np
import pandas as pd
tf = TimezoneFinder()
litteralTimeZone = tf.timezone_at(lng=longitude, lat=latitude)
print(litteralTimeZone)
tz = pytz.timezone(litteralTimeZone)
# Adjust date Time, currently in CSV like: 20070101:0000
Data['time(LOC)'] = pd.DatetimeIndex(
pd.to_datetime(Data['time(UTC)'], format='%Y%m%d:%H%M')
).tz_localize(tz, ambiguous=True, nonexistent='shift_forward')
Data = Data.set_index('time(LOC)')
now, when I pass the data to the get solar position function with
pvlib.solarposition.get_solarposition(
data.index, metadata['latitude'],metadata['longitude'])
The get_solarposition are computed on the UTC portion of the data, ignoring the localized part of it.
Any thoughts?
Thanks for using pvlib!
I believe your issue is that you have UTC timestamps, but you are mixing them with the local timezone. UTC is a timezone. Therefore, you should first localize the naive timestamps with 'UTC'.
# make time-zone aware timestamps from string format in UTC
>>> Data['time(TZ-UTC)'] = pd.DatetimeIndex(
... pd.to_datetime(Data['time(UTC)'], format='%Y%m%d:%H%M')).tz_localize('UTC')
Then you can use these directly in pvlib.solarposition.get_solarposition.
# mimic OP data
>>> Data = pd.DataFrame(
... {'time(UTC)': ['20200420:2030', '20200420:2130', '20200420:2230']})
>>> Data
# time(UTC)
# 0 20200420:2030
# 1 20200420:2130
# 2 20200420:2230
# apply the UTC timezone to the naive timestamps after parsing the string format
>>> Data['time(TZ-UTC)'] = pd.DatetimeIndex(
... pd.to_datetime(Data['time(UTC)'], format='%Y%m%d:%H%M')).tz_localize('UTC')
>>> Data
# time(UTC) time(TZ-UTC)
# 0 20200420:2030 2020-04-20 20:30:00+00:00
# 1 20200420:2130 2020-04-20 21:30:00+00:00
# 2 20200420:2230 2020-04-20 22:30:00+00:00
# now call pvlib.solarposition.get_solarposition with the TZ-aware timestamps
>>> lat, lon = 39.74,-105.24
>>> solarposition.get_solarposition(Data['time(TZ-UTC)'], latitude=lat, longitude=lon)
# apparent_zenith zenith apparent_elevation elevation azimuth equation_of_time
# time(TZ-UTC)
# 2020-04-20 20:30:00+00:00 34.242212 34.253671 55.757788 55.746329 221.860950 1.249402
# 2020-04-20 21:30:00+00:00 43.246151 43.261978 46.753849 46.738022 240.532481 1.257766
# 2020-04-20 22:30:00+00:00 53.872320 53.895328 36.127680 36.104672 254.103959 1.266117
You don't need to convert them to the local timezone. If desired, use pd.DatetimeIndex.tz_convert to convert them from UTC to the local (eg: Golden, CO) timezone. Note: it may be more convenient to use a fixed offset like Etc/GMT+7 because daylight savings time may cause Pandas to raise an ambiguous time error.
>>> Data['time(LOC)'] = pd.DatetimeIndex(Data['time(TZ-UTC)']).tz_convert('Etc/GMT+7')
>>> Data = Data.set_index('time(LOC)')
>>> Data
# time(UTC) time(TZ-UTC)
# time(LOC)
# 2020-04-20 13:30:00-07:00 20200420:2030 2020-04-20 20:30:00+00:00
# 2020-04-20 14:30:00-07:00 20200420:2130 2020-04-20 21:30:00+00:00
# 2020-04-20 15:30:00-07:00 20200420:2230 2020-04-20 22:30:00+00:00
The solar position results should be exactly the same with either local (eg: Golden, CO) time or UTC time:
>>> solarposition.get_solarposition(Data.index, latitude=lat, longitude=lon)
# apparent_zenith zenith apparent_elevation elevation azimuth equation_of_time
# time(LOC)
# 2020-04-20 13:30:00-07:00 34.242212 34.253671 55.757788 55.746329 221.860950 1.249402
# 2020-04-20 14:30:00-07:00 43.246151 43.261978 46.753849 46.738022 240.532481 1.257766
# 2020-04-20 15:30:00-07:00 53.872320 53.895328 36.127680 36.104672 254.103959 1.266117
Does this help? Happy to answer more questions! Cheers!
Related
I am working on a PV system installed in Amsterdam. The PVsystem code is as follows. I am getting good results with the inverter and the modules specified in the code which is obtained with retrieve_sam.
import pvlib
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from pvlib.temperature import TEMPERATURE_MODEL_PARAMETERS
from pandas.plotting import register_matplotlib_converters
from pvlib.modelchain import ModelChain
# Define location for the Netherlands
location = pvlib.location.Location(latitude=52.53, longitude=5.15, tz='UTC', altitude=50, name='amsterdam')
#import the database
module_database = pvlib.pvsystem.retrieve_sam(name='SandiaMod')
inverter_database = pvlib.pvsystem.retrieve_sam(name='cecinverter')
module = module_database.Canadian_Solar_CS5P_220M___2009_
# module = module_database.DMEGC_Solar_320_M6_120BB_ (I want to add this module)
inverter = inverter_database.ABB__PVI_3_0_OUTD_S_US__208V_
temperature_model_parameters = pvlib.temperature.TEMPERATURE_MODEL_PARAMETERS['sapm']['open_rack_glass_glass']
modules_per_string = 10
inverter_per_string = 1
# Define a PV system characteristics
surface_tilt = 12.5
surface_azimuth = 180
system = pvlib.pvsystem.PVSystem(surface_tilt=surface_tilt, surface_azimuth=surface_azimuth, albedo=0.25,
module=module, module_parameters=module,
temperature_model_parameters=temperature_model_parameters,
modules_per_string=modules_per_string, inverter_per_string=inverter_per_string,
inverter=inverter, inverter_parameters=inverter, racking_model='open_rack')
# Define a weather file
def importPSMData():
df = pd.read_csv('/Users/laxmikantradkar/Desktop/PVLIB/solcast_data1.csv', delimiter=';')
# Rename the columns for input to PVLIB
df.rename(columns={'Dhi': 'dhi', 'Dni': 'dni', 'Ghi': 'ghi', 'AirTemp': 'temp_air', 'WindSpeed10m': 'wind_speed',
}, inplace=True)
df.rename(columns={'Year': 'year', 'Month': 'month', 'Day': 'day', 'Hour': 'hour',
'Minute': 'minute'}, inplace=True)
df['dt'] = pd.to_datetime(df[['year', 'month', 'day', 'hour', 'minute']])
df.set_index(df['dt'], inplace=True)
# Rename data parameters to run to datetime
# df.rename(columns={'PeriodEnd': 'period_end'}, inplace=True)
# Drop unnecessary columns
df = df.drop('PeriodStart', 1)
df = df.drop('Period', 1)
df = df.drop('Azimuth', 1)
df = df.drop('CloudOpacity', 1)
df = df.drop('DewpointTemp', 1)
df = df.drop('Ebh', 1)
df = df.drop('PrecipitableWater', 1)
df = df.drop('SnowDepth', 1)
df = df.drop('SurfacePressure', 1)
df = df.drop('WindDirection10m', 1)
df = df.drop('Zenith', 1)
return df
mc = ModelChain(system=system, location=location)
weatherData = importPSMData()
mc.run_model(weather=weatherData)
ac_energy = mc.ac
# ac_energy.to_csv('/Users/laxmikantradkar/Desktop/ac_energy_netherlands.csv')
plt.plot(ac_energy)
plt.show()
Now I want to change the module and inverter which is not present in the library. Could anyone please tell me how to do this?
Is it possible to access the library and manually add the row/column of inverter and module? If yes, where is the library located?
Is it ../Desktop/PVLIB/venv/lib/python3.8/site-packages/pvlib/data/sam-library-sandia-modules-2015-6-30.csv
When I change try to change the module/inverter parameters from above path, I receive an error as DataFrame' object has no attribute 'Module name'
I started working on PVLIB_python 2 days ago, so I am new to the language. I really appreciate your help. Feel free to correct me at any point.
I started working on PVLIB_python 2 days ago, so I am new to the
language. I really appreciate your help. Feel free to correct me at
any point.
Welcome to the community! If you haven't already I encourage you to dig through the pvlib-python documentation and continue to learn Python basics through playing with the examples in the documentation. I encourage you to checkout the pandas tutorials and any other highly rated pandas learning material you can find to get yourself running with data science in Python.
When I change try to change the module/inverter parameters from above
path, I receive an error as DataFrame' object has no attribute 'Module
name'
This is because you're asking for a column in the DataFrame table that's not there. No worries, you can make your own module.
Now I want to change the module and inverter which is not present in
the library. Could anyone please tell me how to do this? Is it possible to access the library and manually add the row/column
of inverter and module? If yes, where is the library located?
It isn't necessary to change the library. You can construct a module yourself since it is a Series from the pandas library. Here's an example showing how you can output the module as a dictionary, change a couple parameters and create your own module.
my_new_module = module.copy() # create your own copy of the module
print("Before:", my_new_module, sep="\n") # show module before
my_new_module["Notes"] = "This is how to change a field in the module. Do this for every field in the module."
my_new_module.name = "DMEGC_Solar_320_M6_120BB_" # rename the Series appropriately
print("\nAfter:", my_new_module, sep="\n") # show module after
Then you can just insert "my_new_module" into PVSystem:
system = pvlib.pvsystem.PVSystem(
surface_tilt=surface_tilt,
surface_azimuth=surface_azimuth,
albedo=0.25,
module=my_new_module, # HERE'S THE NEW MODULE!
module_parameters=module,
temperature_model_parameters=temperature_model_parameters,
modules_per_string=modules_per_string,
inverter_per_string=inverter_per_string,
inverter=inverter,
inverter_parameters=inverter,
racking_model='open_rack')
The hard part here is having the right coefficients that you can trust. You may have an easier time using module_database = pvlib.pvsystem.retrieve_sam(name='CECMod') and replacing those parameters since they can be substituted more easily with data from the module spec sheet.
This should work identically for inverters as well.
I'm trying to create time series from a netCDF file (accessed via Thredds server) with python. The code I use seems correct, but the values of the variable amb reading are 'masked'. I'm new into python and I'm not familiar with the formats. Any idea of how can I read the data?
This is the code I use:
import netCDF4
import pandas as pd
import datetime as dt
import matplotlib.pyplot as plt
from datetime import datetime, timedelta #
dayFile = datetime.now() - timedelta(days=1)
dayFile = dayFile.strftime("%Y%m%d")
url='http://nomads.ncep.noaa.gov:9090/dods/nam/nam%s/nam1hr_00z' %(dayFile)
# NetCDF4-Python can open OPeNDAP dataset just like a local NetCDF file
nc = netCDF4.Dataset(url)
varsInFile = nc.variables.keys()
lat = nc.variables['lat'][:]
lon = nc.variables['lon'][:]
time_var = nc.variables['time']
dtime = netCDF4.num2date(time_var[:],time_var.units)
first = netCDF4.num2date(time_var[0],time_var.units)
last = netCDF4.num2date(time_var[-1],time_var.units)
print first.strftime('%Y-%b-%d %H:%M')
print last.strftime('%Y-%b-%d %H:%M')
# determine what longitude convention is being used
print lon.min(),lon.max()
# Specify desired station time series location
# note we add 360 because of the lon convention in this dataset
#lati = 36.605; loni = -121.85899 + 360. # west of Pacific Grove, CA
lati = 41.4; loni = -100.8 +360.0 # Georges Bank
# Function to find index to nearest point
def near(array,value):
idx=(abs(array-value)).argmin()
return idx
# Find nearest point to desired location (no interpolation)
ix = near(lon, loni)
iy = near(lat, lati)
print ix,iy
# Extract desired times.
# 1. Select -+some days around the current time:
start = netCDF4.num2date(time_var[0],time_var.units)
stop = netCDF4.num2date(time_var[-1],time_var.units)
time_var = nc.variables['time']
datetime = netCDF4.num2date(time_var[:],time_var.units)
istart = netCDF4.date2index(start,time_var,select='nearest')
istop = netCDF4.date2index(stop,time_var,select='nearest')
print istart,istop
# Get all time records of variable [vname] at indices [iy,ix]
vname = 'dswrfsfc'
var = nc.variables[vname]
hs = var[istart:istop,iy,ix]
tim = dtime[istart:istop]
# Create Pandas time series object
ts = pd.Series(hs,index=tim,name=vname)
The var data are not read as I expected, apparently because data is masked:
>>> hs
masked_array(data = [-- -- -- ..., -- -- --],
mask = [ True True True ..., True True True],
fill_value = 9.999e+20)
The var name, and the time series are correct, as well of the rest of the script. The only thing that doesn't work is the var data retrieved. This is the time serie I get:
>>> ts
2016-10-25 00:00:00.000000 NaN
2016-10-25 01:00:00.000000 NaN
2016-10-25 02:00:00.000006 NaN
2016-10-25 03:00:00.000000 NaN
2016-10-25 04:00:00.000000 NaN
... ... ... ... ...
2016-10-26 10:00:00.000000 NaN
2016-10-26 11:00:00.000006 NaN
Name: dswrfsfc, dtype: float32
Any help will be appreciated!
Hmm, this code looks familiar. ;-)
You are getting NaNs because the NAM model you are trying to access now uses longitude in the range [-180, 180] instead of the range [0, 360]. So if you request loni = -100.8 instead of loni = -100.8 +360.0, I believe your code will return non-NaN values.
It's worth noting, however, that the task of extracting time series from multidimensional gridded data is now much easier with xarray, because you can simply select a dataset closest to a lon,lat point and then plot any variable. The data only gets loaded when you need it, not when you extract the dataset object. So basically you now only need:
import xarray as xr
ds = xr.open_dataset(url) # NetCDF or OPeNDAP URL
lati = 41.4; loni = -100.8 # Georges Bank
# Extract a dataset closest to specified point
dsloc = ds.sel(lon=loni, lat=lati, method='nearest')
# select a variable to plot
dsloc['dswrfsfc'].plot()
Full notebook here: http://nbviewer.jupyter.org/gist/rsignell-usgs/d55b37c6253f27c53ef0731b610b81b4
I checked your approach with xarray. Works great to extract Solar radiation data! I can add that the first point is not defined (NaN) because the model starts calculating there, so there is no accumulated radiation data (to calculate hourly global radiation). So that is why it is masked.
Something everyone overlooked is that the output is not correct. It does look ok (at noon= sunshine, at nmidnight=0, dark), but the daylength is not correct! I checked it for 52 latitude north and 5.6 longitude (east) (November) and daylength is at least 2 hours too much! (The NOAA Panoply viewer for Netcdf databases gives similar results)
Basically, I want to not plot extremes in my graph. I thought doing this based on the slope of the graph would be a good idea, but for some reason I keep getting the error that the dates on my x-axis do not exist (DataFrame has no attribute Datumtijd). (Edit: Removed file location as question has been answered)
from pylab import *
import matplotlib.pyplot as plt
import matplotlib.dates as pld
%matplotlib inline
import pandas as pd
from pandas import DataFrame
pbn135 = pd.read_csv('3873_135.csv', parse_dates=[0], index_col = 0, dayfirst = True, delimiter = ';', usecols = ['Datumtijd','DisplayWaarde'])
pbn135.plot()
for i in range(len(pbn135)):
slope = (pbn135.DisplayWaarde[i+1]-pbn135.DisplayWaarde[i])/(pbn135.Datumtijd[i+1]-pbn135.Datumtijd[i])
Python can't operate with DateTime. Converting the DateTime to an integer works. Usually done by calculating the total seconds from a reference date (e.g. 1 jan 2015).
This is done by importing datetime from datetime. Then by setting a reference date datetime(2015,1,1) the seconds are calculted with total_seconds().
However this does create a slope where the interval is in seconds and not the interval of your datetime. If anyone knows how to fix that without manually entering a division please let us know
from datetime import datetime
for i in range(len(pbn135)):
slope = (pbn135.pbn73[i+1]-pbn135.pbn73[i])/((pbn135.index[i+1]-datetime(2015,1,1)).total_seconds()-(pbn135.index[i]-datetime(2015,1,1)).total_seconds())
print slope
In my Django project, I have a form (forms.py) which implements pytz to get current timezone like this:
tz = timezone.get_current_timezone()
and I have passed this value to a form field as an initial value like this:
timezone = forms.CharField(label='Time Zone', initial=tznow)
which gives the field a default value of current Timezone, in my case, it happens to be Asia/Calcutta.
Now i want to find the UTC Offset value for the given Timezone, which in this case Asia/Calcutta is +5:30
I tried tzinfo() method as well, but i couldn't find the expected result. Can somebody guide me through this?
The UTC offset is given as a timedelta by the utcoffset method of any implementation of tzinfo such as pytz. For example:
import pytz
import datetime
tz = pytz.timezone('Asia/Calcutta')
dt = datetime.datetime.utcnow()
offset_seconds = tz.utcoffset(dt).total_seconds()
offset_hours = offset_seconds / 3600.0
print "{:+d}:{:02d}".format(int(offset_hours), int((offset_hours % 1) * 60))
# +5:30
A single timezone such as Asia/Calcutta may have different utc offsets at different dates. You can enumerate the utc offsets known so far using pytz's _tzinfos in this case:
>>> offsets = {off for off, dst, abbr in pytz.timezone('Asia/Calcutta')._tzinfos}
>>> for utc_offset in offsets:
... print(utc_offset)
...
5:30:00
6:30:00
5:53:00
To get the current utc offset for a given timezone:
#!/usr/bin/env python
from datetime import datetime
import pytz # $ pip install pytz
utc_offset = datetime.now(pytz.timezone('Asia/Calcutta')).utcoffset()
print(utc_offset)
# -> 5:30:00
In case you just want the normalized hour offset:
def curr_calcutta_offset():
tz_calcutta = pytz.timezone('Asia/Calcutta')
offset = tz_calcutta.utcoffset(datetime.utcnow())
offset_seconds = (offset.days * 86400) + offset.seconds
offset_hours = offset_seconds / 3600
return offset_hours
curr_calcutta_offset()
# 5.5
I am importing data from a JSON file and it has the date in the following format 1/7/11 9:15
What would be the best variable type/format to define in order to accept this date as it is? If not what would be the most efficient way to accomplish this task?
Thanks.
"What would be the best variable type/format to define in order to accept this date as it is?"
The DateTimeField.
"If not what would be the most efficient way to accomplish this task?"
You should use the datetime.strptime method from Python's builtin datetime library:
>>> from datetime import datetime
>>> import json
>>> json_datetime = "1/7/11 9:15" # still encoded as JSON
>>> py_datetime = json.loads(json_datetime) # now decoded to a Python string
>>> datetime.strptime(py_datetime, "%m/%d/%y %I:%M") # coerced into a datetime object
datetime.datetime(2011, 1, 7, 9, 15)
# Now you can save this object to a DateTimeField in a Django model.
If you take a look at https://docs.djangoproject.com/en/dev/ref/models/fields/#datetimefield, it says that django uses the python datetime library which is docomented at http://docs.python.org/2/library/datetime.html.
Here is a working example (with many debug prints and step-by-step instructions:
from datetime import datetime
json_datetime = "1/7/11 9:15"
json_date, json_time = json_datetime.split(" ")
print json_date
print json_time
day, month, year = map(int, json_date.split("/")) #maps each string in stringlist resulting from split to an int
year = 2000 + year #be ceareful here! 2 digits for a year may cause trouble!!! (could be 1911 as well)
hours, minutes = map(int, json_time.split(":"))
print day
print month
print year
my_datetime = datetime(year, month, day, hours, minutes)
print my_datetime
#Generate a json date:
new_json_style = "{0}/{1}/{2} {3}:{4}".format(my_datetime.day, my_datetime.month, my_datetime.year, my_datetime.hour, my_datetime.minute)
print new_json_style