Value Error while importing data into postgres table using psycopg2 - django

I have a tuple as below
data = ({'weather station name': 'Substation', 'wind': '6 km/h', 'barometer': '1010.3hPa', 'humidity': '42%', 'temperature': '34.5 C', 'place_id': '001D0A00B36E', 'date': '2016-05-10 09:48:58'})
I am trying to push the values from the above tuple to the postgres table using the code below:
try:
con = psycopg2.connect("dbname='WeatherForecast' user='postgres' host='localhost' password='****'")
cur = con.cursor()
cur.executemany("""INSERT INTO weather_data(temperature,humidity,wind,barometer,updated_on,place_id) VALUES (%(temperature)f, %(humidity)f, %(wind)f, %(barometer)f, %(date)s, %(place_id)d)""", final_weather_data)
ver = cur.fetchone()
print(ver)
except psycopg2.DatabaseError as e:
print('Error {}'.format(e))
sys.exit(1)
finally:
if con:
con.close()
Where datatype of each field in the DB is as below:
id serial NOT NULL,
temperature double precision NOT NULL,
humidity double precision NOT NULL,
wind double precision NOT NULL,
barometer double precision NOT NULL,
updated_on timestamp with time zone NOT NULL,
place_id integer NOT NULL,
When i run the code to push the data into postgres table using psycopg2, it is raising an error "ValueError: unsupported format character 'f'"
I hope the issue is in formatting. Am using Python3.4

Have a look at the documentation:
The variables placeholder must always be a %s, even if a different placeholder (such as a %d for integers or %f for floats) may look more appropriate:
>>> cur.execute("INSERT INTO numbers VALUES (%d)", (42,)) # WRONG
>>> cur.execute("INSERT INTO numbers VALUES (%s)", (42,)) # correct
While, your SQL query contains all type of placeholders:
"""INSERT INTO weather_data(temperature,humidity,wind,barometer,updated_on,place_id)
VALUES (%(temperature)f, %(humidity)f, %(wind)f, %(barometer)f, %(date)s, %(place_id)d)"""

Related

Serialize pandas dataframe consists NaN fields before sending as a response

I have a dataframe that has NaN fields in it. I want to send this dataframe as a response. Because it has Nan fields I get this error,
ValueError: Out of range float values are not JSON compliant
I don't want to drop the fields or fill them with a character or etc. and the default response structure is ideal for my application.
Here is my views.py
...
forecast = forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']]
forecast['actual_value'] = df['y'] # <- Nan fields are added here
forecast.rename(
columns={
'ds': 'date',
'yhat': 'predictions',
'yhat_lower': 'lower_bound',
'yhat_upper': 'higher_bound'
}, inplace=True
)
context = {
'detail': forecast
}
return Response(context)
Dataframe,
date predictions lower_bound higher_bound actual_value
0 2022-07-23 06:31:41.362011 3.832143 -3.256209 10.358063 1.0
1 2022-07-23 06:31:50.437211 4.169004 -2.903518 10.566005 7.0
2 2022-07-28 14:20:05.000000 12.085815 5.267806 18.270929 20.0
...
16 2022-08-09 15:07:23.000000 105.655997 99.017424 112.419991 NaN
17 2022-08-10 15:07:23.000000 115.347283 108.526287 122.152684 NaN
Hoping to find a way to send dataframe as a response.
You could try to use the fillna and replace methods to get rid of those NaN values.
Adding something like this should work as None values are JSON compliant:
forecast = forecast.fillna(np.nan).replace([np.nan], [None])
Using replace alone can be enough, but using fillna prevent errors if you also have NaT values for example.

How to load Python Dataframe to cloudera Impala?

getting error while loading the dataframe data to Impala table
DB = conn.cursor()
for row in fourth_set:
SQL = ('''Insert into Boots_retailer(sale_date, product, Assessment, weekno, store_Number, volume, turnover, turnover_missing, Inv_Cubic, XGB, KNN)
values(?,?,?,?,?,?,?,?,?,?,?)''' )
Values = row['Sale_date'], row['product'], row['Assessment'], row['weekno'],row['store_number'],
row['volume'],row['turnover'],row['turnover_missing'],row['Inv_Cubic'],row['XGB'],row['KNN']
har = DB.execute(SQL, Values)
connection.commit()
Error is on line Values = row['Sale_date'], ...:
TypeError: string indices must be integers, not str
You're getting TypeError because you're iterating over the labels of the rows, not the rows themselves. Replace your second line with
for _, row in fourth_set.iterrows():

Reading Time Series from netCDF with python

I'm trying to create time series from a netCDF file (accessed via Thredds server) with python. The code I use seems correct, but the values of the variable amb reading are 'masked'. I'm new into python and I'm not familiar with the formats. Any idea of how can I read the data?
This is the code I use:
import netCDF4
import pandas as pd
import datetime as dt
import matplotlib.pyplot as plt
from datetime import datetime, timedelta #
dayFile = datetime.now() - timedelta(days=1)
dayFile = dayFile.strftime("%Y%m%d")
url='http://nomads.ncep.noaa.gov:9090/dods/nam/nam%s/nam1hr_00z' %(dayFile)
# NetCDF4-Python can open OPeNDAP dataset just like a local NetCDF file
nc = netCDF4.Dataset(url)
varsInFile = nc.variables.keys()
lat = nc.variables['lat'][:]
lon = nc.variables['lon'][:]
time_var = nc.variables['time']
dtime = netCDF4.num2date(time_var[:],time_var.units)
first = netCDF4.num2date(time_var[0],time_var.units)
last = netCDF4.num2date(time_var[-1],time_var.units)
print first.strftime('%Y-%b-%d %H:%M')
print last.strftime('%Y-%b-%d %H:%M')
# determine what longitude convention is being used
print lon.min(),lon.max()
# Specify desired station time series location
# note we add 360 because of the lon convention in this dataset
#lati = 36.605; loni = -121.85899 + 360. # west of Pacific Grove, CA
lati = 41.4; loni = -100.8 +360.0 # Georges Bank
# Function to find index to nearest point
def near(array,value):
idx=(abs(array-value)).argmin()
return idx
# Find nearest point to desired location (no interpolation)
ix = near(lon, loni)
iy = near(lat, lati)
print ix,iy
# Extract desired times.
# 1. Select -+some days around the current time:
start = netCDF4.num2date(time_var[0],time_var.units)
stop = netCDF4.num2date(time_var[-1],time_var.units)
time_var = nc.variables['time']
datetime = netCDF4.num2date(time_var[:],time_var.units)
istart = netCDF4.date2index(start,time_var,select='nearest')
istop = netCDF4.date2index(stop,time_var,select='nearest')
print istart,istop
# Get all time records of variable [vname] at indices [iy,ix]
vname = 'dswrfsfc'
var = nc.variables[vname]
hs = var[istart:istop,iy,ix]
tim = dtime[istart:istop]
# Create Pandas time series object
ts = pd.Series(hs,index=tim,name=vname)
The var data are not read as I expected, apparently because data is masked:
>>> hs
masked_array(data = [-- -- -- ..., -- -- --],
mask = [ True True True ..., True True True],
fill_value = 9.999e+20)
The var name, and the time series are correct, as well of the rest of the script. The only thing that doesn't work is the var data retrieved. This is the time serie I get:
>>> ts
2016-10-25 00:00:00.000000 NaN
2016-10-25 01:00:00.000000 NaN
2016-10-25 02:00:00.000006 NaN
2016-10-25 03:00:00.000000 NaN
2016-10-25 04:00:00.000000 NaN
... ... ... ... ...
2016-10-26 10:00:00.000000 NaN
2016-10-26 11:00:00.000006 NaN
Name: dswrfsfc, dtype: float32
Any help will be appreciated!
Hmm, this code looks familiar. ;-)
You are getting NaNs because the NAM model you are trying to access now uses longitude in the range [-180, 180] instead of the range [0, 360]. So if you request loni = -100.8 instead of loni = -100.8 +360.0, I believe your code will return non-NaN values.
It's worth noting, however, that the task of extracting time series from multidimensional gridded data is now much easier with xarray, because you can simply select a dataset closest to a lon,lat point and then plot any variable. The data only gets loaded when you need it, not when you extract the dataset object. So basically you now only need:
import xarray as xr
ds = xr.open_dataset(url) # NetCDF or OPeNDAP URL
lati = 41.4; loni = -100.8 # Georges Bank
# Extract a dataset closest to specified point
dsloc = ds.sel(lon=loni, lat=lati, method='nearest')
# select a variable to plot
dsloc['dswrfsfc'].plot()
Full notebook here: http://nbviewer.jupyter.org/gist/rsignell-usgs/d55b37c6253f27c53ef0731b610b81b4
I checked your approach with xarray. Works great to extract Solar radiation data! I can add that the first point is not defined (NaN) because the model starts calculating there, so there is no accumulated radiation data (to calculate hourly global radiation). So that is why it is masked.
Something everyone overlooked is that the output is not correct. It does look ok (at noon= sunshine, at nmidnight=0, dark), but the daylength is not correct! I checked it for 52 latitude north and 5.6 longitude (east) (November) and daylength is at least 2 hours too much! (The NOAA Panoply viewer for Netcdf databases gives similar results)

Unable to insert single row in oracle using cx_Oracle in python

code snip :
statement = 'insert into my_Table ("DATE_TS","first_name","Last_name","pet_name","Salary") values (:2, :3, :4,:5,:6)'
cur=conn.cursor()
try:
cur.execute(statement, ('2016-25-07','te','ee','cd',21))
conn.commit()
except Exception as e:
print e
finally:
print "Closing Connection"
conn.close()
i tried to insert this doing multiple combinations with the date(double quotes,single quotes,without quotes etc) but every time an error pops up ....kindly guide me ...have searched for almost 6 hours and getting nowhere with this
Try:
statement = 'insert into my_Table ("DATE_TS","first_name","Last_name","pet_name","Salary") values (to_date(:2, \'YYYY-DD-MM\'), :3, :4,:5,:6)'

Updating entire column with data from tuple in PostgreSQL(psycopg2)

First of all, there is my script:
import psycopg2
import sys
data = ((160000,),
(40000,),
(75000,),
)
def main():
try:
connection = psycopg2.connect("""host='localhost' dbname='postgres'
user='postgres'""")
cursor = connection.cursor()
query = "UPDATE Planes SET Price=%s"
cursor.executemany(query, data)
connection.commit()
except psycopg2.Error, e:
if connection:
connection.rollback()
print 'Error:{0}'.format(e)
finally:
if connection:
connection.close()
if __name__ == '__main__':
main()
This code works of course, but not in the way I want. It updates entire column 'Price' which is good, but it updates it only by use of the last value of 'data'(75000).
(1, 'Airbus', 75000, 'Public')
(2, 'Helicopter', 75000, 'Private')
(3, 'Falcon', 75000, 'Military')
My desire output would look like:
(1, 'Airbus', 160000, 'Public')
(2, 'Helicopter', 40000, 'Private')
(3, 'Falcon', 75000, 'Military')
Now, how can I fix it?
Without setting up your database on my machine to debug, I can't be sure, but it appears that the query is the issue. When you execute
UPDATE Planes SET Price=%s
I would think it is updating the entire column with the value being iterated on from your data tuple. Instead, you might need the tuple to a dictionary
({'name':'Airbus', 'price':160000}, {'name':'Helicopter', 'price':40000}...)
and change the query to
"""UPDATE Planes SET Price=%(price)s WHERE Name=%(name)s""".
See the very bottom of this article for a similar formulation. To check that this is indeed the issue, you could just execute the query once (cursor.execute(query)) and I bet you will get the full price column filled with the first value in your data tuple.