I want to parse NetCDF file using NETCDF4 and Python
My code is :
>>> from netCDF4 import Dataset
>>> dataset = Dataset('data.nc')
>>> print dataset.variables
OrderedDict()
Why OrderedDict() is returned ?
Actually the Netcdf format is new for me here is a part of it :
group: PRODUCT {
dimensions:
scanline = 289 ;
ground_pixel = 215 ;
corner = 4 ;
time = 1 ;
layer = 50 ;
variables:
int scanline(scanline) ;
scanline:units = "1" ;
So I want to access the variables and tried everything in my mind but all failed..
One of my trails is :
>print dataset.variables.keys()
[]
But it returned []
So any idea how to access these variables ?
Thanks in advance,
Hala
I found the answer in http://unidata.github.io/netcdf4-python/#netCDF4.Dataset.renameGroup
The answer is :
print dataset["PRODUCT"].variables['ground_pixel'][0]
Have a nice day
Related
How can i filter 12 random objects from a model in django .
I tried to do this but It does not work and It just returned me 1 object.
max = product.objects.aggregate(id = Max('id'))
max_p = int(max['id'])
l = []
for s in range(1 , 13):
l.append(random.randint(1 , max_p))
for i in l:
great_proposal = product.objects.filter(id=i)
products = product.objects.all().order_by('-id')[:50]
great_proposal1 = random.sample(list(products) , 12)
Hi . It worked with this code !
Try this:
product.objects.order_by('?')[:12]
The '?' will "sort" randomly and "[:12]" will get only 12 objects.
I'm pretty sure the code is correct, but maybe you did not realize that you're just using great_proposal as variable to save the output, which is not an array, and therefore only returns one output.
Try:
result_array = []
for i in l:
result_array.append(product.objects.filter(index=i))
I am new to python and i am still learning how it all works. Its been just a week since i started.
I am trying to code a program which does this:
Reads 4 columns from a file (ref input file above)
Get date, day and count from the file
And construct an dictionary to represent date day and count.
basically i wanted to represent the data in something like below and I am stucked in the syntax.
{
"xyz" :
{"Sunday" : {
"20180101" : 72326,
"20180108" : 71120
}
"Monday" : {
"20171225" : 51954,
"20180102" : 51954
}
}
}
INPUT FILE:
DateDay value count floatex
20171225Monday | 270613| 51954|11.41|
20171226Tuesday | 133579| 46126|12.01|
20171227Wednesday| 630613| 71954|11.41|
20171228Thursday | 253779| 96126|12.01|
20171229Friday | 688613| 71054|11.41|
20171230Saturday | 633779| 66126|12.01|
20180101Sunday | 633779| 72326|12.01|
20180102Monday | 630613| 91954|11.41|
20180103Tuesday | 538779| 73326|12.01|
20180104Wednesday| 630613| 61954|11.41|
20180105Thursday | 393379| 75146|12.01|
20180106Friday | 130613| 51954|11.41|
20180107Saturday | 2643329| 70126|12.01|
20180108Sunday | 863979| 71120|12.01|
This is what i Have but its far from what i want: Infact its throwing error now.. But that is not my question. Basically i am trying to understand how do i create the nested dictionary based on the input data
def buildInputDataDictionary(file, ind):
dateCount = {}
dateDay = {}
#dictData = {}
# dateCount[dictData] = {}
with open(file) as f:
for line in f:
items = line.split("|")
date=items[0].strip()[0:8] ##strip spaces and substring to get only the date
count= items[2].strip()
day= items[0].strip()[8:]
dateCount[date] = count
dateDay[date] = day
dictData = {}
dictData[date] = {}
dictData [ind][date] = count
return dateCount,dateDay,dictData
dc,dd, di= buildInputDataDictionary(autoInqRhf, "xyz")
print dd
print dc
print di
In your current script you reset the dictionary in each for loop. You only need to initiate it once outside the loop. When adding nested data you have to make sure, that all keys above the one you want to enter already exists. You can do this like this
#initate dictionary with your identifier as first object
dictData = {ind: dict()}
with open(file) as f:
for line in f:
# extact your data (haven't tested your code)
items = line.split("|")
day = items[0].strip()[0:8]
date = items[0].strip()[0:8]
count = items[2].strip()
# add days
if not day in dictData[ind].keys():
dictData[ind][day] = dict()
dictData[ind][day][date] = count
I need to figure out how to read in this data of the filename 'berlin52.tsp'
This is the format I'm using
NAME: berlin52
TYPE: TSP
COMMENT: 52 locations in Berlin (Groetschel)
DIMENSION : 52
EDGE_WEIGHT_TYPE : EUC_2D
NODE_COORD_SECTION
1 565.0 575.0
2 25.0 185.0
3 345.0 750.0
4 945.0 685.0
5 845.0 655.0
6 880.0 660.0
7 25.0 230.0
8 525.0 1000.0
9 580.0 1175.0
10 650.0 1130.0
And this is my current code
# Open input file
infile = open('berlin52.tsp', 'r')
# Read instance header
Name = infile.readline().strip().split()[1] # NAME
FileType = infile.readline().strip().split()[1] # TYPE
Comment = infile.readline().strip().split()[1] # COMMENT
Dimension = infile.readline().strip().split()[1] # DIMENSION
EdgeWeightType = infile.readline().strip().split()[1] # EDGE_WEIGHT_TYPE
infile.readline()
# Read node list
nodelist = []
N = int(intDimension)
for i in range(0, int(intDimension)):
x,y = infile.readline().strip().split()[1:]
nodelist.append([int(x), int(y)])
# Close input file
infile.close()
The code should read in the file, output out a list of tours with the values "1, 2, 3..." and more while the x and y values are stored to be calculated for distances. It can collect the headers, at least. The problem arises when creating a list of nodes.
This is the error I get though
ValueError: invalid literal for int() with base 10: '565.0'
What am I doing wrong here?
This is a file in TSPLIB format. To load it in python, take a look at the python package tsplib95, available through PyPi or on Github
Documentation is available on https://tsplib95.readthedocs.io/
You can convert the TSPLIB file to a networkx graph and retrieve the necessary information from there.
You are feeding the string "565.0" into nodelist.append([int(x), int(y)]).
It is telling you it doesn't like that because that string is not an integer. The .0 at the end makes it a float.
So if you change that to nodelist.append([float(x), float(y)]), as just one possible solution, then you'll see that your problem goes away.
Alternatively, you can try removing or separating the '.0' from your string input.
There are two problem with the code above.I have run the code and found the following problem in lines below:
Dimension = infile.readline().strip().split()[1]
This line should be like this
`Dimension = infile.readline().strip().split()[2]`
instead of 1 it will be 2 because for 1 Dimension = : and for 2 Dimension = 52.
Both are of string type.
Second problem is with line
N = int(intDimension)
It will be
N = int(Dimension)
And lastly in line
for i in range(0, int(intDimension)):
Just simply use
for i in range(0, N):
Now everything will be alright I think.
nodelist.append([int(x), int(y)])
int(x)
function int() cant convert x(string(565.0)) to int because of "."
add
x=x[:len(x)-2]
y=y[:len(y)-2]
to remove ".0"
I'm pulling data from Impala using impyla, and converting them to dataframe using as_pandas. And I'm using Pandas 0.18.0, Python 2.7.9
I'm trying to calculate the sum of all columns in a dataframe and trying to select the columns which are greater than the threshold.
self.data = self.data.loc[:,self.data.sum(axis=0) > 15]
But when I run this I'm getting error like below:
pandas.core.indexing.IndexingError: Unalignable boolean Series key
provided
Then I tried like below.
print 'length : ',len(self.data.sum(axis = 0)),' all columns : ',len(self.data.columns)
Then i'm getting different length i.e
length : 78 all columns : 83
And I'm getting below warning
C:\Python27\lib\decimal.py:1150: RuntimeWarning: tp_compare didn't
return -1 or -2 for exception
And To achieve my goal i tried the other way
for column in self.data.columns:
sum = self.data[column].sum()
if( sum < 15 ):
self.data = self.data.drop(column,1)
Now i have got the other errors like below:
TypeError: unsupported operand type(s) for +: 'Decimal' and 'float'
C:\Python27\lib\decimal.py:1150: RuntimeWarning: tp_compare didn't return -1 or -2 for exception
Then i tried to get the data types of each column like below.
print 'dtypes : ', self.data.dtypes
The result has all the columns are one of these int64 , object and float 64
Then i thought of changing the data type of columns which are in object like below
self.data.convert_objects(convert_numeric=True)
Still i'm getting the same errors, Please help me in solving this.
Note : In all the columns I do not have strings i.e characters and missing values or empty.I have checked this using self.data.to_csv
As i'm new to pandas and python Please don't mind if it is a silly question. I just want to learn
Please review the simple code below and you may understand the reason of the error.
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.random([3,3]))
df.iloc[0,0] = np.nan
print df
print df.sum(axis=0) > 1.5
print df.loc[:, df.sum(axis=0) > 1.5]
df.iloc[0,0] = 'string'
print df
print df.sum(axis=0) > 1.5
print df.loc[:, df.sum(axis=0) > 1.5]
0 1 2
0 NaN 0.336250 0.801349
1 0.930947 0.803907 0.139484
2 0.826946 0.229269 0.367627
0 True
1 False
2 False
dtype: bool
0
0 NaN
1 0.930947
2 0.826946
0 1 2
0 string 0.336250 0.801349
1 0.930947 0.803907 0.139484
2 0.826946 0.229269 0.367627
1 False
2 False
dtype: bool
Traceback (most recent call last):
...
pandas.core.indexing.IndexingError: Unalignable boolean Series key provided
Shortly, you need additional preprocess on your data.
df.select_dtypes(include=['object'])
If it's convertable string numbers, you can convert it by df.astype(), or you should purge them.
I have 4 lists such as:
m1_jan = [2.3,3.2]
m1_feb = [3.2,2.3]
m2_jan = [1.2,1.7]
m2_feb = [4.5,6.7]
and I want to get minimum value of each list and tried following:
mon = ['jan','feb']
for i in xrange(1,3,1):
for j in mon:
l = 'm' + str(i) + '_' + j
print l, min(l)
I get list names right but not getting correct minimum values and instead get following:
m1_jan 1
m1_feb 1
m2_jan 2
m2_feb 2
Any suggestions how to get minimum value of each list?
If we change:
print l, min(l)
to:
print globals()[l], min(globals()[l])
The output will be as requested:
[2.3, 3.2] 2.3
[3.2, 2.3] 2.3
[1.2, 1.7] 1.2
[4.5, 6.7] 4.5
Explanation:
The variables that you're looking for are stored in the dictionary of globals()
That said, it's a better practice to store these variables in your own dictionary and access them through it, instead of relying on globals()
You could just use a dictionary
d = {}
d['m1_jan'] = m1_jan
d['m1_feb'] = m1_feb
d['m2_feb'] = m2_feb
d['m2_jan'] = m2_jan
for mon, min_val in d.items():
print("{} {}".format(mon, min(min_val)))
Output
m1_feb 2.3
m2_feb 4.5
m2_jan 1.2
m1_jan 2.3