I need to run a query like this -
historic_data.objects.raw("select * from company_historic_data")
This return a RawQuerySet. I have to convert values from this to a dataframe. the usual .values() method does not work with raw query. Can someone suggest a solution.
Try the codes below
import pandas as pd
res = model.objects.raw('select * from some_table;')
df = pd.DataFrame([item.__dict__ for item in res])
Note that there is a _state column in the returned dataframe
Related
How can I perform this SQL query
SELECT *
FROM table_name
WHERE column_name != value
in the QuerySet of Django?
I tried this but isn't correct way:
To execute sql queries just use raw method of Model like this:
posts = Post.objects.raw("SELECT * FROM table_name WHERE column_name != %s;", [value])
for post in posts:
# do stuff with post object
But i don't think you need raw query (unless you want to get rid of ORM overhead to fetch records quicker) you can just use ORM like this:
posts = Post.objects.all().exclude(column_name=value)
You can do it using Q:
from django.db.models import Q
posts = Post.objects.filter(~Q(column_name=value))
I think you want to do this query using Django ORM (If I am not wrong). You can do this using Q Expression in Django (https://docs.djangoproject.com/en/4.0/topics/db/queries/#s-complex-lookups-with-q-objects). For != you can use ~ sign. In your case, the query will look like this
Post.objects.filter(~Q(<column_name>=<value>))
Another way to use exclude method (https://docs.djangoproject.com/en/4.0/ref/models/querysets/#s-exclude)
Post.objects.exclude(<column_name>=<value>)
Both queries generate the same raw query in your case:
SELECT * FROM <table_name> WHERE NOT (<column_name>=<value)
If you want to do the raw query then you can use raw method (https://docs.djangoproject.com/en/4.0/topics/db/sql/)
posts = Post.objects.raw("SELECT * FROM table_name WHERE column_name != %s;", [value])
If you want to execute a custom raw query directly then use cursor from django.db connection (https://docs.djangoproject.com/en/4.0/topics/db/sql#s-executing-custom-sql-directly)
I am very new to Django, but facing quite a daunting task already.
I need to create multiple forms like this on the webpage where user would provide input (only floating numbers allowed) and then convert these inputs to pandas DataFrame to do data analysis. I would highly appreciate if you could advise how should I go about doing this?
Form needed:
This is a very broad question and I am assuming you are familiar with pandas and python. There might be a more efficient way but this is how I would do it. It should not be that difficult have the user submit the form then import pandas in your view. Create an initial data frame Then you can get the form data using something like this
if form.is_valid():
field1 = form.cleaned_data['field1']
field2 = form.cleaned_data['field2']
field3 = form.cleaned_data['field3']
field4 = form.cleaned_data['field4']
you can then create a new data frame like so:
df2 = pd.DataFrame([[field1, field2], [field3, field4]], columns=list('AB'))
then append the second data frame to the first like so:
df.append(df2)
Keep iterating over the data in this fashion until you have added all the data. After its all been appended you can do you analysis and whatever else you like. You note you can append more data the 2 by 2 thats just for example.
Pandas append docs:
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.append.html
Django forms docs:
https://docs.djangoproject.com/en/2.0/topics/forms/
The docs are you friend
This is my query
SELECT * FROM `music` where lower(music.name) = "hello"
How can I send this query with django
I tried this but it didn't add lower in the query
>>> Music.objects.filter(name__iexact="hello")
(0.144) SELECT `music`.`id`, `music`.`name`, `music`.`artist`, `music`.`image`, `music`.`duration`, `music`.`release_date`, `music`.`is_persian` FROM `music` WHERE `music`.`name` LIKE 'hello' LIMIT 21; args=('hello',)
<QuerySet []>
You can use Lower database function as below.
>>> from django.db.models.functions import Lower
>>> lower_name_music = Music.objects.annotate(lower_name=Lower('name'))
>>> lower_name_music.filter(lower_name__iexact="hello")
First statement is to import the database function.
Second statement is to add calculated column named lower_name using
Lower function on name column. At this time database is not yet been
queried.
Third statement is to filter using the calculated column. As this
statement prints out result, a query is actually executed against
database.
I am retrieving the data from Neo4j using Bolt Driver in Python Language. The returned result should be stored as RDD(or atleast into CSV). I am able to see the returned results but unable to store it as an RDD or a Data frame or atleast into a csv.
Here is how I am seeing the result:
session = driver.session()
result = session.run('MATCH (n) RETURN n.hobby,id(n)')
session.close()
Here, how can I store this data into RDD or CSV file.
I deleted the old post and reposted the same question. But I haven't received any pointers. So, I am posting my way of approach so that it may help others.
'''
Storing the return result into RDD
'''
session = driver.session()
result = session.run('MATCH (n:Hobby) RETURN n.hobby AS hobby,id(n) As id LIMIT 10')
session.close()
'''
Pulling the keys
'''
keys = result.peek().keys()
'''
Reading all the property values and storing it in a list
'''
values=list()
for record in result:
rec= list()
for key in keys:
rec.append(record[key])
values.append(rec)
'''
Converting list of values into a pandas dataframe
'''
df = DataFrame(values, columns=keys)
print df
'''
Converting the pandas DataFrame to Spark DataFrame
'''
sqlCtx = SQLContext(sc)
spark_df = sqlCtx.createDataFrame(df)
print spark_df.show()
'''
Converting the Pandas DataFrame to SparkRdd (via Spark Dataframes)
'''
rdd = spark_df.rdd.map(tuple)
print rdd.take(10)
Any suggestions to improve the efficiency is highly appreciated.
Instead of going from python to spark, why not use the Neo4j Spark connector? I think this would save python from being a bottle neck if you were moving a lot of data. You can put your cypher query inside of the spark session and save it as an RDD.
There has been talk on the Neo4J slack group about a pyspark implementation, which will hopefully be available later this fall. I know the ability to query neo4j from pyspark and sparkr would be very useful.
I have written a udf in pyspark like below:
df1 = df.where(point_inside_polygon(latitide,longitude,polygonArr))
df1 and df are spark dataframes
The function is given below:
def point_inside_polygon(x,y,poly):
latt = float(x)
long = float(y)
if ((math.isnan(latt)) or (math.isnan(long))):
point = sh.geometry.Point(latt, long)
polygonArr = poly
polygon=MultiPoint(polygonArr).convex_hull
if polygon.contains(point):
return True
else:
return False
else:
return False
But when I tried checking the data type of latitude and longitude, its a class of column.
The data type is Column
Is there a way to iterate through each tuple and use their values, instead of taking the data type column.
I don't want to use a for loop because I have a huge recordset and it defeats the purpose of using SPARK.
Is there a way to accomplish to pass the column values as float, or converting them inside the function?
Wrap it using udf:
from pyspark.sql.types import BooleanType
from pyspark.sql.functions import udf
point_inside_polygon_ = udf(point_inside_polygon, BooleanType())
df1 = df.where(point_inside_polygon(latitide,longitude,polygonArr))