Extracting the hour from a datetime64[ns] variable - python-2.7

I have a column that I've converted to datetime using pandas so it's now in the format datetime64.
LOCAL_TIME_ON
2014-06-21 15:32:09
2014-06-07 20:17:13
I want to extract the hour to a new column. The only thing I've found that works is below, however, I get a SettingWithCopyWarning. Does anyone have a cleaner way I can do this?
TRIP_INFO_TIME['TIME_HOUR'] = pd.DatetimeIndex(TRIP_INFO_TIME['LOCAL_TIME_ON']).hour.astype(int)
C:\Program Files (x86)\JetBrains\PyCharm Community Edition 2016.1.4\helpers\pydev\pydevconsole.py:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
'''

try this:
TRIP_INFO_TIME['TIME_HOUR'] = TRIP_INFO_TIME['LOCAL_TIME_ON'].dt.hour

Related

How do I conditionally remove text from a string in a column in a Scala dataframe?

I'm currently exploring Azure Databricks for a POC (Scala and Databricks are both completely new to me. I'm using this (Cars - Corgis) sample dataset to show off the manipulation characteristics of Databricks.
My problem is that I have a dataframe column called 'model' that contains data like '2009 Audi A3' and '2005 Mercedes E550'. What I would like to be able to do is alter that column so instead of the aforementioned, it reads as 'Audi A3' or 'Mercedes E550'. I have a separate model year column so trying to reduce the size of the columns where possible.
From what I have seen, replaceAllIn doesn't seem to work with strings with Scala.
This is my code so far:
//Use the dataframe from the previous cell and trim the model year from the model column so for example it reads as 'Audi A3' instead of '2009 Audi A3'
import scala.util.matching.Regex
val modelPrefixPatternMatch = "[0-9 ]".r
val newModel = modelPrefixPatternMatch.replaceAllIn((specificColumnsDf.select("model")),"")
However, when I run this code, I get the following error message:
command-1778339999318469:5: error: overloaded method value replaceAllIn with alternatives:
(target: CharSequence,replacer: scala.util.matching.Regex.Match => String)String <and>
(target: CharSequence,replacement: String)String
cannot be applied to (org.apache.spark.sql.DataFrame, String)
val newModel = modelPrefixPatternMatch.replaceAllIn((specificColumnsDf.select("model")),"")
I have also tried completing the SparkSQL but didn't have any luck there either.
Thanks!
In Spark you would normally add additional columns using withColumn and then select only the columns you want. In this simple example, I use regexp_replace function to trim out the years, something like this:
%scala
import org.apache.spark.sql.functions._
import org.apache.spark.sql.Column
df
.withColumn("cleanColumn", regexp_replace($"`Identification.Model Year`", "20[0-2][0-9] ","") )
.select($"`Identification.Model Year`", $"cleanColumn").distinct
.show(false)
My results:
We could probably make the regular expression tighter, eg tie it to the start of the column or open it up for years 1980, 1990 etc - this is just an example.
If the year is always at the start then you could just use substring and start at position 5. The regex approach at least protects from the year not being there for some records.
HTH

How to convert list into DataFrame in Python (Binance Futures API)

Using Binance Futures API I am trying to get a proper form of my position regarding cryptocurrencies.
Using the code
from binance_f import RequestClient
request_client = RequestClient(api_key= my_key, secret_key=my_secet_key)
result = request_client.get_position()
I get the following result
[{"symbol":"BTCUSDT","positionAmt":"0.000","entryPrice":"0.00000","markPrice":"5455.13008723","unRealizedProfit":"0.00000000","liquidationPrice":"0","leverage":"20","maxNotionalValue":"5000000","marginType":"cross","isolatedMargin":"0.00000000","isAutoAddMargin":"false"}]
The type command indicates it is a list, however adding at the end of the code print(result) yields:
[<binance_f.model.position.Position object at 0x1135cb670>]
Which is baffling because it seems not to be the list (in fact, debugging it indicates object of type Position). Using PrintMix.print_data(result) yields:
data number 0 :
entryPrice:0.0
isAutoAddMargin:True
isolatedMargin:0.0
json_parse:<function Position.json_parse at 0x1165af820>
leverage:20.0
liquidationPrice:0.0
marginType:cross
markPrice:5442.28502271
maxNotionalValue:5000000.0
positionAmt:0.0
symbol:BTCUSDT
unrealizedProfit:0.0
Now it seems like a JSON format... But it is a list. I am confused - any ideas how I can convert result to a proper DataFrame? So that columns are Symbol, PositionAmt, entryPrice, etc.
Thanks!
Your main question remains as you wrote on the header you should not be confused. In your case you have a list of Position object, you can see the structure of Position in the GitHub of this library
Anyway to answer the question please use the following:
df = pd.DataFrame([t.__dict__ for t in result])
For more options and information please read the great answers on this question
Good Luck!
you can use that
df = pd.DataFrame([t.__dict__ for t in result])
klines=df.values.tolist()
open = [float(entry[1]) for entry in klines]
high = [float(entry[2]) for entry in klines]
low = [float(entry[3]) for entry in klines]
close = [float(entry[4]) for entry in klines]

Convert SList to Dataframe

I am reading data from a binary .out file using a python module "SWMMToolbox." The command to read the infilration time series for RG1 from the file.out is as follows:
x = !swmmtoolbox extract 'file.out' subcatchment,RG1,Infiltration_loss
See link for details about swmmtoolbox.
The data type of 'x' is a 'IPython.utils.text.SList'
The data looks like this:
I would like to import this Slist into pandas, but am having trouble. I want to get the datetime string as one column and the value after the comma as another. However, when I use
df = pd.DataFrame(data=x)
I get the following:
I also tried to use
df = pd.DataFrame.from_records(x)
but get this:
I tried to use pd.read_csv, but I couldn't get it to work since 'x' is a variable and not a file.
Any suggestions are much appreciated.

How to Alter returned data format called from csv using python

I am looking for guidance regarding a return result FORMAT from a csv file. The code I have to date partially ahcieves my objective but despite significant effort researching through this and many other sites/forums I cannot resolve the final step. I have also posed this question on gis.stackexchange but was redirected to this forum with the comment "Questions relating to general Information Technology, with no clear GIS component, are off-topic here, but can be researched/asked at Stack Overflow".
My successful piece of python code that reads selected data from a csv and returns it in dict format is below ; (Yes I know the reason it returns as type dict is due to the format my code is calling!!! and that is the crux of the problem)
import arcpy, csv
Att_Dict ={}
with open ("C:/Data/Code/Python/Library/Peter/123.csv") as f:
reader = csv.DictReader(f)
for row in reader:
if row['Status']=='Keep':
Att_Dict.update({row['book_id']:row['book_ref']})
print Att_Dict
Att_Dict = {'7643': '7625', '9644': '2289', '4406': '4443', '7588': '9681', '2252': '7947'}
For the next part of my code to run I need the result above but in the format of ; (this is part of a very lengthy code but the only show stopper is the returned format so little value in posting the other 200 or so lines)
Att_Dict = [[7643, 7625], [9644, 2289], [4406, 4443], [7588, 9681], [2252, 7947]]
Although I have experimented endlessly and can achieve this by reverting to csv.Reader rather than csv.DictReader, I then lose the ability to 'weed out' rows where column 'Status' has value 'Keep' in them and that is a requirement for the task at hand.
My sledgehammer approach to date has been to use 'search and replace' within Idle to amend the returned set to the meet the other requirement but Im sure it can be done programatically rather than manually. Similar but not exact to https://docs.python.org/2/library/index.html, plus my startout question at Returning values from multiple CSV columns to Python dictionary? and Using Python's csv.dictreader to search for specific key to then print its value plus a multitude of csv based questions at geonet.esri.
(Using Win 7, ArcGIS 10.2, Python 2.7.5)
Try this
Att_Dict = {'7643': '7625', '9644': '2289', '4406': '4443', '7588': '9681', '2252': '7947'}
Att_List = []
for key, value in Att_Dict.items():
Att_List.append([int(key), int(value)])
print Att_List
Out: [[7643, 7625], [9644, 2289], [4406, 4443], [7588, 9681], [2252, 7947]]

django query to calculate time

The following are the timestamps that is present in the timestamp column of the table userdata.My question is that how to write a query such that i get the output as i need the month name i.2,2010-03 and the total time used in the month
userdata.months.filter().order_by('-timestamp')
'2011-03-07 16:03:01'
'2011-03-07 16:07:04'
'2011-03-06 11:03:01'
'2011-03-08 16:03:01'
'2011-03-04 09:03:01'
'2011-05-16 16:03:01'
'2011-05-18 16:03:01'
'2011-07-16 12:03:01'
'2011-07-17 12:03:01'
'2011-07-17 15:03:01'
something like
my_month = 2 (or whatever you need)
userdata.months.filter(timestamp__month=my_month).aggregate(sum('timestamp__time')
i'm not sure about that timestamp__time, you may search a bit about it, but i think it's correct.
otherwise, you can use a raw query (userdata.months.raw('query here'))