I have a dictionary created with key as time in %H:%M:%S and value for it .
dict = {
'06:00:01': '0x95', '06:10:01': '0x97',
'06:20:01': '0x98', '06:30:01': '0x99',
'06:40:01': '0x101', '06:50:01': '0x102',
'07:00:01': '0x104', '07:10:01': '0x105',
'07:20:01': '0x106', '07:30:01': '0x107',
'07:40:01': '0x109', '07:50:01': '0x110',
'08:00:01': '0x111', '08:10:01': '0x112',
'08:20:01': '0x113', '08:30:01': '0x114',
'08:40:01': '0x115', '08:50:01': '0x116',
'09:00:01': '0x117', '09:10:01': '0x118',
'09:20:01': '0x119', '09:30:01': '0x119',
'09:40:01': '0x120', '09:50:01': '0x121',
'10:00:01': '0x122', '10:10:01': '0x122',
'10:20:01': '0x123', '10:30:01': '0x124',
'10:40:01': '0x124', '10:50:01': '0x125',
'11:00:01': '0x125', '11:10:01': '0x126',
'11:20:01': '0x126', '11:30:01': '0x126',
'11:40:01': '0x127', '11:50:01': '0x127',
'12:00:01': '0x127', '12:10:01': '0x128',
'12:20:01': '0x128'
}
I am trying to think of logic which will return the dictionary value based on current system time. If current system time is in range of two key values of dictionary it should return value of lower key in python
This solution assumes that you are not using the am/pm format:
from datetime import datetime
cur_time = datetime.now().time()
print cur_time
keys = sorted(date_dict.keys())
times = [datetime.strptime(i, "%H:%M:%S") for i in keys]
out = False
for idx, t in enumerate(times):
if cur_time >= t.time():
t1 = times[min(idx+1, len(times)-1)].time()
if cur_time <= t1:
out = date_dict[str(t.time())]
print out
I switched the default name for a dictionary object dict to date_dict - that's exactly the dictionary from your question. Using the default names overwrites them, so it shouldn't be done.
cur_time is the current time, printed for convenience. keys is a list of sorted dictionary keys that are then turned into a list times of datetime objects. This can be done in one line but seams more readable this way.
The for loop uses enumerate to access both the datetime objects and their indices idx. If the current time is larger or equal to the datetime objects time, the code checks if it is also smaller or equal to the next object in the list. If that is the case, the current time fits in that range and the lower value of the lower key (t.time()) is assigned to out. If it's not in the dictionary range at all, the value will be remain the default 'False'.
The part times[min(idx+1, len(times)-1)] prevents the index to go out of range for values that are larger than '12:20:01'.
You can easily test this program by using timedelta and generating various times, here different by hours:
from datetime import timedelta
cur_time = (datetime.now() + timedelta(hours=8)).time()
Related
I am new to python and my coding experience so far is with MATLAB.
I am trying to understand more about lists and dictionaries as i am using a library about DOEs that takes an dictionary as a passing argument.
But my trouble so far is that this dictionary assumes the form of ex.
DOE={'Elastic Modulus':[10,20,30], 'Density':[1,2,3], 'Thickness':[2,3,5]}
But i need this dictionary to be user defined, for example:
Have an input to define how many variables are needed (in this example are 3: Elastic Modulus','Density'and 'Thickness)
as the variables are defined, it should be able to store values in the dictionary over a for loop.
Is this possible using dictionaries?
Or is it better to use a list and convert in a dicionary later?
Thank you in advance
One can add keys and the corresponding values to a dict one at a time like so:
my_dict = {}
num_entries = int(input("How many entries "))
for _ in range(num_entries):
key = input("Enter the key: ")
value = input("Enter the value: ")
my_dict[key] = value
Presumably you would have a loop to do the entry of key and value for the number of values you wish to enter. Also if you are in python 2 it needs to be raw_input rather than input function. [Edit: Showing how to do the loop, since I noticed that was part of your question]
I need to speed up (dramatically) the search in a "huge" single dimension list of unsigned values. The list has 389.114 elements, and I need to perform a check before I add an item to make sure it doesn't already exist
I do this check 15 millions times...
Of course, it takes too much time
The fastest way I found was :
if this_item in my_list:
i = my_list.index(this_item)
else:
my_list.append(this_item)
i = len(my_list)
...
I am building a dataset from time series logs
One column of these (huge) logs is a text message, which is very redondant
To dramatically speed up the process, I transform this text into an unsigned with Adler32(), and get a unique numeric value, which is great
Then I store the messages in a PostgreSQL database, with this value as index
For each line of my log files (15 millions all together), I need to update my database of unique messages (389.114 unique messages)
It means that for each line, I need to check if the message ID belongs to my in memory list
I tried "... in list", same with dictionaries, numpy arrays, transforming the list in a string and using string.search(), sql query in the database with good index...
Nothing better than "if item in list" when the list is loaded into memory (very fast)
if this_item in my_list:
i = my_list.index(this_item)
else:
my_list.append(this_item)
i = len(my_list)
For 15 millions iterations with some stuff and NO search in the list:
- It takes 8 minutes to generate 2 tables of 15 millions lines (features and targets)
- When I activate the code above to check if a message ID already exists, it takes 1 hour 35 mn ...
How could I optimize this?
Thank you for your help
If your code is, roughly, this:
my_list = []
for this_item in collection:
if this_item in my_list:
i = my_list.index(this_item)
else:
my_list.append(this_item)
i = len(my_list)
...
Then it will run in O(n^2) time since the in operator for lists is O(n).
You can achieve linear time if you use a dictionary (which is implemented with a hash table) instead:
my_list = []
table = {}
for this_item in collection:
i = table.get(this_item)
if i is None:
i = len(my_list)
my_list.append(this_item)
table[this_item] = i
...
Of course, if you don't care about processing the items in the original order, you can just do:
for i, this_item in enumerate(set(collection)):
...
I have a query sequence that I blasted online using NCBIWWW.qblast. In my xml blast file result I obtained for a query sequence a list of hit (i.e: gi|). Each hit or gi| have multiple hsp. I made a dictionary my_dict1 where I placed gi| as key and I appended the bit score as value. So multiple values for each key.
my_dict1 = {
gi|1002819492|: [437.702, 384.47, 380.86, 380.86, 362.83],
gi|675820360| : [2617.97, 2614.37, 122.112],
gi|953764029| : [414.258, 318.66, 122.112, 86.158],
gi|675820410| : [450.653, 388.08, 386.27] }
Then I looked for max value in each key using:
for key, value in my_dict1.items():
max_value = max(value)
And made a second dictionary my_dict2:
my_dict2 = {
gi|1002819492|: 437.702,
gi|675820360| : 2617.97,
gi|953764029| : 414.258,
gi|675820410| : 450.653 }
I want to compare both dictionary. So I can extract the hsp with the highest score bits. I am also including other parameters like query coverage and identity percentage (Not shown here). The finality is to get the best gi| with the highest bit scores, coverage and identity percentage.
I tried many things to compare both dictionary like this :
First code :
matches[]
if my_dict1.keys() not in my_dict2.keys():
matches[hit_id] = bit_score
else:
matches = matches[hit_id], bit_score
Second code:
if hit_id not in matches.keys():
matches[hit_id]= bit_score
else:
matches = matches[hit_id], bit_score
Third code:
intersection = set(set(my_dict1.items()) & set(my_dict2.items()))
Howerver I always end up with 2 types of errors:
1 ) TypeError: list indices must be integers, not unicode
2 ) ... float not iterable...
Please I need some help and guidance. Thank you very much in advance for your time. Best regards.
It's not clear what you're trying to do. What is hit_id? What is bit_score? It looks like your second dict is always going to have the same keys as your first if you're creating it by pulling the max value for each key of the first dict.
You say you're trying to compare them, but don't really state what you're actually trying to do. Find those with values under a certain max? Find those with the highest max?
Your first code doesn't work because I'm assuming you're trying to use a dict key value as an index to matches, which you define as a list. That's probably where your first error is coming from, though you haven't given the lines where the error is actually occurring.
See in-code comments below:
# First off, this needs to be a dict.
matches{}
# This will never happen if you've created these dicts as you stated.
if my_dict1.keys() not in my_dict2.keys():
matches[hit_id] = bit_score # Not clear what bit_score is?
else:
# Also not sure what you're trying to do here. This will assign a tuple
# to matches with whatever the value of matches[hit_id] is and bit_score.
matches = matches[hit_id], bit_score
Regardless, we really need more information and the full code to figure out your actual goal and what's going wrong.
i have a csv file containing years of data, and i need to calculate the difference between the max date and the min date, i am facing a real problem in how can i determine the max value of dates.
So, i am doing this to convert my dates into datetime object
Temps = datetime.strptime(W['datum'][i]+' '+W['timestamp'][i],'%Y-%m-%d %H:%M:%S')
Printing this line, gives me the exact result i want, but when i try to extract the max values of these dates using this line of code :
start = max(Temps)
I got this error : datetime.strptime' object is not iterable
where am i mistaken ?
The expression
datetime.strptime(W['datum'][i]+' '+W['timestamp'][i],'%Y-%m-%d %H:%M:%S')
produces a single value (a scalar). When you assign it to Temps this variable become a scalar not a list. It contains only one value.
Then when you try to evaluate max(Temps) max is expecting to find something with multiple values as its argument but, unfortunately, it finds what Temps was assigned most recently.
This was a single value, which is not 'iterable'.
I want to find the difference (in days) between two columns in a dataframe (more specifically in the graphlab SFrame datastructure).
I have tried to write a couple of functions to do this but I cannot seem to create a function that is fast enough. Speed is my issue right now as I have ~80 million rows to process.
I have tried two different functions but both are too slow:
The t2_colname_str and t1_colname_str arguments are the column-names of which I want to use, and both columns contain datetime.datetime objects.
For Loop
def diff_days(sframe_obj,t2_colname_str,t1_colname_str):
import graphlab as gl
import datetime as datetime
# creating the new column name to be used later
new_colname = str(t2_colname_str[:-9] + "_DiffDays_" + t1_colname_str[:-9])
diff_days_list = []
for i in range(len(sframe_obj[t2_colname_str])):
t2 = sframe_obj[t2_colname_str][i]
t1 = sframe_obj[t1_colname_str][i]
try:
diff = t2 - t1
diff_days = diff.days
diff_days_list.append(diff_days)
except TypeError:
diff_days_list.append(None)
sframe_obj[new_colname] = gl.SArray(diff_days_list)
List Comprehension
I know this is not the intended purpose of list comprehensions, but I just tried it to see if it was faster.
def diff_days(sframe_obj,t2_colname_str,t1_colname_str):
import graphlab as gl
import datetime as datetime
# creating the new column name to be used later
new_colname = str(t2_colname_str[:-9] + "_DiffDays_" + t1_colname_str[:-9])
diff_days_list = [(sframe_obj[t2_colname_str][i]-sframe_obj[t1_colname_str][i]).days if sframe_obj[t2_colname_str][i] and sframe_obj[t1_colname_str][i] != None else None for i in range(len(sframe_obj[t2_colname_str]))]
sframe_obj[new_colname] = gl.SArray(diff_days_list)
Additional Notes
I have been using GraphLab-Create by Dato and their SFrame data-structure mainly because it parallelizes all the computation which makes my analysis super-fast and it has a great library for machine learning applications. It's a great product if you haven't checked it out already.
GraphLab User Guide can be found here: https://dato.com/learn/userguide/index.html
I'm glad you found a workable way for you, however SArrays allow vector operations, so you don't need to loop through every element of the column. SArrays will iterate, but they're REALLY slow at that.
Unfortunately, SArrays don't support vector operations on datetime types because they don't support a "timedelta" type. You can do this though:
diff = sframe_obj[t2_colname].astype(int) - sframe_obj[t1_colname].astype(int)
That will convert the columns to a UNIX timestamp and then do a vectorized difference operation, which should be plenty fast...at least faster than a conversion to NumPy.