what is the .from(20).fetch(10) from the below code in play framework - playframework-1.x

In this line of code,
List<User> users = User.find("byEmailLike", "alexander#%") .from(20).fetch(10);
what is the .from(20).fetch(10) ?

It's intuitive usage: fetch 10 records starting from 20
Also it's described in docs, you should take a look there
List<Post> posts = Post.all().from(50).fetch(100); // 100 max posts start at 50

Related

Yahoo Finance API .get_historical() not working python

So I recently downloaded the yahoo_finance API and its version 1.4.0. I got it a few days ago, and the .get_historical() was working fine. Now however, it doesn't. Heres what its doing:
import yahoo_finance as yf
apple=yf.Share('AAPL')
apple_price=apple.get_price()
print apple.get_historical('2016-02-15', '2016-04-29')
The error I get is:YQLResponseMalformedError: Response malformed. Is there a bug in the API or am I forgetting something?
The Yahoo Stock Price API doesn't work anymore, which a lot of modules are based on, unfortunately.
Alternatively, you could use Google's API
https://www.google.com/finance/getprices?q=1101&x=TPE&i=86400&p=3d&f=d,c,h,l,o,v
q=1101 is the stock quote
x=TPE is the exchange (List of Exchanges here: https://www.google.com/googlefinance/disclaimer/ )
i=86400 interval in seconds (86400 sec = 1 day)
p=3d data since how long ago
f= fields of data (d=date, c=close, h=high, l=low, o=open, v=volume)
Data would look like this:
EXCHANGE%3DTPE
MARKET_OPEN_MINUTE=540
MARKET_CLOSE_MINUTE=810
INTERVAL=86400
COLUMNS=DATE,CLOSE,HIGH,LOW,OPEN,VOLUME
DATA=
TIMEZONE_OFFSET=480
a1496295000,24.4,24.75,24.35,24.75,11782000
1,24.5,24.5,24.3,24.4,10747000
a1496295000 is the Unix timestamp of first row of data
the second row 1 is the interval offset from first row (offset 1 day)

cloudant index: count number of unique users per time period

A very similar post was made about this issue here. In cloudant, I have a document structure storing when users access an application, that looks like the following:
{"username":"one","timestamp":"2015-10-07T15:04:46Z"}---| same day
{"username":"one","timestamp":"2015-10-07T19:22:00Z"}---^
{"username":"one","timestamp":"2015-10-25T04:22:00Z"}
{"username":"two","timestamp":"2015-10-07T19:22:00Z"}
What I want to know is to count the # of unique users for a given time period. Ex:
2015-10-07 = {"count": 2} two different users accessed on 2015-10-07
2015-10-25 = {"count": 1} one different user accessed on 2015-10-25
2015 = {"count" 2} two different users accessed in 2015
This all just becomes tricky because for example on 2015-10-07, username: one has two records of when they accessed, but it should only return a count of 1 to the total of unique users.
I've tried:
function(doc) {
var time = new Date(Date.parse(doc['timestamp']));
emit([time.getUTCFullYear(),time.getUTCMonth(),time.getUTCDay(),doc.username], 1);
}
This suffers from several issues, which are highlighted by Jesus Alva who commented in the post I linked to above.
Thanks!
There's probably a better way of doing this, but off the top of my head ...
You could try emitting an index for each level of granularity:
function(doc) {
var time = new Date(Date.parse(doc['timestamp']));
var year = time.getUTCFullYear();
var month = time.getUTCMonth()+1;
var day = time.getUTCDate();
// day granularity
emit([year,month,day,doc.username], null);
// year granularity
emit([year,doc.username], null);
}
// reduce function - `_count`
Day query (2015-10-07):
inclusive_end=true&
start_key=[2015, 10, 7, "\u0000"]&
end_key=[2015, 10, 7, "\uefff"]&
reduce=true&
group=true
Day query result - your application code would count the number of rows:
{"rows":[
{"key":[2015,10,7,"one"],"value":2},
{"key":[2015,10,7,"two"],"value":1}
]}
Year query:
inclusive_end=true&
start_key=[2015, "\u0000"]&
end_key=[2015, "\uefff"]&
reduce=true&
group=true
Query result - your application code would count the number of rows:
{"rows":[
{"key":[2015,"one"],"value":3},
{"key":[2015,"two"],"value":1}
]}

haystack order_by not in proper

I am using django1.8 and haystack 2.4 and solr 4.10. Somehow order_by is not working as expected, Please have a look at below code,
>>> sqs = SearchQuerySet()
>>> sqs = sqs.using('entry').filter(status=0)
>>> for b in sqs.filter(content="see").order_by('title'): print b.title
501 Must-See Movies
Look See, Look at Me!
Last Chance to See
1,000 Places to See Before You Die
Pretend You Don't See Her
Learning to See Creatively : Design, Color and Composition in Photography
Behavior Solutions for the Inclusive Classroom : See a Behavior? Look It Up!
See No Evil
Last Chance to See
See It and Sink It : Mastering Putting Through Peak Visual Performance
See No Evil : The True Story of a Ground Soldier in the CIA's War on Terrorism
Voice for Now : Changing the Way We See Ourselves As Women
See Jane Win : The Rimm Report on How 1,000 Girls Became Successful Women
Kaplan Medical USMLE Medical Ethics : The 100 Cases You Are Most Likely to See on the Exam
I See You
You'll See It When You Believe It : The Way to Your Personal Transformation
Body Code : Diet and Fitness Programme: Master Your Metabolism and See the Weight Fall Off
descending Order
>>> sqs = SearchQuerySet()
>>> sqs = sqs.using('entry').filter(status=0)
>>> for b in sqs.filter(content="see").order_by('-title'): print b.title
You'll See It When You Believe It : The Way to Your Personal Transformation
Body Code : Diet and Fitness Programme: Master Your Metabolism and See the Weight Fall Off
Kaplan Medical USMLE Medical Ethics : The 100 Cases You Are Most Likely to See on the Exam
I See You
Voice for Now : Changing the Way We See Ourselves As Women
See Jane Win : The Rimm Report on How 1,000 Girls Became Successful Women
See No Evil : The True Story of a Ground Soldier in the CIA's War on Terrorism
See It and Sink It : Mastering Putting Through Peak Visual Performance
Last Chance to See
See No Evil
Behavior Solutions for the Inclusive Classroom : See a Behavior? Look It Up!
Learning to See Creatively : Design, Color and Composition in Photography
Pretend You Don't See Her
1,000 Places to See Before You Die
Last Chance to See
Look See, Look at Me!
501 Must-See Movies
Why odering is not working like A --> Z and Z --> A
Recently i had same issue with haystack order_by for title. I used python lambda function to sort object list.
ascending order using title:
sqs = sqs.using('entry').filter(status=0)
sorted_list = sorted([s.object for s in sqs], key=lambda x: x.title, reverse=False)
descending order:
sqs = sqs.using('entry').filter(status=0)
rev_sorted_list = sorted([s.object for s in sqs], key=lambda x: x.title, reverse=True)
sqs.order_by works very well with Integer fields.

Missing Tweets from Twitter API (using Tweepy)?

I have been collecting tweets from the past week to collect the past-7-days tweets related to "lung cancer", yesterday, I figured I needed to start collecting more fields, so I added some fields and started re-collecting the same period of Tweets related to "lung cancer" from last week. The problem is, the first time I've collected ~2000 tweets related to lung cancer on 18th, Sept 2014. But last night, it only gave ~300 tweets, when I looked at the time of the tweets for this new set, it's only collecting tweets from something like ~23:29 to 23:59 on 18th Sept 2014. A large chunk of data is obviously missing. I don't think it's something with my code (below), I have tested various ways including deleting most of the fields to be collected and the time of data is still cut off prematurely.
Is this a known issue with Twitter API (when collecting last 7 days' data)? If so, it will be pretty horrible if someone is trying to do serious research. Or is it somewhere in my code that caused this (note: it runs perfectly fine for other previous/subsequent dates)?
import tweepy
import time
import csv
ckey = ""
csecret = ""
atoken = ""
asecret = ""
OAUTH_KEYS = {'consumer_key':ckey, 'consumer_secret':csecret,
'access_token_key':atoken, 'access_token_secret':asecret}
auth = tweepy.OAuthHandler(OAUTH_KEYS['consumer_key'], OAUTH_KEYS['consumer_secret'])
api = tweepy.API(auth)
# Stream the first "xxx" tweets related to "car", then filter out the ones without geo-enabled
# Reference of search (q) operator: https://dev.twitter.com/rest/public/search
# Common parameters: Changeable only here
startSince = '2014-09-18'
endUntil = '2014-09-20'
suffix = '_18SEP2014.csv'
############################
### Lung cancer starts #####
searchTerms2 = '"lung cancer" OR "lung cancers" OR "lungcancer" OR "lungcancers" OR \
"lung tumor" OR "lungtumor" OR "lung tumors" OR "lungtumors" OR "lung neoplasm"'
# Items from 0 to 500,000 (which *should* cover all tweets)
# Increase by 4,000 for each cycle (because 5000-6000 is over the Twitter rate limit)
# Then wait for 20 min before next request (becaues twitter request wait time is 15min)
counter2 = 0
for tweet in tweepy.Cursor(api.search, q=searchTerms2,
since=startSince, until=endUntil).items(999999999): # changeable here
try:
'''
print "Name:", tweet.author.name.encode('utf8')
print "Screen-name:", tweet.author.screen_name.encode('utf8')
print "Tweet created:", tweet.created_at'''
placeHolder = []
placeHolder.append(tweet.author.name.encode('utf8'))
placeHolder.append(tweet.author.screen_name.encode('utf8'))
placeHolder.append(tweet.created_at)
prefix = 'TweetData_lungCancer'
wholeFileName = prefix + suffix
with open(wholeFileName, "ab") as f: # changeable here
writeFile = csv.writer(f)
writeFile.writerow(placeHolder)
counter2 += 1
if counter2 == 4000:
time.sleep(60*20) # wait for 20 min everytime 4,000 tweets are extracted
counter2 = 0
continue
except tweepy.TweepError:
time.sleep(60*20)
continue
except IOError:
time.sleep(60*2.5)
continue
except StopIteration:
break
Update:
I have since tried running the same python scripts on a different computer (which is faster and more powerful than my home laptop). And the latter resulted in the expected number of tweets, I don't know why it's happening as my home laptop works fine for many programs, but I think we could rest the case and rule out the potential issues related to the scripts or Twitter API.
If you want to collect more data, I would highly recommend the streaming api that Tweepy has to offer. It has a much higher rate limit, in fact I was able to collect 500,000 tweets in just one day.
Also your rate limit checking is not very robust, you don't know for sure that Twitter will allow you to access 4000 tweets. From experience, I found that the more often you hit the rate limit the fewer tweets you are allowed and the longer you have to wait.
I would recommend using:
api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)
so that your application will not exceed the rate limit, alternatively you should check what you have used with:
print (api.rate_limit_status())
and then you can just sleep the thread like you have done.
Also your end date is incorrect. The end date should be '2014-09-21', one higher than whatever todays date is.

Xively read data in Python

I have written a python 2.7 script to retrieve all my historical data from Xively.
Originally I wrote it in C#, and it works perfectly.
I am limiting the request to 6 hour blocks, to retrieve all stored data.
My version in Python is as follows:
requestString = 'http://api.xively.com/v2/feeds/41189/datastreams/0001.csv?key=YcfzZVxtXxxxxxxxxxxORnVu_dMQ&start=' + requestDate + '&duration=6hours&interval=0&per_page=1000' response = urllib2.urlopen(requestString).read()
The request date is in the correct format, I compared the full c# requestString version and the python one.
Using the above request, I only get 101 lines of data, which equates to a few minutes of results.
My suspicion is that it is the .read() function, it returns about 34k of characters which is far less than the c# version. I tried adding 100000 as an argument to the ad function, but no change in result.
Left another solution wrote in Python 2.7 too.
In my case, got data each 30 minutes because many sensors sent values every minute and Xively API has limited half hour of data to this sent frequency.
It's general module:
for day in datespan(start_datetime, end_datetime, deltatime): # loop increasing deltatime to star_datetime until finish
while(True): # assurance correct retrieval data
try:
response = urllib2.urlopen('https://api.xively.com/v2/feeds/'+str(feed)+'.csv?key='+apikey_xively+'&start='+ day.strftime("%Y-%m-%dT%H:%M:%SZ")+'&interval='+str(interval)+'&duration='+duration) # get data
break
except:
time.sleep(0.3)
raise # try again
cr = csv.reader(response) # return data in columns
print '.'
for row in cr:
if row[0] in id: # choose desired data
f.write(row[0]+","+row[1]+","+row[2]+"\n") # write "id,timestamp,value"
The full script you can find it here: https://github.com/CarlosRufo/scripts/blob/master/python/retrievalDataXively.py
Hope you might help, delighted to answer any questions :)