I am working with Tweepy (python's REST API client) and I'm trying to find tweets by several keywords and without url included in tweet.
But search results are not up to our satisfaction. Looks like query has erros and was stopped. Additionally we had observed that results were returned one-by-one not (as previously) in bulk packs of 100.
Could you please tell me why this search does not work properly?
We wanted to get all tweets mentioning 'Amazon' without any URL links in the text.
We used search shown below. Search results were still containing tweets with URLs or without 'Amazon' keyword.
Could you please let us know what we are doing wrong?
auth = tweepy.AppAuthHandler(consumer_key, consumer_secret)
api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)
searchQuery = 'Amazon OR AMAZON OR amazon filter:-links' # Keyword
new_tweets = api.search(q=searchQuery, count=100,
result_type = "recent",
max_id = sinceId,
lang = "en")
The minus sign should be put before "filter", not before "links", like this:
searchQuery = 'Amazon OR AMAZON OR amazon -filter:links'
Also, I doubt that the count = 100 option is a valid one, since it is not listed on the API documentation (which may not be very up-to-date, though). Try to replace that with rpp = 100 to get tweets in bulk packs.
I am not sure why some of the tweets you find do not contain the "Amazon" keyword, but a possibility is that "Amazon" is contained within the username of the poster. I do not know if you can filter that directly in the query, or even if you would want to filter it, since it would mean you would reject tweets from the official Amazon accounts. I would suggest that, for each tweet the query returns, you check it to make sure it does contain "Amazon".
Related
Ok I have been stuck on this for a few weeks now. I'm using the Front Email API for a business use case and have created an iterative function to (attempt to) get multiple pages of query results.
A quick overview of the API (endpoint) I'm using for context:
The "events" endpoint returns a list of records based on the parameters given in the query (like before/after/between certain times, types of events, etc.)
If the query results in more than 100 records, a "next" pagination link is generated for the next page(s) of results. There is no "page=n" parameter in the query URL, you only get the "next page" link from the response of the previous query (which is fairly unique)
A side note, the initial base_url for the first query, and the base_url of the "next page" link are two different urls (i.e. the initial call is https://api2.frontapp.com and the second is https://companynamehere-inc.api.frontapp.com), so this is taken into consideration in my querying function.
My environment looks like this:
As you can see, I query the initial URL using the external Func_Query_Front_API function, then begin the iteration; While the next page link is not null, keep feeding the next links returned from the previous calls back into the function to get the next page of results. I deconstruct the links given to the function into a base, relative path and body/params so that I can use this logic in both Desktop and Online Service (don't ask).
It's difficult to describe the results I get, because sometimes in the preview window, it just clocks and clocks and doesn't return any results with the API not being queried at all (confirmed from Postman and the rate limit remaining number in the response headers). When I do get results back, it's a massive list of records (way more than what I'm querying for/expecting to receive) that contains lots of duplicates. It's almost like the initial (or the second) query URL is being looped over and over again, but not the "next" page's links? Like it's somehow stuck in a semi-infinite loop while the "next" link is not null from the initial response only, so it just repeats the same query over and over again re-using the same "next" page link.
Now, unfortunately I cannot provide my bearer token for this API as it's private company info returned by the API, but I'm wondering if the issue is with my syntax in the code that someone can spot and steer me in the right direction. The querying function itself works when invoked on its own, and everything looks like it SHOULD work, but it's just not doing what I think the code is saying it should do.
I understand it's not much to go on, but I'm out of options in trying to debug this thing. Thanks!
UPDATE
I think what MIGHT help here is a working code written in Python that might help to translate what I'm looking for into Power BI, so I've provided a working code in Python below (again though, the bearer token is not provided, but the syntax should make things a bit clearer). The code closely resembles what I've already made in Power BI as well so hopefully this helps things a bit?
import requests
from time import sleep
########################################
# GLOBAL VARIABLES
########################################
_event_types_filter = "assign&q[types]=archive&q[types]=comment&q[types]=inbound&q[types]=outbound&q[types]=forward&q[types]=tag&q[types]=out_reply"
_after = 1667506000
_page_limit = 100
_bearer_token = "Bearer XXXXXXXXXXXXXXXXXXXXXXXXXX"
_init_base_url = "https://api2.frontapp.com"
_relative_path = "events"
_init_body = "limit=" + str(_page_limit) + "&q[after]=" + str(_after) + "&q[types]=" + _event_types_filter
_headers = {'authorization': _bearer_token}
_init_query_url = _init_base_url + "/" + _relative_path + "?" + _init_body
########################################
# FUNCTION - Query the API
########################################
def Func_Query_Front_API(input_url):
#print(input_url)
# Deconstruct the url into its separate parts
splitted_input_url = input_url.split("?")
input_url_base = splitted_input_url[0].replace("/events", "")
input_url_body_params = splitted_input_url[1]
# Query the API
response = requests.request("GET",
input_url_base + "/" + _relative_path + "?" + input_url_body_params,
headers=_headers)
# Get the "next" link from the response
next = response.json()["_pagination"]["next"]
# Return the response and the "next" link
return response, next
########################################
# MAIN PROGRAM START
########################################
# List to add the response data's to
Source = []
# Make the initial request and add the results to the list
init_response, next = Func_Query_Front_API(_init_query_url)
Source.append(init_response)
# While the "next" link(s) are not None, query the current
# "next" link for the next page(s) of the query
while next != None:
response, next = Func_Query_Front_API(next)
Source.append(response)
sleep(1)
print(Source)
print("Done!")
I am experimenting using elasticsearch in a dummy project in django. I am attempting to make a search page using django-elasticsearch-dsl. The user may provide a title, summary and a score to search for. The search should match all the information given by the user, but if the user does not provide any info about something, this should be skipped.
I am running the following code to search for all the values.
client = Elasticsearch()
s = Search().using(client).query("match", title=title_value)\
.query("match", summary=summary_value)\
.filter('range', score={'gt': scorefrom_value, 'lte': scoreto_value})
When I have a value for all the fields then the search works correctly, but if for example I do not provide a value for the summary_value, although I am expecting the search to continue searching for the rest of the values, the result is that it comes up with nothing as a result.
Is there some value that the fields should have by default in case the user does not provide a value? Or how should I approach this?
UPDATE 1
I tried using the following, but it returns every time no matter the input i am giving the same results.
s = Search(using=client)
if title:
s.query("match", title=title)
if summary:
s.query("match", summary=summary)
response = s.execute()
UPDATE 2
I can print using the to_dict().
if it is like the following then s is empty
s = Search(using=client)
s.query("match", title=title)
if it is like this
s = Search(using=client).query("match", title=title)
then it works properly but still if i add s.query("match", summary=summary) it does nothing.
You need to assign back into s:
if title:
s = s.query("match", title=title)
if summary:
s = s.query("match", summary=summary)
I can see in the Search example that django-elasticsearch-dsl lets you apply aggregations after a search so...
How about "staging" your search? I can think if the following:
#first, declare the Search object
s = Search(using=client, index="my-index")
#if parameter1 exists
if parameter1:
s.filter("term", field1= parameter1)
#if parameter2 exists
if parameter2:
s.query("match", field=parameter2)
Do the same for all your parameters (with the needed method for each) so only the ones that exist will appear in your query. At the end just run
response = s.execute()
and everything should work as you want :D
I would recommend you to use the Python ES Client. It lets you manage multiple things related to your cluster: set mappings, health checks, do queries, etc.
In its method .search(), the body parameter is where you send your query as you normally would run it ({"query"...}). Check the Usage example.
Now, for your particular case, you can have a template of your query stored in a variable. You first start with, let's say, an "empty query" only with filter, just like:
query = {
"query":{
"bool":{
"filter":[
]
}
}
}
From here, you now can build your query from the parameters you have.
This is:
#This would look a little messy, but it's useful ;)
#if parameter1 is not None or emtpy
#(change the if statement for your particular case)
if parameter1:
query["query"]["bool"]["filter"].append({"term": {"field1": parameter1}})
Do the same for all your parameters (for strings, use "term", for ranges use "range" as usual) and send the query in the .search()'s body parameter and it should work as you want.
Hope this is helpful! :D
I want to get details of a restaurant in Zomato. I have it's link as the input (https://www.zomato.com/mumbai/fantasy-the-cake-shop-kalyan?utm_source=api_basic_user&utm_medium=api&utm_campaign=v2.1). By browsing the documentation of Zomato APIs, I didn't found a way to get it.
I tried searching for the restaurant using search API but it returns many results.
Any help will be appreciated
It's a two-step process:
Find out restaurant's city id, in your case, Mumbai's city id through the /cities API. It's a simple query search.
Use the city id from the above API call in the /search API, like, https://developers.zomato.com/api/v2.1/search?entity_type=city&entity_id=3&q=fantasy%20the%20cake%20shop%20kalyan
This would give all the basic information about a restaurant.
View the page's source and search for window.RES_ID
I had the same issue as you described. This Zomato's API approach is at least odd. It's almost immposible to GET any information about restaurant if you don't know res_id in advance and that's not possible to parse since Zomato will deny access.
This worked for me:
Obtain user-key from Zomato API Credentials (https://developers.zomato.com/api)
Search restaurant url via API (https://developers.zomato.com/api/v2.1/search?entity_id=84&entity_type=city&q=RESTAURANT_URL&category=7%2C9). The more specific you will be, the better results you'll get (This url is specified by city to Prague (ID = 84) and categories Daily menus (ID = 7) and Lunch (ID = 9). If there is possibility do specify city, category or cuisine, it helps, but should't be necessary. Don't forget to define GET method in headers.
Loop or filter through json results and search for the wanted url. You might need to use method valueOf() to search for the same url. Be careful, you might need to add "?utm_source=api_basic_user&utm_medium=api&utm_campaign=v2.1" at the end of your wanted url so it has the same format. Check that through Zomato API Documentation page.
for (i in body.restaurants) {
var url_wanted = restaurant_url + '?utm_source=api_basic_user&utm_medium=api&utm_campaign=v2.1'
var url_in_json = body.restaurants[i].restaurant.url;
if (url_wanted.valueOf() == url_in_json.valueOf()) {
var restaurant_id = body.restaurants[i].restaurant.id;
}
console.log('Voala! We have res_id:' + restaurant_id);
}
There you have it. It could be easier though.
Hope it helps!
once you have the url of the rseraunt's page you can simply look for a javascript object attribute named "window.RES_ID" and further use it in the api call.
it seems that Boto is the official Amazon API module for Python, and this one is for Tornado, so here is my questions:
does it offer pagination (requests only 10 products, since amazon offers 10 products per page, then i want only to get the first page...), then how (sample code?)
how then to parse the product parse, i've used python-amazon-simple-product-api but sadly it dont offer pagination, so all the offers keep iterating.
generally, pagination is performed by the client requesting the api. To do this in boto, you'll need to cut your systems up. So for instance, say you make a call to AWS via boto, using the get_all_instances def; you'll need to store those somehow and then keep track of which servers have been displayed, and which not. To my knowledge, boto does not have the LIMIT functionality most dev's are used to from MySQL. Personally, I scan all my instances and stash them in mongo like so:
for r in conn.get_all_instances(): # loop through all reservations
groups = [g.name for g in r.groups] # get a list of groups for this reservation
for x in r.instances: # loop through all instances with-in reservation
groups = ','.join(groups) # join the groups into a comma separated list
name = x.tags.get('Name',''); # get instance name from the 'Name' tag
new_record = { "tagname":name, "ip_address":x.private_ip_address,
"external_ip_nat":x.ip_address, "type":x.instance_type,
"state":x.state, "base_image":x.image_id, "placement":x.placement,
"public_ec2_dns":x.public_dns_name,
"launch_time":x.launch_time, "parent": ObjectId(account['_id'])}
new_record['groups'] = groups
systems_coll.update({'_id':x.id},{"$set":new_record},upsert=True)
error = db.error()
if error != None:
print "err:%s:" % str(error)
You could also wrap these in try/catch blocks. Up to you. Once you get them out of boto, should be trivial to do the cut up work.
-- Jess
When using Amazon's web service to get any product's information, is there a direct way to get the Average Customer Rating (1-5 stars)? Here are the parameters I'm using:
Service=AWSECommerceService
Version=2011-08-01
Operation=ItemSearch
SearchIndex=Books
Title=A Game of Thrones
ResponseGroup=Large
I would expect it to have a customer rating of 4.5 and total reviews of 2177. But instead I get the following in the response.
<CustomerReviews><IFrameURL>http://www.amazon.com/reviews/iframe?...</IFrameURL></CustomerReviews>
Is there a way to get the overall customer rating, besides for reading the <IFrameURL/> value, making another HTTP request for that page of reviews, and then screen scraping the HTML? That approach is fragile since Amazon could easily change the reviews page structure which would bust my application.
You can scrape from here. Just replace the asin with what you need.
http://www.amazon.com/gp/customer-reviews/widgets/average-customer-review/popover/ref=dpx_acr_pop_?contextId=dpx&asin=B000P0ZSHK
As far as i know, Amazon changed it's API so its not possible anymore to get the reviewrank information. If you check this Link the note sais:
As of November 8, 2010, only the iframe URL is returned in the request
content.
However, testing with the params you used to get the Iframe it seems that now even the Iframe dosn't work anymore. Thus, even in the latest API Reference in the chapter "Motivating Customers to Buy" the part "reviews" is compleatly missing.
However: Since i'm also very interested if its still possible somehow to get the reviewrank information - maybe even not using amazon API but a competitors API to get review rank informations - i'll set up a bounty if anybody can provide something helpful on that. Bounty will be set in this topic in two days.
You can grab the iframe review url and then use css to position it so only the star rating shows. It's not ideal since you're not getting raw data, but it's an easy way to add the rating to your page.
Sample of this in action - http://spamtech.co.uk/positioning-content-inside-an-iframe/
Here is a VBS script that would scrape the rating. Paste the code below to a text file, rename it to Test.vbs and double click to run on Windows.
sAsin = InputBox("What is your ASIN?", "Amazon Standard Identification Number (ASIN)", "B000P0ZSHK")
if sAsin <> "" Then
sHtml = SendData("http://www.amazon.com/gp/customer-reviews/widgets/average-customer-review/popover/ref=dpx_acr_pop_?contextId=dpx&asin=" & sAsin)
sRating = ExtractHtml(sHtml, "<span class=""a-size-base a-color-secondary"">(.*?)<\/span>")
sReviews = ExtractHtml(sHtml, "<a class=""a-size-small a-link-emphasis"".*?>.*?See all(.*?)<\/a>")
MsgBox sRating & vbCrLf & sReviews
End If
Function ExtractHtml(sHtml,sPattern)
Set oRegExp = New RegExp
oRegExp.Pattern = sPattern
oRegExp.IgnoreCase = True
Set oMatch = oRegExp.Execute(sHtml)
If oMatch.Count = 1 Then
ExtractHtml = Trim(oMatch.Item(0).SubMatches(0))
End If
End Function
Function SendData(sUrl)
Dim oHttp 'As XMLHTTP30
Set oHttp = CreateObject("Msxml2.XMLHTTP")
oHttp.open "GET", sUrl, False
oHttp.send
SendData = Replace(oHttp.responseText,vbLf,"")
End Function
Amazon has completely removed support for accessing rating/review information from their API. The docs mention a Response Element in the form of customer rating, but that doesn't work either.
Google shopping using Viewpoints for some reviews and other sources
This is not possible from PAPI. You either need to scrape it by yourself, or you can use other free/cheaper third-party alternatives for that.
We use the amazon-price API from RapidAPI for this, it supports price/rating/review count fetching for up to 1000 products in a single request.