I have a code snippet which is scraping the content of a webpage. The content on webpage is getting loaded by AJAX. I'm scraping the data in a loop, and every time it ends up with one of the following errors:
1. Address already in use - bind(2) for 127.0.0.1:35216
2. could not obtain a database connection within 5.000 seconds (waited 5.000 seconds)
3. Net::ReadTimeout
Code:
client = Selenium::WebDriver::Remote::Http::Default.new
browser = nil
browser = Watir::Browser.new :phantomjs, :http_client => client
browser.window.maximize
browser.goto "some URL"
final_url = URI.parse(browser.url)
#Sleep for 35 seconds, expecting data to get rendered by ajax
sleep(35)
unless pagecheck.css('li.some-class').empty?
sleep(25)
end
Related
Consider the following queries
useQuery(POSTS, {
variables: {
offset: currPage * 20
}
})
where currPage is a React local state variable. It will get updated when user paginates
Intended outcome:
When currPage=1, new data is fetched, when currPage=2, new data is fetched... When user paginates to the previous page (page 1), because query with currPage=1 is already fetched, it should just read the cache
Actual outcome:
When user paginates from page 1 to page 2, new data for page 2 is fetched, however, when user paginates back to page 1, cache is not read, data is still displayed for page 2
Versions
Apollo client version: #apollo/client: ^3.0.0-beta.14
This bug is also reported in Apollo client GitHub: https://github.com/apollographql/apollo-client/issues/5659
I am trying to import the product titles and review ratings from a listing to a Google Spreadsheet. I have tried the ImportXML fuction using Xpath query but that does not work. So, I tried a code as mentioned below and it worked. I have been able to get the listing data but sometimes it gives me an error instead of displaying the data.
Error:
Request failed for https://www.amazon.co.uk returned code 503. Truncated server response: For information about migrating to ... (use muteHttpExceptions option to examine full response). (line 2).
When I refresh the code or when I add/remove https:// from the url, it works again but when I refresh the sheet, it goes off sometime and displays the error.
Question:
Is there any way to get rid of the error?
While trying to get the Star Rating displayed on the sheet, it uses a Span Data-hook class where the data is stored and I am unable to retrieve it. Is there any way to retrieve the star rating as well?
This is the function that I have created to get the product title and other data:
function productTitle(url) {
var content = UrlFetchApp.fetch(url).getContentText();
var match = content.match(/<span id="productTitle".*>([^<]*)<\/span>/);
return match && match [1] ? match[1] : 'Title not found';
}
You are receiving HTTP Status codes of 503, that means the service you are trying to reach is either under maintenance or overloaded.
This is on Amazon's side. You should use the Amazon API instead of the public endpoints for processing this data.
I want to make a HTTP request, so that I get minimum data from the server. For eg : If the user device is a mobile, the server will send less data.
I was doing this in python ::
req = urllib2.Request(__url, headers={'User-Agent' : "Magic Browser"})
html = urllib2.urlopen(req).read()
But it still takes some time to download all this.
If it helps this is the domain from which I want to download pages : https://in.bookmyshow.com
Is there any other way so that I can download a page, quickly with minimum data? Is it even possible?
you can use request for upload files get datas example for get cookies:
import requests
r = requests.get('https://in.bookmyshow.com')
print r.cookies.get_dict()
or for upload file:
import requests
file = {'file':('filename.txt', open('filename.txt', 'r'), multipart/from-data)}
data = {
"ButtonValueNameInHtml" : "Submit",
}
r = requests.post('https://in.bookmyshow.com', files=file, data=data)
replace in.bookmyshow.com by your own url
you can do many Thigs With requests
I am trying to extract data from an API but in the documentation I was only given one API key, which I assume is the private key.
If this is the case how do I make a GET call in python to pull out data for say inventory based on below documentation documentation(can not post entire document) and if no URL is provided?:
Public Inventory API
1.0
[ Base url: https://partner-gateway.staging.mjplatform.com/v1] https://partners.mjfreeway.com/docs/inventory
API data related to inventory management
Schemes
catalog
GET
/catalog
Listing of Sellable Products
This request provides a detailed listing of all sellable products, also referred to throughout the system as "item masters", for the active facility. The listing can be filtered by some simple parameters.
Parameters
Name Description
category_id
integer
(query)
The ‘id’ of a single category to which you want to limit results
subcategory_id
integer
(query)
The ‘id’ of a single subcategory to which you want to limit results
strain_id
integer
(query)
The ‘id’ of a single strain to which you want to limit results
item_number
string
(query)
The item number of a particular item master, i.e. BKSA00000003
uom_type
string
(query)
The method of measurement for the item. Valid options are discrete, weight, and volume
available_online
boolean
(query)
A boolean indicator of whether the item can be sold online
This is my code so far:
import requests
# api-endpoint
URL = "https://www.mjplatform.com/catalog"
# location given here
key = "123abc"
# defining a params dict for the parameters to be sent to the API
PARAMS = {URL:key}
# sending get request and saving the response as response object
r = requests.get(url = URL, params = PARAMS)
# extracting data in json format
data = r.json()
When I run the above I get the following message:
ValueError: No JSON object could be decoded
I am not sure what I am doing to get the data if I am getting a response status 200.
Edit:
Ran print(r.text) got this message:
<!DOCTYPE html><html lang="en"><head><meta http-equiv="X-UA-Compatible" content="IE=edge"><meta charset="utf-8"/><meta name="viewport" content="width=device-width,initial-scale=1"><title>MJ Platform</title><link href="/main.b21e9284629fc8bfb7bc9b4158ad44b9.css" rel="stylesheet"></head><body><div id="defaultLoadingMessage"><div style="height:40px"></div><div class="col-md-4 col-md-offset-4"><div><h1 style="text-align:center">Loading MJ Platform</h1><div class="text-muted" style="text-align:center;width:600px;margin:auto;color:#aaa">If you continue to see this message for more than a few seconds, your browser is most likely out of date or incompatible. We support Chrome and Firefox. Other browsers may work but not provide an optimal experience. <strong>Safari and MS IE are specifically not supported.</strong></div></div></div></div><div id="app"></div><script type="text/javascript" src="/main.cccbe56cf819e9f8a6e3.js"></script></body></html>
How can browser be out of date, if I pulling information into a python anaconda window?
Was given other API information not included in documentation.
how to download movie from a link (that normally start with click ) this is the html code for the download File in the web page. i am looking to do so in python code as a client that download multiply times the movie but not saving it (just simulating traffic on the web page)
In case you have the url:
import requests
url="http://....."
response = requests.get(url)
You can print the response or parse it:
response.headers is dict of the headers response.
content is the content of the response