Google Compute Engine aiohttp get requests recaptcha - google-cloud-platform

I am trying to send a get request on Google Compute Engine (GCE) to Newegg using aiohttp. Upon doing so, I get back the webpage "Are you a human." However, when I run the same exact code on my local machine, I am able to retrieve the page just fine. Does anyone know why:
I only get the Recaptcha page with GCE, but not my local machine?
Is there any way to avoid or get around this Recaptcha page on GCE?
my code:
import asyncio from bs4 import BeautifulSoup import aiohttp
async def myDriver():
await httpReq()
async def httpReq():
async with aiohttp.ClientSession() as session:
async with session.get("https://www.newegg.com/") as page:
responseCode = page.status
print(responseCode)
pageContent = await page.text()
content = BeautifulSoup(pageContent, 'lxml')
print(content.prettify())
asyncio.run(myDriver())
page reached:
200
<!DOCTYPE html>
<html lang="en" xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>
Are you a human?
</title>
.
.
.
grecaptcha.ready(function()
Notes:
"Debian GNU/Linux 10 (buster)"
python 3.7.3
aiohttp 3.6.3
I've tried similar code with the normal requests library on GCE, and everything works fine, so this is only an issue with aiohttp on GCE.
I must use aiohttp and not the normal requests library for my project

Related

Restful web services with tomcat 8

I'm trying to make a simple example of restful web services and it doesn't work for me. First, i'm using Netbeans for it, I'm using tomcat 8.5.20 and Java EE 7 web and i just have two classes. One is this:
enter image description here
And the other is this:
enter image description here
My intex.html has nothing apart the normal things and my context.xml just has this:
context.xml image
just that.
The web service works pretty nice on my computer, but, when i deploy it in a server which use Apache Tomcat 6.0.45 i have a 404 error and i don't know why because it's the same path that i use when i prove it in my computer. When i use the web service in my pc vs When i prove it in the server.
Thanks for the help, sorry for the bad english :v
Nebeans version: 8.2
Class 1:
import rest.Consumo;
import co.edu.udea.exception.OrgSistemasSecurityException;
import javax.ws.rs.GET;
import javax.ws.rs.HeaderParam;
import javax.ws.rs.Path;
import javax.ws.rs.Produces;
import javax.ws.rs.core.MediaType;
#Path("/semestres")
public class HistoriaAcademicaResource {
#GET
#Path("/hola")
#Produces(MediaType.TEXT_PLAIN)
public String hola() {
return "Don't give up";
}
}
Class 2:
import javax.ws.rs.core.Application;
#javax.ws.rs.ApplicationPath("prueba")
public class ApplicationConfig extends Application {
}
Index:
<html>
<head>
<title>SIRVE MALDITA SEA</title>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
</head>
<body>
<div>SIRVE PLOX</div>
</body>
</html>
Context.xml:
<?xml version="1.0" encoding="UTF-8"?>
<Context path="/pruebaREST"/>

Doing auto redirect using CherryPy

I've just discovered CherryPy. I am going through the tutorial, so far so good. While doing it I wanted to create a "BUSY DOING WORK" splash screen, essentially I have a python function that for example updates an sqlite table with 10000 records. What I want to do is get CherryPy to display a busy.html page while the database is being updated, when the database operation completes I want to redirect the user back to the main.html page.
So far I have only come across
dev update_db(self):
#Code to update the database goes here
return "busy.html"<----show user that work is being done
#database has completed
CherryPy.redirect "main.html"
But return simply exits the function. Is there anyway of doing presenting the user with a temporary splashscreen, while the database is being updated then returning the user back to another page once its complete.
I suppose an alternative is to have a message flash across the top of the existing page, But I don't know if CherryPy has a flash message feature much like Flask.
IMHO, you can achieve this with generators and here is a link from latest (v3.8) cherrypy documentation. However, you should take into account the following issue in the docs
Streaming generators are sexy, but they play havoc with HTTP. CherryPy allows you to stream output for specific situations: pages which take many minutes to produce, or pages which need a portion of their content immediately output to the client. Because of the issues outlined above, it is usually better to flatten (buffer) content rather than stream content. Do otherwise only when the benefits of streaming outweigh the risks.
Generators have some limitations as written in the documentation
you cannot manually modify the status or headers within your page handler if that handler method is a streaming generator, because the method will not be iterated over until after the headers have been written to the client. This includes raising exceptions like HTTPError, NotFound, InternalRedirect and HTTPRedirect. To use a streaming generator while modifying headers, you would have to return a generator that is separate from (or embedded in) your page handler.
Because the headers have already been written to the client when streaming, raising redirection exception cannot help to redirect to different page after your long running task. If I were you, I would yield this
<meta http-equiv="refresh" content="0;URL='/anotherpage'" />
or this at the final yield
<script>window.location.href="/anotherpage";</script>
I coded and example for you. I hope this gives you an idea.
# encoding: utf-8
import time
import cherrypy
class Root(object):
#cherrypy.expose
def index(self):
content = '''<!DOCTYPE html>
<html>
<head>
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.3/jquery.min.js"></script>
<script>
$(document).ready(function(){
$("body").append("<p>jQuery Ready</p>");
setTimeout(function(){
$("body").html("Redirecting Please Wait...")
}, 2500);
});
</script>
</head>
<body>
<p>Page Content1</p>
<p>Page Content2</p>
<p>Page Content3</p>
<p>Page Content4</p>
<p>Page Content5</p>
<p>Page Content Final</p>
</body>
</html>
'''
for line in content.split("\n"):
yield line
time.sleep(1)
else:
yield '''<meta http-equiv="refresh" content="5;URL='/anotherpage'" />'''
index._cp_config = {'response.stream': True}
#cherrypy.expose
def anotherpage(self):
return "Another Page"
cherrypy.quickstart(Root(), "/")

how to embed standalone bokeh graphs into django templates

I want to display graphs offered by the bokeh library in my web application via django framework but I don't want to use the bokeh-server executable because it's not the good way. so is that possible? if yes how to do that?
Using the Embedding Bokeh Plots documentation example as suggested by Fabio Pliger, one can do this in Django:
in the views.py file, we put:
from django.shortcuts import render
from bokeh.plotting import figure
from bokeh.resources import CDN
from bokeh.embed import components
def simple_chart(request):
plot = figure()
plot.circle([1,2], [3,4])
script, div = components(plot, CDN)
return render(request, "simple_chart.html", {"the_script": script, "the_div": div})
in the urls.py file we can put :
from myapp.views import simple_chart
...
...
...
url(r'^simple_chart/$', simple_chart, name="simple_chart"),
...
...
in the template file simple_chart.html we'll have :
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Experiment with Bokeh</title>
<script src="http://cdn.bokeh.org/bokeh/release/bokeh-0.8.1.min.js"></script>
<link rel="stylesheet" href="http://cdn.bokeh.org/bokeh/release/bokeh-0.8.1.min.css">
</head>
<body>
{{ the_div|safe }}
{{ the_script|safe }}
</body>
</html>
And it works.
You don't need to use bokeh-server to embed bokeh plots. It just means you'll not be using (and probably don't need) the extra features it provides.
In fact you can embed bokeh plots in many ways like generating standalone html, by generating bokeh standalone components that you can then embed in you django app when rendering templates or with the method we call "autoloading" which makes bokeh return a tag that will replace itself with a Bokeh plot. You'll find better details looking at the documentation.
Another good source of inspiration is the embed examples you can find in the repository.
It is also possible to have it work with AJAX requests. Let's say we have a page loaded and would like to show a plot on button click without reloading the whole page. From Django view we return Bokeh script and div in JSON:
from django.http import JsonResponse
from bokeh.plotting import figure
from bokeh.resources import CDN
from bokeh.embed import components
def simple_chart(request):
plot = figure()
plot.circle([1,2], [3,4])
script, div = components(plot, CDN)
return JsonResponse({"script": script, "div": div})
When we get AJAX response in JS (in this example Jquery is used) the div is first appended to the existing page and then the script is appended:
$("button").click(function(){
$.ajax({
url: "/simple_chart",
success: function(result){
var bokeh_data = JSON.parse(result);
$('#bokeh_graph').html(bokeh_data.div);
$("head").append(bokeh_data.script);
}});
});
It must put {{the_script|safe}} inside the head tag
Here's a flask app that uses jquery to interract with a bokeh plot. Check out the templates/ for javascript you can reuse. Also search for bokeh-demos on github.

Website Treating me as mobile when scraping from html in python

I am attempting to scrape data off of a website using a combination of urllib2 and beautifulsoup. At the moment, here is my code:
site2='http://football.fantasysports.yahoo.com/archive/nfl/2008/619811/draftresults'
players=[]
teams=[]
response=urllib2.urlopen(site2)
html=response.read()
soup=BeautifulSoup(html)
playername = soup.find_all('a', class_="name")
teamname = soup.find_all('td', class_="last")
My problem is, that when I view the source code in Chrome, these tags are readily available and working, but when I try and run the program, the tags are no longer there.
One hint may be that the first line of the source code reads like such:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
While if I print my soup or html object the first line is <!DOCTYPE html PUBLIC "-//WAPFORUM//DTD XHTML Mobile 1.0//EN" "http://www.wapforum.org/DTD/xhtml-mobile10.dtd">.
It appears that the url appears in a mobile form when I try and scrape it using urllib2. If this is not what this means, or you do in fact know how to have urllib2 open the url as a browser (preferably chrome) would, please let me know! Please also be quite specific as to how I can solve the problem, as I am a novice coder and admittedly my depth of knowledge is shallow at best!
Thanks everyone!
The website tries to figure out what browser the source of the request is from the 'User-agent'. According to the urllib2 docs, the default user-agent is Python-urllib/2.6. You could try setting that to that of a browser using OpenerDirector. Again, from the docs:
import urllib2
opener = urllib2.build_opener()
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
opener.open('http://www.example.com/')

Trouble with urllib calls in Python. Getting server error

I am trying to download an XML file from the Eurostat website but I am having trouble using urllib in Python to do it. Somehow when I use my regular Chrome browser it's able to make the HTTP request and the website will generate an XML file, but when I try to do the same thing in python I get a server error. This is the code I am using:
import urllib
from xml.etree import ElementTree as ET
response = urllib.urlopen("http://ec.europa.eu/eurostat/SDMX/diss-web/rest/data/lfsq_egais/Q.T.Y_GE15.EMP..NL")
result = response.read()
print result
I have tried using urllib.urlretrieve too and that didn't work either. Any reason why this might be happening? The HTML I get back is as follows:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Draft//EN">
<HTML>
<HEAD>
<TITLE>Error 500--Internal Server Error</TITLE>
<META NAME="GENERATOR" CONTENT="WebLogic Server">
</HEAD>
<BODY bgcolor="white">
<FONT FACE=Helvetica><BR CLEAR=all>
<TABLE border=0 cellspacing=5><TR><TD><BR CLEAR=all>
<FONT FACE="Helvetica" COLOR="black" SIZE="3"><H2>Error 500--Internal Server Error</H2>
</FONT></TD></TR>
</TABLE>
<TABLE border=0 width=100% cellpadding=10><TR><TD VALIGN=top WIDTH=100% BGCOLOR=white><FONT FACE="Courier New"><FONT FACE="Helvetica" SIZE="3"><H3>From RFC 2068 <i>Hypertext Transfer Protocol -- HTTP/1.1</i>:</H3>
</FONT><FONT FACE="Helvetica" SIZE="3"><H4>10.5.1 500 Internal Server Error</H4>
</FONT><P><FONT FACE="Courier New">The server encountered an unexpected condition which prevented it from fulfilling the request.</FONT></P>
</FONT></TD></TR>
</TABLE>
</BODY>
</HTML>
This question is a few months old now, but better late than never:
The Eurostat REST API you are talking is supposed to respond with XML content, which urllib is not expecting/allowing by default. The solution is to add a header Accept: application/xml to the request.
This will do the trick in Python 2.7 (using urllib2 by the way):
import urllib2
req = urllib2.Request("http://ec.europa.eu/eurostat/SDMX/diss-web/rest/data/"
"lfsq_egais/Q.T.Y_GE15.EMP..NL")
req.add_header("Accept", "application/xml")
response = urllib2.urlopen(req)
print response.read()
See urllib2 docs for more info and examples.