The relevant line in my code is: a(href = data[i][propt] ) data[i][propt]. data[i][propt] stores the URL, for instance, www.google.com. However, this way, what is shown on the webpage is a clickable data[i][propt] that directs to www.google.com. But what I wanted is a clickable www.google.com that directs to www.google.com. Is this possible?
Try:
a(href = data[i][propt]) #{data[i][propt]}
or:
a(href = data[i][propt])= data[i][propt]
Related
We have some regex code that converts URLs to clickable links, it is working but we are running into issues where if a user submits a entry where they forget to space after a period it thinks it's a link as well.
example: End of a sentence.This is a new sentence
It would create a hyperlink for sentence.This
Is there anyway to valid the following code against say a proper domain like .com, .ca ect..?
Here is the code:
$url = '#(http)?(s)?(://)?(([a-zA-Z])([-\w]+\.)+([^\s\.]+[^\s]*)+[^,.\s])#';
$output = preg_replace($url, '$0', trim($val[0]));
Thanks,
Aaron
I have two search inputs, first on home page another on search results itself. what i m try to do is receive query form home and redirect it to search results page according to this :
Like I search - html-5
redirect page should be - 127.0.0.1/html-5/find/?q=html-5
I have tried but unfortunately not getting the right way to it, please suggest me the correct way to do it.
I use these url patterns
url(r'^(?P<key>.*)/find/', FacetedSearchView.as_view(), name='haystack_search'),
url(r'^search/',category_query_view,name='category_query'),
then in category_query
def category_query_view(request):
category = request.GET.get('q')
print('hihi',category)
return HttpResponseRedirect(reverse('haystack_search', kwargs={'key':category},))
It is redirecting me to
127.0.0.1/html-5/find/
but i don't know how to add
/?q=html-5
in after this?
Oh, I get the right way, its pretty simple
def category_query_view(request):
category = request.GET.get('q')
print('hihi',category)
url = '{category}/find/?q={category}'.format(category=category)
return HttpResponseRedirect('/'+url)
I'm new to Python, and extremely impressed by the amount of libraries at my disposal. I have a function already that uses Beautiful Soup to extract URLs from a site, but not all of them are relevant. I only want webpages (no media) on the same website (domain or subdomain, but no other domains). I'm trying to manually program around examples I run into, but I feel like I'm reinventing the wheel - surely this is a common problem in internet applications.
Here's an example list of URLs that I might retrieve from a website, say http://example.com, with markings for whether or not I want them and why. Hopefully this illustrates the issue.
Good:
example.com/page - it links to another page on the same domain
example.com/page.html - has an filetype ending but it's an HTML page
subdomain.example.com/page.html - it's on the same site, though on a subdomain
/about/us - it's a relative link, so it doesn't have the domain it it, but it's implied
Bad:
otherexample.com/page - bad, the domain doesn't match
example.com/image.jpg - bad, not an image and not a page
/ - bad - sometimes there's just a slash in the "a" tag, but that's a reference to the page I'm already on
#anchor - this is also a relative link, but it's on the same page, so there's no need for it
I've been writing cases in if statements for each of these...but there has to be a better way!
Edit: Here's my current code, which returns nothing:
ignore_values = {"", "/"}
def desired_links(href):
# ignore if href is not set
if not href:
return False
# ignore if it is just a link to the same page
if href.startswith("#"):
return False
# skip ignored values
if href in ignore_values:
return False
def explorePage(pageURL):
#Get web page
opener = urllib2.build_opener()
opener.addheaders = [('User-Agent', 'Mozilla/5.0')]
response = opener.open(pageURL)
html = response.read()
#Parse web page for links
soup = BeautifulSoup(html, 'html.parser')
links = [a["href"] for a in soup.find_all("a", href=desired_links)]
for link in links:
print(link)
return
def main():
explorePage("http://xkcd.com")
BeautifulSoup is quite flexible in helping you to create and apply the rules to attribute values. You can create a filtering function and use it as a value for the href argument to find_all().
For example, something for you to start with:
ignore_values = {"", "/"}
def desired_links(href):
# ignore if href is not set
if not href:
return False
# ignore if it is just a link to the same page
if href.startswith("#"):
return False
# skip ignored values
if href in ignore_values:
return False
# TODO: more rules
# you would probably need "urlparse" package for a proper url analysis
return True
Usage:
links = [a["href"] for a in soup.find_all("a", href=desired_links)]
You should take a look at Scrapy and its Link Extractors.
I'm trying to display the profile pic of the logged in user but for some reason the url gets changed. This is the function i've put together
function testAPI() {
FB.api('/me','GET',{"fields":"picture{url},name"},function(response) {
var url = response.picture.data.url;
console.log(url);
$("#status").html('<p>'+response.name+'</p><figure id="profilePicture"><img href="'+response.picture.data.url+'"/></figure>');
console.log(document.getElementById('status').innerHTML);
});
}
the first console log returns:
"https://fbcdn-profile-a.akamaihd.net/hprofile-ak-xpf1/v/t1.0-1/p50x50/12036592_10153768750487873_8195289030629933305_n.jpg?oh=5488c4b312a1b077b68243a7998d217e&oe=56BA5DE5&gda=1454718467_89331767e8555ed196c5559340774fbb"
the second returns the correct inner html as the users name diplays but its adding amp; after the & symbols in the url so the profile pic isn't displaying. No idea how to fix this. Any help would be great thanks.
You have incorrect <img/> tag syntax, you should be using src instead of href
Change:
$("#status").html('<p>'+response.name+'</p><figure id="profilePicture"><img href="'+response.picture.data.url+'"/></figure>');
To:
$("#status").html('<p>'+response.name+'</p><figure id="profilePicture"><img src="'+response.picture.data.url+'"/></figure>');
I am using Beautiful Soup to try and scrape a page.
I am trying to follow this tutorial.
I am trying to get the contents of the following page after submitting a Stock Ticker Symbol:
http://www.cboe.com/delayedquote/quotetable.aspx
The tutorial is for a page with a "GET" method, my page is a "POST". I wonder if that is part of the problem?
I want use the first text box – under where it says:
“Enter a Stock or Index symbol below for delayed quotes.”
Relevant code:
user_agent = 'Mozilla/5 (Solaris 10) Gecko'
headers = { 'User-Agent' : user_agent }
values = {'ctl00$ctl00$AllContent$ContentMain$ucQuoteTableCtl$txtSymbol' : 'IBM' }
data = urllib.urlencode(values)
request = urllib2.Request("http://www.cboe.com/delayedquote/quotetable.aspx", data, headers)
response = urllib2.urlopen(request)
The call does not fail, I do not get a set of options and prices returned to me like when I run the page interactively. I a bunch of garbled HTML.
Thanks in advance!
Ok - I think I figured out the problem (and found another). I decided to switch to 'mechanize' from 'urllib2'. Unfortunately, I kept having problems getting the data. Finally, I realized that there are two 'submit' buttons, so I tried passing the name parameter when submitting the form. That did the trick as far as getting the correct response.
However, the next problem was that I could not get BeautifulSoup to parse the HTML and find the necessary tags. A brief Google search revealed others having similar problems. So, I gave up on BeautifulSoup and just did a basic regex on the HTML. Not as elegant as BeautifulSoup, but effective.
Ok - enough speechifying. Here's what I came up with:
import mechanize
import re
br = mechanize.Browser()
url = 'http://www.cboe.com/delayedquote/quotetable.aspx'
br.open(url)
br.select_form(name='aspnetForm')
br['ctl00$ctl00$AllContent$ContentMain$ucQuoteTableCtl$txtSymbol'] = 'IBM'
# here's the key step that was causing the trouble - pass the name parameter
# for the button when calling submit
response = br.submit(name="ctl00$ctl00$AllContent$ContentMain$ucQuoteTableCtl$btnSubmit")
data = response.read()
match = re.search( r'Bid</font><span> \s*([0-9]{1,4}\.[0-9]{2})', data, re.MULTILINE|re.M|re.I)
if match:
print match.group(1)
else:
print "There was a problem retrieving the quote"