In PHP I can use this code:
$url = "http://maps.googleapis.com/maps/api/distancematrix/json?origins=$location1& destinations=$location2&mode=bicycling&language=en-EN&sensor=false";
$data = #file_get_contents($url);
$obj = json_decode($data);
$arr = (array)$obj;
to get an array of values that return the distance between 2 locations.
Is this kind of web interaction possible with VSTO? I have Googled high and low and nothing I search is giving me any results to work with.
Ergo, I think I am missing something.
Related
Given the following in a CGI script with Perl and taint mode I have not been able to get past the following.
tail /etc/httpd/logs/error_log
/usr/local/share/perl5/Net/DNS/Dig.pm line 906 (#1)
(F) You tried to do something that the tainting mechanism didn't like.
The tainting mechanism is turned on when you're running setuid or
setgid, or when you specify -T to turn it on explicitly. The
tainting mechanism labels all data that's derived directly or indirectly
from the user, who is considered to be unworthy of your trust. If any
such data is used in a "dangerous" operation, you get this error. See
perlsec for more information.
[Mon Jan 6 16:24:21 2014] dig.cgi: Insecure dependency in eval while running with -T switch at /usr/local/share/perl5/Net/DNS/Dig.pm line 906.
Code:
#!/usr/bin/perl -wT
use warnings;
use strict;
use IO::Socket::INET;
use Net::DNS::Dig;
use CGI;
$ENV{"PATH"} = ""; # Latest attempted fix
my $q = CGI->new;
my $domain = $q->param('domain');
if ( $domain =~ /(^\w+)\.(\w+\.?\w+\.?\w+)$/ ) {
$domain = "$1\.$2";
}
else {
warn("TAINTED DATA SENT BY $ENV{'REMOTE_ADDR'}: $domain: $!");
$domain = ""; # successful match did not occur
}
my $dig = new Net::DNS::Dig(
Timeout => 15, # default
Class => 'IN', # default
PeerAddr => $domain,
PeerPort => 53, # default
Proto => 'UDP', # default
Recursion => 1, # default
);
my #result = $dig->for( $domain, 'NS' )->to_text->rdata();
#result = sort #result;
print #result;
I normally use Data::Validate::Domain to do checking for a “valid” domain name, but could not deploy it in a way in which the tainted variable error would not occur.
I read that in order to untaint a variable you have to pass it through a regex with capture groups and then join the capture groups to sanitize it. So I deployed $domain =~ /(^\w+)\.(\w+\.?\w+\.?\w+)$/. As shown here it is not the best regex for the purpose of untainting a domain name and covering all possible domains but it meets my needs. Unfortunately my script is still producing tainted failures and I can not figure out how.
Regexp-Common does not provide a domain regex and modules don’t seem to work with untainting variable so I am at a loss now.
How to get this thing to pass taint checking?
$domain is not tainted
I verified that your $domain is not tainted. This is the only variable you use that could be tainted, in my opinion.
perl -T <(cat <<'EOF'
use Scalar::Util qw(tainted);
sub p_t($) {
if (tainted $_[0]) {
print "Tainted\n";
} else {
print "Not tainted\n";
}
}
my $domain = shift;
p_t($domain);
if ($domain =~ /(^\w+)\.(\w+\.?\w+\.?\w+)$/) {
$domain = "$1\.$2";
} else {
warn("$domain\n");
$domain = "";
}
p_t($domain);
EOF
) abc.def
It prints
Tainted
Not tainted
What Net::DNS::Dig does
See Net::DNS::Dig line 906. It is the beginning of to_text method.
sub to_text {
my $self = shift;
my $d = Data::Dumper->new([$self],['tobj']);
$d->Purity(1)->Deepcopy(1)->Indent(1);
my $tobj;
eval $d->Dump; # line 906
…
From new definition I know that $self is just hashref containing values from new parameters and several other filled in the constructor. The evaled code produced by $d->Dump is setting $tobj to a deep copy of $self (Deepcopy(1)), with correctly set self-references (Purity(1)) and basic pretty-printing (Indent(1)).
Where is the problem, how to debug
From what I found out about &Net::DNS::Dig::to_text, it is clear that the problem is at least one tainted item inside $self. So you have a straightforward way to debug your problem further: after constructing the $dig object in your script, check which of its items is tainted. You can dump the whole structure to stdout using print Data::Dumper::Dump($dig);, which is roughly the same as the evaled code, and check suspicious items using &Scalar::Util::tainted.
I have no idea how far this is from making Net::DNS::Dig work in taint mode. I do not use it, I was just curious and wanted to find out, where the problem is. As you managed to solve your problem otherwise, I leave it at this stage, allowing others to continue debugging the issue.
As resolution to this question if anyone comes across it in the future it was indeed the module I was using which caused the taint checks to fail. Teaching me an important lesson on trusting modules in a CGI environment. I switched to Net::DNS as I figured it would not encounter this issue and sure enough it does not. My code is provided below for reference in case anyone wants to accomplish the same thing I set out to do which is: locate the nameservers defined for a domain within its own zone file.
#!/usr/bin/perl -wT
use warnings;
use strict;
use IO::Socket::INET;
use Net::DNS;
use CGI;
$ENV{"PATH"} = ""; // Latest attempted fix
my $q = CGI->new;
my $domain = $q->param('domain');
my #result;
if ( $domain =~ /(^\w+)\.(\w+\.?\w+\.?\w+)$/ ) {
$domain = "$1\.$2";
}
else {
warn("TAINTED DATA SENT BY $ENV{'REMOTE_ADDR'}: $domain: $!");
$domain = ""; # successful match did not occur
}
my $ip = inet_ntoa(inet_aton($domain));
my $res = Net::DNS::Resolver->new(
nameservers => [($ip)],
);
my $query = $res->query($domain, "NS");
if ($query) {
foreach my $rr (grep { $_->type eq 'NS' } $query->answer) {
push(#result, $rr->nsdname);
}
}
else {
warn "query failed: ", $res->errorstring, "\n";
}
#result = sort #result;
print #result;
Thanks for the comments assisting me in this matter, and SO for teaching more then any other resource I have come across.
I am unable to implement pagination with Facebook OpenGraph. I have exhausted every option I have found.
My hope is to query for 500 listens repeatedly until there are none left. However, I am only able to receive a response from my first query. Below is my current code, but I have tried setting the parameters to different amounts rather than having the fields from the [page][next] dictate them
$q_param['limit'] = 500;
$next_exists = true;
while($next_exists){
$music = $facebook->api('/me/music.listens','GET', $q_param);
$music_data = array_merge($music_data, $music['data']);
if($music["paging"]["next"]==null || $music["paging"]["next"]=="")
$next_exists = false;
else{
$url = $music["paging"]["next"];
parse_str(parse_url($url, PHP_URL_QUERY), $array);
foreach ($array as $key => $value) {
$q_param[$key]=$value;
}
}
}
}
a - Can you please share what do you get after first call?
b - Also, possible if you can share the whole file?
I think your script is timing out. Try adding following on top of your file:
set_time_limit(0);
Can you check apache log files?
sudo tail -f /var/log/apache2/error.log
In my free time, I've been trying to improve my perl abilities by working on a script that uses LWP::Simple to poll one specific website's product pages to check the prices of products (I'm somewhat of a perl noob). This script also keeps a very simple backlog of the last price seen for that item (since the prices change frequently).
I was wondering if there was any way I could further automate the script so that I don't have to explicitly add the page's URL to the initial hash (i.e. keep an array of key terms and do a search query amazon to find the page or price?). Is there anyway way I could do this that doesn't involve me just copying Amazon's search URL and parsing in my keywords? (I'm aware that processing HTML with regex is generally bad form, I just used it since I only need one small piece of data).
#!usr/bin/perl
use strict;
use warnings;
use LWP::Simple;
my %oldPrice;
my %nameURL = (
"Archer Season 1" => "http://www.amazon.com/Archer-Season-H-Jon-Benjamin/dp/B00475B0G2/ref=sr_1_1?ie=UTF8&qid=1297282236&sr=8-1",
"Code Complete" => "http://www.amazon.com/Code-Complete-Practical-Handbook-Construction/dp/0735619670/ref=sr_1_1?ie=UTF8&qid=1296841986&sr=8-1",
"Intermediate Perl" => "http://www.amazon.com/Intermediate-Perl-Randal-L-Schwartz/dp/0596102062/ref=sr_1_1?s=books&ie=UTF8&qid=1297283720&sr=1-1",
"Inglorious Basterds (2-Disc)" => "http://www.amazon.com/Inglourious-Basterds-Two-Disc-Special-Brad/dp/B002T9H2LK/ref=sr_1_3?ie=UTF8&qid=1297283816&sr=8-3"
);
if (-e "backlog.txt"){
open (LOG, "backlog.txt");
while(){
chomp;
my #temp = split(/:\s/);
$oldPrice{$temp[0]} = $temp[1];
}
close(LOG);
}
print "\nChecking Daily Amazon Prices:\n";
open(LOG, ">backlog.txt");
foreach my $key (sort keys %nameURL){
my $content = get $nameURL{$key} or die;
$content =~ m{\s*\$(\d+.\d+)} || die;
if (exists $oldPrice{$key} && $oldPrice{$key} != $1){
print "$key: \$$1 (Was $oldPrice{$key})\n";
}
else{
print "\n$key: $1\n";
}
print LOG "$key: $1\n";
}
close(LOG);
Yes, the design can be improved. It's probably best to delete everything and start over with an existing full-featured web scraping application or framework, but since you want to learn:
The name-to-URL map is configuration data. Retrieve it from outside of the program.
Store the historic data in a database.
Learn XPath and use it to extract data from HTML, it's easy if you already grok CSS selectors.
Other stackers, if you want to amend my post with the rationale for each piece of advice, go ahead and edit it.
I made simple script to demonstate Amazon search automation. Search url for all departments was changed with escaped search term. The rest of code is simple parsing with HTML::TreeBuilder. Structure of HTML in question can be easily examined with dump method (see commented-out line).
use strict; use warnings;
use LWP::Simple;
use URI::Escape;
use HTML::TreeBuilder;
use Try::Tiny;
my $look_for = "Archer Season 1";
my $contents
= get "http://www.amazon.com/s/ref=nb_sb_noss?url=search-alias%3Daps&field-keywords="
. uri_escape($look_for);
my $html = HTML::TreeBuilder->new_from_content($contents);
for my $item ($html->look_down(id => qr/result_\d+/)) {
# $item->dump; # find out structure of HTML
my $title = try { $item->look_down(class => 'productTitle')->as_trimmed_text };
my $price = try { $item->look_down(class => 'newPrice')->find('span')->as_text };
print "$title\n$price\n\n";
}
$html->delete;
imagine this url:
http://www.youtube.com/watch?v=6n8PGnc_cV4&feature=rec-LGOUT-real_rn-2r-13-HM
what is the cleanest and best regexp to do the following:
1.) i want to strip off every thing after the video URL. so that only http://www.youtube.com/watch?v=6n8PGnc_cV4 remains.
2.) i want to convert this url into http://www.youtube.com/v/6n8PGnc_cV4
Since i'm not much of a regexp-ert i need your help:
$content = preg_replace('http://.*?\?v=[^&]*', '', $content);
return $content;
edit: check this out! I want to create a really simple WordPress plugin that just recognizes every normal youtube URL in my $content and replaces it with the embed code:
<?php
function videoplayer($content) {
$embedcode = '<object class="video" width="308" height="100"><embed src="' . . '" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="308" height="100" wmode="opaque"></embed></object>';
//filter normal youtube url like http://www.youtube.com/watch?v=6n8PGnc_cV4&feature=rec-LGOUT-real_rn-2r-13-HM
//convert it to http://www.youtube.com/v/6n8PGnc_cV4
//use embedcode and pass along the new youtube url
$content = preg_replace('', '', $content);
//return embedcode
return $content;
}
add_filter('the_content', 'videoplayer');
?>
I use this search criteria in my script:
/((http|ftp)\:\/\/)?([w]{3}\.)?(youtube\.)([a-z]{2,4})(\/watch\?v=)([a-zA-Z0-9_-]+)(\&feature=)?([a-zA-Z0-9_-]+)?/
You could just split it on the first ampersand.
$content = explode('&', $content);
$content = $content[0];
Edit: Simplest regexp: /http:\/\/www\.youtube\.com\/watch\?v=.*/
Youtube links are all the same. To get the video id from them, first you slice off the extra parameters from the end and then slice off everything but the last 11 characters. See it in action:
$url = "http://www.youtube.com/watch?v=1rnfE4eo1bY&feature=...";
$url = $url.left(42); // "http://www.youtube.com/watch?v=1rnfE4eo1bY"
$url = $url.right(11); // "1rnfE4eo1bY"
$result = "http://www.youtube.com/v/" + $url; // "http://www.youtube.com/v/1rnfE4eo1bY"
You can uniformize all your youtube links (by removing useless parameters) with a Greasemonkey script: http://userscripts.org/scripts/show/86758. Greasemonkey scripts are natively supported as addons in Google Chrome.
And as a bonus, here is a one (okay, actually two) liner:
$url = "http://www.youtube.com/watch?v=1rnfE4eo1bY&feature=...";
$result = "http://www.youtube.com/v/" + $url.left(42).right(11);
--3ICE
$url = "http://www.youtube.com/v/6n8PGnc_cV4";
$start = strpos($url,"v=");
echo 'http://www.youtube.com/v/'.substr($url,$start+2);
I need to geocode, i.e. translate street address to latitude,longitude for ~8,000 street addresses. I am using both Yahoo and Google geocoding engines at http://www.gpsvisualizer.com/geocoder/, and found out that for a large number of addresses those engines (one of them or both) either could not perform geocoding (i.e.return latitude=0,longitude=0), or return the wrong coordinates (incl. cases when Yahoo and Google give different results).
What is the best way to handle this problem? Which engine is (usually) more accurate? I would appreciate any thoughts, suggestions, ideas from people who had previous experience with this kind of task.
When doing a large number of requests to Google geocoding service you need to throttle the requests as responses start failing. To give you a better idea here is a snippet from a Drupal (PHP) module that I wrote.
function gmap_api_geocode($address) {
$delay = 0;
$api_key = keys_get_key('gmap_api');
while( TRUE ) {
$response = drupal_http_request('http://maps.google.com/maps/geo?q='. drupal_urlencode($address) .'&output=csv&sensor=false&oe=utf8&key='. $api_key);
switch( $response->code ) {
case 200: //OK
$data = explode(',', $response->data);
return array(
'latitude' => $data[2],
'longitude' => $data[3],
);
// Adopted from http://code.google.com/apis/maps/articles/phpsqlgeocode.html
case 620: //Too many requests
$delay += 100000;
break;
default:
return FALSE;
}
usleep($delay);
}
}
you can use
https://github.com/darkphnx/fetegeo/
for offline batch geocoding