On the website I am developing, I'm using a function that get information about events taken from a group on facebook.
I noticed that using the graph protocol, the text contained in the "description" of the created array , there are strange characters like "\ u00b" or "\ n" for new lines.
How can I do to display the content formatted correctly?
thanks in advance
Piero
It is called unicode_decode ( http://webarto.com/83/php-unicode_decode-5.3 ), it will come out in PHP6 until then...
function unicode_decode($string) {
$string = preg_replace_callback('#\\\\u([0-9a-f]{4})#ism',
create_function('$matches', 'return mb_convert_encoding(pack("H*", $matches[1]), "UTF-8", "UCS-2BE");'),
$string);
return $string;
}
Raska Top\u010dagi\u0107
Raska Topčagić
If you are not using PHP, sorry...
Better solution:
echo json_decode('Raska Top\u010dagi\u0107'); # Raska Topčagić
Related
I am Beginner to perl programming and I want know solution for this problem. I have this below information in the text file called token.txt. I want to extract only dynamically generated access_token value and store that value to mysql database.As mentioned access_token will be auto generated everytime so how I need to store this access_token value everytime. Anyone help me with the perl code. Thanks in advance
{
"access_token" : "JgV8Ln1lRGE8JTz4olEQW0rJJHUYsq2LO8Ny9o6m",
"token_type" : "abcdef",
"expires_in" : 123456
}
This is JSON formatted text, so I would suggest reading the file into a string and decoding it, e.g.:
parse.pl
use File::Slurp;
use v5.10;
use JSON;
$token = decode_json ( read_file('token.txt') );
say $token->{'access_token'};
Test it like this:
perl parse.pl
Output:
JgV8Ln1lRGE8JTz4olEQW0rJJHUYsq2LO8Ny9o6m
token will be in $tok,
perl -ne 's/"access_token"\s:\s"([^"]+)"/$tok=$1;print $1/e' token.txt
So i am currently getting a user input in the form of a URL and parsing it and then printing the other pages that website links to. The package that i am using is:
LWP::Simple
I fetch the link using user input from command line and store it in a variable. I get it using the $ARGV[0].
Then i proceed to make another variable and use the $get on the variable where i store the website.
Then i proceeded to make an array variable and apply the regex on the variable
/\shref="?([^\s>"]+)/gi;
which stored the results of the get function being used on the variable containing the website string. And then i did a foreach loop on the array to print out the results.
However, while it does print links and stuff, it also end up printing just standalone special characters such as / and # if there is nothing after them.
So like if there is something like /blabalbla it prints that. but if there are just standalone special characters such as /, \, or #, it also prints them. Any way i can modify the regex so that if the special characters don't follow a string, they should not print. New at learning perl and not so talented at regex
I can't help you with your specific problem without further information, but in the mean time I suggest that you look at HTML::LinkExtor which was written for this purpose.
Here's an example code its output. It lists only <a> elements that have an href attribute.
use strict;
use warnings;
use 5.010;
use LWP;
use HTML::LinkExtor;
my $ua = LWP::UserAgent->new;
my $resp = $ua->get('http://www.bbc.co.uk/');
my $extor = HTML::LinkExtor->new(undef, $resp->base);
$extor->parse($resp->decoded_content);
for my $link ($extor->links) {
my ($tag, %attr) = #$link;
next unless $tag eq 'a' and $attr{href};
say $attr{href};
}
output
http://m.bbc.co.uk
http://www.bbc.co.uk/
http://www.bbc.co.uk/#h4discoveryzone
http://www.bbc.co.uk/accessibility/
https://ssl.bbc.co.uk/id/status
http://www.bbc.co.uk/news/
http://www.bbc.com/news/
http://www.bbc.co.uk/sport/
http://www.bbc.co.uk/weather/
http://shop.bbc.com/
http://www.bbc.com/earth/
http://www.bbc.com/travel/
http://www.bbc.com/capital/
http://www.bbc.co.uk/iplayer/
http://www.bbc.com/culture/
http://www.bbc.com/autos/
http://www.bbc.com/future/
http://www.bbc.co.uk/tv/
http://www.bbc.co.uk/radio/
http://www.bbc.co.uk/cbbc/
http://www.bbc.co.uk/cbeebies/
http://www.bbc.co.uk/arts/
http://www.bbc.co.uk/ww1/
http://www.bbc.co.uk/food/
http://www.bbc.co.uk/history/
http://www.bbc.co.uk/learning/
http://www.bbc.co.uk/music/
http://www.bbc.co.uk/science/
http://www.bbc.co.uk/nature/
http://www.bbc.com/earth/
http://www.bbc.co.uk/local/
http://www.bbc.co.uk/travel/
http://www.bbc.co.uk/a-z/
http://www.bbc.co.uk/#orb-footer
http://search.bbc.co.uk/search
http://www.bbc.co.uk/privacy/cookies/managing/cookie-settings.html
http://www.bbc.co.uk/locator/default/desktop/en-GB?ptrt=%2F
http://www.bbc.co.uk/#
http://www.bbc.co.uk/#
http://www.bbc.co.uk/weather/2643743?day=0
http://www.bbc.co.uk/weather/2643743?day=0
http://www.bbc.co.uk/weather/2643743?day=1
http://www.bbc.co.uk/weather/2643743?day=1
http://www.bbc.co.uk/weather/2643743?day=2
http://www.bbc.co.uk/weather/2643743?day=2
http://www.bbc.co.uk/locator/default/desktop/en-GB?ptrt=%2F
http://www.bbc.co.uk/weather/2643743
http://www.bbc.co.uk/news/science-environment-30311816
http://www.bbc.co.uk/news/science-environment-30311822
http://www.bbc.co.uk/news/science-environment-30311818
http://www.bbc.co.uk/news/magazine-30282261
http://www.bbc.co.uk/news/science-environment-30311816
http://www.bbc.co.uk/news/uk-politics-30291460
http://www.bbc.co.uk/news/
http://www.bbc.co.uk/news/uk-england-kent-30319549
http://www.bbc.co.uk/news/world-europe-30306106
http://www.bbc.co.uk/news/world-europe-30306992
http://www.bbc.co.uk/news/uk-30306145
http://www.bbc.co.uk/news/local/
http://www.bbc.co.uk/news/england/london/
http://www.bbc.co.uk/news/uk-england-london-30308694
http://www.bbc.co.uk/news/uk-england-london-30315650
http://www.bbc.co.uk/news/uk-england-london-30321504
http://www.bbc.co.uk/sport/live/football/29959148
http://www.bbc.co.uk/sport/0/
http://www.bbc.co.uk/sport/live/snooker/29618359
http://www.bbc.co.uk/sport/football/30204433
http://www.bbc.co.uk/sport/cricket/30308980
http://www.bbc.co.uk/sport/football/30204434
http://www.bbc.co.uk/sport/0/football/
http://www.bbc.co.uk/sport/football/30204459
http://www.bbc.co.uk/sport/football/30204511
http://www.bbc.co.uk/sport/football/28647040
http://www.bbc.co.uk/?dzf=sport
http://www.bbc.co.uk/?dzf=entertainment
http://www.bbc.co.uk/?dzf=bbcnow
http://www.bbc.co.uk/?dzf=entertainment
http://www.bbc.co.uk/?dzf=news
http://www.bbc.co.uk/?dzf=lifestyle
http://www.bbc.co.uk/?dzf=knowledge
http://www.bbc.co.uk/?dzf=sport
http://www.bbc.co.uk/news/
http://www.bbc.com/news/
http://www.bbc.co.uk/sport/
http://www.bbc.co.uk/weather/
http://shop.bbc.com/
http://www.bbc.com/earth/
http://www.bbc.com/travel/
http://www.bbc.com/capital/
http://www.bbc.co.uk/iplayer/
http://www.bbc.com/culture/
http://www.bbc.com/autos/
http://www.bbc.com/future/
http://www.bbc.co.uk/tv/
http://www.bbc.co.uk/radio/
http://www.bbc.co.uk/cbbc/
http://www.bbc.co.uk/cbeebies/
http://www.bbc.co.uk/arts/
http://www.bbc.co.uk/ww1/
http://www.bbc.co.uk/food/
http://www.bbc.co.uk/history/
http://www.bbc.co.uk/learning/
http://www.bbc.co.uk/music/
http://www.bbc.co.uk/science/
http://www.bbc.co.uk/nature/
http://www.bbc.com/earth/
http://www.bbc.co.uk/local/
http://www.bbc.co.uk/travel/
http://www.bbc.co.uk/a-z/
http://www.bbc.co.uk/
http://www.bbc.co.uk/terms/
http://www.bbc.co.uk/aboutthebbc/
http://www.bbc.co.uk/privacy/
http://www.bbc.co.uk/privacy/cookies/about
http://www.bbc.co.uk/accessibility/
http://www.bbc.co.uk/guidance/
http://www.bbc.co.uk/contact/
http://www.bbc.co.uk/bbctrust/
http://www.bbc.co.uk/complaints/
http://www.bbc.co.uk/help/web/links/
I use Perl Net::telnet for connecting to my router and change some options, but i get this error:
pattern match timed-out
every thing is true (user , pass , pattern and etc), i am going crazy for the source of this error. my code is:
use Net::Telnet;
$telnet = new Net::Telnet ( Timeout=>10, Errmode=>'die');
$telnet->open('192.168.1.1');
$telnet->waitfor('/login[: ]$/i');
$telnet->print('admin');
$telnet->waitfor('/password[: ]$/i');
$telnet->print('admin');
$telnet->waitfor('/\$ $/i' );
$telnet->print('list');
$output = $telnet->waitfor('/\$ $/i');
print $output;
What should i do now? Is there any alternative way?
Thank you
Maybe try logging in using the example at the top of Net::Telnet page?
use Net::Telnet ();
$t = new Net::Telnet (Timeout => 10, Errmode=>'die');
$t->open($host);
$t->login($username, $passwd);
#lines = $t->cmd("who");
print #lines;
That seems to work for me. While your code snippet times out at the first waitfor trying to login.
First, I apologize if you feel this is a duplicate. I looked around and found some very similar questions, but I either got lost or it wasn't quite what I think I need and therefore couldn't come up with a proper implementation.
QUESTION:
So I have a txt file that contains entries made by another script (I can edit the format for how these entries are generated if you can suggest a better way to format them):
SR4 Pool2
11/5/2012 13:45
----------
Beginning Wifi_Main().
SR4 Pool2
11/8/2012 8:45
----------
This message is a
multiline message.
SR4 Pool4
11/5/2012 14:45
----------
Beginning Wifi_Main().
SR5 Pool2
11/5/2012 13:48
----------
Beginning Wifi_Main().
And I made a perl script to parse the file:
#!C:\xampp-portable\perl\bin\perl.exe
use strict;
use warnings;
#use Dumper;
use CGI 'param','header';
use Template;
#use Config::Simple;
#Config::Simple->import_from('config.ini', \%cfg);
my $cgh = CGI->new;
my $logs = {};
my $key;
print "Content-type: text/html\n\n";
open LOG, "logs/Pool2.txt" or die $!;
while ( my $line = <LOG> ) {
chomp($line);
}
print $logs;
close LOG;
My goal is to have a hash in the end that looks like this:
$logs = {
SR4 => {
Pool2 => {
{
time => '11/5/2012 13:45',
msg => 'Beginning Wifi_NDIS_Main().',
},
{
time => '11/8/2012 8:45',
msg => 'This message is a multiline message.',
},
},
Pool4 => {
{
time => '11/5/2012 13:45',
msg => 'Beginning Wifi_NDIS_Main().',
},
},
},
SR5 => {
Pool2 => {
{
time => '11/5/2012 13:45',
msg => 'Beginning Wifi_NDIS_Main().',
},
},
},
};
What would be the best way of going about this? Should I change the formatting of the generated logs to make it easier on myself? If you need anymore info, just ask. Thank you in advanced. :)
The format makes no sense. You used a hash at the third level, but you didn't specify keys for the values. I'm assuming it should be an array.
my %logs;
{
local $/ = ""; # "Paragraph mode"
while (<>) {
my #lines = split /\n/;
my ($x, $y) = split ' ', $lines[0];
my $time = $lines[1];
my $msg = join ' ', #lines[3..$#lines];
push #{ $logs{$x}{$y} }, {
time => $time,
msg => $msg,
};
}
}
Should I change the formatting of the generated logs
Your time stamps appear to be ambiguous. In most time zones, an hour of the year is repeated.
If you can possibly output it as XML, reading it in would be embarrasingly easy with XML::Simple
Although Karthik T idea of using XML makes sense, and I would also consider it, I'm not sure if this is the best route. The first problem is putting it in XML format in the first place.
The second is that XML format might not be so easily parsed. Sure, the XML::Simple module will read the whole thing in one swoop, you then have to parse the XML data structure itself.
If you can set the output however you want, make it in a format that's easy to parse. I like using prefix data identifiers. In the following example, each piece of data has it's own identifier. The ER: tells me when I hit the end of record:
DT: 11/5/2012 13:35
SR: SR4
PL: Pool2
MG: Beginning Wifi_Main().
ER:
DT: 1/8/2012 8:45
SR: SR4
PL: Pool2
MG: This message is a
MG: multiline message.
ER:
Parsing this output is straight forward:
my %hash;
while ( $line = <DATA> ) {
chomp $line;
if ( not $line eq "ER:" ) {
my ($key, $value) = split ( ": ", $line );
$hash{$key} .= "$value "; #Note trailing space!
}
else {
clean_up_hash ( \%hash ); #Remove trailing space on all values
create_entry ( \%log, \%hash );
%hash = ();
}
}
I like using classes whenever I start getting complex data structures, and I would probably create a Local::Log class and subclasses to store each layer of the log. However, it's not an absolute necessity and wasn't part of your question. Still, I would use a create_entry subroutine just to keep the logic of figuring out where in your log that entry belongs inside your loop.
NOTE: I append a space after each piece of data. I did this to make the code simpler since some of your messages may take more than one line. There are other ways to handle this, but I was trying to keep the loop as clean as possible and with as few if statements as possible.
In my free time, I've been trying to improve my perl abilities by working on a script that uses LWP::Simple to poll one specific website's product pages to check the prices of products (I'm somewhat of a perl noob). This script also keeps a very simple backlog of the last price seen for that item (since the prices change frequently).
I was wondering if there was any way I could further automate the script so that I don't have to explicitly add the page's URL to the initial hash (i.e. keep an array of key terms and do a search query amazon to find the page or price?). Is there anyway way I could do this that doesn't involve me just copying Amazon's search URL and parsing in my keywords? (I'm aware that processing HTML with regex is generally bad form, I just used it since I only need one small piece of data).
#!usr/bin/perl
use strict;
use warnings;
use LWP::Simple;
my %oldPrice;
my %nameURL = (
"Archer Season 1" => "http://www.amazon.com/Archer-Season-H-Jon-Benjamin/dp/B00475B0G2/ref=sr_1_1?ie=UTF8&qid=1297282236&sr=8-1",
"Code Complete" => "http://www.amazon.com/Code-Complete-Practical-Handbook-Construction/dp/0735619670/ref=sr_1_1?ie=UTF8&qid=1296841986&sr=8-1",
"Intermediate Perl" => "http://www.amazon.com/Intermediate-Perl-Randal-L-Schwartz/dp/0596102062/ref=sr_1_1?s=books&ie=UTF8&qid=1297283720&sr=1-1",
"Inglorious Basterds (2-Disc)" => "http://www.amazon.com/Inglourious-Basterds-Two-Disc-Special-Brad/dp/B002T9H2LK/ref=sr_1_3?ie=UTF8&qid=1297283816&sr=8-3"
);
if (-e "backlog.txt"){
open (LOG, "backlog.txt");
while(){
chomp;
my #temp = split(/:\s/);
$oldPrice{$temp[0]} = $temp[1];
}
close(LOG);
}
print "\nChecking Daily Amazon Prices:\n";
open(LOG, ">backlog.txt");
foreach my $key (sort keys %nameURL){
my $content = get $nameURL{$key} or die;
$content =~ m{\s*\$(\d+.\d+)} || die;
if (exists $oldPrice{$key} && $oldPrice{$key} != $1){
print "$key: \$$1 (Was $oldPrice{$key})\n";
}
else{
print "\n$key: $1\n";
}
print LOG "$key: $1\n";
}
close(LOG);
Yes, the design can be improved. It's probably best to delete everything and start over with an existing full-featured web scraping application or framework, but since you want to learn:
The name-to-URL map is configuration data. Retrieve it from outside of the program.
Store the historic data in a database.
Learn XPath and use it to extract data from HTML, it's easy if you already grok CSS selectors.
Other stackers, if you want to amend my post with the rationale for each piece of advice, go ahead and edit it.
I made simple script to demonstate Amazon search automation. Search url for all departments was changed with escaped search term. The rest of code is simple parsing with HTML::TreeBuilder. Structure of HTML in question can be easily examined with dump method (see commented-out line).
use strict; use warnings;
use LWP::Simple;
use URI::Escape;
use HTML::TreeBuilder;
use Try::Tiny;
my $look_for = "Archer Season 1";
my $contents
= get "http://www.amazon.com/s/ref=nb_sb_noss?url=search-alias%3Daps&field-keywords="
. uri_escape($look_for);
my $html = HTML::TreeBuilder->new_from_content($contents);
for my $item ($html->look_down(id => qr/result_\d+/)) {
# $item->dump; # find out structure of HTML
my $title = try { $item->look_down(class => 'productTitle')->as_trimmed_text };
my $price = try { $item->look_down(class => 'newPrice')->find('span')->as_text };
print "$title\n$price\n\n";
}
$html->delete;