How to check spammyness of a link/url - web-services

I know that most spams are related with one or more links, so I am wondering if there is any web service which can check the spam-weight/spammyness of a URL. Similar to how Akismet can check the spammyness of text content.
p.s. - I searched in google and couldn't find anything satisfactory :)

There are a number of different URI DNS-based Blackhole (or DNSBL) services available to the public for low volume lookups. Two of the most well-known are SURBL and URIBL. PhishTank (run by OpenDNS) is also worth a look as many of the URLs are categorized and classified along with being listed.

Related

Retrieve/use G suite Default Routing rules programatically

I am only looking for read-only access.
I'd like to develop either a small web app, or maybe a script embedded in Google Sheets, that allows my users can look up which Google Admin default routing rules they are involved in.
To do that, I'll need an API to go through the rules and tabulate the information in the way I need it.
Can I do that with Admin SDK, which is soon-to-be deprecated? Is there a replacement product that can do what I want?
More details:
I currently use default routing for a few purposes. I have about 15 rules, and each one changes the route of a simple Match Rule by adding extra recipients. Some of these are to catch emails sent to ex-employees.
Others are to handle certain general email addresses like sales#example.com. Rather than using a sales group, we have a sales user account. And rather than putting forwarding rules in that users' settings, we use Default routing.
I had a similar problem where I needed the routing rules.... a bit different case since I just wanted a one time access to see what was going on - not necessarily something for users. I could not find anywhere else that helped me even retrieve the rules (Other than open each one up individually). I ended up finding that I could just scrape the HTML of the routing rules page to a CSV and filter for lines with an '#' character. The rules have a bunch of t/f that presumably can be matched back to their function - I didn't need all that and didn't spend the time to figure it out. This probably doesn't help the for the original post case, but perhaps my finding can help the next person looking for a way to do this.

What is the best tool to use for real-time web statistics?

I operate a number of content websites that have several million user sessions and need a reliable way to monitor some real-time metrics on particular pieces of content (key metrics being: pageviews/unique pageviews over time, unique users, referrers).
The use case here is for the stats to be visible to authors/staff on the site, as well as to act as source data for real-time content popularity algorithms.
We already use Google Analytics, but this does not update quickly enough (4-24 hours depending on traffic volume). Google Analytics does offer a real-time reporting API, but this is currently in closed beta (I have requested access several times, but no joy yet).
New Relic appears to offer a few analytics products, but they are quite expensive ($149/500k pageviews - we have several times this).
Other answers I found on StackOverflow suggest building your own, but this was 3-5 years ago. Any ideas?
Heard some good things about Woopra and they offer 1.2m page views for the same price as Relic.
https://www.woopra.com/pricing/
If that's too expensive then it's live loading your logs and using an elastic search service to read them to get he data you want but you will need access to your logs whilst they are being written to.
A service like Loggly might suit you which would enable you to "live tail" your logs (view whilst being written) but again there is a cost to that.
Failing that you could do something yourself or get someone on freelancer to knock something up for you enabling logs to be read and displayed in a format you recognise.
https://www.portent.com/blog/analytics/how-to-read-a-web-site-log-file.htm
If the metrics that you need to track are just limited to the ones that you have listed (Page Views, Unique Users, Referrers) you may think of collecting the logs of your web servers and using a log analyzer.
There are several free tools available on the Internet to get real-time statistics out of those logs.
Take a look at www.elastic.co, for example.
Hope this helps!
Google Analytics offers real time data viewing now, if that's what you want?
https://support.google.com/analytics/answer/1638635?hl=en
I believe their API is now released as we are now looking at incorporating this!
If you have access to web server logs then you can actually set up Elastic Search as a search engine and along with log parser as Logstash and Kibana as Front end tool for analyzing the data.
For more information: please go through the elastic search link.
Elasticsearch weblink

Google Analytics Referral Exclude Regex Partial Domain Name

I am attempting to filter out some of the nasty analytics referral traffic. It doesn't touch my site, so htaccess is out.
I have to specifically go into Google to create a filter. I have a few setup already, but am looking to try something new that will hopefully make my exclusion list a bit easier to manage.
I want to block any referral traffic coming from a domain that has seo, traffic, monitize, etc. in it. This would stop about 90% of the referral traffic and would keep excluding sites.
What I currently use is this:
(seomonitizer|trafficseo|seotraffic|trafficmonitizer)\.(com|org|net|рф|eu|co)
It removes each site one by one, but when a new site hits, I have to add it to the list.
I'm not sure what the regex capabilities and limitations are of the Analytics filters, but possibly this may be the foundation, I'm just not sure what goes into the middle.
((?=())\.(?=()))
Thanks
Unfortunately you will have to TO check and add each one of them to your list as they are appearing in your account. To answer your question I use as in the following example:
.*((darodar|priceg|buttons\-for(\-your)?\-website|makemoneyonline|blackhatworth|hulfingtonpost|o\-o\-6\-o\-o|(social|(simple|free|floating)\-share)\-buttons)\.com|econom\.co|ilovevitaly(\.co(m)?)|(ilovevitaly(\.ru))|(humanorightswatch|guardlink)\.org).*
I like to use .co(m)? instead of .com for example
Remember To avoid having ghost referrals currently there are 3 methods.
1) The first one (the one you are using) would be to create a filter that will blacklist all the bad traffic, but there is a limit for the amount of character you can use, so you might end up creating multiple similar filters to cover all the nasty analytics referral traffic. Here is a link with a complete list of bad bots.
2) the second method is to check the box "Exclude all hits from known bots and spiders" in your Google Analytics Account >Property >View
3) Create a hostname Filter following this article steps.

Finding Web services by functionalities

I want to find many functionality similar services, and when one service failed, I could switched to the other.
Is there a repository where I could find Web services, using functionality e.g., weather forecasting?
(I heard that UDDI seems to be deprecated, but I cannot confirm for that)
#Bogdan - The Apache jUDDI project actually put up two instances:
one for demo, for which there is no guarantee your data will stay around, but it's nice to play around with it. Find the link for it at the landing page of the prod instance (it only allows me to post two links)
one for (semi) demo production purposes for which you can request a user account, and data is guaranteed to stay around. You can find it at: https://www.webserviceregistry.com. If you want the certificate to match the address then use: https://uddi-jbossoverlord.rhcloud.com, which is a less catchy address to the same instance.
Cheers,
--Kurt
There was only one notable public UDDI registry (put in place by IBM, Microsoft and SAP) but that has long been discontinued. If you are to find a public registry it will probably be something set up just for demonstrative purposes (e.g. jUDDI demo).
#JohnSaunders is right, nobody uses UDDI in the wild. Public repositories are not practical while the private ones are most of the time unnecessary, providing for a complex solution to a simple problem.
You should find (and evaluate) a few stable web services and have their endpoints in a configuration file of some sort. Call one of the services and on failure go for the others you have configured.
If on the other hand you insist on going the UDDI route, then create your own private repository and use that because you won't find a public one (others - me included - have tried and got out empty handed).
To directly answer your question, yes UDDI does support this kind of query. Basically, you'd want to use a "Inquiry API" method "Find_service", passing in a CategoryBag/keyReference that matches a known pattern for weather services with some wildcards to help you get results. In addition, you'll need a find_qualifier for approximateMatch to enable the wildcards. Part of the issue is that UDDI allows registrants to use the spec in many different ways and thus this example assumes that there is in fact a standardized way to register and tag a service as providing weather information. It's up to the governance process of the organization hosting the uddi node.
"#JohnSaunders is right, nobody uses UDDI in the wild" - talk about a blanket statement. Perhaps you both meant "few people", but to say "nobody" without being able to prove it is silly.
"(I heard that UDDI seems to be deprecated, but I cannot confirm for that)". Widely used? probably not. Deprecated? Apache, Microsoft, IBM, HP, Oracle, and WS02 still sell or give away implementations of UDDI so I wouldn't exactly call it deprecated.

How do pagerank checking services work?

How do pagerank checking services work?
There's a PHP script here which should return the pagerank for you http://www.pagerankcode.com/download-script.html
Almost all those services are hitting the same service that the Google Toolbar uses. However, people at Google have said over and over not to look at PageRank, and that it's such a small portion of ranking.
That said, you can grab someone's (open source) SEO toolbar (just search for it) and open up the javascript to see how they're doing it.
Most services just copy what the Google tool bar shows. But pagerank is usually not the important thing, the important thing is to get quality backlinks with relevant anchor text.
Nick is right - Google Page Rank is really not what you should be looking for. In fact, it might be going away. Instead, I would look at SEOmoz.org's metrics from their SEO toolbar. They use metrics called Page Authority (the general power of the site out of 100 - most comparable to Page Rank * 10), mozRank (how popular a site is, i.e. how many links it has and how good those links are), and mozTrust (how trustworthy the site is considered. For example, if a site is in a "bad neighborhood" and is linking to/linked to by a lot of spammy sites, it would have a low mozTrust). MozRank and mozTrust are out of 10.
The script at http://www.pagerankcode.com/download-script.html is not working on most wide known hosting providers, while it will run perfectly if you install a small Apache server on your own PC (XAMPP and similar).
I think the only way is to wait until Google releases a web service API capable of performing such a rank (incredibly there are APIs to query almost every Google service, except this PageRank function).