How do sites count other sites' visitors and "value", and how can they tell users' location?

How do sites count other sites' visitors and "value", and how can they tell users' location? - web-services

Hi actually this is a simple question but just came up out of the curiosity...
I have seen a web evaluation online tool recently called teqpad.com.I have lots of queries on it
How do they do it?? eg:page views daily visitors etc. without mapping real website??...
Website worth...is this getting any near to any site??
I don't know how do they got daily revenue??
I like traffic by country..it has seen same like in Google analytic s..how they got that info??
another one is ISP info and Google map location of server..
is there any one here done similar scripts?? if so what is your opinion??

They may be tracking user browser stats like Alexa does. (More info on Wikipedia.) A group of users installs a plug-in that reports which sites each user visits, like TV ratings work in most (all?) countries. This method is obviously not very reliable, and often nowhere near the actual numbers of visitors.
This is usually based on bullshit pseudo-scientific calculations and never a viable basis for evaluating the "value" of a web site, even though it may be possible to guesstimate the approximate ad revenues a site yields (see 3) But that is only one revenue stream - it says nothing about how expensive the site's daily maintenance is - servers, staff, content creation....
It should be possible to very roughly estimate daily revenue by taking the guesses on daily visitors/page views, count the frequency with which ads are shown, and look at what those ads usually yield per page view. It is probably pretty easy to get some rough numbers on what an ad view is worth on a big site if you're in the market.
and 5. It is possible to track down most IP addresses down to the visitor's country and sometimes even city. See the Geo targeting article on Wikipedia

Related

Using geobytes web service to get cities list

I wanted a free web service to get cities list and found geobytes. Its good. I wanted to know What is the meaning of 50000 request? On every key pressed it makes a HTTP request.So do they count this way?
but if you expect to be performing more than 50,000 requests per day (your average unique visitors X 5), then please tell us
Anyone who has used this please help.

I would imagine what it means is that going over 50,000 requests can be penalized in someway. A key press is not a request - but entering a city and fetching that cities' details would constitute 1 of the 50,000 requests.
Hope this helps.

I am the author and administrator of Goebytes'AutoCompleteCity API and there is now no practical limit to genuine use, and the reference to 50,000 lookup per day has been removed from the web site. I say practical, because it does have DOS attack prevention measures, but as the API is intended to be called from the browser (as opposed to a server - for that you would use the GetCityDetails API) its DOS protection measure of "1024 look ups per IP Address, per hour", should never cut in under any circumstances that I can imagine.

server side technology for business logic

Say I want to develop a web application that will have registered users and will be registered as a twitter app (allowing users to give it permissions to view their timeline and post on their behalf). The sole function of the application will be to re-tweet tweets from users' timeline according to users' settings and desires.
I understand that the website for this app will use the common technologies like HTML, CSS and JS on the client-side. The server side (where the user defines what kind of tweets the application should retweet) will have to be coded in PHP/Python/Perl/... based on a DB MySQL/Postgre/...
What I don't understand, and would really appreciate your help with, is where the real "business logic" will be coded? For example, what technology should I use to code the function that will sit on my server: contacting Twitter server every 5 minutes, reading the timeline of every user I have, checking whether there are tweets worth retweeting (according to what the user has defined), and sending Tweeter the necessary commands to retweet the chosen tweets on behalf of my users.
All that will happen off-line for the user, and will be an on-going and cyclic process - but what technology should I use to code it?
Thanks!

I have heard about this API for PHP. It is actually the only one that I have heard of for PHP, though. I know that there are some good Python libraries out there, but I don't know about Perl.
I am actually working on a new API for C# (won't be a good fit for you, as you're clearly not using Windows Servers), and started building it while working on an enterprise web application that prompted several questions similar to your own.
Here is what you are going to have to do:
Before you start, you are going to have to get in touch with one of Twitter's data partners (I believe that you may contact Twitter for the reference)
The reason for this is that you are going to be requiring many more requests than you think
Twitter's time interval used for Twitter's recorded rate cap is 900 seconds (5 minutes)
With the general rate limit if you are querying a user's timeline only once every rate limit, you are limiting your number of visitors on your site to 300 at a time
Here's where it gets tricky - if every user makes one Tweet (meaning you send the Tweet - not rate limited - and then refresh the timeline - rate limited - so that they can see the updated tweet) you have now dropped your maximum number of active users at any given time to 150
Factor in the company's own timeline (-1 visitor), plus the number of visitors who leave their browsers open (now you need more logic, and you have to either kick them off or simply keep track of whose timeline you won't be refreshing), the number of users who make more than one tweet (-1 visitor for each Tweet), etc.
Moral of the story: contact one of their data partners and get yourself either unlimited requests, or at least a significant enough amount to accommodate your number of visitors/users (plus a bit of padding)
If you adhere to this advise, skip steps 2 and 3, otherwise, skip step 4
(Note: Steps 2 and 3 are only for rate-capped implementations) Using your desired language, make a service that runs on the server and makes the queries to Twitter
Based on the information that you gave, I suggest that you use Python to make this service
The service will run at all times and be on it's own clock to base the 5-minute intervals between the requests on
You will have to use a caching or a database system for storing the data
(Note: Steps 2 and 3 are only for rate-capped implementations) Add the necessary code to make a request to the service that you created for the data and perform these requests every 5 minutes
I suggest that the clock used for making these requests to the service be running a little bit behind the clock used for the service to account for instances of slow data transfer, etc.
You will also have to have to call some methods on the service for adding/removing users from the queue
(Note: Step 4 is only for unlimited request implementations) Forget about the service and simply include the request code directly in the page that the user is on.
The user's timeline will be updated based on when they visited the site or when their timeline was last refreshed (if a Tweet was made)
The only caveat to this implementation is that you will have to pay for the unlimited/larger data rate limit

What's the easiest way to do a one-time mass geocode? (580,000 addresses)

I am working on a civics related project and I need to be able to display all the properties in the City of Philadelphia on a map, so I'll need to get the latitude & longitude for all 580,000 properties. (Only once)
Most APIs like Google/Yahoo have limits of 5,000 per day, and even BatchGeo has a similar limit.
Is there a way I can do a one-time geocoding of all these addresses?

You can find a list of free and paid geocoding services at USC site.
Also check Microsoft's Geocode Dataflow API, it allows up to 200,000 entries / 300 Mb and takes up to 14 days.
Another possibility to combine several services at once: use 4 services that allow 5,000 entries a day and you'll finish your task in a month.

You can use Map Quest of Cloud Made.
I have created a small utility to help compare these API's.
The utility is hosted at below url:
http://ankit-zalani.appspot.com/GeoCode/index.jsp

Tobias, I work for an address verification (and recently, geocoding) company called SmartyStreets.
Many services have usage restrictions based on volume and license agreements which prevent users from storing the results of geocoding queries. There are some vendors, however, which don't have limits or restrictions like that.
I would recommend something like LiveAddress which will not only geocode the addresses but also perform CASS-Certified verification to make sure your addresses are correct before giving you potentially faulty coordinates. You can run 580,000 or even millions at a time in a few minutes, and we allow you to store your results.
Hope this helps. If you have any more questions about addresses, I'll personally assist.

This thread is pretty old by now, but there have been some developments in recent years making bulk geocoding very cheap. My favorite option is to just obtain a geocoding server on AWS ( google: geocoding on aws), many options there, some free some with low hourly rates (total cost depends on the server you choose, of course.)

Looking for United States Address Validation Web Service

I'm looking for a United States Address Validation web service, as the title says. Also:
I don't need maps
I don't need Geo coding
I do need:
Validation that an address is real
Address parsing
Google's Maps / Bing Maps seemed good, but won't work for me because of these:
Prohibits use if not plotting points on a map image
Low request limits (100,000 / day) for premium account. I need more like 1,000,000 / day
Does Geocoding, which I don't need, which is resource intensive, which means it's slow
Any suggestions?

Maybe USPS?
https://www.usps.com/business/address-management-products.htm

use FedEx's API. They have an API to validate addresses.
Also:
https://webgis.usc.edu/Services/AddressValidation/Default.aspx

You can try Pitney Bowes “IdentifyAddress” Api available at - https://identify.pitneybowes.com/
The service analyses and compares the input addresses against the known address databases around the world to output a standardized detail. It corrects addresses, adds missing postal information and formats it using the format preferred by the applicable postal authority. I also uses additional address databases so it can provide enhanced detail, including address quality, type of address, transliteration (such as from Chinese Kanji to Latin characters) and whether an address is validated to the premise/house number, street, or city level of reference information.
You will find a lot of samples and sdk available on the site and i found it extremely easy to integrate.

You could, in theory, run desktop software and plug into any kind of API it provides, but then you become responsible for things like uptime, data updates, and associated overhead. You may also run into issues with the software threading model--is it multi-threaded or single-threaded software? You don't want to find that out in production.
There are a handful of web services out there that can verify US-based addresses, including the USPS official web service. The USPS one is very limited in the fields that it returns. For example, if you're looking for the "delivery point" which is used to make a full barcode, the USPS API doesn't return that information. I believe the USPS web service also limits the number of queries that you can perform, although I don't remember the exact limit.
A few things that you'll want to look for in a web service include the price (obviously) as well as geo-distribution of their servers. If a company has all of their servers in one location and that data center goes offline (which can and does happen), you're left out in the cold. If they have multiple physical locations, it can help to prevent unnecessary outages. Also, you'll want to make sure that the service call returns all necessary fields as per your requirements--like delivery point code, barcode, and DPV code (which tells you how deliverable an address is).
Lastly, you'll want to determine how you feel about interacting with the company. When you call them on the phone, are they responsive and concerned about your needs? Or are you talking to some front-line person that can't answer questions and is only able to gather information about your company size and revenue so they can evaluate how big of a fish you are and determine which salesman gets to call you back. Can you talk to the engineers that wrote the web service on the phone or via email?
There are a few choices out there and you'll have to choose the one that best fits your requirements and unique situation. Do a Google search to find a list of companies. In the interest of full disclosure, I'm the co-founder of SmartyStreets. We have an address verification web service API called LiveAddress. You're more than welcome to contact me directly with questions on my personal Twitter account or the company Twitter account.

Data mining? And how can I perform it on my website?

I’m preparing my graduation project from computer science, I made this website and it's running perfectly but my supervisor requested me to apply data mining on the website.
But I don’t understand what I should do.
The website is a social network, each user will have a profile and blog and access to some e-books that required you to be registered so you can download. The website also contains a music server that contains songs that a registered user can choose a song to download or to add it as a favorite in his profile page, the website contains ads (I used OpenX script), so this is most of the website services where I can perform data mining, the website is www.sy-stu.com.
I need ideas and what is the best way to present it in the interview?

You can ask your professor what was his intention of using data mining. Data mining algorithms can do various tasks, you need first define what you want to accomplish and then find some algorithms for this and technical possibilities.
Some ideas that came to my mind about usage of data mining in your project:
you can use data mining to find what songs (ebooks,etc.) can be favorited by a user based on other people favorites songs (find similarities, probably association rules would be a good algorithm for this).
you can use some clustering algorithms to group users based on some parameters and suggest them that they could become a connections with other people from the same group (if you have something like this)
Good luck!:)

Firstly, ask for clarification from your supervisor. Don't say 'What do you mean?', but ask 'Are you expecting something like this?' because it shows that you've at least thought about it.
If you can't think of anything, or your supervisor is vague, perform some simple data retrieval and analysis, e.g.
most active members
the most / least popular songs and books.
number of ads clicked etc
most popular website features
Just elementary analysis should suffice - you aren't doing a statistics degree. Work out the most songs downloaded in a day or per user, the average songs per user, how many users visit each day and how many sign up and never visit.
The purpose is to demostrate that your website is logging all activity, so that when you are asked 'how many books did the 20 most active users download in June' you will be able to work out the answer.
The alternative is a website that just runs and you don't have any knowledge of how your users are behaving and what they are doing, which means you aren't able to focus on things that they find important.

I dont know exactly what kind of data you are trying to mine, but have you check out google analytics? It is very easy to setup, once you register all you need is to include the javascript provided to your web pages. Google analytics will give you plenty of statistic about access to your site information regarding your site and visits. Is that what you need? The data produced is very easy to read as well and will be suitable for you to present I reckon.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js