C++ libraries for web ranking and search engines - c++

Can anybody introduce me some libraries that contains web ranking algorithms such as PageRank, HITS?
Thank you

I guess you are refering to the canonical PageRank algorithm as published in the original PageRank paper. People nowadays use "PageRank" to refer to the actual current Google algorithm for search.
If that is really the case, the PageRank implementation is not that difficult to find and use. Searching through Google you can find a good deal of implementations. One in python, for example.
For the HITS algorithm there's pseudocode in wikipedia. There's also a Perl implementation.
I'm also suggesting CLucene for you to start messing around.

Unless you work for Google, there aren't many good ways of finding out the specifics of their page ranking algorithm...which changes from time to time. Wikipedia outlines some of the basics:
http://en.wikipedia.org/wiki/PageRank
Other people write lengthy articles:
http://www.smashingmagazine.com/2007/06/05/google-pagerank-what-do-we-really-know-about-it/
If you are interested in the kinds of techniques that are involved in writing a search engine, there are several topics. For instance, there is "web crawling" and how to write programs that visit web sites and grab their contents...and determining when to visit the sites again to see if they've changed:
http://en.wikipedia.org/wiki/Web_crawler
Once you have a bunch of data on your machine(s) to analyze and search, the subject area to study is called "Information Retrieval" (or "IR"):
http://en.wikipedia.org/wiki/Information_retrieval
It's a fairly new science, but a lot of work is done on it. Wikipedia has a list of "free search engine software":
http://en.wikipedia.org/wiki/Category:Free_search_engine_software
I'd suggest that if you're new to this then it might be best to start with figuring out how to use something like Lucene to provide a search box on a website you have. Then dig in and see how it works. It has been ported to C++ if that is important to you:
http://clucene.sourceforge.net/

Related

Writing a DSL in C++ with boost::proto

Apologies for asking such an open-ended question, but I want to emulate some synthetic assembly (not for a real processor) in C++ and I want to decouple the assembly from the implementation of the simulator it runs on.
Writing a DSL or similar seems like the obvious way and I have some experience of this, having done something like it (actually a mixture between a DSL and an interpreter) in Groovy.
boost::proto seems like the obvious choice, but I find the documentation utterly impenetrable, even though, as I say, I have some grasp of the basics.
Is there any alternative tutorial or similar out there that explains - in a way that focuses on the practicalities of writing a DSL rather than the theory of ASTs etc - how to do this. Or is there an alternative? Right now I am stuck with implementing the assembly instructions as methods of classes that make up the simulator, which makes them very tightly bound and extremely difficult to maintain the code base.
I second the comments suggesting that you may have a badly matched XY-problem here.
Meanwhile, the best introduction to applied Boost Proto for an embedded eDSL was on Dave Abrahams' cpp-next.com blog. Sadly, that has gone off the air.
Eric Niebler, author of Boost Proto, has offerred to send people the raw dumps of those pages, on request:
The C++ community is suffering from the loss of the cpp-next.com website and all the great content that was once hosted there. In the past 2 months, I’ve gotten many questions both about the site and about the fate of my “Expressive C++” article series. In response, I will re-post my old articles on this blog. But I’m busy and it’ll take time. In the meanwhile, if you have a desperate need for a readable introduction to Boost.Proto and domain-specific embedded languages in C++, and you don’t mind reading raw markdown, email me. I’ll send you what I have.
http://ericniebler.com/2014/05/24/cpp-next-com-and-the-expressive-cxx-series-2/
In the mean time, waybackmachine has some of it, e.g.:
http://web.archive.org/web/20120906070131/http://cpp-next.com/archive/2011/01/expressive-c-expression-optimization/
http://web.archive.org/web/20120315111227/http://cpp-next.com/archive/2010/10/expressive-c-expression-extension-part-one/
http://web.archive.org/web/20120430163700/http://cpp-next.com/archive/2010/10/expressive-c-expression-extension-part-two/
*

how to match soft aggregate features(eyes,nose,mouth) using some statistical method?

I know a little bit of ML, and want to implement a learning system by myself,but do not know how to do.Any one can give me a demo or use other method to compare faces?
Here is a related post: https://stackoverflow.com/questions/14079794/how-to-recognize-face-by-geometric-feature-such-as-eyes-nose-mouth.
One can not reasonably answer this question bassed on the above information because of sheer vastness of the subject.
For the start you should know that these problems are usually solved using Machine Learning techniques like Neural Networks. You said you know a bit about ML but as IMHO you might want to read more or take an online Course on Machine learning.
There are some good Courses on Coursera.org one that I like is Machine Learning by Andrew Ng.
These Methods are also described in above mentioned course and there are some good assignments too, which will help you to get the detailled idea behind machine learning.

Looking for Projects having extensive usage of (mostly used) design patterns

I want to get my hands dirty with some projects (in C++) which have used design patterns extensively.
I have already read design pattern documentation (as well as code) from net and other books (gang of four and Head first), but i am looking for a place where i can get already implemented projects (using design patterns), get my hands dirty with them, understand them, enhance them etc.
Could anybody point me to a place(s) from where I can get design experience in the best possible way? (Please note: Language C++, Complexity of the project can be intermediate to difficult)
ACE is a good example - uses many concurrency and communications patterns. There's a list of related tutorials on their website here.
If you are feeling ambitious take a look at Loki.

Advice for starting own wiki?

My friends and I were thinking of starting our own wiki. Given how widespread they have become recently, we heard it isn't that hard. We want to keep the site as simple as possible - we have some experience with web design, but not a whole lot with system administration. What are some things that we should keep in mind going forward (such as, which wikifarms may be useful, or what caveats should we keep in mind)?
I'm guessing from your question that you mean for personal, instead of business, use.
As Bayard implies, the key to success is the social side. For the technical side you'll need to have a server (or someone prepared to host it) and good wiki software. The most obvious choice here is MediaWiki which is well developed (features), well tested, well known (through Wikipedia) and completely free. Furthermore, it can easily be extended with a variety of new features (extensions).
Take your time making the choice of software because it is hard to change later. WikiMatrix may help here (to compare software).
However, the social side is also important. What is your topic? Why is it necessary? Could you accomplish the same with Google Docs (if it is just for friends) or do you want a wider involvement?
If you want a wider involvement (e.g. allow the public to contribute), then decide whether you will permit anonymous edits.
Now the most important: moderation. This means (1) you need clear rules (like who can delete pages and what the process is) and (2) someone (or, better, a group) to enforce those rules (the moderators). You will need to create the right balance for you in terms of being strict with the rules (encourages quality) and being flexible (encourages participation).
You will also need someone to take a lead - to encourage, support and manage the moderators and processes. This person is often called a wiki champion. Here's a good link explaining more about this role.
Final tips: be clear what should go onto the wiki and what not, stay close to your users (customers) by encouraging feedback and keep it fun for everyone!
Later addition: check out these Stack Overflow questions and answers:
Getting developers to use a wiki
Getting started with a personal wiki and moinmoin
Does it make sense to set up a wiki at the workplace?
What’s the best open source wiki platform?
Another edition: make sure the moderators create and maintain great "how to" pages for the wiki. Often they are not intuitive (especially for people used to Word). You might want to start with a "What is a wiki?" page - and then, after a brief introduction, link to a Wikipedia page all about wikis.
MindTouch has a free, open source wiki (http://www.mindtouch.com/downloads) that sounds like it would be perfect for what you're trying to do. I've used it in the past and it's super easy to get up and running and very flexible. Watch one of their demos before you make any decisions though (http://www.mindtouch.com/support_and_services/demo_videos).
The most difficult part of implementing a successful wiki tends to be social, rather than technical. Wikipatterns is a good resource which describes the challenges you're likely to encounter.

What is the most useful way to document assessment of technological choices for a business problem?

I would like to know if there are any templates for doing this in a clear and concise way to give the gist of the application and its inner workings and how it meets the business needs. I do not want to write a mythological story so looking for any new ways of doing this.
Mostly this is about documenting what you actually need from the system. You can't make a good choice if you don't know what you need.
Here is a doc-style approach.
This is a decision matrix approach outline. The formatting is rough, but this is a good approach. This one has better formatting, but is not about software (it doesn't really matter).
I'm not exactly sure if this is what you are asking for, but check out this paper. It's a sample implementation of the CMMI's "Decision and Analysis Resolution" process area. It basically documents a method for comparing alternatives, reaching a decision, and documenting that decision.
The SEI's site has the original definition of DAR (see page 181), as well as a pretty good presentation about it. You have to realize that their whole goal is to help companies define their processes, not to push a particular process. So the documents you find there tend to be pretty high level, discussing the goals that your process should achieve and the specific practices that should be covered.
Consult Eric Evans' "Domain Driven Design". At the end of the day, you're going to have to use your experience and judgment - and that of your team - to make the design decisions big and small, but Evans recommends formulating a one-page manifesto, written in business terms, to share with biz types that explains the value of your view of the domain to the business.