I'm quite confused by this paradox:
GCC ext apparently contains lots of broadly useful functionality. For example, ext/pb_ds/assoc_container.h lets you build an order statistic tree just by specifying particular template arguments, and ext/numeric contains the power(..) algorithm for O(lg N) exponentiation of a generic object to a non-zero integer power — this algorithm gets written from scratch all the time. There is also the rope data structure, algorithms for random sampling, and many others. Not things you would use every day, but definitely things that would be handy every other year or so.
Almost nobody seems to be using them. There is very little discussion on the web. There are some bug reports, and posts like this one suggesting either that these things are buggy and unmaintained or that there is no definitive guide on how to use them properly.
Now, trying to find the documentation, I type in gcc "ext" into Google, and get https://gcc.gnu.org/onlinedocs/libstdc++/ext/pb_ds/ as the first result. Going to Examples of Associate Containers gets me to another table of contents, but clicking on e.g. the link to basic_set.cc gives me a 404 page.
At this point I'm not even sure if this code had received enough testing to be able to rely on it for serious applications.
Is there any proper documentation for when and how to use #include <ext/numeric> and the like? Or at least examples and asymptotic complexity estimates?
Since it sounds like you've found a defect in the documentation, I'd suggest sending an e-mail to libstdc++#gcc.gnu.org to subscribe to the mailing list. I was able to find a mirror for the libstdc++ test suite on Github, which contains the examples you want. If you're looking for documentation for ext_numerics, it's at gcc.gnu.org/onlinedocs/libstdc++/manual/ext_numerics.html.
Related
I am new to C++ and extremely surprised by the lack of accessible, common probability manipulation tools (i.e. the lack of things in Boost and the standard library). I've done a lot of scientific programming in other languages, but the standard and/or ubiquitious third party add-ons always include a full range of probability tools. A friend billed up Boost to be the equivalent ubiquitous add-on for C++, but as I read the Boost documentation, even it seems to have a dearth of what I would consider extremely elementary built-ins.
I cannot find a built in that takes some sort of array of discrete probabilities and produces an index chosen according to those probabilities. I can of course write my own function for this, but I just wanted to check whether I am missing a standard way to do this.
Having to write my own functions at such a low-level is a bad thing, I feel, but I am writing a new simulation module for a larger project that is all in C++. My usual go-to tactic would be to write it in Python and link the Python to the C++, but because several other people are going to have to manage this code once I finish it, and none of them know Python, I think it would be more prudent to deliver it to them all in C++.
More generally, what do people do in C++ for things like sampling from standard distributions, in particular something as basic as a multi-variate normal distribution?
Perhaps I'm misunderstanding your intention, but it seems to me what you want is simply std::discrete_distribution.
(Moved from comment.)
Did you look at Boost.Math.StatisticalDistributions? Specifically, its Discrete Probability Distributions?
Boost is not a library, it's a collection of libraries, so it can sometimes be difficult to find exactly what you're looking for – but that doesn't mean it isn't there. ;-]
As mentioned, you'll want to look at boost/math/distributions and friends to meet your needs.
Here's a very good, detailed tutorial on getting these working for you in Boost. You may also want to throw your weight behind stan as well, which looks quite promising within this space.
Boost's math libraries are pretty good for working with different distributions, but if you are only interested in sampling (as in the problem you mentioned in your post), then looking at the boost Random libraries might be more germane to your task. This link shows how to simulate rolling a weighted die, for example.
You should do less C++ bashing, and more question asking - we try to be helpful and respectful on SO. Questions like yours are often tagged as inflammatory.
Boost::math seems to provide exactly what you're looking for: https://www.quantnet.com/cplusplus-statistical-distributions-boost/ - I'm not 100% sure about how well it handles multi-variate distributions though (nor am I an expert on statistics).
Get it here: http://www.boost.org/doc/libs/1_49_0/libs/math/doc/html/index.html
I am aware of a similar question for C#. I downloaded and tried NArrange and UniversalIndentGUI but both do not sort functions of C++ code per name. Does anyone know a non-commercial tool that does this job?
Unless you're under orders to rearrange the code to conform to an arbitrary coding standard, my advice is do not do this. I've seen people do it before, and the results are not pretty. The file will look completely different after you're done, and you'll have effectively trashed all of the edit history in source control. Any diffs between this version and any version that came before it will look like a jumbled mess. In the long run, having a clear diff history is worth more to you and your team than some measure of code cleanliness.
Can anybody introduce me some libraries that contains web ranking algorithms such as PageRank, HITS?
Thank you
I guess you are refering to the canonical PageRank algorithm as published in the original PageRank paper. People nowadays use "PageRank" to refer to the actual current Google algorithm for search.
If that is really the case, the PageRank implementation is not that difficult to find and use. Searching through Google you can find a good deal of implementations. One in python, for example.
For the HITS algorithm there's pseudocode in wikipedia. There's also a Perl implementation.
I'm also suggesting CLucene for you to start messing around.
Unless you work for Google, there aren't many good ways of finding out the specifics of their page ranking algorithm...which changes from time to time. Wikipedia outlines some of the basics:
http://en.wikipedia.org/wiki/PageRank
Other people write lengthy articles:
http://www.smashingmagazine.com/2007/06/05/google-pagerank-what-do-we-really-know-about-it/
If you are interested in the kinds of techniques that are involved in writing a search engine, there are several topics. For instance, there is "web crawling" and how to write programs that visit web sites and grab their contents...and determining when to visit the sites again to see if they've changed:
http://en.wikipedia.org/wiki/Web_crawler
Once you have a bunch of data on your machine(s) to analyze and search, the subject area to study is called "Information Retrieval" (or "IR"):
http://en.wikipedia.org/wiki/Information_retrieval
It's a fairly new science, but a lot of work is done on it. Wikipedia has a list of "free search engine software":
http://en.wikipedia.org/wiki/Category:Free_search_engine_software
I'd suggest that if you're new to this then it might be best to start with figuring out how to use something like Lucene to provide a search box on a website you have. Then dig in and see how it works. It has been ported to C++ if that is important to you:
http://clucene.sourceforge.net/
I am thinking about the following scheduling problem:
I have X people.
I have Y meeting slots with Z meeting roles available in every meeting.
For some roles, same person may combine two of them in a single meeting, but most are one person = one role.
For each person x in X, I know a set of facts about them:
a) The last date they attended the meeting and had a specific role (historical);
b) Their availability for any meeting y in Y;
c) Their specific preference for the roles z in Z or a set of roles (no specific dates) for the group of meetings.
I'd like to build a scheduler with the following objectives in mind:
a) All meeting roles are filled.
b) Preferences are accommodated if possible;
c) Distribution of people / roles should be uniform (i.e. if one person is scheduled every meeting and other just for one meeting once in a while -- it's unacceptable; if one person is scheduled for the same role over, and over, and over again -- it's unacceptable).
Now, I have a gut feeling that the task is not easy at all :), so my specific questions are:
What language would be better suited for the task (somehow I feel Prolog can deal with it, but I am not entirely sure).
What is the proper approach to solve this task and how close can I get to my objectives in #4 above?
Any good read on the kind of problem I am looking to solve?
Thank you!
P.S. If you are curious, the use case is scheduling a roster for a set of Toastmasters meeting (example) (I am lazy do it by hand and I'd like computer to help me in this task at least partially).
A rule engine, like Drools Expert or Prolog is good for defining the constraints (= score function). However it's terrible at finding the best solution.
Since your problem is probably NP complete (especially if the meetings need to be put into a timeslot and/or 1 person can't attend 2 meetings at the same time), you need to use a planning optimization algorithm on top of that, such as construction heuristics and metaheuristics. Take a look at the curriculum course example in Drools Planner (java, open source, ASL).
From my point of view, the language you are going to program in doesn't really matter that much: for simple problems the language to use is more of a personal preference instead of an exact science. If you like/want to learn Python, use that. If you "feel like" Prolog today, use that.
What will be a factor in your choice though is how you want to preserve and present your data. From your question it can be told that you need the following:
A database (or at least, a persistent resource) to store your available participants and roles, past and future meetings storing the roles for every participant, and some way to schedule availability.
Some way to present your data (command line, GUI, or website).
Some business logic that describes the way of assigning roles, criteria for the attendance and such.
You will want to use some third-party components for most of these, since your time is to be spent on the added value of your product; creating a shiny ORM or GUI toolkit is not your goal in this. So the programming language you will choose should have a proper support for these items (especially the first two). I can't say it for Prolog, but Python will have you fully covered in these areas. I think it goes beyond the scope of this question to suggest specific toolkits, so I'll leave it at that for now.
After this step, you analyze your problem, which you seem to have done quite nicely already. So, start implementing it. To be able to verify your specific use cases, it sounds like you could benefit from some Test or Behavior Driven Design, so you may want to read up on that.
For learning the language, just search StackOverflow for "[language] tutorial": there are already plenty of answers linking to very nice resources for getting started with any language you will choose.
Final advice: perseverance is the hardest part, so try to set yourself some goals or milestones, or try to involve other people in one way or another. That way you'll enlarge the possibility of following through with creating a nice piece of software.
Even though I'm a Python fan, I'd hardly suggest Prolog for this task. I'm familiar with Prolog, and it's definitely nicer solved with Prolog. But it depends on how you will use that program. Your choice - decide whether the installation of Python or Prolog is easier for you (if you just run it on your local PC, it doesn't matter that much I guess), or on other requirements you have.
It's farly simple with Prolog, if you know about Prolog. After you learnt Prolog, you can solve it with some thinking without much problems I guess (if you really understood Prolog!).
Basicly you should start with Prolog of course. I'd suggest to use SWI-Prolog, it's one of the most common Prolog Implementations used. Also, there is a nice tutorial for it: http://www.learnprolognow.org/
It seems to me, but I'm not 100% sure, that you are not familiar with Prolog yet. You need the time to learn Prolog first, so it also depends on how fast you need to have your program. It's possible to get through the Tutorial in less than a month, as far as I remember. Of course this hardly depends on how much time you invest per day - you can do it in less or even more time.
Prolog is based on rules. Every of your requirement can be expressed as a rule. After you have your set of rules, you can ask, which combination (of persons and meeting room) conform to all those rules. For the historical data of the different persons, you could use a small database.
This sounds like an optimization problem and I agree with Geoffrey that it would be a NP Complete problem. I recently developed a scheduling algorithm for a university that does final exam scheduling. I used a genetic algorithm with domain specific heuristics to solve that problem. My implementation performed nicely with a student count of 3000 + and course count of 500, it took about 2 hours to find a near optimal solution.
I agree with people who suggest Prolog for this task; I would suggest to take a look
at ECLiPSe (it is, besides being a Prolog implementation, a constraint programming
language which have more powerful problem solving capabilities than just Prolog).
ECLiPSe has now a very nice introduction, with many examples and very to the point,
with a free pdf, written by Antoni Niederlinski:
http://www.anclp.pl/
Among the examples on ECLiPSe site, I found the following which seems to be relevant: http://eclipseclp.org/examples/roster.ecl.txt.
ECLiPSe is thoroughly documented and, according to this documentation,
can be also integerated with C++/Java.
Looking for some general advice...
I've been using boost for a while, and I've written several small modules and function (eg: see this SO question) which I think cold be appropriate for inclusion in boost. I've been to the project pages to see about the submission process, but it seems like it's "be on the inside, or don't bother". I can subscribe to the developers mailing list, but I'm not sure I'm qualified to post there: I'm certainly not intimately familiar with all the various boost modules, and not nearly as well-versed in template meta-programming as the people actively participating.
Is there an avenue that I'm missing for "normal" people to send ideas for things which could be incorporated into boost? Or is boost kinda a "open in name only, unless you make it a full-time job" type of project?
I think that you shouldn't hestitate and go to boost-devel. Most likely your code will not be accepted, but it is very likely that you will be able to collect valuable feedback and learn a lot. People there will justify why it can't be accepted in the current form, or how the given functionality could be made more generic, etc. I think that overall it will be beneficial for you.
If you are not comfortable with boost-devel then subscribe and just follow it for some time. And few more personal comments about reading/following the list:
Find out who is who - some people are active only in very narrow fields, while others tend to have a lot to say in many different domains
Create some kind of filtering rules for new mails (the load of the list is rather heavy) - some mails are really not interesting
Observe the review process of the submitted libraries, critical comments, suggestions.
Subscribe to boost-devel, boost and boost-users - they tend to be quite interconnected. You can throw sprit, threads and ublas lists if you are interested in these projects.
I have been following the devel list for some time. I feel as you do: maybe I am not playing in the same league as these people, but nevertheless you do learn quite a bit from the discussions there.
During the time I have been there, I have found some kind of common pattern for submissions: first query for interest in the library, then offer the library for review. Beware of the licensing of your code: if it is not compatible with the boost license most of the people will not even give it a look. Spending time in the review of some proprietary code seems like working for free for someone else.
Also consider writting docs and publishing, make the library accessible in the internet and have others use/look at it. That will boost (no pun intended) your chances.
Why does your stuff have to go straight into boost ? Why not just put it "out there" first with a license you're happy with... if there's enough interest in it from users a "hey this should be in boost" might be the next logical step, or not.
I tried submitting my UTF-8 CPP library to Boost. Started by subscribing to their dev mail list and sent a couple of informal review requests, but felt it leads nowhere so I gave up and never submitted a formal review request.
Good luck with your submission.
Did you look at http://www.boost.org/development/submissions.html?