I posted this on Wilmott too, wasn't sure which would get more of a response.
I'm relatively new to the world of Quantlib (and C++ . . .), so perhaps this is quite obvious. I'm trying to figure out if Quantlib can price forward premium vanilla swaptions (OIS discounting, 3mL curve for estimation). All I can see in Quantlib in the Swaption files are inputs for one term structure for discounting. Does it use this also for estimation? Or is there a way to override it, such that I can enter two curves.
Any help, examples etc would be much appreciated (and would save me a lot of time staring at the same files hoping something jumps out at me...)!
Thanks a lot
It depends. If you want to price Bermudan swaptions, you're out of luck; QuantLib can only price them on a tree and there's no way to use the two curves.
If you want to price European swaptions, you can use the two curves in the Black formula, although I agree that it's not obvious to find that out by looking at the code. As you've probably seen already, you'll have to instantiate both an instrument (the Swaption class) and a corresponding engine (the BlackSwaptionEngine class). The constructor of the BlackSwaptionEngine takes a discount curve besides the other args, so you'll pass the OIS curve here. The constructor of the Swaption, on the other hand, takes the swap underlying the option as a VanillaSwap instance. In turn, the VanillaSwap constructor takes an IborIndex instance representing the floating-rate index to be paid; and finally, the IborIndex constructor takes the curve to be used to forecast its fixings, so that's the place where you can pass the 3mL curve. To summarize:
shared_ptr<IborIndex> libor(new GBPLibor(3*Months, forecastCurve));
shared_ptr<VanillaSwap> swap(new VanillaSwap(..., libor, ...));
shared_ptr<Instrument> swaption(new Swaption(swap, ...));
shared_ptr<PricingEngine> engine(new BlackSwaptionEngine(discountCurve, ...));
swaption->setPricingEngine(engine);
double price = swaption->NPV();
Also, note that the current released version (QuantLib 1.1) has a bug that makes it use the wrong curve at some point during the calculations. You'll want to use version 1.2, which is not yet released but can be checked out from the Subversion repository at https://quantlib.svn.sourceforge.net/svnroot/quantlib/branches/R01020x-branch/QuantLib.
Related
Firstly apologies if this has been answered elsewhere.
I am using QuantLib (via Excel) to build a "standard" bond pricing sheet: prices, yields, spline AND matched-maturity ASW.
I can price the bonds, and have successfully built a forecast (Euribor) and discount (EONIA) curve. I can use qlMakeVanillaSwap() to define a spot-start swap by tenor (eg "1y","2Y" etc) and it works fine. However I am struggling to define a "broken date" swap, ie one which starts T+2 and ends on a given date (and so usually has a short stub on the first payment), to match the bond maturity. All the examples I can find have integer year tenors.
I would be grateful if someone could point me to the right method (can be in python, C++ or Excel). Or do I have to go down the route of creating explicit fixed and floating rate schedules for the swaps?
The answer seems to be: Yes, I do have to create explicit fixed and floating rate schedules, using qlSchedule(), but it turns out to be not too onerous. NB. I am pricing a vanilla EUR ABB vs 6m Euribor swap.
As for pricing, it seems the qlMakeVanillaSwap() is doing a few helpful things in one call, but only IF your swap has a whole-period tenor (eg "1y"). I found the answer for what I wanted to do in the example sheet that came with the QuantLibXL download package.
The other thing that qlMakeVanillaSwap() is doing (in addition to creating the schedules) is setting the Pricing Engine (which is used to discount the cashflows). In the longer version you have to (a) set it yourself using qlInstrumentSetPricingEngine() and (b) pass the result of that call to the Trigger parameter of qlVanillaSwapFairRate(), to establish the calculation order.
can word2vec be used for guessing words with just context?
having trained the model with a large data set e.g. Google news how can I use word2vec to predict a similar word with only context e.g. with input ", who dominated chess for more than 15 years, will compete against nine top players in St Louis, Missouri." The output should be Kasparov or maybe Carlsen.
I'ven seen only the similarity apis but I can't make sense how to use them for this? is this not how word2vec was intented to use?
It is not the intended use of word2vec. The word2vec algorithm internally tries to predict exact words, using surrounding words, as a roundabout way to learn useful vectors for those surrounding words.
But even so, it's not forming exact predictions during training. It's just looking at a single narrow training example – context words and target word – and performing a very simple comparison and internal nudge to make its conformance to that one example slightly better. Over time, that self-adjusts towards useful vectors – even if the predictions remain of wildly-varying quality.
Most word2vec libraries don't offer a direct interface for showing ranked predictions, given context words. The Python gensim library, for the last few versions (as of current version 2.2.0 in July 2017), has offered a predict_output_word() method that roughly shows what the model would predict, given context-words, for some training modes. See:
https://radimrehurek.com/gensim/models/word2vec.html#gensim.models.word2vec.Word2Vec.predict_output_word
However, considering your fill-in-the-blank query (also called a 'cloze deletion' in related education or machine-learning contexts):
_____, who dominated chess for more than 15 years, will compete against nine top players in St Louis, Missouri
A vanilla word2vec model is unlikely to get that right. It has little sense of the relative importance of words (except when some words are more narrowly predictive of others). It has no sense of grammar/ordering, or or of the compositional-meaning of connected-phrases (like 'dominated chess' as opposed to the separate words 'dominated' and 'chess'). Even though words describing the same sorts of things are usually near each other, it doesn't know categories to be able to determine that the blank must be a 'person' and a 'chess player', and the fuzzy-similarities of word2vec don't guarantee words-of-a-class will necessarily all be nearer-each-other than other words.
There has been a bunch of work to train word/concept vectors (aka 'dense embeddings') to be better at helping at such question-answering tasks. A random example might be "Creating Causal Embeddings for Question Answering with Minimal Supervision" but queries like [word2vec question answering] or [embeddings for question answering] will find lots more. I don't know of easy out-of-the-box libraries for doing this, with or without a core of word2vec, though.
Is there a way to incorporate the uncertainties on my data set into the result of the Savitzky Golay fit? Since I am not passing this information into the function, I asume that it is simply calcuating the 'best fit' via an unweighted least-squares process. I am currently working with data that has non-uniform uncertainty, and so the fit of the data could be improved by including the errors that I have for my main dataset.
The wikipedia page for the Savitzky-Golay filter suggests how I might go about alter the process of calculating the coefficients of the fit, and I am staring at the code for scipy.signal.savgol_filter, but I cannot get my head around what I need to adjust so that this will do what I want it to.
Are there any ready-made weighted SG filters floating about? I find it hard to believe that no-one else has ever needed this tool in Python, but maybe I have missed something.
Check out this Python module: https://github.com/surhudm/savitzky_golay_with_errors
This python script improves upon the traditional Savitzky-Golay filter
by accounting for errors or covariance in the data. The inputs and
arguments are all modelled after scipy.signal.savgol_filter
Matlab function sgolayfilt supports weights. Check the documentation.
I would like to construct a spot curve from supplied bond prices. I know that the curve has to be constructed from dirty prices (i.e. the ones that include accrued interest). However, from FittedBondCurve.cpp example posted on quantlib.org, it appears that FixedRateBondHelper class is initialized with clean prices.
So, my question is: does it mean that FixedRateBondHelper takes care of computing accrued interest and converting clean price to dirty price? Or is it something that a user should do? I believe it's the former but wanted to make sure.
The helper doesn't, but the fitting algorithm does. If you look at the FittedBondDiscountCurve::FittingMethod::FittingCost::value method, you'll cringe a bit at the nested inner classes, but then you'll see that the model price is calculated by adding the discounted future cash flows and subtracting the accrued amount.
A further note: in recent releases, the bond helpers have been given the possibility to work with quoted dirty prices when bootstrapping a curve (see the last parameter of their constructors, useCleanPrice, which defaults to true but can be set to false to use dirty prices. However, the FittedBondDiscountCurve class is not yet aware of this change, and thus setting useCleanPrice to false would break the algorithm. I'll try to fix this in a future release.
What I am trying to do:
I am trying to take a list of terms and distinguish which domain they are coming from. For example "intestine" would be from the anatomical domain while the term "cancer" would be from the disease domain. I am getting these terms from different ontologies such as DOID and FMA (they can be found at bioportal.bioontology.org)
The problem:
I am having a hard time realizing the best way to implement this. Currently I am naively taking the terms from the ontologies DOID and FMA and taking difference of any term that is in the FMA list which we know is anatomical from the DOID list (which contains terms that may be anatomical such as colon carcinoma, colon being anatomical and carcinoma being disease).
Thoughts:
I was thinking that I can get root words, prefixes, and postfixes, for the different term domains and try and match it to the terms in the list. Another idea is to take more information from their ontology such as meta data or something and use this to distinguish between the terms.
Any ideas are welcome.
As a first run, you'll probably have the best luck with bigrams. As an initial hypothesis, diseases are usually noun phrases, and usually have a very English-specific structure where NP -> N N, like "liver cancer", which means roughly the same thing as "cancer of the liver." Doctors tend not to use the latter, while the former should be caught with bigrams quite well.
Use the two ontologies you have there as starting points to train some kind of bigram model. Like Rcynic suggested, you can count them up and derive probabilities. A Naive Bayes classifier would work nicely here. The features are the bigrams; classes are anatomy or disease. sklearn has Naive Bayes built in. The "naive" part means, in this case, that all your bigrams are independent of each other. This assumption is fundamentally false, but it works well in a lot of circumstances, so we pretend it's true.
This won't work perfectly. As it's your first pass, you should be prepared to probe the output to understand how it derived the answer it came upon and find cases that failed on. When you find trends of errors, tweak your model, and try again.
I wouldn't recommend WordNet here. It wasn't written by doctors, and since what you're doing relies on precise medical terminology, it's probably going to add bizarre meanings. Consider, from nltk.corpus.wordnet:
>>> livers = reader.synsets("liver")
>>> pprint([l.definition() for l in livers])
[u'large and complicated reddish-brown glandular organ located in the upper right portion of the abdominal cavity; secretes bile and functions in metabolism of protein and carbohydrate and fat; synthesizes substances involved in the clotting of the blood; synthesizes vitamin A; detoxifies poisonous substances and breaks down worn-out erythrocytes',
u'liver of an animal used as meat',
u'a person who has a special life style',
u'someone who lives in a place',
u'having a reddish-brown color']
Only one of these is really of interest to you. As a null hypothesis, there's an 80% chance WordNet will add noise, not knowledge.
The naive approach - what precision and recall is it getting you? If you setup a test case now, then you can track your progress as you apply more sophisticated methods.
I don't know what initial set you are dealing with - but one thing to try is to get your hands on annotated documents(maybe use mechanical turk). The documents need to be tagged as the domains you're looking for - anatomical or disease.
then count and divide will tell you how likely a word you encounter is to belong to a domain. With that the next step and be to tweak some weights.
Another approach (going in a whole other direction) is using WordNet. I don't know if it will be useful for exactly your purposes, but its a massive ontology - so it might help.
Python has bindings to use Wordnet via nltk.
from nltk.corpus import wordnet as wn
wn.synsets('cancer')
gives output = [Synset('cancer.n.01'), Synset('cancer.n.02'), Synset('cancer.n.03'), Synset('cancer.n.04'), Synset('cancer.n.05')]
http://wordnetweb.princeton.edu/perl/webwn
Let us know how it works out.