PCA between matlab PLS toolsbox and R - pca

I am recently starting doing PCA with some pyrosequencing community result (which basically tell how many species in different samples). I am having two choices on software - PLS tools in matlab or package vegan in R.
In R , I input my normallised (relative abundance) file and use rda in package vegan:
pca<-rda(myfile)
biplot(pca)
I noticed that PLS tools give you option to choose mean centering, with that option, the PCA graph looks totally different between software. I am wondering whether this mean centering option make any different?
I read some website that mentioned we should always do the mean centering for PCA, is there any way I can do mean centering in R as well? Or does the function rda in R did the mean centering by itself?
Thank you

Mean centring is done automatically and always in rda() function in vegan. The vegan function does not offer you any other option. Neither do standard R functions prcomp (recommended) nor princomp (not recommended).

Related

How to Show C++ Results in Excel

I am trying to create C++ code that allows User Input in selecting a variety of fields, then it will calculate many different angles and show the users the results, as well as a graph.
However, it has been suggested to us by our lecturer that it may be a good idea to write the code to these calculations etc in C++, then input the results into Excel.
Does anyone have any idea how to do this? Literally looking for a way for the user to fill in the required values on C++ and then to be AUTOMATICALLY taken to the excel file to show the results in the table and graph format.
If this is not possible, is there a way to display the results in the table and graph format through C++?
Thanks very much in advance
Excel provides COM interface which you can use from your C++ application.
This can be done in the way described in this article:
http://support.microsoft.com/kb/216686
This link might also be useful:
http://www.codeproject.com/Articles/10886/How-to-use-Managed-C-to-Automate-Excel
I think the second link would be better for you as its more of a step by step guide which should help you to workout the answer.
Use COM Automation to automate excel.
The best way to do this is to use the vole library by Matthew Wilson at
http://vole.sourceforge.net/
Take a look at the examples. I do not think there is an example for excel, but there is one for microsoft word at http://www.codeproject.com/KB/COM/VOLE_word.aspx
I have used vole in the past, and it makes it a whole lot easier

Use R to create graphs/plots/charts and GTK to display and interact with them

I would like to use R to generates some graphs/plots/charts and then use GTK to display them. One feature is that the plot must be able to auto-update and have some interactive features such as set maxima/minima labels, re-scale, allow for normalisation, etc... The data set is potentially of the order of several thousands data points, possibly up to ten of thousands.
Are there any libraries/modules that already do that? My Google-fu was weak. I do not mind either a c++ or a python one.
If there are no such library, how would I be able to achieve this?
Note: The system is kind of embedded -- it certainly has no Internet connection but does have an internal network. Using the web would increase the cost of the system drastically and thus it is not a good solution to my problem.
As you've put python in your tags too, maybe matplotlib would be of some interest? Just in case.
I wondered whether 10,000 points would be an issue with these graphics devices, and with this gWidgets script running under RGtk2 and Qt it was just about the border of fast enough to be acceptable (certainly on my aging machine 100,000 points was way too many):
library(gWidgets)
options(guiToolkit="RGtk2")
w <- gwindow("test")
pg <- gpanedgroup(cont=w)
fl <- glayout(cont=pg)
gg <- ggraphics(cont=pg)
size(gg) <- c(600, 600)
fl[1,1] <- "No. points"
fl[1,2] <- no_pts <- gedit("10", cont=fl, coerce.with=as.numeric)
fl[2,2] <- gbutton("click me", cont=fl, label="", handler=function(h,...) {
n <- svalue(no_pts)
plot(rnorm(n), rnorm(n))
})
If this speed is acceptable, one can make a GUI along the lines of playwith for your specific needs relatively easily. It might be that the cranvas package can make this faster for Qt.
Otherwise, I don't know if the rgl package of Duncan Murdoch would be useful, but it might be. Simon Urbanek gave a very nice presentation at the last useR meeting where the openGl graphics engine in some browsers allowed for very fast plots with over 1,000,000 points, and this was done over a websocket.
First of all, R at its core does not feature interactive plots -- this goes against the idea of controlling almost everything with the programming language itself.
There are some libraries that allow you to create more or less interactive plots, starting from the simplistic locator function that you would need to wrap into your R programs, and including the manipulate package from RStudio as well as the iplot package. There is even a GTK+ based R package called playwith.
Depending on what you actually want to achieve, maybe using gnuplot would be a better idea.
For a web based solution (web is the future :)) that allows this kind of functionality from a server, I would take a look at the shiny package just released by the people at Rstudio. It looks like what you need, without you having to do any programmng. And you get the bonus that anyone with a browser can open it from anywhere. See this lnks:
http://blog.rstudio.org/2012/11/08/introducing-shiny/

Need a C++ library to fit curves to data points

I have a program that's creating data points, some of them are in the shape of a log function and some are lines. I need to be able to fit curves to these data points to be able to extrapolate. Are there any C++ libraries that can do this for me?
Try the GNU scientific Library: http://www.gnu.org/software/gsl/manual/html_node/Linear-regression.html
CERN's ROOT package is probably your best option. It combines plotting, GUI resources, and stable computing resources together in one great package.

Simple data visualization from data to create/place circles/spere on grid

all I want is to create circles (from data) on a grid for specific sets of data with different colours. These might be objects I have created to be placed on grid or from the program itself. I was using POVRAY but it is massively complicated and I don't have the time. Unless anyone has a tutorial on how to read data from files and extract all the numbers and used successfully in .pov files.
There are several programs/environments (not C++) that can do this directly. One is gnuplot; another, more robust tool is R. Although with R there is a bit more of a learning curve to really get moving.
Have you considered GNUplot?
It has a simple syntax, so you can just convert your data file into an input file for gnuplot.

Graph-Drawing / TSP-Route-Drawing in C++ with "known" coordinates: How? Which Library/Tool?

i'm developing some kind of heuristics for a variation of the vehicle-routing-problem in C++.
After generating a solution, i want to plot this solution. The solution is a composite of various tours, all starting and ending at a common depot.
Therefore i have a vertex-set with all the coordinates and edges defined by two vertex-id's each. Furthermore i have all the distances between vertex-pairs of course.
It would be helpful to plot this in an extra-window opening in my program, but writing a plot to a graphics-file should be okay too.
What is an easy way to plot this? How would you tackle this?
First i tried to look for common graph-visualization packages (graphviz, tulip, networkx (python)), but i realized that all of them are specialized at graph-layouting (when there are no coordinates). Correct me when i'm wrong.
I don't know if it is possible to tell these packages that i already have the coordinates, helping the layouting-algorithms.
Next thing i tried is the CGAL library with geomview output -> no luck until now -> ubuntu crashes geomview.
One more question: Is it a better idea to use some non-layouting 2d-plot-libraries risking a plot, which isn't really good to view at (is there more to do than scaling?) or to use some layout-algorithm-based-libraries (e.g. graphviz, tulip, networkx), feed them with the distances between the vertices and hope the layouting-algorithms are keeping the distances while plotting in a good-to-view-at way?
If non-layouting-plotting is the way to do it: which library do you recommend?
If layout-based-plotting is the way to do it: how can i make use of the distances/coordinates in these libraries? And which library do you recommend?
Thanks for all your input!
Sascha
EDIT: I completed a prototype implementation using the PLplot library (http://plplot.sourceforge.net/). The results are nice and should be enough for the moment. I discovered and chosed this library because a related project (VRPH Software Package / Groer) used this plot and the source code was distributed. So the implementation was done in a short amount of time. The API is in my opinion bit awkward and low-level. Maybe there are some more modern (maybe not a c-based library) libraries out there? MathGL? Dislin? Maybe i will try them too.
The nice thing about drawing multiple tours in a vehicle routing problem is that "not so bad" algorithms tend to discover nice non-overlapping and divergent tours which is really good for the eye ;-)
It is not quite clear what you are trying to archive, but if I understand your question correctly, then you could do it using OpenGL. Having vertex coordinates, it should be fairly easy.
You can use Gnuplot with a input text file that contains your solution.
It is convenient to draw the points (vertex) then lines (agents paths) than link them.
To make the plot script easy, you can have a separate file for each vehicle, if the number
of vehicles is known.
check out:
http://www.cleveralgorithms.com/nature-inspired/advanced/visualizing_algorithms.html