A problem in saving "long double" variables to hdf5 file - c++

I'm new to hdf5. I'm trying to save the results of a simulation in hdf5 file. The variables are long double. So, I mapped them into NATIVE_LDOUBLE. However, the saved values are completely wrong (fluctuating between very small and very large values).
When I save them with NATIVE_DOUBLE, everything is ok. But I need to save long double.
My question is how to properly save long double variables, and moreover quadruple precision variables?
I deeply appreciate your help. Examples are also appreciated.
here is the code
void createHDF5_2DProjectionFile(char* file_name,
CarGrid1D3V<long double>& ph_grid,
std::string first_dim,
std::string second_dim,
long double *x1, int size_x1,
long double *x2, int size_x2)
{
try
{
/* define the size of the datasets containing the coordinates x1
and x2
*/
PredType h5Int = PredType::NATIVE_INT;
PredType h5DoubleL = PredType::NATIVE_LDOUBLE;
PredType h5Double = PredType::NATIVE_DOUBLE;
/* Define the parameters of grid space
DS --> Data Space
*/
hsize_t x1_dims[1], x2_dims[1];
x1_dims[0] = size_x1;
x2_dims[0] = size_x2;
H5File *file_id = new H5File(file_name, H5F_ACC_TRUNC);
/* Saving string attribute
Create dataspace with H5S_SCALAR
Create string datatype of specific length of characters
Create attribute and write to it
*/
DataSpace attr_stringDS = DataSpace(H5S_SCALAR);
StrType strdatatype(PredType::C_S1, 64);
Attribute original_DistFun = file_id->createAttribute("/OriginalDistFun",
strdatatype, attr_stringDS);
original_DistFun.write(strdatatype, "1D3V");
Attribute projection = file_id->createAttribute("/Projection",
strdatatype, attr_stringDS);
projection.write(strdatatype, first_dim + " - " + second_dim);
/* Create the data spaces for grid points along each direction */
DataSpace* first_dimDS_id = new DataSpace(1, x1_dims, NULL);
DataSpace* second_dimDS_id = new DataSpace(1, x2_dims, NULL);
/* Create and fille the datasets for grid points along each direction */
DataSet *data_dim1 = new DataSet(file_id->createDataSet(first_dim,
h5DoubleL, *first_dimDS_id));
data_dim1->write(x1, h5DoubleL);
DataSet *data_dim2 = new DataSet(file_id->createDataSet(second_dim,
h5DoubleL, *second_dimDS_id));
data_dim2->write(x2, h5DoubleL);
/* Important attributes added to the file */
long double x_minmax[2], px_minmax[2],
py_minmax[2], pz_minmax[2], mom_steps[3],
ph_vols[3], spatial_steps[1];
x_minmax[0] = ph_grid.x_min_;
x_minmax[1] = ph_grid.x_max_;
px_minmax[0] = ph_grid.px_min_;
px_minmax[1] = ph_grid.px_max_;
py_minmax[0] = ph_grid.py_min_;
py_minmax[1] = ph_grid.py_max_;
pz_minmax[0] = ph_grid.pz_min_;
pz_minmax[1] = ph_grid.pz_max_;
mom_steps[0] = ph_grid.dpx_;
mom_steps[1] = ph_grid.dpy_;
mom_steps[2] = ph_grid.dpz_;
ph_vols[0] = ph_grid.dvs_;
ph_vols[1] = ph_grid.dvp_;
ph_vols[2] = ph_grid.dv_;
spatial_steps[0] = ph_grid.dx_;
ph_grid.print_characteristics();
std::cout << x_minmax[0] << " , " << x_minmax[1] << "\n";
/* define attributes configuration */
hsize_t space_1[1];
space_1[0] = 1;
hsize_t space_2[1];
space_2[0] = 2;
hsize_t space_3[1];
space_3[0] = 3;
DataSpace attr_space_1 = DataSpace(1, space_1);
DataSpace attr_space_2 = DataSpace(1, space_2);
DataSpace attr_space_3 = DataSpace(1, space_3);
Attribute x_interval = file_id->createAttribute("[x_min,x_max]",
h5DoubleL, attr_space_2);
x_interval.write(h5DoubleL, x_minmax);
Attribute px_interval = file_id->createAttribute("[px_min,px_max]",
h5DoubleL, attr_space_2);
px_interval.write(h5DoubleL, px_minmax);
Attribute py_interval = file_id->createAttribute("[py_min,py_max]",
h5DoubleL, attr_space_2);
py_interval.write(h5DoubleL, py_minmax);
Attribute pz_interval = file_id->createAttribute("[pz_min,pz_max]",
h5DoubleL, attr_space_2);
pz_interval.write(h5DoubleL, pz_minmax);
Attribute MomVolumes = file_id->createAttribute("[dpx,dpy,dpz]",
h5DoubleL, attr_space_3);
MomVolumes.write(h5DoubleL, mom_steps);
Attribute PhVolumes = file_id->createAttribute("[dv_s, dv_m, dv_t]",
h5DoubleL, attr_space_3);
PhVolumes.write(h5DoubleL, ph_vols);
Attribute SpatialVolumes = file_id->createAttribute("[dx]", PredType::NATIVE_DOUBLE,
attr_space_1);
SpatialVolumes.write(h5DoubleL, spatial_steps);
/* Free memory */
delete data_dim1;
delete data_dim2;
delete first_dimDS_id;
delete second_dimDS_id;
delete file_id;
}
catch(DataSetIException error)
{
error.printErrorStack();
}
catch(DataSpaceIException error)
{
error.printErrorStack();
}
catch(FileIException error)
{
error.printErrorStack();
}
}
Update
A great discussion and explanations are available on HDF5 forum where I posted the same question
https://forum.hdfgroup.org/t/a-problem-when-saving-native-ldouble-variables/9504
Also, Steven Varga provided examples to answer this question on his GitHub by constructing a user-defined datatype (see this link).

Related

How to read in data from a json file until there is no more data to read in QT

Basically I am using an API to retrieve stock data to me in the form of a ytd so it will return data of the closing price of the stock everyday from january until now. At first I was simply using a for loop and reading until i < json.size() but after figuring out the .size() does not properly return what i need for it to work i am again stuck on this. My code is below
//Retrieves json format of data
Json::Value chartData = IEX::stocks::chartYtd(symbol_std);
//Stores x and y values
QVector<double> time(365), closePrice(365);
//Intialize first vector to first values
closePrice[0] = chartData[0]["close"].asDouble();
time[0] = startYearTime;
//Finds max and min for range
float maxAvg = closePrice[0];
float minAvg = closePrice[0];
//Reads in data from json(historical data 1 day delayed)
for(int i = 1; ; i++)
{
time[i] = startYearTime + 86400*i;
closePrice[i] = (chartData[i]["close"].asDouble());
if((closePrice[i] == 0) && (time[i] != chartData.size() - 1))
{
closePrice[i] = closePrice[i-1];
}
if(closePrice[i] > maxAvg)
{
maxAvg = closePrice[i];
}
else if(closePrice[i] < minAvg)
{
minAvg = closePrice[i];
}
}
The json file looks like this
what can i do to have my code store the "close" value in the json file until there is no more "close" value to read in and then in which it stops, thank you in advance as im a new developer!

OLE DB Bulk Copy Operation Always Loads True into Bit Columns

I'm using the OLE DB bulk copy operation against a SQL Server database but I'm having trouble while loading data into bit columns - they are always populated with true!
I created a simple reproduction program from Microsoft's sample code with the snippet that I adjusted below. My program includes a little SQL script to create the destination table. I had to download and install the x64 version of the SQL Server OLE DB driver to build this.
// Set up custom bindings.
oneBinding.dwPart = DBPART_VALUE | DBPART_LENGTH | DBPART_STATUS;
oneBinding.iOrdinal = 1;
oneBinding.pTypeInfo = NULL;
oneBinding.obValue = ulOffset + offsetof(COLUMNDATA, bData);
oneBinding.obLength = ulOffset + offsetof(COLUMNDATA, dwLength);
oneBinding.obStatus = ulOffset + offsetof(COLUMNDATA, dwStatus);
oneBinding.cbMaxLen = 1; // Size of varchar column.
oneBinding.pTypeInfo = NULL;
oneBinding.pObject = NULL;
oneBinding.pBindExt = NULL;
oneBinding.dwFlags = 0;
oneBinding.eParamIO = DBPARAMIO_NOTPARAM;
oneBinding.dwMemOwner = DBMEMOWNER_CLIENTOWNED;
oneBinding.bPrecision = 0;
oneBinding.bScale = 0;
oneBinding.wType = DBTYPE_BOOL;
ulOffset = oneBinding.cbMaxLen + offsetof(COLUMNDATA, bData);
ulOffset = ROUND_UP(ulOffset, COLUMN_ALIGNVAL);
if (FAILED(hr =
pIFastLoad->QueryInterface(IID_IAccessor, (void**)&pIAccessor)))
return hr;
if (FAILED(hr = pIAccessor->CreateAccessor(DBACCESSOR_ROWDATA,
1,
&oneBinding,
ulOffset,
&hAccessor,
&oneStatus)))
return hr;
// Set up memory buffer.
pData = new BYTE[40];
if (!(pData /* = new BYTE[40]*/)) {
hr = E_FAIL;
goto cleanup;
}
pcolData = (COLUMNDATA*)pData;
pcolData->dwLength = 1;
pcolData->dwStatus = 0;
for (i = 0; i < 10; i++)
{
if (i % 2 == 0)
{
pcolData->bData[0] = 0x00;
}
else
{
pcolData->bData[0] = 0xFF;
}
if (FAILED(hr = pIFastLoad->InsertRow(hAccessor, pData)))
goto cleanup;
}
It's entirely likely that I'm putting the wrong value into the buffer, or have some other constant value incorrect. I did find an article describing the safety of various data type conversions and it looks like byte to bool is safe... but how would the buffer know what kind of data I'm putting in there if it's just a byte array?
Figured this out, I had not correctly switched over the demo from loading strings to fixed-width values. For strings, the data blob gets a 1-width pointer to the value whereas fixed-width values get the actual data.
So my COLUMNDATA struct now looks like this:
// How to lay out each column in memory.
struct COLUMNDATA {
DBLENGTH dwLength; // Length of data (not space allocated).
DWORD dwStatus; // Status of column.
VARIANT_BOOL bData; // Value, or if a string, a pointer to the value.
};
With the relevant length fix here:
pcolData = (COLUMNDATA*)pData;
pcolData->dwLength = sizeof(VARIANT_BOOL); // using a short.. make it two
pcolData->dwStatus = DBSTATUS_S_OK; // Indicates that the data value is to be used, not null
And the little value-setting for loop looks like this:
for (i = 0; i < 10; i++)
{
if (i % 2 == 0)
{
pcolData->bData = VARIANT_TRUE;
}
else
{
pcolData->bData = VARIANT_FALSE;
}
if (FAILED(hr = pIFastLoad->InsertRow(hAccessor, pData)))
goto cleanup;
}
I've updated the repository with the working code. I was inspired to make this change after reading the documentation for the obValue property.

Writing dataset of type H5T_ARRAY

I'm trying to write data in HDF5 using the C++ API.
I work on Windows XP 64 bits with Visual Studio 2010. I use version 1.8.9.
The target is set to X86 so I had to use the 32 bits version of HDF (to be honest, I'm very new to programming with Windows and VS and didn't configure the whole thing myself, so I'm really not sure it was the right choice).
My issue happens when trying to write a part of a dataset of type H5T_ARRAY.
The HDF5 file structure I want to achieve is a dataset of 4 dimensions (i1,i2,i3,i4), which datatype is : array of double with 2 dimensions (a1,a2).
Here is the DDL to sum it up :
HDF5 "result.h5" {
GROUP "/" {
DATASET "mydata" {
DATATYPE H5T_ARRAY { [a1][a2] H5T_IEEE_F64LE }
DATASPACE SIMPLE { ( i1,i2,i3,i4) / ( i1,i2,i3,i4 ) }
DATA { <my data> }
}
}
Due to my program structure, I write this dataset element by element, ie H5T_ARRAY by H5T_ARRAY.
I've defined a class OutputFile to manage all the HDF5 I/O. It contains these attributes :
H5::H5File *_H5fileHandle ; // HDF5 file
H5::DataSpace *_dataspaceHandle ; // Handle of the Dataspace of the datasets
int _dataspaceRank ; // Rank of the dataspace
H5::ArrayType *_datatypeHandle ; // Handle of the datatype of the datasets (= array of N dimensions)
int _datatypeRank ; // Rank of the datatype
H5::DataSet *_datasetHandle ; // Handle of the dataset
The file is open right at the beginning of the program, and all the handles (dataspace, datatype and dataset) are set then :
void OutputFile ::createFile(std::string filename,
std::vector<int> dsdims,
std::vector<int> adims,
std::vector<std::string> datasetName) {
_filename = filename ;
_H5fileHandle = new H5::H5File(_filename.c_str(), H5F_ACC_TRUNC);
// Defining the dataspace
_dataspaceRank = dsdims.size() ;
hsize_t *h5dsdims = new hsize_t[_dataspaceRank] ;
for (int iDim=0 ; iDim < _dataspaceRank ; iDim++) h5dsdims[iDim] = hsize_t(dsdims[iDim]) ;
_dataspaceHandle = new H5::DataSpace(_dataspaceRank, h5dsdims, NULL);
// Defining the datatype = array type
_datatypeRank = adims.size() ;
hsize_t *h5adims = new hsize_t[_datatypeRank] ;
for (int iDim=0 ; iDim < _datatypeRank ; iDim++) h5adims[iDim] = hsize_t(adims[iDim]) ;
_datatypeHandle = new H5::ArrayType(H5::PredType::IEEE_F64LE, _datatypeRank, h5adims);
// Creating the dataset
_datasetHandle = _H5fileHandle->createDataSet( _datasetName.c_str(),*_datatypeHandle, *_dataspaceHandle );
// Clean up
delete h5dsdims ;
delete h5adims ;
}
Then, I write the data each time I get an element ready (i.e a H5T_ARRAY) :
void OutputFile::writeMyData(double **Values, int *positionInDataSet) {
// set the element position
hsize_t position[1][4] ;
position[0][0] = hsize_t(positionInDataset[0]);
position[0][1] = hsize_t(positionInDataset[1]);
position[0][2] = hsize_t(positionInDataset[2]);
position[0][3] = hsize_t(positionInDataset[3]);
_fileDataspace->selectElements( H5S_SELECT_SET, 1, (const hsize_t *)position);
//Set the memory dataspace
hsize_t memdims[] = {1} ;
H5::DataSpace memspace(1, memdims, NULL);
// set the memory datatype
hsize_t memTypeRank = 2 ;
hsize_t *memTypedims = new hsize_t[memTypeRank] ;
for (int iDim=0 ; iDim < memTypeRank ; iDim++) memTypedims[iDim] = hsize_t(dataDims[iDim]) ;
H5::ArrayType memtypeHandle(H5::PredType::IEEE_F64LE, memTypeRank, memTypedims);
_datasetHandle->write(Values, memtypeHandle, memspace, *_dataspaceHandle);
_H5fileHandle->flush(H5F_SCOPE_GLOBAL) ;
}
The Values argument is allocated in the calling function, with size [a1][a2].
Unfortunately, it doesn't work properly. I get invalid data in my HDF5 file, and all elements are equal (meaning that all H5T_ARRAY contains the same values).
Exemple :
(0,0,0,0): [ 5.08271e-275, 5.08517e-275, -7.84591e+298, -2.53017e-098, 0, 2.18992e-303,
5.08094e-275, 0, 2.122e-314, -7.84591e+298, 5.08301e-275, 5.08652e-275,
-7.84591e+298, -2.53017e-098, 0, 2.18994e-303, 5.08116e-275, 0,
2.122e-314, -7.84591e+298, 5.08332e-275, 5.08683e-275, -7.84591e+298, -2.53017e-098,
0, 2.18995e-303, 5.08138e-275, 0, 2.122e-314, -7.84591e+298 ],
... and so on for every element.
For now, I have :
checked that the content of the "Value" array in writeMyData() is correct and contains valid data
checked that, if I only write one element, then this element, and only this one, contains invalid data in the HDF5 files (the other ones contain only zeroes)
used those additional type combinations, without success :
memType = NATIVE_DOUBLE, fileType = IEEE_64LE
memType = NATIVE_DOUBLE, fileType = NATIVE_DOUBLE
memType = IEEE_32LE, fileType = IEEE_32_LE
checked that double-value attributes are written correctly, using the type IEEE_F64LE
tried to close the file at the end of writeMyData(), and open it at the beginning, to force writing data on the disk. The results are the same.
passed &Values instead of Values in the call to DataSet::write() (the results are the same).
I'm a bit at my wits' end. I've found examples for partial I/0 of a dataset and others for array datatypes, but nothing for partial writing of array-type datasets.
I guess it's a memory issue, my feeling is that I do something wrong when passing the "Values" array to DataSet::write(), but I can't pinpoint the problem.
Thanks in advance for any pointers you have.

How to train in Matlab a model, save it to disk, and load in C++ program?

I am using libsvm version 3.16. I have done some training in Matlab, and created a model. Now I would like to save this model to disk and load this model in my C++ program. So far I have found the following alternatives:
This answer explains how to save a model from C++, which is based on this website. Not exactly what I need, but could be adapted. (This requires development time).
I could find the best training parameters (kernel,C) in Matlab and re-train everything in C++. (Will require doing the training in C++ each time I change a parameter. It's not scalable).
Thus, both of these options are not satisfactory,
Does anyone have an idea?
My solution was to retrain in C++ because I couldn't find a nice way to directly save the model. Here's my code. You'll need to adapt it and clean it up a bit. The biggest change you'll have to make it not hard coding the svm_parameter values like I did. You'll also have to replace FilePath with std::string. I'm copying, pasting and making small edits here in SO so the formatting won't e perfect:
Used like this:
auto targetsPath = FilePath("targets.txt");
auto observationsPath = FilePath("observations.txt");
auto targetsMat = MatlabMatrixFileReader::Read(targetsPath, ',');
auto observationsMat = MatlabMatrixFileReader::Read(observationsPath, ',');
auto v = MiscVector::ConvertVecOfVecToVec(targetsMat);
auto model = SupportVectorRegressionModel{ observationsMat, v };
std::vector<double> observation{ { // 32 feature observation
0.883575729725847,0.919446119013878,0.95359403450317,
0.968233630936732,0.91891307107125,0.887897763183844,
0.937588566544751,0.920582702918882,0.888864454119387,
0.890066735260163,0.87911085669864,0.903745573664995,
0.861069296586979,0.838606194934074,0.856376230548304,
0.863011311537075,0.807688936997926,0.740434984165146,
0.738498042748759,0.736410940165691,0.697228384912424,
0.608527698289016,0.632994967880269,0.66935784966765,
0.647761430696238,0.745961037635717,0.560761134660957,
0.545498063585615,0.590854855113663,0.486827902942118,
0.187128866890822,- 0.0746523069562551
} };
double prediction = model.Predict(observation);
miscvector.h
static vector<double> ConvertVecOfVecToVec(const vector<vector<double>> &mat)
{
vector<double> targetsVec;
targetsVec.reserve(mat.size());
for (size_t i = 0; i < mat.size(); i++)
{
targetsVec.push_back(mat[i][0]);
}
return targetsVec;
}
libsvmtargetobjectconvertor.h
#pragma once
#include "machinelearning.h"
struct svm_node;
class LibSvmTargetObservationConvertor
{
public:
svm_node ** LibSvmTargetObservationConvertor::ConvertObservations(const vector<MlObservation> &observations, size_t numFeatures) const
{
svm_node **svmObservations = (svm_node **)malloc(sizeof(svm_node *) * observations.size());
for (size_t rowI = 0; rowI < observations.size(); rowI++)
{
svm_node *row = (svm_node *)malloc(sizeof(svm_node) * numFeatures);
for (size_t colI = 0; colI < numFeatures; colI++)
{
row[colI].index = colI;
row[colI].value = observations[rowI][colI];
}
row[numFeatures].index = -1; // apparently needed
svmObservations[rowI] = row;
}
return svmObservations;
}
svm_node* LibSvmTargetObservationConvertor::ConvertMatToSvmNode(const MlObservation &observation) const
{
size_t numFeatures = observation.size();
svm_node *obsNode = (svm_node *)malloc(sizeof(svm_node) * numFeatures);
for (size_t rowI = 0; rowI < numFeatures; rowI++)
{
obsNode[rowI].index = rowI;
obsNode[rowI].value = observation[rowI];
}
obsNode[numFeatures].index = -1; // apparently needed
return obsNode;
}
};
machinelearning.h
#pragma once
#include <vector>
using std::vector;
using MlObservation = vector<double>;
using MlTarget = double;
//machinelearningmodel.h
#pragma once
#include <vector>
#include "machinelearning.h"
class MachineLearningModel
{
public:
virtual ~MachineLearningModel() {}
virtual double Predict(const MlObservation &observation) const = 0;
};
matlabmatrixfilereader.h
#pragma once
#include <vector>
using std::vector;
class FilePath;
// Matrix created with command:
// dlmwrite('my_matrix.txt', somematrix, 'delimiter', ',', 'precision', 15);
// In these files, each row is a matrix row. Commas separate elements on a row.
// There is no space at the end of a row. There is a blank line at the bottom of the file.
// File format:
// 0.4,0.7,0.8
// 0.9,0.3,0.5
// etc.
static class MatlabMatrixFileReader
{
public:
static vector<vector<double>> Read(const FilePath &asciiFilePath, char delimiter)
{
vector<vector<double>> values;
vector<double> valueline;
std::ifstream fin(asciiFilePath.Path());
string item, line;
while (getline(fin, line))
{
std::istringstream in(line);
while (getline(in, item, delimiter))
{
valueline.push_back(atof(item.c_str()));
}
values.push_back(valueline);
valueline.clear();
}
fin.close();
return values;
}
};
supportvectorregressionmodel.h
#pragma once
#include <vector>
using std::vector;
#include "machinelearningmodel.h"
#include "svm.h" // libsvm
class FilePath;
class SupportVectorRegressionModel : public MachineLearningModel
{
public:
SupportVectorRegressionModel::~SupportVectorRegressionModel()
{
svm_free_model_content(model_);
svm_destroy_param(&param_);
svm_free_and_destroy_model(&model_);
}
SupportVectorRegressionModel::SupportVectorRegressionModel(const vector<MlObservation>& observations, const vector<MlTarget>& targets)
{
// assumes all observations have same number of features
size_t numFeatures = observations[0].size();
//setup targets
//auto v = ConvertVecOfVecToVec(targetsMat);
double *targetsPtr = const_cast<double *>(&targets[0]); // why aren't the targets const?
LibSvmTargetObservationConvertor conv;
svm_node **observationsPtr = conv.ConvertObservations(observations, numFeatures);
// setup observations
//svm_node **observations = BuildObservations(observationsMat, numFeatures);
// setup problem
svm_problem problem;
problem.l = targets.size();
problem.y = targetsPtr;
problem.x = observationsPtr;
// specific to out training sets
// TODO: This is hard coded.
// Bust out these values for use in constructor
param_.C = 0.4; // cost
param_.svm_type = 4; // SVR
param_.kernel_type = 2; // radial
param_.nu = 0.6; // SVR nu
// These values are the defaults used in the Matlab version
// as found in svm_model_matlab.c
param_.gamma = 1.0 / (double)numFeatures;
param_.coef0 = 0;
param_.cache_size = 100; // in MB
param_.shrinking = 1;
param_.probability = 0;
param_.degree = 3;
param_.eps = 1e-3;
param_.p = 0.1;
param_.shrinking = 1;
param_.probability = 0;
param_.nr_weight = 0;
param_.weight_label = NULL;
param_.weight = NULL;
// suppress command line output
svm_set_print_string_function([](auto c) {});
model_ = svm_train(&problem, &param_);
}
double SupportVectorRegressionModel::Predict(const vector<double>& observation) const
{
LibSvmTargetObservationConvertor conv;
svm_node *obsNode = conv.ConvertMatToSvmNode(observation);
double prediction = svm_predict(model_, obsNode);
return prediction;
}
SupportVectorRegressionModel::SupportVectorRegressionModel(const FilePath & modelFile)
{
model_ = svm_load_model(modelFile.Path().c_str());
}
private:
svm_model *model_;
svm_parameter param_;
};
Option 1 is actually pretty reasonable. If you save the model in libsvm's C format through matlab, then it is straightforward to work with the model in C/C++ using functions provided by libsvm. Trying to work with matlab-formatted data in C++ will probably be much more difficult.
The main function in "svm-predict.c" (located in the root directory of the libsvm package) probably has most of what you need:
if((model=svm_load_model(argv[i+1]))==0)
{
fprintf(stderr,"can't open model file %s\n",argv[i+1]);
exit(1);
}
To predict a label for example x using the model, you can run
int predict_label = svm_predict(model,x);
The trickiest part of this will be to transfer your data into the libsvm format (unless your data is in the libsvm text file format, in which case you can just use the predict function in "svm-predict.c").
A libsvm vector, x, is an array of struct svm_node that represents a sparse array of data. Each svm_node has an index and a value, and the vector must be terminated by an index that is set to -1. For instance, to encode the vector [0,1,0,5], you could do the following:
struct svm_node *x = (struct svm_node *) malloc(3*sizeof(struct svm_node));
x[0].index=2; //NOTE: libsvm indices start at 1
x[0].value=1.0;
x[1].index=4;
x[1].value=5.0;
x[2].index=-1;
For SVM types other than the classifier (C_SVC), look at the predict function in "svm-predict.c".

BulkLoading the R* tree with spatialindex library

After successfully building the R* tree with spatial library inserting records one-by-one 2.5 million of times, I was trying to create the R* tree with bulkloading. I implemented the DBStream class to iteratively give the data to the BulkLoader. Essentially, it invokes the following method and prepared a Data (d variable in the code) object for the Bulkloader:
void DBStream::retrieveTuple() {
if (query.next()) {
hasNextBool = true;
int gid = query.value(0).toInt();
// allocate memory for bounding box
// this streets[gid].first returns bbox[4]
double* bbox = streets[gid].first;
// filling the bounding box values
bbox[0] = query.value(1).toDouble();
bbox[1] = query.value(2).toDouble();
bbox[2] = query.value(3).toDouble();
bbox[3] = query.value(4).toDouble();
rowId++;
r = new SpatialIndex::Region();
d = new SpatialIndex::RTree::Data((size_t) 0, (byte*) 0, *r, gid);
r->m_dimension = 2;
d->m_pData = 0;
d->m_dataLength = 0;
r->m_pLow = bbox;
r->m_pHigh = bbox + 2;
d->m_id = gid;
} else {
d = 0;
hasNextBool = false;
cout << "stream is finished d:" << d << endl;
}
}
I initialize the DBStream object and invoke the bulk loading in the following way:
// creating a main memory RTree
memStorage = StorageManager::createNewMemoryStorageManager();
size_t capacity = 1000;
bool bWriteThrough = false;
fileInMem = StorageManager
::createNewRandomEvictionsBuffer(*memStorage, capacity, bWriteThrough);
double fillFactor = 0.7;
size_t indexCapacity = 100;
size_t leafCapacity = 100;
size_t dimension = 2;
RTree::RTreeVariant rv = RTree::RV_RSTAR;
DBStream dstream();
tree = RTree::createAndBulkLoadNewRTree(SpatialIndex::RTree::BLM_STR, dstream,
*fileInMem,
fillFactor, indexCapacity,
leafCapacity, dimension, rv, indexIdentifier);
cout << "BulkLoading done" << endl;
Bulk loading calls my next() and hasNext() functions, retrieved my data, sorts it and then seg faults in the building phase. Any clues way? Yeah, the error is:
RTree::BulkLoader: Building level 0
terminate called after throwing an instance of 'Tools::IllegalArgumentException'
The problem supposedly lies in the memory allocation and a few bugs in the code (somewhat related to memory allocation too). Firstly one needs to properly assign the properties of the Data variable:
memcpy(data->m_region.m_pLow, bbox, 2 * sizeof(double));
memcpy(data->m_region.m_pHigh, bbox + 2, 2 * sizeof(double));
data->m_id = gid;
Second (and most importantly) getNext must return a new object with all the values:
RTree::Data *p = new RTree::Data(returnData->m_dataLength, returnData->m_pData,
returnData->m_region, returnData->m_id);
return returnData;
de-allocation of memory is done by RTree so no care is needed to be taken here.