I need to sum up thousands of histograms from one directory - c++

I have a directory Processed_Data with thousands of hists*****_blinded.root files. Each hists*****_blinded.root contains around 15 graphs and histograms in it. My goal is just to overlap 1 specific histogram sc***** from each file to get the final histogram finalhists_blinded.root which will represent all of those overlapped together.
I have tried the following macro:
void final()
{
TCanvas *time = new TCanvas("c1","overlap" ,600,1000);
time ->Divide(1,1);
time ->cd(1);
TH1F *h1 = new TH1F("h1","time" ,4096,0,4096);
ifstream in;
Float_t t;
Int_t nlines= 0;
in.open("Processed_Data", ios::in);
while (1) {
in >> t;
if (!in.good()) break;
h1->Fill(t);
nlines++;
}
in.close();
But I get the blank canvas at the end. The idea is to run each hists file through the code and add each one by one.
As a result, I want to see all those sc***** histograms overlapping so that the spikes in each of them will create a pattern in a finalhists_blinded.root file.

Shouldn't be that complicated, try this:
void overlap()
{
TCanvas *time = new TCanvas("c1", "overlap", 0, 0, 800, 600);
const char* histoname = "sc";
const int NFiles = 100000;
for (int fileNumber = 0; fileNumber < NFiles; fileNumber++)
{
TFile* myFile = TFile::Open(Form("Processed_Data/hists%i_blinded.root", fileNumber));
if (!myFile)
{
printf("Nope, no such file!\n");
return;
}
TH1* h1 = (TH1*)myFile->Get(histoname);
if (!h1)
{
printf("Nope, no such histogram!\n");
return;
}
h1->SetDirectory(gROOT);
h1->Draw("same");
myFile->Close();
}
}
It loops over all Processed_Data/histsXXXXXi_blinded.root files (given their names are Processed_Data/hists0_blinded.root, Processed_Data/hists1_blinded.root, Processed_Data/hists2_blinded.root, ..., Processed_Data/hists99998_blinded.root, Processed_Data/hists99999_blinded.root), opens each of them, grabs a 1D sc histogram, adds it to the canvas, closes the file and moves to the next file.

Related

How to get all tags from a tiff file with libtiff?

I have a tiff file and would like to get a list of all tags used in that file. If I understand the TiffGetField() function correctly, it only gets the values of tags specified. But how do I know what tags the file uses? I would like to get all used tags in the file. Is there an easy way to get them with libtiff?
It seems to be a very manual process from my experience. I used the TIFF tag reference here https://www.awaresystems.be/imaging/tiff/tifftags.html to create a custom structure
typedef struct
{
TIFF_TAGS_BASELINE Baseline;
TIFF_TAGS_EXTENSION Extension;
TIFF_TAGS_PRIVATE Private;
} TIFF_TAGS;
With each substructure custom defined. For example,
typedef struct
{
TIFF_UINT32_T NewSubfileType; // TIFFTAG_SUBFILETYPE
TIFF_UINT16_T SubfileType; // TIFFTAG_OSUBFILETYPE
TIFF_UINT32_T ImageWidth; // TIFFTAG_IMAGEWIDTH
TIFF_UINT32_T ImageLength; // TIFFTAG_IMAGELENGTH
TIFF_UINT16_T BitsPerSample; // TIFFTAG_BITSPERSAMPLE
...
char *Copyright; // TIFFTAG_COPYRIGHT
} TIFF_TAGS_BASELINE;
Then I have custom readers:
TIFF_TAGS *read_tiff_tags(char *filename)
{
TIFF_TAGS *tags = NULL;
TIFF *tif = TIFFOpen(filename, "r");
if (tif)
{
tags = calloc(1, sizeof(TIFF_TAGS));
read_tiff_tags_baseline(tif, tags);
read_tiff_tags_extension(tif, tags);
read_tiff_tags_private(tif, tags);
TIFFClose(tif);
}
return tags;
}
Where you have to manually read each field. Depending on if it's an array, you'll have to check the return status. For simple fields, it's something like
// The number of columns in the image, i.e., the number of pixels per row.
TIFFGetField(tif, TIFFTAG_IMAGEWIDTH, &tags->Baseline.ImageWidth);
but for array fields you'll need something like this
// The scanner model name or number.
status = TIFFGetField(tif, TIFFTAG_MODEL, &infobuf);
if (status)
{
len = strlen(infobuf);
tags->Baseline.Model = malloc(sizeof(char) * (len + 1));
_mysprintf(tags->Baseline.Model, (int)(len + 1), "%s", infobuf);
tags->Baseline.Model[len] = 0;
}
else
{
tags->Baseline.Model = NULL;
}
// For each strip, the byte offset of that strip.
status = TIFFGetField(tif, TIFFTAG_STRIPOFFSETS, &arraybuf);
if (status)
{
tags->Baseline.NumberOfStrips = TIFFNumberOfStrips(tif);
tags->Baseline.StripOffsets = calloc(tags->Baseline.NumberOfStrips, sizeof(TIFF_UINT32_T));
for (strip = 0; strip < tags->Baseline.NumberOfStrips; strip++)
{
tags->Baseline.StripOffsets[strip] = arraybuf[strip];
}
}
else
{
tags->Baseline.StripOffsets = NULL;
}
My suggestion is to only read the fields you want/need and ignore everything else. Hope that helps.

Overlap hundreds of histograms macro question

I have a directory trial which contains hundreds of histograms in it and a macro. Each is called in a way hists09876_blinded.root or hists12365_blinded.root. The order, however, is not like that. There are some missig histograms like hists10467_blinded.root hists10468_blinded.root hists10470_blinded.root. The ultimate goal is to get one histogram on a canvas which represents all of those combined together. The tricky thing is that each hists*****_blinded.root has around 15 1D histos in it, I need to pull out just one from each called sc*****.
I have 2 ideas, but I guess I should combine them together to get the final result.
First idea was to open histo by histo, but since there are some missed histos in the order, that does not work well.
void overlap()
{
TCanvas *time = new TCanvas("c1", "overlap", 0, 0, 800, 600);
const char* histoname = "sc";
const int NFiles = 256;
for (int fileNumber = 09675; fileNumber < NFiles; fileNumber++)
{
TFile* myFile = TFile::Open(Form("hists%i_blinded.root", fileNumber));
if (!myFile)
{
printf("Nope, no such file!\n");
return;
}
TH1* h1 = (TH1*)myFile->Get(histoname);
if (!h1)
{
printf("Nope, no such histogram!\n");
return;
}
h1->SetDirectory(gROOT);
h1->Draw("same");
myFile->Close();
}
}
After having read multiple posts on the pretty much the same question (1, 2, and this one) I have figured out what was wrong with my answer here: I did not know the file name may contain a zero if the number in its name is < 10000. Also, I failed to understand that the asterisks in the histogram name, which you refer to as sc*****, actually hide the same number as in the file name! I thought this was something completely different. So in that case I suggest you construct the file name and the histogram name you should be after in the same loop:
void overlap_v2()
{
TCanvas *time = new TCanvas("c1", "overlap", 0, 0, 800, 600);
const int firstNumber = 9675;
const int NFiles = 100000;
for (int fileNumber = firstNumber; fileNumber < firstNumber+NFiles; fileNumber++)
{
const char* filename = Form("trial/hists%05i_blinded.root", fileNumber);
TFile* myFile = TFile::Open(filename);
if (!myFile)
{
printf("Can not find a file named \"%s\"!\n", filename);
continue;
}
const char* histoname = Form("sc%05i", fileNumber);
TH1* h1 = (TH1*)myFile->Get(histoname);
if (!h1)
{
printf("Can not find a histogram named \"%s\" in the file named \"%s\"!\n", histoname, filename);
continue;
}
h1->SetDirectory(gROOT);
h1->Draw("same");
myFile->Close();
}
}
Since it is expected that some files are "missing", I suggest not to try to guess the names of the files that actually exist. Instead, use a function that lists all files in a given directory and from that list filter out those files that match the pattern of files you want to read. See for example these links for how to read the content of a directory in C++:
How can I get the list of files in a directory using C or C++?
http://www.martinbroadhurst.com/list-the-files-in-a-directory-in-c.html

How does AudioKit's AKNodeOutputPlot pull it's data?

I'm very new to the AudioKit framework and I have been trying to understand a bit more about the DSP side to it. Whilst rummaging around in the source code I realised that AKNodeOutputPlot does not pull data from the node the same way others would.
In the DSP code for the AKAmplitudeTracker an RMS value is calculated for each channel and the result is briefly written to the output buffer but at the end of the for loop the node is essentially bypassed by setting the output to the original input:
void process(AUAudioFrameCount frameCount, AUAudioFrameCount bufferOffset) override {
for (int frameIndex = 0; frameIndex < frameCount; ++frameIndex) {
int frameOffset = int(frameIndex + bufferOffset);
for (int channel = 0; channel < channels; ++channel) {
float *in = (float *)inBufferListPtr->mBuffers[channel].mData + frameOffset;
float temp = *in;
float *out = (float *)outBufferListPtr->mBuffers[channel].mData + frameOffset;
if (channel == 0) {
if (started) {
sp_rms_compute(sp, leftRMS, in, out);
leftAmplitude = *out;
} else {
leftAmplitude = 0;
}
} else {
if (started) {
sp_rms_compute(sp, rightRMS, in, out);
rightAmplitude = *out;
} else {
rightAmplitude = 0;
}
}
*out = temp;
}
}
}
This makes sense since outputting the RMS value to the device speakers would sound terrible but when this node is used as the input to the AKNodeOutputPlot object RMS values are plotted.
I assumed that the leftAmplitude and rightAmplitude variables were being referenced somewhere but even if they are zeroed out the plot works just fine. I'm interested in doing some work on the signal without effecting the output so I'd love it someone could help me figure how the AKPlot is grabbing this data.
Cheers
AKNodeOutputPlot works with something called a "tap":
https://github.com/AudioKit/AudioKit/blob/master/AudioKit/Common/User%20Interface/AKNodeOutputPlot.swift
There are also a few other taps that are not necessarily just for user interface purposes:
https://github.com/AudioKit/AudioKit/tree/master/AudioKit/Common/Taps
Taps allow you to inspect the data being pulled through another node without being inserted into the signal chain itself.

SoundTouch library messes up the ending when pitch-shifting

I'm using the SoundTouch library to pitch-shift some audio files. Everything works well, except the last few hundred milliseconds of the new audio file are not like the original file. Here is the original file:
And here's what I get after pitch-shifting:
As you can see the ending is not right. It's like there was nothing there in the original file, when there certainly is.
Here's the code I'm using:
int generateFile(WavInFile *file, SoundTouch *st, string fileName, int semitones)
{
const bool speech = true;
SAMPLETYPE samples[BUFF_SIZE];
WavOutFile *out = new WavOutFile(fileName.c_str(), (int)file->getSampleRate(), (int)file->getNumBits(), (int)file->getNumChannels());
int nChannels = (int)file->getNumChannels();
assert(nChannels > 0);
int num, nSamples;
int buffSizeSamples = BUFF_SIZE / nChannels;
st->setSampleRate((int)file->getSampleRate());
st->setChannels(nChannels);
st->setPitchSemiTones(semitones);
if (!speech)
{
st->setSetting(SETTING_USE_QUICKSEEK, 0);
st->setSetting(SETTING_USE_AA_FILTER, 0);
}
else
{
st->setSetting(SETTING_USE_QUICKSEEK, 0);
st->setSetting(SETTING_SEQUENCE_MS, 40);
st->setSetting(SETTING_SEEKWINDOW_MS, 15);
st->setSetting(SETTING_OVERLAP_MS, 8);
}
while (file->eof() == 0)
{
num = file->read(samples, BUFF_SIZE);
nSamples = num / (int)file->getNumChannels();
st->putSamples(samples, nSamples);
do
{
nSamples = st->receiveSamples(samples, buffSizeSamples);
out->write(samples, nSamples * nChannels);
} while (nSamples != 0);
}
st->flush();
do
{
nSamples = st->receiveSamples(samples, buffSizeSamples);
out->write(samples, nSamples * nChannels);
} while (nSamples != 0);
delete out;
return 0;
}
And yes, I delete WavInFile *file later in my code. So my question is- Why is SoundTouch doing this and how can I fix it?
Also I cannot simply cut the wrong part of the new audio file because I'm generating hundreds of files this way so cutting every single one of them would be...

How to train in Matlab a model, save it to disk, and load in C++ program?

I am using libsvm version 3.16. I have done some training in Matlab, and created a model. Now I would like to save this model to disk and load this model in my C++ program. So far I have found the following alternatives:
This answer explains how to save a model from C++, which is based on this website. Not exactly what I need, but could be adapted. (This requires development time).
I could find the best training parameters (kernel,C) in Matlab and re-train everything in C++. (Will require doing the training in C++ each time I change a parameter. It's not scalable).
Thus, both of these options are not satisfactory,
Does anyone have an idea?
My solution was to retrain in C++ because I couldn't find a nice way to directly save the model. Here's my code. You'll need to adapt it and clean it up a bit. The biggest change you'll have to make it not hard coding the svm_parameter values like I did. You'll also have to replace FilePath with std::string. I'm copying, pasting and making small edits here in SO so the formatting won't e perfect:
Used like this:
auto targetsPath = FilePath("targets.txt");
auto observationsPath = FilePath("observations.txt");
auto targetsMat = MatlabMatrixFileReader::Read(targetsPath, ',');
auto observationsMat = MatlabMatrixFileReader::Read(observationsPath, ',');
auto v = MiscVector::ConvertVecOfVecToVec(targetsMat);
auto model = SupportVectorRegressionModel{ observationsMat, v };
std::vector<double> observation{ { // 32 feature observation
0.883575729725847,0.919446119013878,0.95359403450317,
0.968233630936732,0.91891307107125,0.887897763183844,
0.937588566544751,0.920582702918882,0.888864454119387,
0.890066735260163,0.87911085669864,0.903745573664995,
0.861069296586979,0.838606194934074,0.856376230548304,
0.863011311537075,0.807688936997926,0.740434984165146,
0.738498042748759,0.736410940165691,0.697228384912424,
0.608527698289016,0.632994967880269,0.66935784966765,
0.647761430696238,0.745961037635717,0.560761134660957,
0.545498063585615,0.590854855113663,0.486827902942118,
0.187128866890822,- 0.0746523069562551
} };
double prediction = model.Predict(observation);
miscvector.h
static vector<double> ConvertVecOfVecToVec(const vector<vector<double>> &mat)
{
vector<double> targetsVec;
targetsVec.reserve(mat.size());
for (size_t i = 0; i < mat.size(); i++)
{
targetsVec.push_back(mat[i][0]);
}
return targetsVec;
}
libsvmtargetobjectconvertor.h
#pragma once
#include "machinelearning.h"
struct svm_node;
class LibSvmTargetObservationConvertor
{
public:
svm_node ** LibSvmTargetObservationConvertor::ConvertObservations(const vector<MlObservation> &observations, size_t numFeatures) const
{
svm_node **svmObservations = (svm_node **)malloc(sizeof(svm_node *) * observations.size());
for (size_t rowI = 0; rowI < observations.size(); rowI++)
{
svm_node *row = (svm_node *)malloc(sizeof(svm_node) * numFeatures);
for (size_t colI = 0; colI < numFeatures; colI++)
{
row[colI].index = colI;
row[colI].value = observations[rowI][colI];
}
row[numFeatures].index = -1; // apparently needed
svmObservations[rowI] = row;
}
return svmObservations;
}
svm_node* LibSvmTargetObservationConvertor::ConvertMatToSvmNode(const MlObservation &observation) const
{
size_t numFeatures = observation.size();
svm_node *obsNode = (svm_node *)malloc(sizeof(svm_node) * numFeatures);
for (size_t rowI = 0; rowI < numFeatures; rowI++)
{
obsNode[rowI].index = rowI;
obsNode[rowI].value = observation[rowI];
}
obsNode[numFeatures].index = -1; // apparently needed
return obsNode;
}
};
machinelearning.h
#pragma once
#include <vector>
using std::vector;
using MlObservation = vector<double>;
using MlTarget = double;
//machinelearningmodel.h
#pragma once
#include <vector>
#include "machinelearning.h"
class MachineLearningModel
{
public:
virtual ~MachineLearningModel() {}
virtual double Predict(const MlObservation &observation) const = 0;
};
matlabmatrixfilereader.h
#pragma once
#include <vector>
using std::vector;
class FilePath;
// Matrix created with command:
// dlmwrite('my_matrix.txt', somematrix, 'delimiter', ',', 'precision', 15);
// In these files, each row is a matrix row. Commas separate elements on a row.
// There is no space at the end of a row. There is a blank line at the bottom of the file.
// File format:
// 0.4,0.7,0.8
// 0.9,0.3,0.5
// etc.
static class MatlabMatrixFileReader
{
public:
static vector<vector<double>> Read(const FilePath &asciiFilePath, char delimiter)
{
vector<vector<double>> values;
vector<double> valueline;
std::ifstream fin(asciiFilePath.Path());
string item, line;
while (getline(fin, line))
{
std::istringstream in(line);
while (getline(in, item, delimiter))
{
valueline.push_back(atof(item.c_str()));
}
values.push_back(valueline);
valueline.clear();
}
fin.close();
return values;
}
};
supportvectorregressionmodel.h
#pragma once
#include <vector>
using std::vector;
#include "machinelearningmodel.h"
#include "svm.h" // libsvm
class FilePath;
class SupportVectorRegressionModel : public MachineLearningModel
{
public:
SupportVectorRegressionModel::~SupportVectorRegressionModel()
{
svm_free_model_content(model_);
svm_destroy_param(&param_);
svm_free_and_destroy_model(&model_);
}
SupportVectorRegressionModel::SupportVectorRegressionModel(const vector<MlObservation>& observations, const vector<MlTarget>& targets)
{
// assumes all observations have same number of features
size_t numFeatures = observations[0].size();
//setup targets
//auto v = ConvertVecOfVecToVec(targetsMat);
double *targetsPtr = const_cast<double *>(&targets[0]); // why aren't the targets const?
LibSvmTargetObservationConvertor conv;
svm_node **observationsPtr = conv.ConvertObservations(observations, numFeatures);
// setup observations
//svm_node **observations = BuildObservations(observationsMat, numFeatures);
// setup problem
svm_problem problem;
problem.l = targets.size();
problem.y = targetsPtr;
problem.x = observationsPtr;
// specific to out training sets
// TODO: This is hard coded.
// Bust out these values for use in constructor
param_.C = 0.4; // cost
param_.svm_type = 4; // SVR
param_.kernel_type = 2; // radial
param_.nu = 0.6; // SVR nu
// These values are the defaults used in the Matlab version
// as found in svm_model_matlab.c
param_.gamma = 1.0 / (double)numFeatures;
param_.coef0 = 0;
param_.cache_size = 100; // in MB
param_.shrinking = 1;
param_.probability = 0;
param_.degree = 3;
param_.eps = 1e-3;
param_.p = 0.1;
param_.shrinking = 1;
param_.probability = 0;
param_.nr_weight = 0;
param_.weight_label = NULL;
param_.weight = NULL;
// suppress command line output
svm_set_print_string_function([](auto c) {});
model_ = svm_train(&problem, &param_);
}
double SupportVectorRegressionModel::Predict(const vector<double>& observation) const
{
LibSvmTargetObservationConvertor conv;
svm_node *obsNode = conv.ConvertMatToSvmNode(observation);
double prediction = svm_predict(model_, obsNode);
return prediction;
}
SupportVectorRegressionModel::SupportVectorRegressionModel(const FilePath & modelFile)
{
model_ = svm_load_model(modelFile.Path().c_str());
}
private:
svm_model *model_;
svm_parameter param_;
};
Option 1 is actually pretty reasonable. If you save the model in libsvm's C format through matlab, then it is straightforward to work with the model in C/C++ using functions provided by libsvm. Trying to work with matlab-formatted data in C++ will probably be much more difficult.
The main function in "svm-predict.c" (located in the root directory of the libsvm package) probably has most of what you need:
if((model=svm_load_model(argv[i+1]))==0)
{
fprintf(stderr,"can't open model file %s\n",argv[i+1]);
exit(1);
}
To predict a label for example x using the model, you can run
int predict_label = svm_predict(model,x);
The trickiest part of this will be to transfer your data into the libsvm format (unless your data is in the libsvm text file format, in which case you can just use the predict function in "svm-predict.c").
A libsvm vector, x, is an array of struct svm_node that represents a sparse array of data. Each svm_node has an index and a value, and the vector must be terminated by an index that is set to -1. For instance, to encode the vector [0,1,0,5], you could do the following:
struct svm_node *x = (struct svm_node *) malloc(3*sizeof(struct svm_node));
x[0].index=2; //NOTE: libsvm indices start at 1
x[0].value=1.0;
x[1].index=4;
x[1].value=5.0;
x[2].index=-1;
For SVM types other than the classifier (C_SVC), look at the predict function in "svm-predict.c".