Generate custom header file with CMake - c++

I want to generate a custom header file and was wondering if CMake has something that I can use without writing my own generator. It would have some "one time" items and some items I need to generate based on a loop.
For example, the desired output would look like:
//One time stuff
#define ABC 1
#define DEF 2
#define GHI 3
#define JKL 4
//Stuff generated in loop
//Iteration with params Item1, 1, 2
const int prefix_Item1_1 = 2;
const int prefix2_1_2 = 0;
//Iteration with params Item2, 3, 4
const int prefix_Item2_3 = 4;
const int prefix2_3_4 = 0;
//Iteration with params Item5, 6, 7
const int prefix_Item5_6 = 7;
const int prefix2_6_7 = 0;
For input, I would provide the following in some form:
Item1, 1, 2
Item2, 3, 4
Item5, 6, 7

CMake provides a few utilities for generating and writing to files at the configure stage. To start off, we can put the "One time stuff" in a template file, and use configure_file() to generate a header file from it:
# Generate header.hpp from your template file header.hpp.in
configure_file(${CMAKE_CURRENT_LIST_DIR}/header.hpp.in
${CMAKE_CURRENT_LIST_DIR}/header.hpp COPYONLY
)
The template file header.hpp.in can simply contain this:
//One time stuff
#define ABC 1
#define DEF 2
#define GHI 3
#define JKL 4
//Stuff generated in loop
Next, we can use CMake's file() and string() utilities to read an input file (CSV-formatted in this example), parse the contents, and write the rest of the header file. So, test.csv would contain the input, and we can do something like this:
# Read the entire CSV file.
file(READ ${CMAKE_CURRENT_LIST_DIR}/test.csv CSV_CONTENTS)
# Split the CSV by new-lines.
string(REPLACE "\n" ";" CSV_LIST ${CSV_CONTENTS})
# Loop through each line in the CSV file.
foreach(CSV_ROW ${CSV_LIST})
# Get a list of the elements in this CSV row.
string(REPLACE "," ";" CSV_ROW_CONTENTS ${CSV_ROW})
# Get variables to each element.
list(GET CSV_ROW_CONTENTS 0 ELEM0)
list(GET CSV_ROW_CONTENTS 1 ELEM1)
list(GET CSV_ROW_CONTENTS 2 ELEM2)
# Append these lines to header.hpp, using the elements from the current CSV row.
file(APPEND ${CMAKE_CURRENT_LIST_DIR}/header.hpp
"
//Iteration with params ${ELEM0}, ${ELEM1}, ${ELEM2}
const int prefix_${ELEM0}_${ELEM1} = ${ELEM2};
const int prefix2_${ELEM1}_${ELEM2} = 0;
"
)
endforeach()
This will work for an arbitrary number of rows in the input CSV file. While this is one solution, something less verbose is certainly possible using regex. Note, this works best if your CSV doesn't contain spaces!
The completed header.hpp:
//One time stuff
#define ABC 1
#define DEF 2
#define GHI 3
#define JKL 4
//Stuff generated in loop
//Iteration with params Item1, 1, 2
const int prefix_Item1_1 = 2;
const int prefix2_1_2 = 0;
//Iteration with params Item2, 3, 4
const int prefix_Item2_3 = 4;
const int prefix2_3_4 = 0;
//Iteration with params Item5, 6, 7
const int prefix_Item5_6 = 7;
const int prefix2_6_7 = 0;

Related

format specifier of percent sign with Poco Formatter

while using Poco::format
I am trying to print the following line:
"3% and 5%"
int var1 = 3
int var2 = 5
std::string s;
s = format("%?d%% and %?d%%",var1,var2);
instead of getting "3% and 5%",
s equals "3 ?d%"
What am I doing wrong?

Ideal architecture design for namespace parameters

I have 3 header files
// a1.h
namespace a
{
enum abc:uint8
{
abc1 = 1
abc2 = 2
};
}
// a2.h
namespace b
{
enum abc:uint8
{
abc1 = 1
abc2 = 2
abc3 = 3
};
}
// out.h
namespace out
{
enum abc:uint8
{
abc1 = 1
abc2 = 2
abc3 = 3
};
}
I want to apply some operation dosomething(a::abc, &out::abc) or dosomething(b::abc, &out::abc) on the enum where I simply map input from (a::abc or b::abc) to output (out::abc) using switch statements. The easiest solution would be to write two separate functions for different namespaces.
I am wondering if the dosomething function can be templatized given that
Enum values are same (number of enum values are same e.g all have abc1, abc2)
Enum values are different (namespace b::abc::abc1 = 3, b::abc::abc2 = 4)
New enum value introduced (eg. b::abc::abc3)
This would avoid code duplication and make the design extendable.
I have constraint that I cannot modify header files.
It is quite doable with a static_cast, you don't need a separate function at all:
a::abc A = a::abc2;
b::abc B = b::abc3;
a::abc A2 = static_cast<a::abc>(B);
cout << A << " " << A2 << " " << B;
Outputs 2 3 3 as expected.
https://www.ideone.com/pKltlP

How to query list of variables in Matlab struct matching a certain pattern?

Suppose I have the following struct in Matlab (read from a JSON file):
>>fs.
fs.dichte fs.hoehe fs.ts2
fs.temperatur fs.ts3 fs.viskositaet
fs.ts1 fs.ts4
Each one of the fs.ts* components contains another struct. In this particular case, the index of ts goes from 1 to 4, but in another case it could as well be 2 or 7. You get the idea, right? I want the program to be flexible enough to handle any possible input.
So my question comes down to: How to query the maximum index of ts?
In an ideal world, this would work:
who fs.ts*
But unfortunately, it just returns nothing. Any ideas?
(Btw, I'm using Octave and don't have Matlab available for testing; however, there should really be a solution to this which works in both environments.)
You can use fieldnames to get all field names of the struct, then use regexp to extract the ones that start with ts and extract the number. Then you can compare the numbers to find the largest.
fields = fieldnames(fs);
number = str2double(regexp(fields, '(?<=^ts)\d+$', 'once', 'match'));
numbers = number(~isnan(number));
[~, ind] = max(number);
max_field = fields{ind};
max_value = fs.(max_field);
Not an answer to your exact question but sounds like instead of tsN fields, you should have a single ts field with a list.
Tip: every time you see a number in a variable or field name, think whether you shouldn't be using a vector/array/list instead.
This is true for all languages but more so for Octave since everything is arrays. Even if you have three field named ts1, ts2, and ts3 with scalars values, what you really have is three fields whose values are an array of size 1x1.
In Octave you can have two things. Either the value of ts is a cell array, each element of the cell array a scalar struct; or is a struct array. Use a cell array of structs when each struct has different keys, use a struct array when all structs have the same keys.
Struct array
octave> fs.ts = struct ("foo", {1, 2, 3, 4}, "bar", {"a", "b", "c", "d"});
octave> fs.ts # all keys/fields in the ts struct array
ans =
1x4 struct array containing the fields:
foo
bar
octave> fs.ts.foo # all foo values in the ts struct array
ans = 1
ans = 2
ans = 3
ans = 4
octave> numel (fs.ts) # number of structs in the ts struct array
ans = 4
octave> fs.ts(1) # struct 1 in the ts struct array
ans =
scalar structure containing the fields:
foo = 1
bar = a
octave> fs.ts(1).foo # foo value of the struct 1
ans = 1
Cell array of scalar structs
However, I'm not sure if JSON supports anything like struct arrays, you will probably need to have a list of structs. In that case, you will end up with a cell array of struct scalars.
octave> fs.ts = {struct("foo", 1, "bar", "a"), struct("foo", 2, "bar", "b"), struct("foo", 3, "bar", "c"), struct("foo", 4, "bar", "d"),};
octave> fs.ts # note, you actually have multiple structs
ans =
{
[1,1] =
scalar structure containing the fields:
foo = 1
bar = a
[1,2] =
scalar structure containing the fields:
foo = 2
bar = b
[1,3] =
scalar structure containing the fields:
foo = 3
bar = c
[1,4] =
scalar structure containing the fields:
foo = 4
bar = d
}
octave-gui:28> fs.ts{1} # get struct 1
ans =
scalar structure containing the fields:
foo = 1
bar = a
octave-gui:29> fs.ts{1}.foo # value foo from struct 1
ans = 1

Extracting Specific Columns from Multiple Files & Writing to File Python

I have seven tab delimited files, each file has the exact number and name of the columns but different data of each. Below is a sample of how either of the seven files looks like:
test_id gene_id gene locus sample_1 sample_2 status value_1 value_2 log2(fold_change)
000001 000001 ZZ 1:1 01 01 NOTEST 0 0 0 0 1 1 no
I am trying to basically read all of those seven files and extract the third, fourth and tenth column (gene, locus, log2(fold_change)) And write those columns in a new file. So the file look something like this:
gene name locus log2(fold_change) log2(fold_change) log2(fold_change) log2(fold_change) log2(fold_change) log2(fold_change) log2(fold_change)
ZZ 1:1 0 0 0 0
all the log2(fold_change) are obtain from the tenth column from each of the seven files
What I had so far is this and need help constructing a more efficient pythonic way to accomplish the task above, note that the code is still not accomplish the task explained above, need some work
dicti = defaultdict(list)
filetag = []
def read_data(file, base):
with open(file, 'r') as f:
reader = csv.reader((f), delimiter='\t')
for row in reader:
if 'test_id' not in row[0]:
dicti[row[2]].append((base, row))
name_of_fold = raw_input("Folder name to stored output files in: ")
for file in glob.glob("*.txt"):
base=file[0:3]+"-log2(fold_change)"
filetag.append(base)
read_data(file, base)
with open ("output.txt", "w") as out:
out.write("gene name" + "\t"+ "locus" + "\t" + "\t".join(sorted(filetag))+"\n")
for k,v in dicti:
out.write(k + "\t" + v[1][1][3] + "\t" + "".join([ int(z[0][0:3]) * "\t" + z[1][9] for z in v ])+"\n")
So, the code above is a working code but is not what I am looking for here is why. The output code is the issue, I am writing a tab delimited output file with the gene at the first column (k), v[1][1][3] is the locus of that particular gene, and finally which is what I am having tough time coding is this is part of the output file:
"".join([ int(z[0][0:3]) * "\t" + z[1][9] for z in v ])
I am trying to provide a list of fold change from each of the seven file at that particular gene and locus and then write it to the correct column number, so I am basically multiply the column number of which file number is by "\t" this will insure that the value will go to the right column, the problem is that when the next column of another file comes a long, the writing will be starting from where it left off from writing which I don't want, I want to start again from the beginning of the writing:
Here is what I mean for instance,
gene name locus log2(fold change) from file 1 .... log2(fold change) from file7
ZZ 1:3 0
0
because first log2 will be recorded based on the column number for instance 2 and that is to ensure recording, I am multiplying the number of column (2) by "\t" and fold_change value , it will record it no problem but then last column will be the seventh for instance and will not record to the seven because the last writing was done.
Here is my first approach:
import glob
import numpy as np
with open('output.txt', 'w') as out:
fns = glob.glob('*.txt') # Here you can change the pattern of the file (e.g. 'file_experiment_*.txt')
# Title row:
titles = ['gene_name', 'locus'] + [str(file + 1) + '_log2(fold_change)' for file in range(len(fns))]
out.write('\t'.join(titles) + '\n')
# Data row:
data = []
for idx, fn in enumerate(fns):
file = np.genfromtxt(fn, skip_header=1, usecols=(2, 3, 9), dtype=np.str, autostrip=True)
if idx == 0:
data.extend([file[0], file[1]])
data.append(file[2])
out.write('\t'.join(data))
Content of the created file output.txt (Note: I created just three files for testing):
gene_name locus 1_log2(fold_change) 2_log2(fold_change) 3_log2(fold_change)
ZZ 1:1 0 0 0
I am using re instead of csv. The main problem with you code is the for loop which writes the output in the file. I am writing the complete code. Hope this solves problem you have.
import collections
import glob
import re
dicti = collections.defaultdict(list)
filetag = []
def read_data(file, base):
with open(file, 'r') as f:
for row in f:
r = re.compile(r'([^\s]*)\s*')
row = r.findall(row.strip())[:-1]
print row
if 'test_id' not in row[0]:
dicti[row[2]].append((base, row))
def main():
name_of_fold = raw_input("Folder name to stored output files in: ")
for file in glob.glob("*.txt"):
base=file[0:3]+"-log2(fold_change)"
filetag.append(base)
read_data(file, base)
with open ("output", "w") as out:
data = ("genename" + "\t"+ "locus" + "\t" + "\t".join(sorted(filetag))+"\n")
r = re.compile(r'([^\s]*)\s*')
data = r.findall(data.strip())[:-1]
out.write('{0[1]:<30}{0[2]:<30}{0[3]:<30}{0[4]:<30}{0[5]:<30} {0[6]:<30}{0[7]:<30}{0[8]:<30}'.format(data))
out.write('\n')
for key in dicti:
print 'locus = ' + str(dicti[key][1])
data = (key + "\t" + dicti[key][1][1][3] + "\t" + "".join([ len(z[0][0:3]) * "\t" + z[1][9] for z in dicti[key] ])+"\n")
data = r.findall(data.strip())[:-1]
out.write('{0[0]:<30}{0[1]:<30}{0[2]:<30}{0[3]:<30}{0[4]:<30}{0[5]:<30}{0[6]:<30}{0[7]:<30}{0[8]:<30}'.format(data))
out.write('\n')
if __name__ == '__main__':
main()
and i change the name of the output file from output.txt to output as the former may interrupt the code as code considers all .txt files. And I am attaching the output i got which i assume the format that you wanted.
Thanks
gene name locus 1.t-log2(fold_change) 2.t-log2(fold_change) 3.t-log2(fold_change) 4.t-log2(fold_change) 5.t-log2(fold_change) 6.t-log2(fold_change) 7.t-log2(fold_change)
ZZ 1:1 0 0 0 0 0 0 0
Remember to append \n to the end of each line to create a line break. This method is very memory efficient, as it just processes one row at a time.
import csv
import os
import glob
# Your folder location where the input files are saved.
name_of_folder = '...'
output_filename = 'output.txt'
input_files = glob.glob(os.path.join(name_of_folder, '*.txt'))
with open(os.path.join(name_of_folder, output_filename), 'w') as file_out:
headers_read = False
for input_file in input_files:
if input_file == os.path.join(name_of_folder, output_filename):
# If the output file is in the list of input files, ignore it.
continue
with open(input_file, 'r') as fin:
reader = csv.reader(fin)
if not headers_read:
# Read column headers just once
headers = reader.next()[0].split()
headers = headers[2:4] + [headers[9]]
file_out.write("\t".join(headers + ['\n'])) # Zero based indexing.
headers_read = True
else:
_ = reader.next() # Ignore header row.
for line in reader:
if line: # Ignore blank lines.
line_out = line[0].split()
file_out.write("\t".join(line_out[2:4] + [line_out[9]] + ['\n']))
>>> !cat output.txt
gene locus log2(fold_change)
ZZ 1:1 0
ZZ 1:1 0

Rcpp Create DataFrame with Variable Number of Columns

I am interested in using Rcpp to create a data frame with a variable number of columns. By that, I mean that the number of columns will be known only at runtime. Some of the columns will be standard, but others will be repeated n times where n is the number of features I am considering in a particular run.
I am aware that I can create a data frame as follows:
IntegerVector i1(3); i1[0]=4;i1[1]=2134;i1[2]=3453;
IntegerVector i2(3); i2[0]=4123;i2[1]=343;i2[2]=99123;
DataFrame df = DataFrame::create(Named("V1")=i1,Named("V2")=i2);
but in this case it is assumed that the number of columns is 2.
To simplify the explanation of what I need, assume that I would like pass a SEXP variable specifying the number of columns to create in the variable part. Something like:
RcppExport SEXP myFunc(SEXP n, SEXP <other stuff>)
IntegerVector i1(3); <compute i1>
IntegerVector i2(3); <compute i2>
for(int i=0;i<n;i++){compute vi}
DataFrame df = DataFrame::create(Named("Num")=i1,Named("ID")=i2,...,other columns v1 to vn);
where n is passed as an argument. The final data frame in R would look like
Num ID V1 ... Vn
1 2 5 'aasda'
...
(In reality, the column names will not be of the form "Vx", but they will be known at runtime.) In other words, I cannot use a static list of
Named()=...
since the number will change.
I have tried skipping the "Named()" part of the constructor and then naming the columns at the end, but the results are junk.
Can this be done?
If I understand your question correctly, it seems like it would be easiest to take advantage of the DataFrame constructor that takes a List as an argument (since the size of a List can be specified directly), and set the names of your columns via .attr("names") and a CharacterVector:
#include <Rcpp.h>
// [[Rcpp::export]]
Rcpp::DataFrame myFunc(int n, Rcpp::List lst,
Rcpp::CharacterVector Names = Rcpp::CharacterVector::create()) {
Rcpp::List tmp(n + 2);
tmp[0] = Rcpp::IntegerVector(3);
tmp[1] = Rcpp::IntegerVector(3);
Rcpp::CharacterVector lnames = Names.size() < lst.size() ?
lst.attr("names") : Names;
Rcpp::CharacterVector names(n + 2);
names[0] = "Num";
names[1] = "ID";
for (std::size_t i = 0; i < n; i++) {
// tmp[i + 2] = do_something(lst[i]);
tmp[i + 2] = lst[i];
if (std::string(lnames[i]).compare("") != 0) {
names[i + 2] = lnames[i];
} else {
names[i + 2] = "V" + std::to_string(i);
}
}
Rcpp::DataFrame result(tmp);
result.attr("names") = names;
return result;
}
There's a little extra going on there to allow the Names vector to be optional - e.g. if you just use a named list you can omit the third argument.
lst1 <- list(1L:3L, 1:3 + .25, letters[1:3])
##
> myFunc(length(lst1), lst1, c("V1", "V2", "V3"))
# Num ID V1 V2 V3
#1 0 0 1 1.25 a
#2 0 0 2 2.25 b
#3 0 0 3 3.25 c
lst2 <- list(
Column1 = 1L:3L,
Column2 = 1:3 + .25,
Column3 = letters[1:3],
Column4 = LETTERS[1:3])
##
> myFunc(length(lst2), lst2)
# Num ID Column1 Column2 Column3 Column4
#1 0 0 1 1.25 a A
#2 0 0 2 2.25 b B
#3 0 0 3 3.25 c C
Just be aware of the 20-length limit for this signature of the DataFrame constructor, as pointed out by #hrbrmstr.
It's an old question, but I think more people are struggling with this, like me. Starting from the other answers here, I arrived at a solution that isn't limited by the 20 column limit of the DataFrame constructor:
// [[Rcpp::plugins(cpp11)]]
#include <Rcpp.h>
#include <string>
#include <iostream>
using namespace Rcpp;
// [[Rcpp::export]]
List variableColumnList(int numColumns=30) {
List retval;
for (int i=0; i<numColumns; i++) {
std::ostringstream colName;
colName << "V" << i+1;
retval.push_back( IntegerVector::create(100*i, 100*i + 1),colName.str());
}
return retval;
}
// [[Rcpp::export]]
DataFrame variableColumnListAsDF(int numColumns=30) {
Function asDF("as.data.frame");
return asDF(variableColumnList(numColumns));
}
// [[Rcpp::export]]
DataFrame variableColumnListAsTibble(int numColumns=30) {
Function asTibble("tbl_df");
return asTibble(variableColumnList(numColumns));
}
So build a C++ List first by pushing columns onto an empty List. (I generate the values and the column names on the fly here.) Then, either return that as an R list, or use one of two helper functions to convert them into a data.frame or tbl_df. One could do the latter from R, but I find this cleaner.