Parsing a CSV file using MS Text Driver / CDatabase

Parsing a CSV file using MS Text Driver / CDatabase - c++

I have a CSV file that I am trying to process that looks similar to the one below. This format was is already in use within legacy software so unfortunately I can't change it. As you can see the file is separated into two sections- in this example one section for items, and one section for parts within those items.
ID,Description,Make,Model,Serial,Parts
200,Fridge,Samsung,S4450,SX05948596,1x34.4x22
354,Dishwasher,Bobs,BB45,BFDD34848,3x34.1x55.4x2
ENDITEMS
STARTPARTS
ID,Description,Price,Created
34,Bolt,4.33,08/05/15
22,Nut,1.20,10/10/12
ENDPARTS
I am currently trying to use the Microsoft text driver and CDatabase/CRecordset to parse the file, but am running into an issue. It seems the engine gets confused with some fields about what data type to use - specifically the fields where the first section and the second section use differing types.
For example, if I call recordSet.GetFieldValue() on index 1 (Description), that is fine and it parses it as a string as both section 1 and 2 use strings. If I were to call it on index 4 I run into issues - for the first section that index (Model) uses strings, but in the second section that index (Created) uses a date type. The results in the GetFieldValue() call returning null.
I have tried calling CRecordSet.GetFieldValue(index, CDBVariant, SQL_C_CHAR) to force it to read the index as a string, but I'm still getting a null returned. If possible I'd like to avoid having to chop up the file before parsing.
Now I'm fairly new to C++ still, so there may be some glaring errors here, but here is my test code (the printType() method just prints out the type and value of the CDBVarient):
CString fileDir = "C:\\";
CString fileName = "test.CSV";
CString conString;
CString queryString;
conString.Format("DRIVER={Microsoft Text Driver (*.txt; *.csv)};DSN='';DBQ=%s;", fileDir);
queryString.Format("SELECT * FROM [%s]", fileName);
CDatabase db;
CRecordset rs;
rs.m_pDatabase = &db;
db.OpenEx(conString);
if (rs.Open(AFX_DB_USE_DEFAULT_TYPE, queryString))
{
int count = rs.GetODBCFieldCount();
for (int i = 0; i < count; i++)
{
CODBCFieldInfo fieldInfo;
rs.GetODBCFieldInfo(i, fieldInfo);
CDBVariant varValue;
rs.GetFieldValue(i, varValue, SQL_C_CHAR);
std::cout << i << " " << fieldInfo.m_strName << ":\t";
printType(&varValue);
std::cout << std::endl;
}
}

Related

InfluxDB Query string field data return null - c++

I'm trying to query a Field set Element of line protocol containing string data type using c++.
The library is https://github.com/offa/influxdb-cxx
The influxdb version is 1.8
The UI is Chronograf, version 1.8
excepted outcome may look like this :
double_field=double_value, int_field=int_value, string_field=string_value, longlong_field=longlong_value
But actual outcome was :
double_field=double_value, int_field=int_value, longlong_field=longlong_value
It doesn't contain any string data. I'm sure the data have been uploaded to the database since I can see a table showing the string on the UI.
My code is as following:
db_influxdb->write(influxdb::Point{"test"}
.addField("int_field",3)
.addField("longlong_field",1234LL)
.addField("string_field","string value")
.addField("double_field",3.859));
std::vector<influxdb::Point> TEST = db_influxdb->query("SELECT * FROM test");
for(int i=0 ; i < TEST.size() ; i++){
cout << TEST[i].getFields() << "\n" ;
}
Does anyone know why the string data I got is null?
Please help me, thinks!

Access protocol buffers extension fields

I am working with protocol buffers in C++. My message has only one extension range. And I want to access all the extension fields without knowing their name, using only their numbers. How can I do this??
message Base {
optional int32 id = 1;
extensions 1000 to 1999;
}
extend Base{
optional int32 id2 = 1000;
}
Up till now, I have obtained ExtensionRange.
const google::protobuf::Descriptor::ExtensionRange* rng = desc->extension_range(0);
std::cerr << "rng " << rng->start << " " << rng->end << std::endl;
But I donot know to to get the Fielddescriptor* of the extensions.
There is one weird thing and that is extension_count() is returning 0. Although I have used extension in my .proto file. Similarly FindExtensionBy[Name/number] are not working as expected?

I found a solution using reflection.
const Reflection* ref = message_.GetReflection();
const FieldDescriptor* cfield = ref->FindKnownExtensionByNumber(33);
std::cerr << "cfield->name() " << cfield->name() << std::endl;
Now my existing solution will be to loop for all the numbers in extension range and get the required Fielddescriptors of the extensions.
I am still waiting for any better/different solution, you guys.

To cite from the official descriptor.h documentation:
To get a FieldDescriptor for an extension, do one of the following:
Get the Descriptor or FileDescriptor for its containing scope, then
call Descriptor::FindExtensionByName() or
FileDescriptor::FindExtensionByName().
Given a DescriptorPool, call
DescriptorPool::FindExtensionByNumber().
Given a Reflection for a
message object, call Reflection::FindKnownExtensionByName() or
Reflection::FindKnownExtensionByNumber(). Use DescriptorPool to
construct your own descriptors.
The reason why extension_count() is returning 0 is that it tells you the number of nested extension declarations (for other message types).

Write output to new files at every nth iteration

A fragment of my code is :
for (int iter = 0; iter < flags.total_iterations_; ++iter) {
if (iter%20==0) {
std::ofstream mf(flags.model_file_.c_str());
accum_model.AppendAsString(word_index_map, mf); }
else {
std::cout << "Model not created for "; }
std::cout << "Iteration " << iter << " ...\n";
So, I am trying to generate outputs from method accum_model at every 20th iteration. But, the problem is I have to write the output in new file everytime the 20th iteration is reached. Now, my output is being overwritten.
I execute this code with the help of a executible, which is as:
./lda --num_topics 15 --alpha 0.1 --beta 0.01 --training_data_file testdata/test_data.txt --model_file MF/lda_model.txt --burn_in_iterations 120 --total_iterations 150
The MF/lda_model.txt is the output file given. I am not understanding how to link the file that contains the code and this executible command as I would need 5 different new files (for 100 iterations - as data is written into a new file every 20th iteration).
I am new to coding and so far, I was coding in python. I tried till this loop, I am confused about how to create new files and get corresponding outputs. Please help! Thanks in advance.

Use std::stringstream, and build a new file name to open each time.
std::string uniquePathFileNamePostFix (int i) {
std::stringstream ss;
ss << '-' << i ;
return (ss.str());
}
The idea is to use the stringstream to create (or append or prepend) a unique modifier based on i. (or anything else convenient - I have used time stamps).

If I understand your question correctly, you are overwriting the ofstream instead of appending to it.
You'll want to specify the 'app' flag in the mode of the ofstream constructor:
std::ofstream mf(flags.model_file_.c_str(), std::ios_base::app);
If you need to start the output with an new, "empty" file, just leave out the mode (ofstream defaults to std::ios_base:out, whether you specify it or not): std::ofstream::ofstream
...and if you need a new output file (according to your OP), you need to change the file name in flags.model_file_.

I'm not sure that I understand your question correctly, but I think you want to output every 20th iteration in a new file. To do so, you just need to append the value of theiter to the name of the file or otherwise add a "dynamic" element to it.
The way to do it using only standard C++ is using stringstream:
std::stringstream file_name;
file_name << flags.model_file_ << iter;
result = file_name.str();

Creation of RRD files with C++ on Raspberry behaves strangely

I want to use my Raspberry Pi to record temperature from a series of sensors. For this purpose I am writing a C++ program which uses librrd.
For every connected sensor I want to create a rrd with 12 rra. The following call should create my wanted rrd:
rrd_create(mNumberOfCreateParams, mCreateParams);
mNumberOfCreateParams is 17 and the content of mCreateParams is the following:
rrdcreate
28-000005fd934f.rrd
--step=300
--no-overwrite
DS:temperature:GAUGE:600:-55:125
RRA:AVERAGE:0.5:1:288
RRA:AVERAGE:0.5:3:672
RRA:AVERAGE:0.5:12:744
RRA:AVERAGE:0.5:72:1464
RRA:MAX:0.5:1:288
RRA:MAX:0.5:3:672
RRA:MAX:0.5:12:744
RRA:MAX:0.5:72:1464
RRA:MIN:0.5:1:288
RRA:MIN:0.5:3:672
RRA:MIN:0.5:12:744
RRA:MIN:0.5:72:1464
The second line changes each time corresponding to the id of the sensor.
Now the problem: Some time the call to rrd_create works as intended but at some point it stops working and just creates errors on further calls. This is even true if I want to recreate an rrd which was successfully created previously.
By changing mNumberOfCreateParams I can alter the number of parsed arguments. If the parameter is in range of 13 to 17 the error returned by rrd_get_error() is "can't parse argument ' ' " (added space between ' for readability). If I let the function parse 10 to 12 parameters it will "work" the first time and return "opening '#': No such file or directory" the second time because the first time the following file was created:
image of created file in file browser
If the number of parsed parameters is below 10 it is working as intended.
There isn't any difference if I change the order of the RRA lines.
If I call rrdtool create [...same parameters as above] from terminal everything works fine indifferent how many parameters are parsed.
In hopes of rrd_create again working I restarted the Raspberry serveral times and it even worked once for a short amount of time (one run of my application).
Are there any suggestions what I am doing wrong or how I can move rrd_create into a more stable state?
Edit:
I'm using version 1.4.7 of RRDtool (rrdtool version in shell).
Here is the code I'm using for creation of rrd files:
// mCreateParams & mNumberOfCreateParams will be set here
setupRrdCreateParamsDS18B20(lStepSize);
char lCurrentPath[255];
getcwd(lCurrentPath, sizeof(lCurrentPath));
// since I wasn't able to create rrd files outside current working directory I
// change working directory to where I want all files
chdir(DS18B20_PATH.c_str());
// the dump of mCreateParams postet above was created here
int lStatus = rrd_create(mNumberOfCreateParams, mCreateParams);
Since I dumped mCreateParams just before calling rrd_create(...) I think they shouldn't be corrupted.
My current workaround uses popen() and mCreateParams are used to create an shell command calling rrdtool.
stringstream ss;
// create shell command from create params
ss << "rrdtool create ";
for (int i = 1; i < mNumberOfCreateParams - 1; i++) {
ss << mCreateParams[i] << " ";
}
ss << mCreateParams[mNumberOfCreateParams - 1];
// needed for capturing output from executed command
FILE * in;
char buff[512];
if(!(in = popen(ss.str().c_str(), "r"))){
return false;
}
ss.str("");
ss.clear();
// get output
while(fgets(buff, sizeof(buff), in)!=NULL){
ss << buff;
}
lRrdError = ss.str();
ss.str("");
ss.clear();
int lTemp = pclose(in);
// get exit code from rrdtool create
lStatus = WEXITSTATUS(lTemp);
I am thankful for every advise.

When you call rrd_create(int argc, char**argv), you need to pass the parameters in an ARGV list, very similar to the way a normal C main() function take its parameters.
In particular, you do not need to pass the create function name (this is implicit) and, of course, the argcparameter must match the number of elements in the argv array.
So, in short: your parameter list to rrd_create should not include the rrdcreate parameter, and your argc MUST match the number of argv parameters passed.
If you still get errors returned from the rrd_create function call, then print out the error message.

Reading text file by scanning for keywords

As part of a bigger application I am working on a class for reading input from a text file for use in the initialization of the program. Now I am myself fairly new to programming, and I only started to learn C++ in December, so I would be very grateful for some hints and ideas on how to get started! I apologise in advance for a rather long wall of text.
The text file format is "keyword-driven" in the following way:
There are a rather small number of main/section keywords (currently 8) that need to be written in a given order. Some of them are optional, but if they are included they should adhere to the given ordering.
Example:
Suppose there are 3 potential keywords ordered like as follows:
"KEY1" (required)
"KEY2" (optional)
"KEY3" (required)
If the input file only includes the required ones, the ordering should be:
"KEY1"
"KEY3"
Otherwise it should be:
"KEY1"
"KEY2"
"KEY3"
If all the required keywords are present, and the total ordering is ok, the program should proceed by reading each section in the sequence given by the ordering.
Each section will include a (possibly large) amount of subkeywords, some of which are optional and some of which are not, but here the order does NOT matter.
Lines starting with characters '*' or '--' signify commented lines, and they should be ignored (as well as empty lines).
A line containing a keyword should (preferably) include nothing else than the keyword. At the very least, the keyword must be the first word appearing there.
I have already implemented parts of the framework, but I feel my approach so far has been rather ad-hoc. Currently I have manually created one method per section/main keyword , and the first task of the program is to scan the file for to locate these keywords and pass the necessary information on to the methods.
I first scan through the file using an std::ifstream object, removing empty and/or commented lines and storing the remaining lines in an object of type std::vector<std::string>.
Do you think this is an ok approach?
Moreover, I store the indices where each of the keywords start and stop (in two integer arrays) in this vector. This is the input to the above-mentioned methods, and it would look something like this:
bool readMAINKEY(int start, int stop);
Now I have already done this, and even though I do not find it very elegant, I guess I can keep it for the time being.
However, I feel that I need a better approach for handling the reading inside of each section, and my main issue is how should I store the keywords here? Should they be stored as arrays within a local namespace in the input class or maybe as static variables in the class? Or should they be defined locally inside relevant functions? Should I use enums? The questions are many!
Now I've started by defining the sub-keywords locally inside each readMAINKEY() method, but I found this to be less than optimal. Ideally I want to reuse as much code as possible inside each of these methods, calling upon a common readSECTION() method, and my current approach seems to lead to much code duplication and potential for error in programming. I guess the smartest thing to do would simply be to remove all the (currently 8) different readMAINKEY() methods, and use the same function for handling all kinds of keywords. There is also the possibility for having sub-sub-keywords etc. as well (i.e. a more general nested approach), so I think maybe this is the way to go, but I am unsure on how it would be best to implement it?
Once I've processed a keyword at the "bottom level", the program will expect a particular format of the following lines depending on the actual keyword. In principle each keyword will be handled differently, but here there is also potential for some code reuse by defining different "types" of keywords depending on what the program expects to do after triggering the reading of it. Common task include e.g. parsing an integer or a double array, but in principle it could be anything!
If a keyword for some reason cannot be correctly processed, the program should attempt as far as possible to use default values instead of terminating the program (if reasonable), but an error message should be written to a logfile. For optional keywords, default values will of course also be used.
In order to summarise, therefore, my main questions are the following:
1. Do you think think my approach of storing the relevant lines in a std::vector<std::string> to be reasonable?
This will of course require me to do a lot of "indexing work" to keep track of where in the vector the different keywords are located. Or should I work more "directly" with the original std::ifstream object? Or something else?
2. Given such a vector storing the lines of the text file, how I can I best go about detecting the keywords and start reading the information following them?
Here I will need to take account of possible ordering and whether a keyword is required or not. Also, I need to check if the lines following each "bottom level" keyword is in the format expected in each case.
One idea I've had is to store the keywords in different containers depending on whether they are optional or not (or maybe use object(s) of type std::map<std::string,bool>), and then remove them from the container(s) if correctly processed, but I am not sure exactly how I should go about it..
I guess there is really a thousand different ways one could answer these questions, but I would be grateful if someone more experienced could share some ideas on how to proceed. Is there e.g. a "standard" way of doing such things? Of course, a lot of details will also depend on the concrete application, but I think the general format indicated here can be used in a lot of different applications without a lot of tinkering if programmed in a good way!
UPDATE
Ok, so let my try to be more concrete. My current application is supposed to be a reservoir simulator, so as part of the input I need information about the grid/mesh, about rock and fluid properties, about wells/boundary conditions throughout the simulation and so on. At the moment I've been thinking about using (almost) the same set-up as the commercial Eclipse simulator when it comes to input, for details see
http://petrofaq.org/wiki/Eclipse_Input_Data.
However, I will probably change things a bit, so nothing is set in stone. Also, I am interested in making a more general "KeywordReader" class that with slight modifications can be adapted for use in other applications as well, at least it can be done in a reasonable amount of time.
As an example, I can post the current code that does the initial scan of the text file and locates the positions of the main keywords. As I said, I don't really like my solution very much, but it seems to work for what it needs to do.
At the top of the .cpp file I have the following namespace:
//Keywords used for reading input:
namespace KEYWORDS{
/*
* Main keywords and corresponding boolean values to signify whether or not they are required as input.
*/
enum MKEY{RUNSPEC = 0, GRID = 1, EDIT = 2, PROPS = 3, REGIONS = 4, SOLUTION = 5, SUMMARY =6, SCHEDULE = 7};
std::string mainKeywords[] = {std::string("RUNSPEC"), std::string("GRID"), std::string("EDIT"), std::string("PROPS"),
std::string("REGIONS"), std::string("SOLUTION"), std::string("SUMMARY"), std::string("SCHEDULE")};
bool required[] = {true,true,false,true,false,true,false,true};
const int n_key = 8;
}//end KEYWORDS namespace
Then further down I have the following function. I am not sure how understandable it is though..
bool InputReader::scanForMainKeywords(){
logfile << "Opening file.." << std::endl;
std::ifstream infile(filename);
//Test if file was opened. If not, write error message:
if(!infile.is_open()){
logfile << "ERROR: Could not open file! Unable to proceed!" << std::endl;
std::cout << "ERROR: Could not open file! Unable to proceed!" << std::endl;
return false;
}
else{
logfile << "Scanning for main keywords..." << std::endl;
int nkey = KEYWORDS::n_key;
//Initially no keywords have been found:
startIndex = std::vector<int>(nkey, -1);
stopIndex = std::vector<int>(nkey, -1);
//Variable used to control that the keywords are written in the correct order:
int foundIndex = -1;
//STATISTICS:
int lineCount = 0;//number of non-comment lines in text file
int commentCount = 0;//number of commented lines in text file
int emptyCount = 0;//number of empty lines in text file
//Create lines vector:
lines = std::vector<std::string>();
//Remove comments and empty lines from text file and store the result in the variable file_lines:
std::string str;
while(std::getline(infile,str)){
if(str.size()>=1 && str.at(0)=='*'){
commentCount++;
}
else if(str.size()>=2 && str.at(0)=='-' && str.at(1)=='-'){
commentCount++;
}
else if(str.size()==0){
emptyCount++;
}
else{
//Found a non-empty, non-comment line.
lines.push_back(str);//store in std::vector
//Start by checking if the first word of the line is one of the main keywords. If so, store the location of the keyword:
std::string fw = IO::getFirstWord(str);
for(int i=0;i<nkey;i++){
if(fw.compare(KEYWORDS::mainKeywords[i])==0){
if(i > foundIndex){
//Found a valid keyword!
foundIndex = i;
startIndex[i] = lineCount;//store where the keyword was found!
//logfile << "Keyword " << fw << " found at line " << lineCount << " in lines array!" << std::endl;
//std::cout << "Keyword " << fw << " found at line " << lineCount << " in lines array!" << std::endl;
break;//fw cannot equal several different keywords at the same time!
}
else{
//we have found a keyword, but in the wrong order... Terminate program:
std::cout << "ERROR: Keywords have been entered in the wrong order or been repeated! Cannot continue initialisation!" << std::endl;
logfile << "ERROR: Keywords have been entered in the wrong order or been repeated! Cannot continue initialisation!" << std::endl;
return false;
}
}
}//end for loop
lineCount++;
}//end else (found non-comment, non-empty line)
}//end while (reading ifstream)
logfile << "\n";
logfile << "FILE STATISTICS:" << std::endl;
logfile << "Number of commented lines: " << commentCount << std::endl;
logfile << "Number of non-commented lines: " << lineCount << std::endl;
logfile << "Number of empty lines: " << emptyCount << std::endl;
logfile << "\n";
/*
Print lines vector to screen:
for(int i=0;i<lines.size();i++){
std:: cout << "Line nr. " << i << " : " << lines[i] << std::endl;
}*/
/*
* So far, no keywords have been entered in the wrong order, but have all the necessary ones been found?
* Otherwise return false.
*/
for(int i=0;i<nkey;i++){
if(KEYWORDS::required[i] && startIndex[i] == -1){
logfile << "ERROR: Incorrect input of required keywords! At least " << KEYWORDS::mainKeywords[i] << " is missing!" << std::endl;;
logfile << "Cannot proceed with initialisation!" << std::endl;
std::cout << "ERROR: Incorrect input of required keywords! At least " << KEYWORDS::mainKeywords[i] << " is missing!" << std::endl;
std::cout << "Cannot proceed with initialisation!" << std::endl;
return false;
}
}
//If everything is in order, we also initialise the stopIndex array correctly:
int counter = 0;
//Find first existing keyword:
while(counter < nkey && startIndex[counter] == -1){
//Keyword doesn't exist. Leave stopindex at -1!
counter++;
}
//Store stop index of each keyword:
while(counter<nkey){
int offset = 1;
//Find next existing keyword:
while(counter+offset < nkey && startIndex[counter+offset] == -1){
offset++;
}
if(counter+offset < nkey){
stopIndex[counter] = startIndex[counter+offset]-1;
}
else{
//reached the end of array!
stopIndex[counter] = lines.size()-1;
}
counter += offset;
}//end while
/*
//Print out start/stop-index arrays to screen:
for(int i=0;i<nkey;i++){
std::cout << "Start index of " << KEYWORDS::mainKeywords[i] << " is : " << startIndex[i] << std::endl;
std::cout << "Stop index of " << KEYWORDS::mainKeywords[i] << " is : " << stopIndex[i] << std::endl;
}
*/
return true;
}//end else (file opened properly)
}//end scanForMainKeywords()

You say your purpose is to read initialization data from a text file.
Seems you need to parse (syntax analyze) this file and store the data under the right keys.
If the syntax is fixed and each construction starts with a keyword, you could write a recursive descent (LL1) parser creating a tree (each node is a stl vector of sub-branches) to store your data.
If the syntax is free, you might pick JSON or XML and use an existing parsing library.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Parsing a CSV file using MS Text Driver / CDatabase - c++

Related

InfluxDB Query string field data return null - c++

Access protocol buffers extension fields

Write output to new files at every nth iteration

Creation of RRD files with C++ on Raspberry behaves strangely

Reading text file by scanning for keywords

Categories

Resources