Parse and remove part of a QString - c++

I want to parse some kind (or pure) XML code from a QString.
My QString is like:
<a>cat</a>My cat is very nice.
I want to obtain 2 strings:
cat, and My Cat is very nice.
I think a XML parser is not maybe necessary, but in the future I will have more tags in the same string so it's also a very interesting point.

In Qt you have the QRegExp class that can help you to parse your QString.
According to Documentation example:
QRegExp rxlen("^<a>(.*)</a>(.*)$");
int pos = rxlen.indexIn("<a>cat</a>My cat is very nice.");
QStringList list
if (pos > -1) {
list << = rxlen.cap(1); // "cat"
list << = rxlen.cap(2); // "My cat is very nice."
}
The QStringList list will contain the cat and My cat is very nice.

You could use a regular expression <a>(.*)</a>(.*).
If you use Boost you could implement it like follows:
boost::regex exrp( "^<a>(.*)</a>(.*)$" );
boost::match_results<string::const_iterator> what;
if( regex_search( input_string, what, exrp ) ) {
std::string tag( what[1].first, what[1].second );
std::string value( what[2].first, what[2].second );
}

Related

BaseX XML database in C++ encoding issue

I work with Base X and try to integrate a XML database with c++ on Windows 7. I use the BaseXclient API from https://github.com/JohnLeM/BasexCPPAPI/
it contains the pugixml parser and uses the boost lib. I got it to work but I have issues with the encoding. The xml dokuments in my database are utf-8 and contain some letters and symbols that are not displayed correctly on console output(like ä and °).
I set the console code page with chcp 65001.
I changed the locale with std::setlocale(LC_ALL, ""); in c++ and when I cout these letters and symbols directly in my Programm and not from the database they are displayed correctly. The database output also changed but is still wrong.
I also set the pugi parser with pugi::xml_encoding::encoding_utf8; but the database output is not affected. here is a code example from the string list interface:
virtual ~my_string_list_interface() {};
my_string_list_interface(const std::string& DBHOST, const std::string& DBPORT, const std::string& DBUSER,const std::string& DBPASSWD) : base_type(DBHOST,DBPORT,DBUSER,DBPASSWD) {};
virtual my_string_list get(string query,int stage){
my_string_list my_string_list_;
pugi::xml_encoding::encoding_auto;
pugi::xml_parse_result parse_;
pugi::xml_document doc;
string results = session().execute("XQUERY "+query);
parse_ = doc.load(results.c_str());
pugi::xpath_node_set is = doc.select_nodes("/record/mid");
The API uses boost streambuf to get the data. The code from boost streambuf looks like that:
std::string read_streambuffer(){return read_streambuffer(response_);};
std::string read_streambuffer(boost::asio::streambuf & response)
{
std::string results;
boost::system::error_code error;
boost::asio::streambuf::const_buffers_type bufs = response.data();
std::size_t size(0);
std::string line;
auto ptr_b = boost::asio::buffers_begin(bufs);
for(; ptr_b != boost::asio::buffers_end(bufs); ++ptr_b, ++ size)
{if (*ptr_b != 0) {line.push_back(*ptr_b);} else if (size > 1) break; };
response.consume(size);
return line;
};
Is there a way to specify the encoding for the buffer stream or string? I use a string list for the database output. should I use wstring or is there something i missed?
Thanks!

Read json with rapidjson pointer

Trying to integrate rapidjson into my app. Used to read a (validated with an online tool) simple config file like:
{
"filecontent": "appsettings",
"fileversion": 1,
"appsettings": {
"general": {
"sync": "false",
"sound": "true"
},
...
This is my code:
QString path = keypath( key ); //.prepend("/");
rapidjson::Value* hello = rapidjson::Pointer( "/appsettings/general/sound" ) //path.toStdString().c_str()
.Get(rapidJsonDoc_);
if ( hello ) {
QVariant retStr( hello->GetString() );
qDebug()<<"--> " <<path<<" --> " << retStr;
ret = QVariant::fromValue( retStr );
}else{
qDebug()<<"Value not found!";
}
return ret;
If I prepend the pointer string with /, as I understand the examples, it says value not found.
If I remove the slash, if (hello) is true, but does not return an expected value.
rapidJsonDoc_ is of type rapidjson::Document.
Please help me with the correct syntax. I am looking at the source code of rapidjson and can't understand a thing, it is so full of templates and complex signatures...
update:
according to this post modifying a Qt QJsonDocument is not possible like I want.

Regular Expression Matcher

I am using pattern matching to match file extension with my expression String for which code is as follows:-
public static enum FileExtensionPattern
{
WORDDOC_PATTERN( "([^\\s]+(\\.(?i)(txt|docx|doc))$)" ), PDF_PATTERN(
"([^\\s]+(\\.(?i)(pdf))$)" );
private String pattern = null;
FileExtensionPattern( String pattern )
{
this.pattern = pattern;
}
public String getPattern()
{
return pattern;
}
}
pattern = Pattern.compile( FileExtensionPattern.WORDDOC_PATTERN.getPattern() );
matcher = pattern.matcher( fileName );
if ( matcher.matches() )
icon = "blue-document-word.png";
when file name comes as "Home & Artifact.docx" still matcher.matches returns false.It works fine with filename with ".doc" extension.
Can you please point out what i am doing wrong.
"Home & Artifact.docx" contains spaces. Since you allow any char except whitespaces [^\s]+, this filename is not matched.
Try this instead:
(.+?(\.(?i)(txt|docx|doc))$
It is because you have spaces in filename ("Home & Artifact.docx") but your regex has [^\\s]+ which won't allow any spaces.
Use this regex instead for WORDDOC_PATTERN:
"(?i)^.+?\\.(txt|docx|doc)$"

File I/O with Windows Forms

I need to create a program with Windows forms. I made a bit of code in c++...and Windows forms in c++/cli at the same time. Now I'm trying to adapt the c++ code from the forms, but I'm having some problems with the file, it's completely different from c++.
I have 2 forms. The first is for registration (it should register every student in a file). The second is for modifying students data with a given surname for example.
In registration.cpp I have created a list of objects but when I write I use streamwriter, but I guess there isnt any relationship with my list.
So my problems are:
How can I WRITE my data list into a file?
How can I MODIFY that data?
Now I post some code, but it's in italian :D as I am from italy (sorry for my mistakes.)
//.cpp of the registration
class studente
{
private:
string cognome;
string nome;
public:
studente(){
cognome="";
nome="";
};
~studente(){};
void set(string str1,string str2){
cognome=str1;
nome=str2;
}
class primo_anno:public studente
{
private:
int voto_diploma;
public:
primo_anno(){
cognome="";
nome="";
voto_diploma='0';
};
~primo_anno(){};
void set(string str1,string str2, int mark){ voto_diploma=mark; };
void stampa(){//I KNOW ITS NOT USEFUL HERE..BUT IN C++ I USED THAT
f<<"\ncognome: "<<cognome<<"\n";
f<<"nome: "<<nome<<"\n";
f<<"voto: "<<voto_diploma<<"\n";
};
};
list<primo_anno> l1;//DECLARE MY STL LIST
{//WHEN I CLICK ON MY REGISTER BUTTON THE PROGRAM RUN THIS
int mark;
primo_anno *s;
s=new primo_anno;
char* str1=(char*)(Marshal::StringToHGlobalAnsi(textBox1->Text)).ToPointer();
char* str2=(char*)(Marshal::StringToHGlobalAnsi(textBox2->Text)).ToPointer();
mark = Convert::ToInt16(textBox35->Text);
s->set(str1,str2,mark);
l1.push_back(*s);
list<primo_anno>::iterator it;
//I HAVE FOUND THIS METHOD BUT ITS NOT LINKED TO MY STL LIST.
//BY THE WAY I AM ABLE TO WRITE ON FILE WITH THIS.BUT LATER I DONT KNOW HOW TO MODIFY
//FOR EXAMPLE "DELETE THE LINE WHERE THERE IS Rossi SURNAME".HOW!!!
TextWriter ^tw = gcnew StreamWriter("primoAnno.txt", true);//true append
tw->WriteLine(textBox1->Text + "\t\t" + textBox2->Text + "\t\t" + textBox35->Text);
tw->Close();
Thank you in advance! And sorry again for my English... I'm just a student:)
Normally, you can convert a std::string into a System::String^ quite easily (it's even possible that simply using gcnew String(myPrimoAnnoObj.cognome) will give you a string with the right contents, easily written into the managed stream.
However you appear to have failed to grasp how new works for unmanaged objects: Your code allocates a primo_anno structure dynamically for no reason, before copying its value into the list and leaking the pointer. You also leak the pointers to the unmanaged strings you obtained from the Marshal class.
Are you sure you should be using unmanaged objects? It would be much easier to have everything in a managed System::Collections::Generic::List<> of managed objects...
Added: For writing everything in a file, you can try something like this:
ref class MyClass
{
public:
String^ cognome;
String^ nome;
int voto_diploma;
};
//...
List<MyClass^>^ primo = gcnew List<MyClass^>();
//...
MyClass^ myObj = gcnew MyClass();
myObj->cognome = textBox1->Text;
myObj->nome = textBox2->Text;
myObj->voto_diploma = Convert::ToInt32(textBox35->Text);
primo->Add(myObj);
//...
TextWriter ^tw = gcnew StreamWriter(L"primoAnno.txt", true);
for each(MyClass^ obj in primo)
{
//You can use any character or string as separator,
//as long as it's not supposed to appear in the strings.
//Here, I used pipes.
tw->Write(obj->cognome);
tw->Write(L"|");
tw->Write(obj->nome);
tw->Write(L"|");
tw->WriteLine(obj->voto_diploma);
}
tw->Close();
For reading, you can use a function like this:
MyClass^ ParseMyClass(String^ line)
{
array<String^>^ splitString = line->Split(L'|');
MyClass^ myObj = gcnew MyClass();
myObj->cognome = splitString[0];
myObj->nome = splitString[1];
myObj->voto_diploma = Convert::ToInt32(splitString[2]);
return myObj;
}
And for deleting:
TextWriter^ tw = gcnew StreamWriter(L"primoAnno2.txt", true);
TextReader^ tr = gcnew StreamReader(L"primoAnno.txt");
String^ line;
while((line=tr->ReadLine()) != nullptr)
{
MyClass^ obj = ParseMyClass(line);
if(obj->cognome != L"cat")
tw->WriteLine(line);
}
tr->Close();
tw->Close();
File::Delete(L"primoAnno.txt");
File::Move(L"primoAnno2.txt", L"primoAnno.txt");
It may not be the exact code, but it's overall what should work.
Note: If you want your separator to be spaces, and there can be spaces in the strings, things will get a lot more complicated.
I have tried to use a generic list..(thanks MSDN).in the comments below there are my dubts..
List<String^>^ primo=gcnew List<String^>();
int mark;
char* str1=(char*)(Marshal::StringToHGlobalAnsi(textBox1->Text)).ToPointer();
char* str2=(char*)(Marshal::StringToHGlobalAnsi(textBox2->Text)).ToPointer();
mark = Convert::ToInt16(textBox35->Text);
//here i add TEXTBOXES to my generic list...not objects of my stl list
primo->Add(textBox1->Text);
primo->Add(textBox2->Text);
primo->Add(textBox35->Text);
TextWriter ^tw = gcnew StreamWriter("primoAnno.txt", true);
for each(String^ prim in primo){
//here i write my string one by one in column..i want them all in a line!how?
tw->WriteLine(prim);
}
//i also have tried to delete an object..but i dont like the remove..i mean i want all the strings in a line, if i find "cat" for example i want to delete the ENTIRE line..not just "cat"
if(primo->Contains("cat"))tw->WriteLine("ok");primo->Remove("cat");
for each(String^ prim in primo){
tw->WriteLine(prim);
}
tw->Close();
i make an example of my primoAnno.txt file
first time i write(and push the register button) i want this:
cat gae 5
second time i write(and push the register button again) i want this:
cat gae 5
bla bla 1
then, when i remove(if there is "cat" in a line delete that line) i want this:
bla bla 1
hope it s useful. thanks to ones who will reply

Extract data from CSV with Regex and convert it to JSON

Imagine you have a table in a CSV file with this kind of layout:
name,property1 [unit1],property2 [unit2]
name1,4.5,2.3
name2,3.2,7.4
name3,5.5,6.1
I need to convert each row to this kind of JSON structure (ie, for row 1):
{
"name1": [
{
"properties": [
{
"property_1": "_value_",
"unit": "unit1"
},
{
"property_2": "_value_",
"unit": "unit2"
}
]
}
]
}
On top of it all, I have to explain that I am using Qt 4.7 and can't update; also, I can't install Qxt so I'm relying on qt-json for the JSON parsing/encoding. More, the CSV file is not created/maintained by me, so I can't really change it either.
So with all of this, I realised I need a few things, so this is a kind of multiple question:
how should I write the RegEx to read the unit in each column's header? Please note that the unit is enclosed in rect-parenthesis.
imagine I extract both the header row and the other rows into a QList<QString>, separating each column as a string. How can I manage to sync all the bits of data in order to create the JSON structure I need on a QString? (I think I need it in a QString so I can dump each row in a different file, but I'm open to other options as well)
Just one final note - I also need to this to be somewhat scalable. The CSV files on which this will be apllied are very heterogenous in column count: some have 8 columns, others have 20.
I know it is not a good practice to post "multiquestions", but the thing is I'm feeling too overwhelmed with all of this, and because I have virtually no experience with Qt, I can't even define a plan to attack this. Hope someone can share some pointers. Thanks!
EDIT
So, I've been thinking a little more about this and I don't actually know if this is a good idea/feasible but here is what I thought of:
when going through the header row, I would check if each column string had a hit for the RegEx. If so, I would store the column index and the unit string in a list;
then, when going through the other rows, in order to parse them into JSON, I would check in each column if it matched the index in the previous list, and if so, I would then add the unit to the map (as qt-json docs explains)
Does this make any sense? Can anyone mock up a skeleton I can work on for this?
EDIT2
I've managed to get a few things working so far, but still not working as it should. Right now I have managed to read properly from the CSV file, but the output isn't coming out right. Can anyone share some insight?
NOTE: the processLineFromCSV function returns a QStringList obtained like so: QStringList cells = line.split(separator_char);
NOTE2: the RegEx was obtained from this answer.
NOTE3: Check below for the type of output I'm getting. Right now I think the problem relates more to the usage of the qt-json lib than actually the rest of the code, but any help is welcome! :)
The code so far:
QFile file(csvfile);
if (file.open(QIODevice::ReadOnly | QIODevice::Text))
{
bool first = true;
QVariantMap map;
QVariantMap propertyMap;
QList<QVariant> generalList, propertiesList;
while (!file.atEnd())
{
QString line = file.readLine();
if(first == true){
headerList = processLineFromCSV(line, separator_char);
first = false;
}else{
QStringList cellList = processLineFromCSV(line, separator_char);
int i=0;
for(i; i<cellList.size(); i++)
{
// check the header cell for "[unit]" string
// returns -1 if does not have the string
// if it has the string, it's stored in capturedUnits[1]
int test = exp.indexIn(headerList.at(i));
// store the captured units in a QStringList
QStringList capturedUnits = exp.capturedTexts();
if(test==-1){ // if header does not have a captured unit - general column
QString name = headerList.at(i);
QString sanitizeName= name.remove(exp.capturedTexts().at(0), Qt::CaseSensitive);
map[sanitizeName] = cellList.at(i);
}
else{ // if header string has a captured unit - property column
QString propertyName = headerList.at(i); // extract string in header
QString sanitizedPropertyName = propertyName.remove(exp); //remove the unit regex from the string
sanitizedPropertyName.remove(QChar('\n'), Qt::CaseSensitive); // clear newlines
if(sanitizedPropertyName.startsWith('"') && sanitizedPropertyName.endsWith('"'))
{
sanitizedPropertyName.remove(0,1);
sanitizedPropertyName.remove(sanitizedPropertyName.length(),1);
}
QString value =cellList.at(i); // extract string in value
QString sanitizedValue = value.remove(QChar('\n'), Qt::CaseSensitive); // clear newlines
if(sanitizedValue.startsWith('"') && sanitizedValue.endsWith('"'))
{
sanitizedValue.remove(0,1);
sanitizedValue.remove(sanitizedValue.length(),1);
}
propertyMap[sanitizedPropertyName]= sanitizedValue; // map the property: value pair
propertyMap["unit"] = capturedUnits.at(1); // map the unit: [unit] value pair
QByteArray general = QtJson::serialize(map); // serialize the pair for general column
QByteArray properties = QtJson::serialize(propertyMap); // serialize the pair for property column
QVariant genVar(general);
QVariant propVar(properties);
generalList.append(genVar);
propertiesList.append(propVar);
}
}
}}
QByteArray finalGeneral = QtJson::serialize(generalList);
QByteArray finalProperties = QtJson::serialize(propertiesList);
qDebug() << finalGeneral;
qDebug() << finalProperties;
file.close();
}
The ouput:
"[
"{ \"name\" : \"name1\" }",
"{ \"name\" : \"name1\" }",
"{ \"name\" : \"name2\" }",
"{ \"name\" : \"name2\" }",
"{ \"name\" : \"name3\" }",
"{ \"name\" : \"name3\" }"
]"
"[
"{ \"property1 \" : \"4.5\", \"unit\" : \"unit1\" }",
"{ \"property1 \" : \"4.5\", \"property2 \" : \"2.3\", \"unit\" : \"unit2\" }",
"{ \"property1 \" : \"3.2\", \"property2 \" : \"2.3\", \"unit\" : \"unit1\" }",
"{ \"property1 \" : \"3.2\", \"property2 \" : \"7.4\", \"unit\" : \"unit2\" }",
"{ \"property1 \" : \"5.5\", \"property2 \" : \"7.4\", \"unit\" : \"unit1\" }",
"{ \"property1 \" : \"5.5\", \"property2 \" : \"6.1\", \"unit\" : \"unit2\" }"
]"
This should be a good start for you:
QString csv = "name,property1 [unit1],property2 [unit2],property3 [unit3]\n"
"name1,4.5,2.3\n"
"name2,3.2,7.4\n"
"name3,5.5,6.1,4.3\n";
QStringList csvRows = csv.split('\n', QString::SkipEmptyParts);
QStringList csvHeader = csvRows.takeFirst().split(',');
csvHeader.removeFirst();
foreach(QString row, csvRows) {
QStringList values = row.split(',');
QString rowName = values.takeFirst();
QVariantList properties;
for(int i = 0; i < values.size(); i++) {
QString value = values[i];
QStringList propParts = csvHeader[i].split(' ');
QString propName = propParts[0];
QString propType = propParts[1].mid(1, propParts[1].size() - 2);
QVariantMap property;
property[propName] = value;
property["unit"] = propType;
properties.append(property);
}
QVariantMap propertyObj;
propertyObj["properties"] = properties;
QVariantList propList;
propList.append(propertyObj);
QVariantMap root;
root[rowName] = propList;
QByteArray json = QtJson::serialize(root);
qDebug() << json;
// Now you can save json to a file
}
Joum.
Just seen your response to my comment. I don't have much experience with QT either, but a quick outline....
Extract the data one line at a time, and 'split' it into an array. If you are using CSV you need to be sure that there are no data points that have a comma in them, or the split will result in a real mess. Check with whoever extracted the data if they can use another 'less common' separator (eg a '|' is good). if you data is all numeric that is great, but be wary of locations that use the comma as a decimal separator :(
I hope that you have 1 'table' per file, if not you need to be able to 'identify' when a new table starts somehow, this could be interesting / fun - depends on your outlook ;).
At the end you will have a collection of 'string arrays' (a table of some sort) hopefully the first is your header info. If you have mutliple tables, you will deal with them one at a time
You should now be able to 'output' each table in good JSON format.
Getting your 'units' from the header rows: If you know in advance where they are located (ie the index in the array) you can plan for extracting the info (using a regex if you wish) in the correct index locations.
Last point.
If your csv file is very long (hundreds of lines), just grab the first few into a new test file for quicker debuging, then once you are happy, enlarge it a bit and check the output format... then again once you are happy that there are no other bugs... for the whole file
Likewise if you have multiple tables in your file, start with the first one only, then add the first part of a second... test.... add a third.... test etc etc etc until you are happy
David.
A possibly better solution, after reading your comment about wanting some form of 'synchronisation'.
NOTE: this may seem a little more complex, but I think it would be a more flexible solution in the end. Also does this data not exist in a DB somewhere (who gave it to you?), can they give you direct read access to the underlying DB and tables? if so, you can jump straight to the 'output each table to JSON' step.
using an embeded DB (ie SQLite).
Extract the first 'header' row, and create a table in your DB that follows the info there (you should be able to add info regarding units to the 'metadata' ie a description). If all your files are the same you could even import all the data into the same single table, or auto create a new table (assuming the same format) for each new file using the same create table statement.
I'm sure there is a 'csvimport' in SQLite (I haven't checked the docs yet, and haven't done this in a while) or someone has written a library that will do this.
Output each table to JSON format, again I'm sure someone has written a library for this.
Using the answer by ExplodingRat this is the final code: (without file creation at the end)
QString csvfile = ui->lineEditCSVfile->text();
QString separator_char = ui->lineEditSeparator->text();
QRegExp exp("\\[([^\\]]+)\\]");
QFile file(csvfile);
if (!file.open(QIODevice::ReadOnly | QIODevice::Text))
return;
QString csv = file.readAll();
QStringList csvRows = csv.split('\n', QString::SkipEmptyParts);
QStringList csvHeader = csvRows.takeFirst().split(separator_char);
csvHeader.removeFirst();
foreach(QString row, csvRows) {
QStringList values = row.split(separator_char);
QString rowName = values.takeFirst();
QVariantList general;
QVariantList properties;
for(int i = 0; i < values.size(); i++) {
QString value = values[i];
int test = exp.indexIn(csvHeader[i]);
//qDebug() << test;
//qDebug() << csvHeader;
QStringList capturedUnits = exp.capturedTexts();
QString propName = csvHeader[i];
if(test==-1){
//QString propName = csvHeader[i].remove(exp);
//qDebug() <<"property name" << propName;
QVariantMap property;
property[propName] = value;
general.append(property);
}else{
propName.remove(exp);
//QStringList propParts = csvHeader[i].split(' ');
//QString propName = csvHeader[i].remove(exp);
QString propType = capturedUnits[1];
QVariantMap property;
property[propName] = value;
property["unit"] = propType;
properties.append(property);
}
}
QVariantMap propertyObj;
propertyObj["properties"] = properties;
QVariantList propList;
propList.append(propertyObj);
QVariantMap generalObj;
generalObj["general"] = general;
QVariantList generalList;
generalList.append(generalObj);
QVariantList fullList;
fullList.append(generalObj);
fullList.append(propertyObj);
QVariantMap root;
root[rowName] = fullList;
QByteArray json = QtJson::serialize(root);
json.prepend('[');
json.append(']');
qDebug() << json;
// Now you can save json to a file