Extract data from CSV with Regex and convert it to JSON - regex

Imagine you have a table in a CSV file with this kind of layout:
name,property1 [unit1],property2 [unit2]
name1,4.5,2.3
name2,3.2,7.4
name3,5.5,6.1
I need to convert each row to this kind of JSON structure (ie, for row 1):
{
"name1": [
{
"properties": [
{
"property_1": "_value_",
"unit": "unit1"
},
{
"property_2": "_value_",
"unit": "unit2"
}
]
}
]
}
On top of it all, I have to explain that I am using Qt 4.7 and can't update; also, I can't install Qxt so I'm relying on qt-json for the JSON parsing/encoding. More, the CSV file is not created/maintained by me, so I can't really change it either.
So with all of this, I realised I need a few things, so this is a kind of multiple question:
how should I write the RegEx to read the unit in each column's header? Please note that the unit is enclosed in rect-parenthesis.
imagine I extract both the header row and the other rows into a QList<QString>, separating each column as a string. How can I manage to sync all the bits of data in order to create the JSON structure I need on a QString? (I think I need it in a QString so I can dump each row in a different file, but I'm open to other options as well)
Just one final note - I also need to this to be somewhat scalable. The CSV files on which this will be apllied are very heterogenous in column count: some have 8 columns, others have 20.
I know it is not a good practice to post "multiquestions", but the thing is I'm feeling too overwhelmed with all of this, and because I have virtually no experience with Qt, I can't even define a plan to attack this. Hope someone can share some pointers. Thanks!
EDIT
So, I've been thinking a little more about this and I don't actually know if this is a good idea/feasible but here is what I thought of:
when going through the header row, I would check if each column string had a hit for the RegEx. If so, I would store the column index and the unit string in a list;
then, when going through the other rows, in order to parse them into JSON, I would check in each column if it matched the index in the previous list, and if so, I would then add the unit to the map (as qt-json docs explains)
Does this make any sense? Can anyone mock up a skeleton I can work on for this?
EDIT2
I've managed to get a few things working so far, but still not working as it should. Right now I have managed to read properly from the CSV file, but the output isn't coming out right. Can anyone share some insight?
NOTE: the processLineFromCSV function returns a QStringList obtained like so: QStringList cells = line.split(separator_char);
NOTE2: the RegEx was obtained from this answer.
NOTE3: Check below for the type of output I'm getting. Right now I think the problem relates more to the usage of the qt-json lib than actually the rest of the code, but any help is welcome! :)
The code so far:
QFile file(csvfile);
if (file.open(QIODevice::ReadOnly | QIODevice::Text))
{
bool first = true;
QVariantMap map;
QVariantMap propertyMap;
QList<QVariant> generalList, propertiesList;
while (!file.atEnd())
{
QString line = file.readLine();
if(first == true){
headerList = processLineFromCSV(line, separator_char);
first = false;
}else{
QStringList cellList = processLineFromCSV(line, separator_char);
int i=0;
for(i; i<cellList.size(); i++)
{
// check the header cell for "[unit]" string
// returns -1 if does not have the string
// if it has the string, it's stored in capturedUnits[1]
int test = exp.indexIn(headerList.at(i));
// store the captured units in a QStringList
QStringList capturedUnits = exp.capturedTexts();
if(test==-1){ // if header does not have a captured unit - general column
QString name = headerList.at(i);
QString sanitizeName= name.remove(exp.capturedTexts().at(0), Qt::CaseSensitive);
map[sanitizeName] = cellList.at(i);
}
else{ // if header string has a captured unit - property column
QString propertyName = headerList.at(i); // extract string in header
QString sanitizedPropertyName = propertyName.remove(exp); //remove the unit regex from the string
sanitizedPropertyName.remove(QChar('\n'), Qt::CaseSensitive); // clear newlines
if(sanitizedPropertyName.startsWith('"') && sanitizedPropertyName.endsWith('"'))
{
sanitizedPropertyName.remove(0,1);
sanitizedPropertyName.remove(sanitizedPropertyName.length(),1);
}
QString value =cellList.at(i); // extract string in value
QString sanitizedValue = value.remove(QChar('\n'), Qt::CaseSensitive); // clear newlines
if(sanitizedValue.startsWith('"') && sanitizedValue.endsWith('"'))
{
sanitizedValue.remove(0,1);
sanitizedValue.remove(sanitizedValue.length(),1);
}
propertyMap[sanitizedPropertyName]= sanitizedValue; // map the property: value pair
propertyMap["unit"] = capturedUnits.at(1); // map the unit: [unit] value pair
QByteArray general = QtJson::serialize(map); // serialize the pair for general column
QByteArray properties = QtJson::serialize(propertyMap); // serialize the pair for property column
QVariant genVar(general);
QVariant propVar(properties);
generalList.append(genVar);
propertiesList.append(propVar);
}
}
}}
QByteArray finalGeneral = QtJson::serialize(generalList);
QByteArray finalProperties = QtJson::serialize(propertiesList);
qDebug() << finalGeneral;
qDebug() << finalProperties;
file.close();
}
The ouput:
"[
"{ \"name\" : \"name1\" }",
"{ \"name\" : \"name1\" }",
"{ \"name\" : \"name2\" }",
"{ \"name\" : \"name2\" }",
"{ \"name\" : \"name3\" }",
"{ \"name\" : \"name3\" }"
]"
"[
"{ \"property1 \" : \"4.5\", \"unit\" : \"unit1\" }",
"{ \"property1 \" : \"4.5\", \"property2 \" : \"2.3\", \"unit\" : \"unit2\" }",
"{ \"property1 \" : \"3.2\", \"property2 \" : \"2.3\", \"unit\" : \"unit1\" }",
"{ \"property1 \" : \"3.2\", \"property2 \" : \"7.4\", \"unit\" : \"unit2\" }",
"{ \"property1 \" : \"5.5\", \"property2 \" : \"7.4\", \"unit\" : \"unit1\" }",
"{ \"property1 \" : \"5.5\", \"property2 \" : \"6.1\", \"unit\" : \"unit2\" }"
]"

This should be a good start for you:
QString csv = "name,property1 [unit1],property2 [unit2],property3 [unit3]\n"
"name1,4.5,2.3\n"
"name2,3.2,7.4\n"
"name3,5.5,6.1,4.3\n";
QStringList csvRows = csv.split('\n', QString::SkipEmptyParts);
QStringList csvHeader = csvRows.takeFirst().split(',');
csvHeader.removeFirst();
foreach(QString row, csvRows) {
QStringList values = row.split(',');
QString rowName = values.takeFirst();
QVariantList properties;
for(int i = 0; i < values.size(); i++) {
QString value = values[i];
QStringList propParts = csvHeader[i].split(' ');
QString propName = propParts[0];
QString propType = propParts[1].mid(1, propParts[1].size() - 2);
QVariantMap property;
property[propName] = value;
property["unit"] = propType;
properties.append(property);
}
QVariantMap propertyObj;
propertyObj["properties"] = properties;
QVariantList propList;
propList.append(propertyObj);
QVariantMap root;
root[rowName] = propList;
QByteArray json = QtJson::serialize(root);
qDebug() << json;
// Now you can save json to a file
}

Joum.
Just seen your response to my comment. I don't have much experience with QT either, but a quick outline....
Extract the data one line at a time, and 'split' it into an array. If you are using CSV you need to be sure that there are no data points that have a comma in them, or the split will result in a real mess. Check with whoever extracted the data if they can use another 'less common' separator (eg a '|' is good). if you data is all numeric that is great, but be wary of locations that use the comma as a decimal separator :(
I hope that you have 1 'table' per file, if not you need to be able to 'identify' when a new table starts somehow, this could be interesting / fun - depends on your outlook ;).
At the end you will have a collection of 'string arrays' (a table of some sort) hopefully the first is your header info. If you have mutliple tables, you will deal with them one at a time
You should now be able to 'output' each table in good JSON format.
Getting your 'units' from the header rows: If you know in advance where they are located (ie the index in the array) you can plan for extracting the info (using a regex if you wish) in the correct index locations.
Last point.
If your csv file is very long (hundreds of lines), just grab the first few into a new test file for quicker debuging, then once you are happy, enlarge it a bit and check the output format... then again once you are happy that there are no other bugs... for the whole file
Likewise if you have multiple tables in your file, start with the first one only, then add the first part of a second... test.... add a third.... test etc etc etc until you are happy
David.
A possibly better solution, after reading your comment about wanting some form of 'synchronisation'.
NOTE: this may seem a little more complex, but I think it would be a more flexible solution in the end. Also does this data not exist in a DB somewhere (who gave it to you?), can they give you direct read access to the underlying DB and tables? if so, you can jump straight to the 'output each table to JSON' step.
using an embeded DB (ie SQLite).
Extract the first 'header' row, and create a table in your DB that follows the info there (you should be able to add info regarding units to the 'metadata' ie a description). If all your files are the same you could even import all the data into the same single table, or auto create a new table (assuming the same format) for each new file using the same create table statement.
I'm sure there is a 'csvimport' in SQLite (I haven't checked the docs yet, and haven't done this in a while) or someone has written a library that will do this.
Output each table to JSON format, again I'm sure someone has written a library for this.

Using the answer by ExplodingRat this is the final code: (without file creation at the end)
QString csvfile = ui->lineEditCSVfile->text();
QString separator_char = ui->lineEditSeparator->text();
QRegExp exp("\\[([^\\]]+)\\]");
QFile file(csvfile);
if (!file.open(QIODevice::ReadOnly | QIODevice::Text))
return;
QString csv = file.readAll();
QStringList csvRows = csv.split('\n', QString::SkipEmptyParts);
QStringList csvHeader = csvRows.takeFirst().split(separator_char);
csvHeader.removeFirst();
foreach(QString row, csvRows) {
QStringList values = row.split(separator_char);
QString rowName = values.takeFirst();
QVariantList general;
QVariantList properties;
for(int i = 0; i < values.size(); i++) {
QString value = values[i];
int test = exp.indexIn(csvHeader[i]);
//qDebug() << test;
//qDebug() << csvHeader;
QStringList capturedUnits = exp.capturedTexts();
QString propName = csvHeader[i];
if(test==-1){
//QString propName = csvHeader[i].remove(exp);
//qDebug() <<"property name" << propName;
QVariantMap property;
property[propName] = value;
general.append(property);
}else{
propName.remove(exp);
//QStringList propParts = csvHeader[i].split(' ');
//QString propName = csvHeader[i].remove(exp);
QString propType = capturedUnits[1];
QVariantMap property;
property[propName] = value;
property["unit"] = propType;
properties.append(property);
}
}
QVariantMap propertyObj;
propertyObj["properties"] = properties;
QVariantList propList;
propList.append(propertyObj);
QVariantMap generalObj;
generalObj["general"] = general;
QVariantList generalList;
generalList.append(generalObj);
QVariantList fullList;
fullList.append(generalObj);
fullList.append(propertyObj);
QVariantMap root;
root[rowName] = fullList;
QByteArray json = QtJson::serialize(root);
json.prepend('[');
json.append(']');
qDebug() << json;
// Now you can save json to a file

Related

QJsonDocument to object is empty

I'm trying to parse a simple JSON data in Qt5.
Code looks like this :
...
socket->readDatagram(Buffer.data(),Buffer.size(),&sender,&senderPort);
QJsonParseError jsonError;
QJsonDocument dataJson = QJsonDocument::fromJson(Buffer.data(),&jsonError);
if (jsonError.error != QJsonParseError::NoError){
qDebug() << jsonError.errorString();
}
QJsonObject map = dataJson.object();
//map["x"].toDouble()
But for some reason my map is empty, here is a debugging snap :
How can I resolve this ?
Data :
'{\"x\":1,\"y\":2,\"z\":3}'
Assuming that you're reading correctly, you should test with a command like this:
echo -n \{\"x\":1,\"y\":2,\"\z\":3\} > /dev/udp/127.0.0.1/8080
So, get rid of single quotes and escape curly brackets.
Even better: put your json data in a myfile file and use cat myfile > /dev/udp/127.0.0.1/8080

How to display Highlighted text using Solrj

I am new to Solr and SolrJ. I am trying to use for a desktop application and I have files(text files) to index and search. I wanted to use highlight feature and display the fragments with highlight,but I don't get them to display in yellow background as you highlight a text, please let me know how to display the text in yellow background.
here is my code snippet:
public void TestHighLight(SolrQuery query) throws
SolrServerException, IOException {
query.setQuery("*");
query.set("hl", "true");
query.set("hl.snippets", "5");
query.set("q", "text:Pune");
query.set("hl.fl", "*");
QueryResponse queryResponse = client.query(query);
SolrDocumentList docs = queryResponse.getResults();
Iterator iter = docs.iterator();
for (int i = 0; i < docs.size(); i++) {
iter = docs.get(i).getFieldNames().iterator();
String fldVal = (String) docs.get(i).getFieldValue("id");
String docID = (String) docs.get(i).get("id");
while (iter.hasNext()) {
String highlighText = getHighlightedText(queryResponse,
"text", docID);
System.out.println(" tHighlightedText is " + highlighText );
}
}
}
The output looks like this:how do I color it ?
[ for Java Developer at Pune
Thanks a lot !
Set the pre and post parameters of the highlighter. Specifies the “tag” to use before a highlighted term. This can be any string, but is most often an HTML or XML tag.
e.g:
solrQueryHandler.setHighlightSimplePre("<font color="yellow">");
solrQueryHandler.setHighlightSimplePost("/font");
But note that this will work only for the Original Highlighter

Creating json string using json lib

I am using jsonc-libjson to create a json string like below.
{ "author-details": {
"name" : "Joys of Programming",
"Number of Posts" : 10
}
}
My code looks like below
json_object *jobj = json_object_new_object();
json_object *jStr1 = json_object_new_string("Joys of Programming");
json_object *jstr2 = json_object_new_int("10");
json_object_object_add(jobj,"name", jStr1 );
json_object_object_add(jobj,"Number of Posts", jstr2 );
this gives me json string
{
"name" : "Joys of Programming",
"Number of Posts" : 10
}
How do I add the top part associated with author details?
To paraphrase an old advertisement, "libjson users would rather fight than switch."
At least I assume you must like fighting with the library. Using nlohmann's JSON library, you could use code like this:
nlohmann::json j {
{ "author-details", {
{ "name", "Joys of Programming" },
{ "Number of Posts", 10 }
}
}
};
At least to me, this seems somewhat simpler and more readable.
Parsing is about equally straightforward. For example, let's assume we had a file named somefile.json that contained the JSON data shown above. To read and parse it, we could do something like this:
nlohmann::json j;
std::ifstream in("somefile.json");
in >> j; // Read the file and parse it into a json object
// Let's start by retrieving and printing the name.
std::cout << j["author-details"]["name"];
Or, let's assume we found a post, so we want to increment the count of posts. This is one place that things get...less tasteful--we can't increment the value as directly as we'd like; we have to obtain the value, add one, then assign the result (like we would in lesser languages that lack ++):
j["author-details"]["Number of Posts"] = j["author-details"]["Number of Posts"] + 1;
Then we want to write out the result. If we want it "dense" (e.g., we're going to transmit it over a network for some other machine to read it) we can just use <<:
somestream << j;
On the other hand, we might want to pretty-print it so a person can read it more easily. The library respects the width we set with setw, so to have it print out indented with 4-column tab stops, we can do:
somestream << std::setw(4) << j;
Create a new JSON object and add the one you already created as a child.
Just insert code like this after what you've already written:
json_object* root = json_object_new_object();
json_object_object_add(root, "author-details", jobj); // This is the same "jobj" as original code snippet.
Based on the comment from Dominic, I was able to figure out the correct answer.
json_object *jobj = json_object_new_object();
json_object* root = json_object_new_object();
json_object_object_add(jobj, "author-details", root);
json_object *jStr1 = json_object_new_string("Joys of Programming");
json_object *jstr2 = json_object_new_int(10);
json_object_object_add(root,"name", jStr1 );
json_object_object_add(root,"Number of Posts", jstr2 );

Qt and JSON resource parsing - Empty QJSONDocument

I've got a trouble while parsing JSON using QJON objects.
I read a json file of mine referenced in a resource file, read the content and try to initialize a QJSONDocument from the QString I got. And it seems it's not working
Here is the code I use :
QFile myFile(":/mime/iconemapping.json");
myFile.open(QIODevice::ReadOnly);
QJsonDocument jsonContent;
QJsonObject root;
QString jsonString = QString::fromUtf8(myFile.readAll()).simplified();
jsonContent = QJsonDocument::fromJson(jsonString.toUtf8());
root = jsonContent.object();
QJsonObject ext = root["extensions"].toObject();
QStringList listeCle = ext.keys();
int s = listeCle.size();
for (int i = 0; i < listeCle.size(); i++) {
QString cle = listeCle.at(i).toLocal8Bit().constData();
MAP_ICONE_MIME.insert(cle, ext[cle].toString());
}
myFile.close();
Before I try QJSONDocument::fromJson() my jsonString contains : { "extensions" : { ".7z" : ":/mime/7zip.png", ".ace" : ":/mime/ace.png", ".ai" : ":/mime/ai.png", ".eps" : ":/mime/ai.png", ".alg" : ":/mime/algobox.png", ".rar" : ":/mime/archive.png", ".aiff" : ":/mime/audio-x-generic.png"}. (there is more data but I think you get it).
The program doesn't stop unexpectedly but listeCle.size() is always 0.
I tried to access directly to ext[".7z"].toString() but I still get "" as a result.
I probably made an enormous mistake, but until now that's the only JSON parsing that fails in the program.
Would you have any explanation or clue ?
Thank you for everything
So the JSON was not valid. I recommend using http://jsonformatter.curiousconcept.com/ in the future, it's a great website.
{
"extensions":{
".7z":":/mime/7zip.png",
".ace":":/mime/ace.png",
".ai":":/mime/ai.png",
".eps":":/mime/ai.png",
".alg":":/mime/algobox.png",
".rar":":/mime/archive.png",
".aiff":":/mime/audio-x-generic.png"
}
}

JSON parsing with Qt

Hei , I have this text in a JSON :
( without the returns all in one line)
[
{
"ERROR":false,
"USERNAME":"Benutzer",
"FORMAT":"HUMAN",
"LATITUDE_MIN":84,
"LATITUDE_MAX":36,
"LONGITUDE_MIN":5,
"LONGITUDE_MAX":20,
"RECORDS":203
},
[
{
"MMSI":233434540,
"TIME":"2014-10-09 06:19:06 GMT",
"LONGITUDE":8.86037,
"LATITUDE":54.12666,
"COG":347,
"SOG":0,
"HEADING":236,
"NAVSTAT":0,
"IMO":0,
"NAME":"HELGOLAND",
"CALLSIGN":"DK6068",
"TYPE":90,
"A":20,
"B":15,
"C":4,
"D":4,
"DRAUGHT":2,
"DEST":"BREMERHAVEN",
"ETA":"00-00 00:00"
},
{
"MMSI":319072300,
"TIME":"2014-10-09 06:08:53 GMT",
"LONGITUDE":9.71578,
"LATITUDE":54.31949,
"COG":343.6,
"SOG":0,
"HEADING":197,
"NAVSTAT":5,
"IMO":1012189,
"NAME":"M.Y. ESTER III",
"CALLSIGN":"ZGED3",
"TYPE":37,
"A":31,
"B":35,
"C":7,
"D":6,
"DRAUGHT":3.5,
"DEST":"SCHACT AUDORF",
"ETA":"09-16 08:00"
}
// many more lines but the Json IS VALID.
]
]
I would parse it and put that in a MYSQL table.
Not all, only name and MMSI first.
But this don't view anything in my consle because its dont jump in the foreach:
bool ok = true;
// my json data is in reply & ok is a boolean
QVariantList result = parser.parse(reply, &ok).toList();
foreach(QVariant record, result) {
QVariantMap map = record.toMap();
qDebug() << map.value("NAME");
}
What's wrong ?
When i debug, i only see that it doesn't jump in the foreach.
I use the QJson libary : QJson::Parser parser; But please anyone can tell me what i do wrong?
Your code looks like you are iterating over top level array, while the data you are looking for is in the nested array, which is effectively the second item of the top level array. So, you need to iterate over items in the inner array.
The following code works for me with your sample JSON:
QVariantList result = parser.parse(reply, &ok).toList().at(1).toList();
foreach (const QVariant &item, result) {
QVariantMap map = item.toMap();
qDebug() << map["NAME"].toString();
qDebug() << map["MMSI"].toLongLong();
}
If you are using Qt5 or above, you can make use of the awesome features provided by QJsonDocument, QJsonObject and QJsonArray.
I have copied your json into a file named test.txt in my D-drive and the code below works fine.
QJsonDocument jsonDoc;
QByteArray temp;
QFile file("D://test.txt");
if (file.open(QIODevice::ReadOnly | QIODevice::Text))
{
temp = file.readAll();
}
jsonDoc = QJsonDocument::fromJson(temp);
QJsonArray jsonArray = jsonDoc.array().at(1).toArray(); //Since you are interested in the json array which is the second item and not the first json object with error
for(int i =0; i < jsonArray.size(); ++i)
{
QJsonObject jsonObj = jsonArray.at(i).toObject();
int mmsi = jsonObj.find("MMSI").value().toInt();
QString name = jsonObj.find("NAME").value().toString();
qDebug() << mmsi;
qDebug() << name;
}
If you have to stick to Qt4, you can try using the qjson4 library which tries to mimic the behavior of the 'QJsonDocument' which is part of Qt5.