how do I parse text file into variables using regex c++? - c++

Please help me fulfill my dreams of turning this sequence into a meaningful output. :)
See regex in action, it works!: http://regex101.com/r/iM4yN2/1
Now all I need is to know how to use it. If I could put this into a multidimensional array e.g. configFile[0][0] = [Tuner,] that would work. Or if I could turn this into a comma separated list, I could then parse that again and put it into arrays and finally out to individual variables. Anyway, you don't need to spell out how to actually assign the variables, I'll create another question if I really need help with that. Mainly I need help with the use of regex functions and outputting data into SOME variable where I can access the various text on either side of the = sign per line.
regex:
^[\t ]*(.*?)\s*=[\t ]*(.*?)(#.*)?$
test string:
### MODULES ###
Tuner =
PitchDetector = 0
PhaseLocker = 0
FileOutput = 1
### FILE MANAGER ###
RenameFile_AvgFreq = dfgsdfg dsf gdfs g #gdrgk
RenameFile_NoteName = 0
RenameFile_Prefix = "The String Is Good"
RenameFile_Suffix = ""
OutputFolder = "..\Folder\String\"
### PITCH DETECTOR ###
AnalysisChannel = 1 #int starting from 1
BlockSize = 8 #power of 2
Overlap = 16 #power of 2
NormalizeForDetection = 0
### TUNER ###
Smoothing = 0.68
Envelope = 0.45
### PHASELOCKER ###
FFTSize = 1024 #powert of 2
FFTOverlap = 54687
WindowType = 0
MaxFreq = 5000
my variables:
//Modules
bool Tuner;
bool PitchDetector;
bool PhaseLocker;
bool FileOutput;
//File Manager
bool RenameFile_AvgFreq;
bool RenameFile_NoteName;
std::string RenameFile_Prefix;
std::string RenameFile_Suffix;
std::string OutputFolder;
//Pitch Detector
int AnalysisChannel;
int BlockSize;
int Overlap;
bool NormalizeForDetection;
//Tuner
float Smoothing;
float Envelope;
//Phaselocker
int FFTSize;
int FFTOverlap;
int FFTWindowType;
float FFTMaxFreq;
final notes: i spent a long time looking at c++ regex functions... very confusing stuff. I know how to do this in python without thinking twice.

Include the following:
#include <string>
#include <regex>
Declare a string and regex type:
std::string s;
std::regex e;
In your main function, assign string and regex variables and call regex function (you could assign the variables when you declare them as well):
int main()
{
s="i will only 349 output 853 the numbers 666"
e="(\\d+)"
s = std::regex_replace(s, e, "$1\n", std::regex_constants::format_no_copy);
return 0;
}
Notice how I am putting the results right back into the string (s). Of course, you could use a different string to store the result. The "std::regex_constants::format_no_copy" is a flag that tells the regex function to output only "substrings" aka group matches. Also notice how I am using double slash on the "\d+". Try double slashes if your regex pattern isn't working.
To find key/value pairs with regex, e.g. "BlockSize = 1024", you could create a pattern such as:
BlockSize\s*=\s*((?:[\d.]+)|(?:".*"))
in c++ you could create that regex pattern with:
expr = key+"\\s*=\\s*((?:[\\d.]+)|(?:\".*\"))";
and return the match with:
config = std::regex_replace(config, expr, "$1", std::regex_constants::format_no_copy);
and put it all together in a function with the ability to return a default value:
std::string Config_GetValue(std::string key, std::string config, std::string defval)
{
std::regex expr;
match = key+"\\s*=\\s*((?:[\\d.]+)|(?:\".*\"))";
config = std::regex_replace(config, expr, "$1", std::regex_constants::format_no_copy);
return config == "" ? defval : config;
}
FULL CODE (using std::stoi and std::stof to convert string to number when needed, and using auto type because right-hand side (RHS) makes it clear what the type is):
#include "stdafx.h"
#include <string>
#include <regex>
#include <iostream>
std::string Config_GetValue(std::string key, std::string config, std::string defval)
{
std::regex expr;
match = key+"\\s*=\\s*((?:[\\d.]+)|(?:\".*\"))";
config = std::regex_replace(config, expr, "$1", std::regex_constants::format_no_copy);
return config == "" ? defval : config;
}
int main()
{
//test string
std::string s = " ### MODULES ###\nTuner = \n PitchDetector = 1\n PhaseLocker = 0 \nFileOutput = 1\n\n### FILE MANAGER ###\nRenameFile_AvgFreq = dfgsdfg dsf gdfs g #gdrgk\nRenameFile_NoteName = 0\n RenameFile_Prefix = \"The String Is Good\"\nRenameFile_Suffix = \"\"\nOutputFolder = \"..\\Folder\\String\\\"\n\n### PITCH DETECTOR ###\nAnalysisChannel = 1 #int starting from 1\nBlockSize = 1024 #power of 2\nOverlap = 16 #power of 2\nNormalizeForDetection = 0\n\n### TUNER ###\nSmoothing = 0.68\nEnvelope = 0.45\n\n### PHASELOCKER ###\nFFTSize = 1024 #powert of 2\nFFTOverlap = 54687\nWindowType = 0\nMaxFreq = 5000";
//Modules
auto FileOutput = stoi(Config_GetValue("FileOutput", s, "0"));
auto PitchDetector = stoi(Config_GetValue("PitchDetector", s, "0"));
auto Tuner = stoi(Config_GetValue("Tuner", s, "0"));
auto PhaseLocker = stoi(Config_GetValue("PhaseLocker", s, "0"));
//File Manager
auto RenameFile_AvgFreq = stoi(Config_GetValue("RenameFile_AvgFreq", s, "0"));
auto RenameFile_NoteName = stoi(Config_GetValue("RenameFile_NoteName", s, "0"));
auto RenameFile_Prefix = Config_GetValue("RenameFile_Prefix", s, "");
auto RenameFile_Suffix = Config_GetValue("RenameFile_Suffix", s, "");
auto OutputFolder = Config_GetValue("FileOutput", s, "");
//Pitch Detector
auto AnalysisChannel = stoi(Config_GetValue("AnalysisChannel", s, "1"));
auto BlockSize = stoi(Config_GetValue("BlockSize", s, "4096"));
auto Overlap = stoi(Config_GetValue("Overlap", s, "8"));
auto NormalizeForDetection = stoi(Config_GetValue("NormalizeForDetection", s, "0"));
//Tuner
auto Smoothing = stof(Config_GetValue("Smoothing", s, ".5"));
auto Envelope = stof(Config_GetValue("Envelope", s, ".3"));
auto TransientTime = stof(Config_GetValue("TransientTime", s, "0"));
//Phaselocker
auto FFTSize = stoi(Config_GetValue("FFTSize", s, "1"));
auto FFTOverlap = stoi(Config_GetValue("FFTOverlap", s, "1"));
auto FFTWindowType = stoi(Config_GetValue("FFTWindowType", s, "1"));
auto FFTMaxFreq = stof(Config_GetValue("FFTMaxFreq", s, "0.0"));
std::cout << "complete";
return 0;
}

Another way of doing this is with regex_iterator:
#include <regex>
using std::regex;
using std::sregex_iterator;
void CreateConfig(string config)
{
//group 1,2,3,4,5 = key,float,int,string,bool
regex expr("^[\\t ]*(\\w+)[\\t ]*=[\\t ]*(?:(\\d+\\.+\\d+|\\.\\d+|\\d+\\.)|(\\d+)|(\"[^\\r\\n:]*\")|(TRUE|FALSE))[^\\r\\n]*$", std::regex_constants::icase);
for (sregex_iterator it(config.begin(), config.end(), expr), itEnd; it != itEnd; ++it)
{
if ((*it)[2] != "") cout << "FLOAT -> " << (*it)[1] << " = " <<(*it)[2] << endl;
else if ((*it)[3] != "") cout << "INT -> " << (*it)[1] << " = " <<(*it)[3] << endl;
else if ((*it)[4] != "") cout << "STRING -> " << (*it)[1] << " = " <<(*it)[4] << endl;
else if ((*it)[5] != "") cout << "BOOL -> " << (*it)[1] << " = " << (*it)[5] << endl;
}
}
int main()
{
string s = "what = 1\n: MODULES\nFileOutput = \"on\" :bool\nPitchDetector = TRuE :bool\nTuner = on:bool\nHarmSplitter = off:bool\nPhaseLocker = on\n\nyes\n junk output = \"yes\"\n\n: FILE MANAGER\nRenameFile AvgFreq = 1 \nRenameFile_NoteName = 0 :bool\nRenameFile_Prefix = \"The Strin:g Is Good\" :string\nRenameFile_Suffix = \"\":string\nOutputFolder = \"..\\Folder\\String\\\" :relative path\n\n: PITCH DETECTOR\nAnalysisChannel = 1 :integer starting from 1\nBlockSize = 8 :power of 2\nOverlap = 16 :power of 2\nNormalizeForDetection = 0 :bool\n\n: TUNER\nSmoothing = 0.68 :float\nEnvelope = 0.45 :float\n\n: PHASE LOCKER\nFFTSize = 1024 :power of 2\nFFTOverlap = 54687 :power of 2\nWindowType = 0 :always set to 0\nMaxFreq = 5000 :float";
CreateConfig(s);
return 0;
}
Let's break this down. The regex expression I created uses a ^regexy stuff goes here$ format so that each line of text is considered individually: ^=start of line, $=end of line. The regex looks for: variable_name = decimal OR number OR string OR (true OR false). Because each type is stored in its own group, we know what type every match is going to be.
To explain the for loop, I will write the code a few different ways
//You can declare more than one variable of the same type:
for (sregex_iterator var1(str.begin(), str.end(), regexExpr), var2); var1 != var2; var1++)
//Or you can delcare it outside the for loop:
sregex_iterator var1(str.begin(), str.end(), regexExpr);
sregex_iterator var2;
for (; var1 != var2; var1++)
//Or the more classic way:
sregex_iterator var1(str.begin(), str.end(), regexExpr);
for (sregex_iterator var2; var1 != var2; var1++)
Now for the body of the for loop. It says "If group2 is not blank, print group 2 which is a float. If gorup3 is not blank, print group3 which is an int. If group4 is not blank, print group 4 which is a string. If group5 is not blank, print group5 which is a bool. When inside a loop, the syntax is:
//group0 is some kind of "currently evaluating" string plus group matches.
//group1 is my key group
//group2/3/4/5 are my values groups float/int/string/bool.
theString = (*iteratorVariableName)[groupNumber]

Related

TDLib: how to send bold text in the message? (ะก++)

Using the official TDLib C++ example, I'm trying to send a message with formatted markdown text.
Here's my code:
auto send_message = td_api::make_object<td_api::sendMessage>();
send_message->chat_id_ = -1001424068198;
auto message_content = td_api::make_object<td_api::inputMessageText>();
std::string text = "Hello! **how are u?**";
message_content->text_ = td_api::make_object<td_api::formattedText>();
message_content->text_->text_ = std::move(text);
send_message->input_message_content_ = std::move(message_content);
send_query(std::move(send_message), {});
I expect to see "Hello! how are u?" but the message comes as it is written in the code, without markdown formatting applied.
I spent hours on google trying to figure out how to force TDLib to parse it.
UPDATE: SOLVED!
Thanks Azeem for help!
Using this example, the following code should send the parsed message (tested in VS 2019)
void sendMsg(INT64 chatID, INT64 ReplyTo, const char* textMsg) {
const std::string text = textMsg;
auto textParseMarkdown = td_api::make_object<td_api::textParseModeMarkdown>(2);
auto parseTextEntities = td_api::make_object<td_api::parseTextEntities>(text, std::move(textParseMarkdown));
td::Client::Request parseRequest{ 123, std::move(parseTextEntities) };
auto parseResponse = td::Client::execute(std::move(parseRequest));
if (parseResponse.object->get_id() == td_api::formattedText::ID) {
auto formattedText = td_api::make_object<td_api::formattedText>();
formattedText = td_api::move_object_as<td_api::formattedText>(parseResponse.object);
auto send_message = td_api::make_object<td_api::sendMessage>();
send_message->chat_id_ = chatID;
auto message_content = td_api::make_object<td_api::inputMessageText>();
message_content->text_ = std::move(formattedText);
send_message->input_message_content_ = std::move(message_content);
send_message->reply_to_message_id_ = ReplyTo;
send_query(std::move(send_message), {});
}
}
You can use td_api::textParseModeMarkdown, td_api::parseTextEntities and td::Client::execute() like this:
using namespace td;
const std::string text = "*bold* _italic_ `code`";
auto textParseMarkdown = td_api::make_object<td_api::textParseModeMarkdown>( 2 );
auto parseTextEntities = td_api::make_object<td_api::parseTextEntities>( text, std::move( textParseMarkdown ) );
td::Client::Request parseRequest { 123, std::move( parseTextEntities ) };
auto parseResponse = td::Client::execute( std::move( parseRequest ) );
auto formattedText = td_api::make_object<td_api::formattedText>();
if ( parseResponse.object->get_id() == td_api::formattedText::ID )
{
formattedText = td_api::move_object_as<td_api::formattedText>( parseResponse.object );
}
else
{
std::vector<td_api::object_ptr<td_api::textEntity>> entities;
formattedText = td_api::make_object<td_api::formattedText>( text, std::move(entities) );
}
std::cout << td_api::to_string( formattedText ) << '\n';
For debugging purposes, you can use td_api::to_string() to dump the contents of an object. For example, dumping parseTextEntities like this:
std::cout << td_api::to_string( parseTextEntities ) << '\n';
would give this:
parseTextEntities {
text = "*bold* _italic_ `code`"
parse_mode = textParseModeMarkdown {
version = 2
}
}

Search a string against multiple string arrays

I have a input string and need to run through it and see if it matches certain words. I have multiple string arrays but not sure whats an efficient way to check the string agianst all the arrays.
String Arrays:
string checkPlayType(string printDescription)
{
const string DeepPassRight[3] = {"deep" , "pass" , "right"};
const string DeepPassLeft[3] = {"deep" , "pass" , "left"};
const string DeepPassMiddle[3] = {"deep" , "pass" , "middle"};
const string ShortPassRight[3] = {"short" , "pass" , "right"};
const string ShortPassLeft[3] = {"short" , "pass" , "left"};
const string ShortPassMiddle[3] = {"short" , "pass" , "middle"};
//Must contain right but not pass
const string RunRight = "right";
//Must contain right but not pass
const string RunLeft = "left";
//Must contain middle but not pass
const string RunMiddle = "middle";
const string FieldGoalAttempt[2] = {"field" , "goal" };
const string Punt = "punt";
}
Sample Input: (13:55) (Shotgun) P.Manning pass incomplete short right to M.Harrison.
Assuming this is our only input...
Sample Output:
Deep Pass Right: 0%
Deep Pass Left: 0%
Deep Pass Middle: 0%
Short Pass Right: 100%
Shor Pass Left:0%
...
..
..
you may want something similar to:
void checkPlayType(const std::vector<std::string>& input)
{
std::set<std::string> s;
for (const auto& word : input) {
s.insert(word);
}
const bool deep_present = s.count("deep");
const bool pass_present = s.count("pass");
const bool right_present = s.count("right");
const bool left_present = s.count("left");
// ...
if (deep_present && pass_present && right_present) { /* increase DeepPassRight counter */}
if (deep_present && pass_present && left_present) { /* increase DeepPassLeft counter */}
// ...
}
Try regular expressions:
if found "pass" then
if regexp "(deep|short).*(left|right|middle)"
Hooray!
else if regexp "(left|right|middle).*(deep|short)"
Hooray!
else
Aye, Caramba!
else
Aye, Caramba!
You can go over your arrays and search for the words are stored in the array within the input string. Use std functions for better performance. For example:
const string DeepPassRight[3] = {"deep" , "pass" , "right"};
int i = 0;
for(;i<3;i++)
{
string s = " ";
s.append(DeepPassRight[i]);
s.append(" ");
std::size_t found = printDescription.find(s);
if (found ==std::string::npos)
break;
}
if(i == 3)
// printDescription contains all DeepPassRight's members!
if(i== 2)
// just two words were found

extract domain between two words

I have in a log file some lines like this:
11-test.domain1.com Logged ...
37-user1.users.domain2.org Logged ...
48-me.server.domain3.net Logged ...
How can I extract each domain without the subdomains? Something between "-" and "Logged".
I have the following code in c++ (linux) but it doesn't extract well. Some function which is returning the extracted string would be great if you have some example of course.
regex_t preg;
regmatch_t mtch[1];
size_t rm, nmatch;
char tempstr[1024] = "";
int start;
rm=regcomp(&preg, "-[^<]+Logged", REG_EXTENDED);
nmatch = 1;
while(regexec(&preg, buffer+start, nmatch, mtch, 0)==0) /* Found a match */
{
strncpy(host, buffer+start+mtch[0].rm_so+3, mtch[0].rm_eo-mtch[0].rm_so-7);
printf("%s\n", tempstr);
start +=mtch[0].rm_eo;
memset(host, '\0', strlen(host));
}
regfree(&preg);
Thank you!
P.S. no, I cannot use perl for this because this part is inside of a larger c program which was made by someone else.
EDIT:
I replace the code with this one:
const char *p1 = strstr(buffer, "-")+1;
const char *p2 = strstr(p1, " Logged");
size_t len = p2-p1;
char *res = (char*)malloc(sizeof(char)*(len+1));
strncpy(res, p1, len);
res[len] = '\0';
which is extracting very good the whole domain including subdomains.
How can I extract just the domain.com or domain.net from abc.def.domain.com ?
is strtok a good option and how can I calculate which is the last dot ?
#include <vector>
#include <string>
#include <boost/regex.hpp>
int main()
{
boost::regex re(".+-(?<domain>.+)\\s*Logged");
std::string examples[] =
{
"11-test.domain1.com Logged ...",
"37-user1.users.domain2.org Logged ..."
};
std::vector<std::string> vec(examples, examples + sizeof(examples) / sizeof(*examples));
std::for_each(vec.begin(), vec.end(), [&re](const std::string& s)
{
boost::smatch match;
if (boost::regex_search(s, match, re))
{
std::cout << match["domain"] << std::endl;
}
});
}
http://liveworkspace.org/code/1983494e6e9e884b7e539690ebf98eb5
something like this with boost::regex. Don't know about pcre.
Is the in a standard format?
it appears so, is there a split function?
Edit:
Here is some logic.
Iterate through each domain to be parsed
Find a function to locate the index of the first string "-"
Next find the index of the second string minus the first string "Logged"
Now you have the full domain.
Once you have the full domain "Split" the domain into your object of choice (I used an array)
now that you have the array broken apart locate the index of the value you wish to reassemble (concatenate) to capture only the domain.
NOTE Written in C#
Main method which defines the first value and the second value
`static void Main(string[] args)
{
string firstValue ="-";
string secondValue = "Logged";
List domains = new List { "11-test.domain1.com Logged", "37-user1.users.domain2.org Logged","48-me.server.domain3.net Logged"};
foreach (string dns in domains)
{
Debug.WriteLine(Utility.GetStringBetweenFirstAndSecond(dns, firstValue, secondValue));
}
}
`
Method to parse the string:
`public string GetStringBetweenFirstAndSecond(string str, string firstStringToFind, string secondStringToFind)
{
string domain = string.Empty;
if(string.IsNullOrEmpty(str))
{
//throw an exception, return gracefully, whatever you determine
}
else
{
//This can all be done in one line, but I broke it apart so it can be better understood.
//returns the first occurrance.
//int start = str.IndexOf(firstStringToFind) + 1;
//int end = str.IndexOf(secondStringToFind);
//domain = str.Substring(start, end - start);
//i.e. Definitely not quite as legible, but doesn't create object unnecessarily
domain = str.Substring((str.IndexOf(firstStringToFind) + 1), str.IndexOf(secondStringToFind) - (str.IndexOf(firstStringToFind) + 1));
string[] dArray = domain.Split('.');
if (dArray.Length > 0)
{
if (dArray.Length > 2)
{
domain = string.Format("{0}.{1}", dArray[dArray.Length - 2], dArray[dArray.Length - 1]);
}
}
}
return domain;
}
`

Searching a C++ string and strip off text if present

I have to handle two types of string:
// get application name is simple function which returns application name.
// This can be debug version or non debug version. So return value for this
// function can be for eg "MyApp" or "MyApp_debug".
string appl = getApplicationName();
appl.append("Info.conf");
cout << "Output of string is " << appl << endl;
In above code appl is MyAppInfo.conf or MyAppInfo_debug.conf.
My requirement is whether it is debug or non-debug version I should have output of only one i.e., MyAppInfo.conf. How can we check for _debug in string and if present and how do we strip of that so that we always get output string as MyAppInfo.conf?
I would wrap getApplicationName() and call the wrapper instead:
std::string getCanonicalApplicationName()
{
const std::string debug_suffix = "_debug";
std::string application_name = getApplicationName();
size_t found = application_name.find(debug_suffix);
if (found != std::string::npos)
{
application_name.replace(found, found + debug_suffix.size(), "");
}
return application_name;
}
See the documentation for std::string::find() and std::string::replace().
string appl = getApplicationName(); //MyAppInfo.conf or MyAppInfo_debug.conf.
size_t pos = appl.find("_debug");
if ( pos != string::npos )
appl = appl.erase(pos, 6);
cout << appl;
Output is always:
MyAppInfo.conf
See sample output : http://www.ideone.com/x6ZRN

Remove '\n.\n' C++

If i have a string as such
"I am not here... \n..Hello\n.\n....Whats happening"
I want to replace the above string so:
"I am not here... \n..Hello\n. \n....Whats happening"
^ Space added
Just a bit of a background on what im doing. Im using sendmail in C++ and \n.\n is End Of Message Equivalent of sendmail. I just created a class that uses sendmail to send mails. but obviously if the user from the outsite gives sendmail that command i want it to be removed. Here is my message function just incase.:
//Operator to add to the message
void operator<<(string imessage){
if (imessage != ""){ message += imessage; }
}
How would i go about doing this. Thanks in advance :D
This is my last version :)
This code handles the case mentioned by #Greg Hewgill
string& format_text(string& str)
{
const string::size_type dot_offset = 2;
string::size_type found_at_start = str.find("\n.\n"),
found_at = str.find("\n.\n");
if(found_at_start != string::npos)
str.insert(0, " ");
while(found_at != string::npos)
{
str.insert(found_at+dot_offset+1, " ");
found_at = str.find("\n.\n", found_at+1);
}
return str;
}
int main()
{
string text = ".\nn\n.\nn";
std::cout << format_text(text);
}
Look up String.find and String.replace
For example (not tested)
string endOfMessage = "\n.\n";
string replacement = "\n. \n";
size_t position;
while (position = message.find(endOfMessage))
{
message.replace(position, endOfMessage.length(), replacement);
}
This is derived from Dan McG's answer so upvote him ;)
string endOfMessage = "\n.\n";
string replacement = "\n. \n";
size_t position;
while (position = message.find(endOfMessage, position) != message.npos)
{
message.replace(position, endOfMessage.length(), replacement);
position += replacement.length();
}
Boost has Boost.Regex (a regular expression module). Might be overkill if this is the only replacement you need to do.
Use std::search and the insert method of sequence containers such as string, deque, or whatever you use to store the message text.
typedef std::string::iterator SIter; // or whatever container you use
static const char *end_seq = "\n.\n";
for ( SIter tricky_begin = msg.begin();
tricky_begin = std::search( tricky_begin, msg.end(), end_seq, end_seq+3 ),
tricky_begin != msg.end(); ) {
tricky_begin = msg.insert( tricky_begin + 2, ' ' );
}