clojure regex to match words and everything inbetween - regex

This is more of a regex question than Clojure, but I am testing it in Clojure.
(re-seq #"\w+" "This is a test. Only a test!")
produces:
("This" "is" "a" "test" "Only" "a" "test")
I want to have this:
("This" " " "is" " " "a" "test" ". " "Only" " " "a" " " "test" "!")
Where I get all the words, but everything else between the words is included too.
I don't care for the period and space if they are seperate "." " " or together ". "
Is this simple to do with a regex?

Try using the following regex:
\w+|\W+
> (re-seq #"\w+|\W+" "This is a test. Only a test!")
("This" " " "is" " " "a" " " "test" ". " "Only" " " "a" " " "test" "!")

You probably could use \b which matches word boundaries and use string/split. The only problem is that it will match the beginning of the string too:
(rest (clojure.string/split "This is a test. Only a test!" #"\b"))
This won't be lazy either.

Related

std::regex - lookahead assertion not always working

I'm writing a module that's making some string substitutions into text to give to a scripting language. The language's syntax is vaugely lisp-y, so expressions are bounded by parentheses and symbols separated by spaces, most of them starting with '$'. A regular expression like this seems like it should give matches at the appropriate symbol boundaries:
auto re_match_abc = std::regex{ "(?=.*[[:space:]()])\\$abc(?=[()[:space:]].*)" };
But in my environment (Visual C++ 2017, 15.9.19, targetting C++-17) it can match strings without a suitable boundary in front of them:
std::cout << " $abc -> " << std::regex_replace(" $abc ", re_match_abc, "***") << std::endl;
std::cout << " ($abc) -> " << std::regex_replace("($abc)", re_match_abc, "***") << std::endl;
std::cout << "xyz$abc -> " << std::regex_replace("xyz$abc ", re_match_abc, "***") << std::endl;
std::cout << " $abcdef -> " << std::regex_replace(" $abcdef", re_match_abc, "***") << std::endl;
// Result from VC++ 2017:
//
// $abc -> ***
// ($abc) -> (***)
// xyz$abc -> xyz*** <= What's going wrong here?
// $abcdef -> $abcdef
Why is that regex ignoring the positive-lookahead requirement to have at least one space or parenthesis before the matching text?
[I realize that there are other ways to do this job and to do it really robustly maybe I should use something to turn the string into a token stream, but for the immediate job I have (and because the person authoring the strings that get processed is sitting next to me, so we can coordinate) I thought that regex replacements would do for now.]
You need to use a positive lookbehind instead. What you really want is this:
auto re_match_abc = std::regex{ "(?<=[[:space:]()])\\$abc(?=[()[:space:]])" };
You can try it out on a website like https://regex101.com/ (just remove the escaped backslash that's required for the C++ string). It explains what each piece of the regex is doing and shows you everything that matches.
Keep in mind that this will also match things like )$abc)
Edit: std::regex apparently does not support lookbehind. For you specific case you might try something like this:
auto re_match_abc = std::regex{ "([[:space:]()])\\$abc(?=[()[:space:]])" };
std::cout << " $abc -> " << std::regex_replace(" $abc ", re_match_abc, "$1***") << std::endl;
std::cout << " ($abc) -> " << std::regex_replace("($abc)", re_match_abc, "$1***") << std::endl;
std::cout << "xyz$abc -> " << std::regex_replace("xyz$abc ", re_match_abc, "$1***") << std::endl;
std::cout << " $abcdef -> " << std::regex_replace(" $abcdef", re_match_abc, "$1***") << std::endl;
output:
$abc -> ***
($abc) -> (***)
xyz$abc -> xyz$abc
$abcdef -> $abcdef
try it here
Here instead of a lookbehind we have a normal capture group. In the replacement we're emitting whatever we captured (a parenthesis or space) followed by the actual string we want to replace $abc with.

When try to insert into table i am getting QSqlError("", "Parameter count mismatch", "") here is my query

QSqlQuery insert_emi_query;
insert_emi_query.prepare("INSERT INTO emi_info (emi-info_id, customer_id, down_payment, emi_start_date, emi_end_date, emi_amount, toatl_emi, intrest_rate, total_emi_amount) "
"VALUES(:emi-info_id, :customer_id, :down_payment, :emi_start_date, :emi_end_date, :emi_amount, :toatl_emi, :intrest_rate, :total_emi_amount)");
insert_emi_query.bindValue(":emi-info_id",emi_id);
insert_emi_query.bindValue(":customer_id",cutomer_id);
insert_emi_query.bindValue(":down_payment",ui->txtEMIDownPayment->text().toInt());
insert_emi_query.bindValue(":emi_start_date",ui->dateEMIStart->date());
insert_emi_query.bindValue(":emi_end_date",ui->dateEMIEnd->date());
insert_emi_query.bindValue(":emi_amount",ui->txtEMIPerMonth->text().toInt());
insert_emi_query.bindValue(":toatl_emi",ui->spinEMI->text().toInt());
insert_emi_query.bindValue(":intrest_rate",ui->txtEMIRate->text().toInt());
insert_emi_query.bindValue(":total_emi_amount",ui->txtEMIAfterPayment->text().toInt());
if(insert_emi_query.exec()){
qDebug() << "EMI Info Added---------------------";
}else{
qDebug() << "EMi not inserted" << insert_emi_query.lastError();
}
:emi-info_id is not a valid parameter name (- is not allowed in unquoted identifiers, and parameter names cannot be quoted).
I was not able to edit your question. May be you want to edit it yourself, using the following snippet, to improve comprehension.
1) Is the dash symbol in "emi-info_id" right? Most databases do not allow a dash in a column name.
2) May be you want to check if you have specified values for all mandatory columns of the table.
QSqlQuery insert_emi_query;
insert_emi_query.prepare(" "
" "
"INSERT INTO emi_info ( "
" emi-info_id, customer_id, down_payment, emi_start_date, "
" emi_end_date, emi_amount, toatl_emi, intrest_rate, "
" total_emi_amount) "
" "
" "
"VALUES( "
" :emi-info_id, :customer_id, :down_payment, :emi_start_date, "
" :emi_end_date, :emi_amount, :toatl_emi, :intrest_rate, "
" :total_emi_amount) "
" "
);
insert_emi_query.bindValue(":emi-info_id",emi_id);
insert_emi_query.bindValue(":customer_id",cutomer_id);
insert_emi_query.bindValue(":down_payment",ui->txtEMIDownPayment->text().toInt());
insert_emi_query.bindValue(":emi_start_date",ui->dateEMIStart->date());
insert_emi_query.bindValue(":emi_end_date",ui->dateEMIEnd->date());
insert_emi_query.bindValue(":emi_amount",ui->txtEMIPerMonth->text().toInt());
insert_emi_query.bindValue(":toatl_emi",ui->spinEMI->text().toInt());
insert_emi_query.bindValue(":intrest_rate",ui->txtEMIRate->text().toInt());
insert_emi_query.bindValue(":total_emi_amount",ui->txtEMIAfterPayment->text().toInt());
if(insert_emi_query.exec()) {
qDebug() << "EMI Info Added---------------------";
}
else{
qDebug() << "EMi not inserted" << insert_emi_query.lastError();
}
It seems that column name emi-info_id is typo in you query. It should be emi_info_id. Because column name with - is invalid.
If you are using Qt5(I didn't know what is there in qt4) or above then you can get error type like
insert_emi_query.lastError().type()
In above case it should be QSqlError::StatementError because there is typo in name of column.

vim syntax checking whitespace

I want configuring my .vimrc for do somes auto syntax checking.
That is my problem, i want auto change somes syntax by another.
I deal with the specific caracter in computer programation like = ; , . ( { [ <.
An exemple it's better than words :
void bibi(int param1,char *words)
{
unsigned int locale=param;
cout<<words<<endl;
}
became :
void bibi( int param1,char* words)
{
unsigned int locale = param;
cout << words << endl;
}
Just formating with add or remove some whitespaces.
I write this :
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
" Formating of text in code
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
function! ChangeSpaces()
"" search and replace "= " or " =" or "= " to " = "
silent! %s/\s*[=]\s*/ = /g
endfunction
""autocmd CursorMovedI * call ChangeSpaces()
""autocmd BufWrite * call ChangeSpaces()
autocmd FileAppendPre * call ChangeSpaces()
But i have not the result, in this case, if i write " i=e" , they do nothing but if i write 'i= ', it's work, the regex doesn't run, they replace after the end of the "pattern".
By the way if you have a more "sexy way" to do what i want, let me know.
In fact, when i want add some other specific caracter the code became :
"function! ChangeSpaces()
"" search and replace "= " or " =" or "= " to " = "
"silent! %s/\s*[=]\s*/ = /g
""" search and replace "( " or " (" or "(" to " ( "
"" silent! %s/\s*[(]\s*/ ( /g
""" search and replace "[ " or " [" or "[" to " [ "
"" silent! %s/\s*[[]\s*/ [ /g
""" search and replace ", " or " ," or "," to " , "
"" silent! %s/\s*[,]\s*/ , /g
""" search and replace "== " or " ==" or "==" to " == "
"" silent! %s/\s*[==]\s*/ = /g
""" search and replace "> " or " >" or ">" to " > "
"" silent! %s/\s*[>]\s*/ > /g
""" search and replace ">= " or " >=" or ">=" to " >= "
" silent! %s/\s*[>=]\s*/ >= /g
""" search and replace "< " or " <" or "<" to " < "
"" silent! %s/\s*[<]\s*/ < /g
""" search and replace "<= " or " <=" or "<=" to " <= "
"" silent! %s/\s*[=]\s*/ <= /g
"" let repl=substitute(cline,\s*[= ]\s*," = ", "g")
"" call setline(".",repl)
"" let cline=line(".")
"" let ccol=col(".")
"" call cursor(cline, ccol)
"endfunction
""autocmd CursorMovedI * call ChangeSpaces()
""autocmd BufWrite * call ChangeSpaces()
"autocmd FileAppendPre * call ChangeSpaces()
Best regards.
PS: my bad, i want this kind of formating, for every language i use, not just C++.
What about filtering your file through an external C++ indenter? While GNU indent says it was not designed for C++ it works reasonably well. If it doesn't, you might try astyle. Then all you have to do is
map <F8> :w<CR>m':%!astyle<CR>`'
That way even folks using other editors can use the same indent style.

Constructing boost regex

I want to match every single number in the following string:
-0.237522264173E+01 0.110011117918E+01 0.563118085683E-01 0.540571836345E-01 -0.237680494785E+01 0.109394729137E+01 -0.237680494785E+01 0.109394729137E+01 0.392277532367E+02 0.478587433035E+02
However, for some reason the following boost::regex doesn't work:
(.*)(-?\\d+\\.\\d+E\\+\\d+ *){10}(.*)
What's wrong with it?
EDIT: posting relevant code:
std::ifstream plik("chains/peak-summary.txt");
std::string mystr((std::istreambuf_iterator<char>(plik)), std::istreambuf_iterator<char>());
plik.close();
boost::cmatch what;
boost::regex expression("(.*)(-?\\d+\\.\\d+E\\+\\d+ *){10}(.*)");
std::cout << "String to match against: \"" << mystr << "\"" << std::endl;
if(regex_match(mystr.c_str(), what, expression))
{
std::cout << "Match!";
std::cout << std::endl << what[0] << std::endl << what[1] << std::endl;
} else {
std::cout << "No match." << std::endl;
}
output:
String to match against: " -0.237555275450E+01 0.109397523269E+01 0.560420828508E-01 0.556732715285E-01 -0.237472295761E+01 0.110192835331E+01 -0.237472295761E+01 0.110192835331E+01 0.393040553508E+02 0.478540190640E+02
"
No match.
Also posting the contents of file read into the string:
[dare2be#schroedinger multinest-peak]$ cat chains/peak-summary.txt
-0.237555275450E+01 0.109397523269E+01 0.560420828508E-01 0.556732715285E-01 -0.237472295761E+01 0.110192835331E+01 -0.237472295761E+01 0.110192835331E+01 0.393040553508E+02 0.478540190640E+02
The (.*) around your regex match and consume all text at the start and end of the string, so if there are more than ten numbers, the first ones won't be matched.
Also, you're not allowing for negative exponents.
(-?\\d\\.\\d+E[+-]\\d+ *){10,}
should work.
This will match all of the numbers in a single string; if you want to match each number separately, you have to use (-?\\d\\.\\d+E[+-]\\d+) iteratively.
Try with:
(-?[0-9]+\\.[0-9]+E[+-][0-9]+)
Your (.*) in the beggining matches greedy whole string.

How can I access all matches of a repeated capture group, not just the last one?

My code is:
#include <boost/regex.hpp>
boost::cmatch matches;
boost::regex_match("alpha beta", matches, boost::regex("([a-z])+"));
cout << "found: " << matches.size() << endl;
And it shows found: 2 which means that only ONE occurrence is found… How to instruct it to find THREE occurrences? Thanks!
You should not call matches.size() before verifying that something was matched, i.e. your code should look rather like this:
#include <boost/regex.hpp>
boost::cmatch matches;
if (boost::regex_match("alpha beta", matches, boost::regex("([a-z])+")))
cout << "found: " << matches.size() << endl;
else
cout << "nothing found" << endl;
The output would be "nothing found" because regex_match tries to match the whole string. What you want is probably regex_search that is looking for substring. The code below could be a bit better for you:
#include <boost/regex.hpp>
boost::cmatch matches;
if (boost::regex_search("alpha beta", matches, boost::regex("([a-z])+")))
cout << "found: " << matches.size() << endl;
else
cout << "nothing found" << endl;
But will output only "2", i.e. matches[0] with "alpha" and matches[1] with "a" (the last letter of alpha - the last group matched)
To get the whole word in the group you have to change the pattern to ([a-z]+) and call the regex_search repeatedly as you did in your own answer.
Sorry to reply 2 years late, but if someone googles here as I did, then maybe it will be still useful for him...
This is what I've found so far:
text = "alpha beta";
string::const_iterator begin = text.begin();
string::const_iterator end = text.end();
boost::match_results<string::const_iterator> what;
while (regex_search(begin, end, what, boost::regex("([a-z]+)"))) {
cout << string(what[1].first, what[2].second-1);
begin = what[0].second;
}
And it works as expected. Maybe someone knows a better solution?
This works for me, maybe somebody will find it usefull..
std::string arg = "alpha beta";
boost::sregex_iterator it{arg.begin(), arg.end(), boost::regex("([a-z])+")};
boost::sregex_iterator end;
for (; it != end; ++it) {
std::cout << *it << std::endl;
}
Prints:
alpha
beta