How to extract a string that is present between two brackets? - c++

For example if the string is:
XYZ ::[1][20 BB EC 45 40 C8 97 20 84 8B 10]
The output should be:
20 BB EC 45 40 C8 97 20 84 8B 10
int main()
{
char input = "XYZ ::[1][20 BB EC 45 40 C8 97 20 84 8B 10]";
char output[500];
// what to write here so that i can get the desired output as:
// output = "20 BB EC 45 40 C8 97 20 84 8B 10"
return 0;
}

In C, you could do this with a scanset conversion (though it's a bit RE-like, so the syntax gets a bit strange):
sscanf(input, "[%*[^]]][%[^]]]", second_string);
In case you're wondering how that works, the first [ matches an open bracket literally. Then you have a scanset, which looks like %[allowed_chars] or %[^not_allowed_chars]. In this case, you're scanning up to the first ], so it's %[^]]. In the first one, we have a * between the % and the rest of the conversion specification, which means sscanf will try to match that pattern, but ignore it -- not assign the result to anything. That's followed by a ] that gets matched literally.
Then we repeat essentially the same thing over again, but without the *, so the second data that's matched by this conversion gets assigned to second_string.
With the typo fixed and a bit of extra code added to skip over the initial XYZ ::, working (tested) code looks like this:
#include <stdio.h>
int main() {
char *input = "XYZ ::[1][20 BB EC 45 40 C8 97 20 84 8B 10]";
char second_string[64];
sscanf(input, "%*[^[][%*[^]]][%[^]]]", second_string);
printf("content: %s\n", second_string);
return 0;
}

Just find the second [ and start extracting (or just printing) until next ]....

You can use string::substr if you are willing to convert to std::string
If you don't know the location of brackets, you can use string::find_last_of for the last bracket and again string::find_last_of to find the open bracket.

Well, say, your file looks like this:
XYZ ::[1][20 BB EC 45 40 C8 97 20 84 8B 10]
XYZ ::[1][Maybe some other text]
XYZ ::[1][Some numbers maybe: 123 98345 123 9-834 ]
XYZ ::[1][blah-blah-blah]
The code that will extract the data will look something like this:
#include <iostream>
#include <fstream>
using namespace std;
int main()
{
//opening the file to read from
std::ifstream file( "in.txt" );
if( !file.is_open() )
{
cout << "Cannot open the file";
return -1;
}
std::string in, out;
int blockNumber = 1;//Which bracket block we are looking for. We are currently looking for the second one.
while( getline( file, in ) )
{
int n = 0;//Variable for storing index in the string (where our target text starts)
int i = 0;//Counter for [] blocks we have encountered.
while( i <= blockNumber )
{
//What we are doing here is searching for the position of [ symbol, starting
//from the n + 1'st symbol of the string.
n = in.find_first_of('[', n + 1);
i++;
}
//Getting our data and printing it.
out = in.substr( n + 1, ( in.find_first_of(']', n) - n - 1) );
std::cout << out << std::endl;
}
return 0;
}
The output after executing this will be:
20 BB EC 45 40 C8 97 20 84 8B 10
Maybe some other text
Some numbers maybe: 123 98345 123 9-834
blah-blah-blah

The simplest solution is something along the lines of:
std::string
match( std::string const& input )
{
static boost::regex const matcher( ".*\\[[^]]*\\]\\[(.*)\\]" );
boost::smatch matched;
return regex_match( input, matched, matcher )
? matched[1]
: std::string();
}
The regular expression looks a bit complicated because you need to match
meta-characters, and because the compiler I use doesn't support raw
strings yet. (With raw strings, I think the expression would be
R"^(.*\[[^]]\]\[(.*)\])^". But I can't verify that.)
This returns an empty string in case there is no match; if you're sure
about the format, you might prefer to throw an exception. You can also
extend it to do as much error checking as necessary: in general, the
more you validate a text input, the better it is, but you didn't give
precise enough information about what was legal for me to fill it out
completely. (For your example string, for example, you might replace
the ".*" at the beginning of the regular expression with
"\\u{3}\\s*::": three upper case characters followed by zero or more
whitespace, then two ':'. Or the first [] group might be
"\\[\\d\\]", if you're certain it's always a single digit.

This could work for you in a very specific sense:
std::string str(input);
std::string output(input.find_last_of('['), input.find_last_of(']'));
out = output.c_str();
The syntax isnt quite correct so you will need to look that up. You probably need to define your question a little better as well as this will only work if you want the brcketed string at the end.

Using string library in C. I'll give a code snippet that process a single linewhich can be used in a loop that reads the file line by line. NOTE: string.h should be included
int length = strlen( input );
char* output = 0;
// Search
char* firstBr = strchr( input, '[' );
if( 0 != firstBr++ ) // check for null pointer
{
char* secondBr = strchr( firstBr, '[' );
// we don't need '['
if( 0 != secondBr++ )
{
int nOutLen = strlen( secondBr ) - 1;
if( 0 < nOutLen )
{
output = new char[nOutLen+1];
strncpy( output, secondBr, nOutLen );
output[ nOutLen ] = '\0';
}
}
}
if( 0 != output )
{
cout << output;
delete[] output;
output = 0;
}
else
{
cout << "Error!";
}

You could use this regex to get what is inside "<" and ">":
// Regex: "<%999[^>]>" (Max of 999 Bytes)
int n1 = sscanf(source, "<%999[^>]>", dest);

Related

Printing elements of a tuple

I am trying to print the elements of a tuple returned by a function where I am comparing the elements of a vector of addresses to those in a database. The fields are: 32-bit int representing the address, int for prefix matching, string containing ASN, string containing matching address, string containing the original address being queried.
for (auto itr = IPs.begin(); itr != IPs.end(); itr++) {
tuple<int,int,string,string,string> entry = Compare(*itr, database);
string out = get<3>(entry) + "/" + to_string(get<1>(entry)) + " " + get<2>(entry) + " " + get<4>(entry) + "\n";
cout << out;
}
I want each line of the output to look like this:
"{prefix}/{# bits of prefix} {ASN} {address}\n"
However, the output looks like this:
12.105.69.1528 15314
12.125.142.190 6402
57.0.208.2450 6085
208.148.84.30 4293
208.148.84.16 4293
208.152.160.797 5003
192.65.205.2509 5400
194.191.154.806 2686
199.14.71.79 1239
199.14.70.79 1239
The expected output is:
12.105.69.144/28 15314 12.105.69.152
12.125.142.16/30 6402 12.125.142.19
57.0.208.244/30 6085 57.0.208.245
208.148.84.0/30 4293 208.148.84.3
208.148.84.0/24 4293 208.148.84.16
208.152.160.64/27 5003 208.152.160.79
192.65.205.248/29 5400 192.65.205.250
194.191.154.64/26 2686 194.191.154.80
199.14.71.0/24 1239 199.14.71.79
199.14.70.0/24 1239 199.14.70.79
The part that confuses me the most is the fact that when I print each element on separate lines by replacing each separator with line breaks, it prints the elements correctly:
12.105.69.144
28
15314
12.105.69.152
12.125.142.16
30
6402
12.125.142.19
57.0.208.244
30
6085
57.0.208.245
208.148.84.0
30
4293
208.148.84.3
208.148.84.0
24
4293
208.148.84.16
208.152.160.64
27
5003
208.152.160.79
192.65.205.248
29
5400
192.65.205.250
194.191.154.64
26
2686
194.191.154.80
199.14.71.0
24
1239
199.14.71.79
199.14.70.0
24
1239
199.14.70.79
I suppose that I could just write another function that formats the line breaks into the correct format afterwards, but I am curious about what is causing this. Any ideas?
Could you provide a little more code, so it can be debugged to precisely track the problem?
I think tuple and get are used correctly.
I guess the problem is in the content of strings or at least in the string returned by `get<2>(entry).
Here is a little example which shows what might be wrong
std::string aa = "AAAAA\r"; //"\r" is extra character in aa string
std::string bb = "bbb";
std::cout << aa + " " + bb; //output is " bbbA" not "AAAAA bbb"
The problem obviously doesn't occur when each strings is printed separately in each line.
Double check if string returned by get<X> doesn't contain any special characters or contain OSX end of line mixed with Linux or Windows end of line

Find string to regular expression programmatically?

Given a regular expression, is is possible to find a string that matches that expression programmatically? If so, please mention an algorithm for that, assuming that a string exists.
Bonus question: Give the performance/complexity of that algorithm, if able.
PS: Note I am not asking this: Programmatically derive a regular expression from a string. More likely I am asking the reserve problem.
Generex is a Java library for generating String from a regular expression.
Check it out: https://github.com/mifmif/Generex
Here is the sample Java code demonstrating library usage:
Generex generex = new Generex("[0-3]([a-c]|[e-g]{1,2})");
// Generate random String
String randomStr = generex.random();
System.out.println(randomStr);// a random value from the previous String list
// generate the second String in lexicographical order that match the given Regex.
String secondString = generex.getMatchedString(2);
System.out.println(secondString);// it print '0b'
// Generate all String that matches the given Regex.
List<String> matchedStrs = generex.getAllMatchedStrings();
// Using Generex iterator
Iterator iterator = generex.iterator();
while (iterator.hasNext()) {
System.out.print(iterator.next() + " ");
}
// it prints:
// 0a 0b 0c 0e 0ee 0ef 0eg 0f 0fe 0ff 0fg 0g 0ge 0gf 0gg
// 1a 1b 1c 1e 1ee 1ef 1eg 1f 1fe 1ff 1fg 1g 1ge 1gf 1gg
// 2a 2b 2c 2e 2ee 2ef 2eg 2f 2fe 2ff 2fg 2g 2ge 2gf 2gg
// 3a 3b 3c 3e 3ee 3ef 3eg 3f 3fe 3ff 3fg 3g 3ge 3gf 3gg
Another one: https://code.google.com/archive/p/xeger/
Here is the sample Java code demonstrating library usage:
String regex = "[ab]{4,6}c";
Xeger generator = new Xeger(regex);
String result = generator.generate();
assert result.matches(regex);
Assume you define regular expressions like this:
R :=
<literal string>
(RR) -- concatenation
(R*) -- kleene star
(R|R) -- choice
Then you can define a recursive function S(r) which finds a matching string:
S(<literal string>) = <literal string>
S(rs) = S(r) + S(s)
S(r*) = ""
S(r|s) = S(r)
For example: S(a*(b|c)) = S(a*) + S(b|c) = "" + S(b) = "" + "b" = "b".
If you have a more complex notion of regular expression, you can rewrite it in terms of the basic primitives and then apply the above. For example, R+ = RR* and [abc] = (a|b|c).
Note that if you've got a parsed regular expression (so you know its syntax tree), then the above algorithm takes at most time linear in the size of the regular expression (assuming you're careful to perform the string concatenations efficiently).
To find given expression in string which fit under that criteria, for that I had tried below algorithm.
i) Create the array for all strings available in given source.
ii) Create a function with parameters for array, expression and initial index count.
iii) Call function recursively and increase the index with every move, until we match string has not found.
iv) Return/break the function if String with desired expression is found.
Below is same java code:
public class ExpressionAlgo {
public static void main(String[] args) {
// TODO Auto-generated method stub
String data = "A quantifier defines how often an element can occur. The symbols ?, *, + and {} define the quantity of the regular expressions";
regCheck(data.split(" "), "sym", 0);
}
public static void regCheck(String[] ar, String expresion, int i) {
if(ar[i].contains(expresion)){
System.out.println(ar[i]);
return;
}
if(i<ar.length-1){
i=i+1;
regCheck(ar, expresion, i);
}
}
}
As far as I calculated the complexity of this code is N^3 because I had use split, contains method and call regCheck method recursively.

Getting full name and values from string with regex and c++

I have a project where I am reading data from a text file in c++ which contains a person's name and up to 4 numerical numbers like this. (each line has an entry)
Dave Light 89 71 91 89
Hua Tran Du 81 79 80
I am wondering if regex would be an efficient way of splitting the name and numerical values or if I should find an alternative method.
I would also like to be able to pick up any errors in the text file when reading each entry such as a letter instead of a number as if an entry like this was found.
Andrew Van Den J 88 95 85
You should better use a separator instead of space. The separator could be :, |, ^ or anything that cannot be part of your data. With this approach, your data should be stored as:
Dave Light:89:71:91:89
Hua Tran Du:81:79:80
And then you can use find, find_first_of, strchr or strstr or any other searching (and re-searching) to find relevant data.
This non-regex solution:
std::string str = "Dave Light 89 71 91 89";
std::size_t firstDig = str.find_first_of("0123456789");
std::string str1 = str.substr (0,firstDig);
std::string str2 = str.substr (firstDig);
would give you the letter part in str1 and the number part in str2.
Check this code at ideone.com.
It sounds like it's something like this you want...(?) I'm not quite sure what kind of errors you mean to pick. As paxdiablo pointed out, a name could be quite complex, so getting the letter part probably would be the safest.
Try this code.
#include <iostream>
#include <regex>
#include <string>
#include <vector>
int main(){
std::vector<std::string> data {"Dave Light 89 71 91 ","Hua Tran Du 81 79 80","zyx 1 2 3 4","zyx 1 2"};
std::regex pat {R"((^[A-Za-z\s]*)(\d+)\s*(\d+)\s*(\d+)(\s*)$)"};
for(auto& line : data) {
std::cout<<line<<std::endl;
std::smatch matches; // matched strings go here
if (regex_search(line, matches, pat)) {
//std::cout<<"size:"<<matches.size()<<std::endl;
if (matches.size()==6)
std::cout<<"Name:"<<matches[1].str()<<"\t"<<"data1:"<<matches[2].str()<<"\tdata2:"<<matches[3].str()<<"\tdata3:"<<matches[4].str()<<std::endl;
}
}
}
With regex number of lines code reduced greatly. Main trick in regex is using right pattern.
Hope this will help you.

BOOST Regex global Search behavior

My question is about whether the boost regex engine can do "global searches".
I've tried and I can't get it to do it.
The match_results class contains the base pointer of the string, so after incrementing the
starting position manually then setting the match_flag_type to match_not_bob | match_prev_avail,
I would have thought the boost regex engine would be able to know it is in the middle of a string.
Since I'm using this engine in my software, I'd like to know if this engine can infact do this correctly and I'm doing something wrong, or global searching is not possible with this engine.
Below are sample code/output using BOOST regex, and an equivalent Perl script.
Edit: Just to clarify, in the below boost example the Start iterator is always treated as a boundry. The engine doesn't seem to consider text to the left of that position when making a match.
At least in this case.
7/22/2014 - The Solution for Global Search
Posting this update as the solution. Its not a workaround or kludge.
After googling 'regex_iterator' I knew that regex_iterator sees the text to the left of the
current search position. And, I came across all the same source code. One site (like the others)
had an passing simple explanation of how it works that said it calls 'regex_search()'
when the regex_iterator is incremented.
So down in the bowels of the regex_iterator class, I saw that it indeed called regex_search() when
the iterator was incremented ->Next().
This 'regex_search()' overload wasn't documented and comes in only 1 type.
It includes a BIDI parameter at the end named 'base'.
bool regex_search(BidiIterator first, BidiIterator last,
match_results<BidiIterator, Allocator>& m,
const basic_regex<charT, traits>& e,
match_flag_type flags,
BidiIterator base)
{
if(e.flags() & regex_constants::failbit)
return false;
re_detail::perl_matcher<BidiIterator, Allocator, traits> matcher(first, last, m, e, flags, base);
return matcher.find();
}
It appears the base is the wall to the left of the start BIDI from where initial lookbehind's could use to check conditions..
So, I tested it out and it seemed to work.
The bottom line is to set base BIDI to the start of the input, and put the start BIDI anywhere after.
Effectively, this is like setting the pos() variable in Perl.
And, to emulate global positional increment on a zero-length match, a simple conditional is all that's
needed:
Start = ( _M[0].length() == 0) ? _M[0].first + 1 : _M[0].second; (see below)
BOOST Regex 1.54 regex_search() using 'base' BIDI
Note - in this example, Start always = _M[0].second;
The regex is purposely unlike the two other examples (below it), to demonstrate in fact
the text from 'Base' to 'Start' is considered each time when matching this regex.
#typedef std::string::const_iterator SITR;
boost::regex Rx( "(?<=(.)).", regex_constants::perl );
regex_constants::match_flag_type Flags = match_default;
string str("0123456789");
SITR Start = str.begin();
SITR End = str.end();
SITR Base = Start;
boost::smatch _M;
while ( boost::regex_search( Start, End, _M, Rx, Flags, Base) )
{
string str1(_M[1].first, _M[1].second );
string str0(_M[0].first, _M[0].second );
cout << str1 << str0 << endl;
// This line implements the Perl global match flag m//g ->
Start = ( _M[0].length() == 0) ? _M[0].first + 1 : _M[0].second;
}
output:
01
12
23
34
45
56
67
78
89
Perl 5.10
use strict;
use warnings;
my $str = "0123456789";
while ( $str =~ /(?<=(..))/g )
{
print ("$1\n");
}
output:**
01
12
23
34
45
56
67
78
89
BOOST Regex 1.54 regex_search() no 'base'
string str("0123456789");
std::string::const_iterator Start = str.begin();
std::string::const_iterator End = str.end();
boost::regex Rx("(?<=(..))", regex_constants::perl);
regex_constants::match_flag_type Flags = match_default;
boost::smatch _M;
while ( boost::regex_search( Start, End, _M, Rx, Flags) )
{
string str(_M[1].first, _M[1].second );
cout << str << "\n";
Flags |= regex_constants::match_prev_avail;
Flags |= regex_constants::match_not_bob;
Start = _M[0].second;
}
output:
01
23
45
67
89
Updated in response to the comments Live On Coliru:
#include <boost/regex.hpp>
int main()
{
using namespace boost;
std::string str("0123456789");
std::string::const_iterator start = str.begin();
std::string::const_iterator end = str.end();
boost::regex re("(?<=(..))", regex_constants::perl);
regex_constants::match_flag_type flags = match_default;
boost::smatch match;
while (start<end &&
boost::regex_search(start, end, match, re, flags))
{
std::cout << match[1] << "\n";
start += 1; // NOTE
//// some smartness that should work for most cases:
// start = (match.length(0)? match[0] : match.prefix()).first + 1;
flags |= regex_constants::match_prev_avail;
flags |= regex_constants::match_not_bob;
std::cout << "at '" << std::string(start,end) << "'\n";
}
}
Prints:
01 at '123456789'
12 at '23456789'
23 at '3456789'
34 at '456789'
45 at '56789'
56 at '6789'
67 at '789'
78 at '89'
89 at '9'

std::string search for numbers in a string & insert space before & after [duplicate]

This question already exists:
std::string search for numbers in a string & insert space before & after them [closed]
Closed 10 years ago.
I have this string:
string strInput = "33kfkdsfhk33 324234k334k 333 3 323434/545435436***33/rrrr34 e3mdgmflkgfdlglk3434424dfffff555555555555gggggg00000033lll-111111 1974-1-12";
I would like to format it as:
" 33 kfkdsfhk 33 324234 k 334 k 333 3 323434 / 545435436 * 33 /rrrr 34 e 3 mdgmflkgfdlglk 3434424 dfffff 555555555555 gggggg 00000033lll - 111111 1974 - 1 - 12 ";
That is, find a number and insert space before and after the number.
No Boost please... only standard C++ library.
This is what I tried, inserts space after number, i want to group all consecutive numbers to get desired output.
strInput = "33kfkdsfhk33 324234k334k 333 3 323434/545435436***33/rrrr34 e3mdgmflkgfdlglk3434424dfffff555555555555gggggg00000033lll-111111 1974-1-12";
for ( std::string::iterator it=strInput.begin(); it!=strInput.end(); ++it)
{
static bool flag = false;
if(isdigit(*it) && !flag)
{
strInput.insert(it,1,' ');
flag = true;
}
else
flag = false;
}
Your solution actually looks fairly good conceptually, but there is one major problem: After you insert into a string, all iterators pointing to it may be invalid, in particular your loop iterator it. That can lead to segfaults and all kinds of hard-to-explain bugs.
As an alternative solution, I would suggest not modifying the string you start with, but just reading from it and building a new one step by step, inserting spaces where you want them as you go along. This is really only a minor modification of your current code!
string strInput = ... // whatever;
string newString = "";
bool currentisdigit = false;
bool previouswasdigit = false;
for ( std::string::iterator it=strInput.begin(); it!=strInput.end(); ++it)
{
previouswasdigit = currentisdigit;
currentisdigit = isdigit(*it);
if(currentisdigit && !previouswasdigit)
newString.push_back(' ');
if(!currentisdigit && previouswasdigit)
newString.push_back(' ');
newString.push_back(*it);
}