The C++ equivalent of C's format string [closed] - c++

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
I have a C program that reads from keyboard, like this:
scanf("%*[ \t\n]\"%[^A-Za-z]%[^\"]\"", ps1, ps2);
For a better understanding of what this instruction does, let's split the format string as follows:
%*[ \t\n]\" => read all spaces, tabs and newlines ([ \t\n]) but not store them in any variable (hence the '*'), and will keep reading until encounter a double quote (\"), however the double quote is not input.
Once scanf() has found the double quote, reads all caracters that are not letters into ps1. This is accomplished with...
%[^A-Za-z] => input anything not an uppercase letter 'A' through 'Z' and lowercase letter 'a' through 'z'.
%[^\"]\" => read all remaining characters up to, but not including a double quote into ps2 ([^\"]) and the string must end with a double quote (\"), however the double quote is not input.
Can someone show me how to do the same thing in C++
Thank you

C++ supports the scanf function. There is no simple alternative, especially if you want to replicate the exact semantics of scanf() with all the quirks.
Note however that your code has several issues:
You do not pass the maximum number of characters to read into ps1 and ps2. Any sufficiently input sequence will cause a buffer overflow with dire consequences.
You could simplify the first format %*[ \t\n] with just a space in the format string. This would also allow for the case where no whitespace characters are present. As currently written, scanf() would fail and return 0 if no whitspace characters are present before the ".
Similarly, if no non letters or if no other characters follow before the second ", scanf would return a short count of 0 or 1 and leave one or both destination array in an indeterminate state.
For all these reasons, it would be much safer and predictable in C to first read a line of input with fgets() and use sscanf() or parse the line by hand.
In C++, you definitely want to use the std::regex package defined in <regex.h>.

Related

Regex expression doesn't recognize dot at end of word - Regex (C++) [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed last month.
Improve this question
I'm trying to read a line out of a file using the following regex expression:
^([A-z.]+?\\s?[A-z]+)\\s([A-z]+)\\s(\\d{7})\\s(\\d?\\d.\\d)$
on the line:
W.W. Sneijder 0000574 10.0
(To be clear: the intent is to make any word with chars [a-z], [A-Z], or dots, match with the [A-z.]+ part.)
However, the regular expression doesn't recognize the second dot in W.W., which seems strange to me. Don't the square brackets combined with the + mean that any character from inside them is accepted, until (here) whitespace is encountered? I found a regex that does work but isn't that elegant:
^([A-z.]+[.\\s?[A-z]+)\\s([A-z]+)\\s(\\d{7})\\s(\\d?\\d.\\d)$
I'm hoping the find an elegant solution. It'd be great to hear your input.
Links such as RegEx - Not parsing dot(.) at the end of a sentence didn't seem to answer my question unfortunately.
Space separated data is just a different variant of the common CSV (Comma Separated Values) format. There are many ways to separate a string on arbitrary separators, but in C++ using space is actually very easy:
std::vector<std::string> separate_on_space(std::string const& input)
{
std::vector<std::string> output;
std::istringstream iss(input);
// Copy all space-separated "words" from the input to the vector
std::copy(std::istream_iterator<std::string>(iss), // Begin iterator
std::istream_iterator<std::string>(), // End iterator
std::back_inserter(output)); // Destination iterator
return output;
}
[See example here]
Once you have separated the values into a vector of strings, you can then convert the numeric values to their actual type (for example using std::stod) and store into suitable objects.
Of course this doesn't handle names with spaces in them in a graceful way, but that can be handled at a higher level (by checking the size of the resulting vector, and by knowing the last two elements should always the special numbers, and the rest are the names).
On the other hand the regular expression in the question doesn't handle it at all. :)
In your regex, the entire W.W. Sneijder is captured in the first group. Looking at your regex, I doubt you intended it that way.
I think the regex you wanted is ^([A-z.]+?\s?[A-z]+)\s(\d{7})\s(\d?\d.\d)$. Or if you wanted Sneijder to be in the second capture: ^([A-z.]+?)\s([A-z]+)\s(\d{7})\s(\d?\d.\d)$.
... or maybe you wanted ^([A-z.]+?\s?[A-z]*)\s([A-z]+)\s(\d{7})\s(\d?\d.\d)$ (* instead of + in the first capture group).
or ^([A-z.]+?(?:\s[A-z]+)?)\s([A-z]+)\s(\d{7})\s(\d?\d.\d)$ (optional space + text, again in the first capture groups).
All 4 expressions should match your test string, but behave differently on other test strings.
There certainly are improvements to the regex, such as ensuring the string does not start with a ..
As long as you touch the inside of each capture group but not the logic across capture groups, you can let the regex manage any level of control you desire and this will have no impact on the code that follows the text parsing.
It will always be 4 capture groups, with, except the first regex I posted above that has only 3 capture groups, with some guarantees on the text if you need to convert it to another type.

Where to use empty character constant '' in C++? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
The empty character constant '' can not be cout or assigned to character in C++. The compiler will say "error: expected expression". Can we put it in C++ source code? If not, what's the usage of ''? (empty character constant '' is one ' followed with another ')
Can we put it in C++ source code?
No, it would be a syntax error.
If not, what's the usage of ''?
There is no usage, unless your purpose is to cause a compilation error (for which there are probably better alternatives such as static_assert).
Can it be understood that empty character constant '' is just a pure grammar error just like a variable being named as 2018ch ?
Yes. The grammar says:
character-literal:
encoding-prefix opt ' c-char-sequence '
Notice that unlike the encoding-prefix, c-char-sequence is not optional.
Side note: Yes, it is a character sequence - multi character literals exist. But you don't need to learn about them yet other than knowing that you probably won't need them. Just don't assume that they're strings.
Ok I think that the confusion comes from the fact that a string can be an empty string e.g. "", so maybe you draw a parallel and expect there to be an empty character something like ''.
Well remember what a string is: a series of characters (0, 1, or more) (terminated by the end of string character '\0'). So "" is a string of 0 characters (end of string character not counted, although it is there), aka the "empty string".
A character is well... just that one character. Not zero, not 2 or 3. A character always has a value. Thus the empty character '' does not exist and makes no sense.
'' makes no sense and thus it won't compile, what value is it supposed to have?
Remember, it's all just bits and bytes in memory at some point so what value should the bytes have that represent ''?
char a = 0;
//or
char a = '\0';
These represent "empty" chars which is the closest you'll get to ''.

Perl: Regular expressions Pattern matching [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 4 years ago.
Improve this question
Is [A] a regular expression that will match a string of characters which contains any number of occurrences of the letter A (and only the letter A, with no other characters or spaces) such as AAAA?
Anything in square brackets is a character class. This is complicated enough that it has its own Perl documentation page (in the link), so it's not a surprise it wasn't evident how it works.
A character class defines a set of possible characters; when pattern matching, a character class by itself matches one character from the input, no matter how many characters there are inside the square brackets.
/[A]/ # find one copy of 'A' anywhere in the string
/[abcd]/ # find one copy of any of 'a', 'b', 'c', or 'd' anywhere in the string
/[A..Z]/ # find any one uppercase ASCII character somewhere in the string
If you want your class to match differently, you can add modifiers:
/[A..Z]+/ # find one or more uppercase ASCII characters in a row
/[A]*/ # find zero or more 'A's in a row
The linked page will show you a lot of other options to specify sets of characters inside the square brackets. But the key is that one set of square brackets matches one character unless you add + (one or more of these) or '*' (zero or more of these).
No.
The regular expression pattern [A] can be simplified to just A. It will match any string that contains A. While that includes AAAA, it also includes ZAZ.
For starters, you will need to anchor the match.

get characters (not separated with spaces) in array using scanf() [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
I need to store characters from user's input in array, BUT not one by one. User will input them as one line like this;
....
I need to save each dot in array, but I can't do this:
scanf("%s%s%s%s", &s[0], &s[1], &s[2], &s[3])
because user can enter N number of dots. So it must be dynamic, I guess.
scanf() is a C runtime function. In C++, you should be using std::cin instead. For instance, with std::getline(). You can treat the returned std::string like an array of characters.
User will input them as one line like this; .... I need to save each dot in array,
C solution:
Define upper sane bound like 1000 and use a scanset "%[]".
// Read up to 1000 `.`
char dot[1000 + 1];
if (scanf(" %1000[.]") == 1) {
// Success
puts(dot);
}
Additional code needed if other non-., non -white-space characters need to be handled.

How many byte read the std::istream::peek() function [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 5 years ago.
Improve this question
I have try to read the next character in a file with only characters and with a file with only integers. This function return the next value (int or char). Now the question is how many byte read peek()? For first file it seems read one byte while for the second file it seems read four byte. How it's possible?
[H]ow many byte[s are] read [by] peek()?
std::ifstream::peek() reads one character (i.e. one byte) from the file, and returns it inside an int (the use of int is so that there is sufficient range to conditionally return EOF).
(Other std::basic_istream specialisations may have different char_type and int_type aliases, so the exact types and numbers given may differ if you use them. But the key is still that you extract one character, even if you think your ASCII file contains "numbers".)
It doesn't matter what the bytes of your file are: human-readable ASCII text, human-readable ASCII numbers, an encoded ZIP, random values… std::ifstream::peek() is an "unformatted input function", which is a type of IOStream function that works on the characters in a file.
For first file it seems read one byte while for the second file it seems read four byte.
How it's possible?
It's not. You did something wrong.
It depends entirely on the type of stream.
peek() will read one character, however streams have traits associated with them that dictate what is considered a character. The char_traits on a stream controls this.
Here's a handy guide:
char : always 1 byte
wchar_t : platform defined. dictated by its *underlying type* Likely 4 bytes
char32_t: at least 4 bytes
char16_t: at least 2 bytes
fstream is actually a basic_fstream<char> meaning that it will read the same number of bytes as sizeof(char), which is 1 byte. It will return int_type which is also controlled through traits