How to check if my string input followed the correct syntax in C++? - c++

I'm new to this site and this is my first time to ask here.
My problem is I want to check if my string follows a correct pattern or syntax. I'm doing it with C++ String (std::string). I have already done this using C-Style string, however, I want to do it this time in C++ String. Sample problem below:
Input: 2y'' + 3y' - 2y = 0
or y'' = 4y
I want to check if the derivative input is in correct syntax like (a)y'' + (b)y' + (c)y = 0, a second order homogeneous equation. However, I still want to input a non-standard form equation like the second sample input that can be transposed and make it to standard form.
What I did before with it is remove all the white spaces, loop the entire string and check every index. Eg. if 'y' is found the next char should be '\'' or an arithmetic symbol like '-' or '+' or '=' then if it does not match, then, it must return false.
Or maybe I am just implementing this wrong. I'm new to programming and just taking a computer science course. Note: Sorry for my bad English and sorry if I did not written my code here. Its just way too long.

Regular expressions might be the answer. They're commonly used for checking whether a string first a format, or finding parts of a string that do.
RegExr is a great tool to both learn and test your regular expressions.

Related

FLUTTER - Checking if a string contains another one

I am working on an English vocabulary learning app. Some of the exercises given to the users are written quizzes. They have to translate French words into English words and vice versa.
To make the checking a little more sophisticated than just "1" or "0" (TypedWord == expectedWord), I have been working with similarities between strings and that worked well (for spelling mistakes for example).
I had also used the contains function, so that for example, if the user adds an article in front of the expected word, it doesn't consider it wrong. (Ex : Ecole (School is expected), but user writes "A school").
So I was checking with lines such as "if (typedWord.contains(word)==true) then...". It works fine for the article problem.
But it prompts another issue :
Ex : A bough --> the expected French word is "branche". If user types "une branche", it considers it correct, which is great. But if user types "débrancher" (to unplug), it considers it correct as well as the word "branche" is a part of "débrancher"...
How could I keep this from happening ? Any idea of other ways to go about it ?
I read the three proposed answers which are really interesting. The thing is that some of the words are compound.... "Ex : kitchen appliance, garden tool" etc... so then I think the "space" functions might be problematic...
In this case, separate the whole answer with the "space", then compare it with the correct word.
For an example:
User's answer: That is my school
Separate it with space, so that you will find an array of words:
that, is, my, school.
Then compare each word with your word. It will give you the correct answer.
The flutter code will be like below:
usersAnswer?.split(" ").forEach((word){
if(word == correctAnswer)
print("this is a correct answer");
});
You can split the string by space and check if the resulting array has the word you're looking for.
typedWord.split(' ').contains('debranche');
So if typedWord is 'une branchethesplit(' ') will turn it into this array: ['une', 'branche'].
Now when you check if this array contains('branche') it will check if the exact string branche exists which in this case it does and returns true.
However if it's 'une debranche' the resulting array would be: ['une', 'debranche'] and because this array has no value equal to 'branche' it will return false. Remember that when you use split it turns the string into an array and by using contains on an array it checks whether or not an item of exactly the value you provide contains exists or not, whereas in a string it checks if part of that string matches the given value or not.
You could check for whitespaces before and after the correct word: something like if (typedWord.contains(' '+word+' ')==true) then..., so that "débrancher" gets marked as wrong. This is kind of strict, though: if the sentence must be completed with some punctuation, it would be rejected by this check. You'll probably want some RegExp that allows punctuation but not whitespaces.

Inverse regex processing to produce regex phrase

We take the normal regex processor and pass the input text and the regex phrase to capture the desired output text.
output = the_normal_regex(
input = "12$abc##EF345",
phase = "\d+|[a-zA-Z]+")
= ["12", "abc", "EF", "345"]
Can we inverse the processing that receives both the input text and the output text to produce the adequate regex phrase, specially if the text size is limited to the practical minimum e.g. some dozens of characters? Is any tool available in this regard?
phrase = the_inverse_tool(
input = "12$abc##EF345",
output=["12", "abc", "EF", "345"])
= "\d+|[a-zA-Z]+"
What you're asking appears to be whether there is some algorithm or existing library that takes an input string (like "12$abc##EF345") and a set of matches (like ["12", "abc", "EF", "345"]) and produces an "adequate" regex that would produce the matches, given the input string.
However, what does 'adequate' mean in this context? For your example, a simple answer would be: "12|abc|EF|345". However, it appears you expect something more like the generalised "\d+|[a-zA-Z]+"
Note that your generalisation makes a number of assumptions, for example that words in French, Swedish or Chinese shouldn't be matched. And numbers containing , or . are also not included.
You cannot expect a generalised algorithm to make those kinds of distinctions, as those are essentially problems requiring general AI, understanding the problem domain at an abstract level and coming up with a suitable solution.
Another way of looking at it is: your question is the same as asking if there is some function or library that automates the work of a programmer (specific to the regex language). The answer is: no, not yet anyway, and by the time there is, there won't be people on StackOverflow asking or answering these question, because we'll all be out of a job.
However, some more optimistic viewpoints can be found here: Is it possible for a computer to "learn" a regular expression by user-provided examples?

How to get the first digit on the left side of a string with python and regex?

I want to get a specific digit based on the right string.
This stretch of string is in body2.txt
string = "<li>3 <span class='text-info'>quartos</span></li><li>1 <span class='text-info'>suíte</span></li><li>96<span class='text-info'>Área Útil (m²)</span></li>"
with open("body2.txt", 'r') as f:
area = re.compile(r'</span></li><li>(\d+)<span class="text-info">Área Útil')
area = area.findall(f.read())
print(area)
output: []
expected output: 96
You have a quote mismatch. Note carefully the difference between 'text-info' and "text-info" in your example string and in your compiled regex. IIRC escaping quotes in raw strings is a bit of a pain in Python (if it's even possible?), but string concatenation sidesteps the issue handily.
area = re.compile(r'</span></li><li>(\d+)<span class='"'"'text-info'"'"'>Área Útil')
Focusing on the quotes, this is concatenating the strings '...class', "'", 'text-info', "'", and '>.... The rule there is that if you want a single quote ' in a single-quote raw string you instead write '"'"' and try to ignore Turing turning in his grave. I haven't tested the performance, but I think it might behave much like '...class' + "'" + 'text-info' + "'" + '>.... If that's the case, there is a bunch of copying happening behind the scenes, and that strategy has a quadratic runtime in the number of pieces being concatenated (assuming they're roughly the same size and otherwise generally nice for such an analysis). You'd be better off with nearly any other strategy (such as ''.join(...) or using triple quoted raw strings r'''...'''). It might not be a problem though. Benchmark your solution and see if it's good enough before messing with alternatives.
As one of the comments mentioned, you probably want to be parsing the HTML with something more powerful than regex. Regex cannot properly parse arbitrary HTML since it can't parse arbitrarily nested structures. There are plenty of libraries to make the job easier though and handle all of the bracket matching and string munging for you so that you can focus on a high-level description of exactly the data you want. I'm a fan of lxml. Without putting a ton of time into it, something like the following would be roughly equivalent to what you're doing.
from lxml import html
with open("body2.txt", 'r') as f:
tree = html.fromstring(f.read())
area = tree.xpath("//li[contains(span/text(), 'Área Útil')]/text()")
print(area)
The html.fromstring() method parses your data as html. The tree.xpath method uses xpath syntax to query that parsed tree. Roughly speaking it means the following:
// Arbitrarily far down in the tree
li A list node
[*] Satisfying whatever property is in the square brackets
contains(span/text(), 'Área Útil') The li node needs to have a span/text() node containing the text 'Área Útil'
/text() We want any text that is an immediate child of the root li we're describing.
I'm working on a pretty small amount of text here and don't know what your document structure is in the general case. You could add or change any of those properties to better describe the exact document you're parsing. When you inspect an element, any modern browser is able to generate a decent xpath expression to pick out exactly the element you're inspecting. Supposing this snippet came from a larger document I would imagine that functionality would be a time saver for you.
This will get the right digits no matter how / what form the target is in.
Capture group 1 contains the digits.
r"(\d*)\s*<span(?=\s)(?=(?:[^>\"']|\"[^\"]*\"|'[^']*')*?\sclass\s*=\s*(?:(['\"])\s*text-info\s*\2))\s+(?=((?:\"[\S\s]*?\"|'[\S\s]*?'|[^>]?)+>))\3\s*Área\s+Útil"
https://regex101.com/r/pMATkj/1

Tokenize the text depending on some specific rules. Algorithm in C++

I am writing a program which will tokenize the input text depending upon some specific rules. I am using C++ for this.
Rules
Letter 'a' should be converted to token 'V-A'
Letter 'p' should be converted to token 'C-PA'
Letter 'pp' should be converted to token 'C-PPA'
Letter 'u' should be converted to token 'V-U'
This is just a sample and in real time I have around 500+ rules like this. If I am providing input as 'appu', it should tokenize like 'V-A + C-PPA + V-U'. I have implemented an algorithm for doing this and wanted to make sure that I am doing the right thing.
Algorithm
All rules will be kept in a XML file with the corresponding mapping to the token. Something like
<rules>
<rule pattern="a" token="V-A" />
<rule pattern="p" token="C-PA" />
<rule pattern="pp" token="C-PPA" />
<rule pattern="u" token="V-U" />
</rules>
1 - When the application starts, read this xml file and keep the values in a 'std::map'. This will be available until the end of the application(singleton pattern implementation).
2 - Iterate the input text characters. For each character, look for a match. If found, become more greedy and look for more matches by taking the next characters from the input text. Do this until we are getting a no match. So for the input text 'appu', first look for a match for 'a'. If found, try to get more match by taking the next character from the input text. So it will try to match 'ap' and found no matches. So it just returns.
3 - Replace the letter 'a' from input text as we got a token for it.
4 - Repeat step 2 and 3 with the remaining characters in the input text.
Here is a more simple explanation of the steps
input-text = 'appu'
tokens-generated=''
// First iteration
character-to-match = 'a'
pattern-found = true
// since pattern found, going recursive and check for more matches
character-to-match = 'ap'
pattern-found = false
tokens-generated = 'V-A'
// since no match found for 'ap', taking the first success and replacing it from input text
input-text = 'ppu'
// second iteration
character-to-match = 'p'
pattern-found = true
// since pattern found, going recursive and check for more matches
character-to-match = 'pp'
pattern-found = true
// since pattern found, going recursive and check for more matches
character-to-match = 'ppu'
pattern-found = false
tokens-generated = 'V-A + C-PPA'
// since no match found for 'ppu', taking the first success and replacing it from input text
input-text = 'u'
// third iteration
character-to-match = 'u'
pattern-found = true
tokens-generated = 'V-A + C-PPA + V-U' // we'r done!
Questions
1 - Is this algorithm looks fine for this problem or is there a better way to address this problem?
2 - If this is the right method, std::map is a good choice here? Or do I need to create my own key/value container?
3 - Is there a library available which can tokenize string like the above?
Any help would be appreciated
:)
So you're going through all of the tokens in your map looking for matches? You might as well use a list or array, there; it's going to be an inefficient search regardless.
A much more efficient way of finding just the tokens suitable for starting or continuing a match would be to store them as a trie. A lookup of a letter there would give you a sub-trie which contains only the tokens which have that letter as the first letter, and then you just continue searching downward as far as you can go.
Edit: let me explain this a little further.
First, I should explain that I'm not familiar with these the C++ std::map, beyond the name, which makes this a perfect example of why one learns the theory of this stuff as well as than details of particular libraries in particular programming languages: unless that library is badly misusing the name "map" (which is rather unlikely), the name itself tells me a lot about the characteristics of the data structure. I know, for example, that there's going to be a function that, given a single key and the map, will very efficiently search for and return the value associated with that key, and that there's also likely a function that will give you a list/array/whatever of all of the keys, which you could search yourself using your own code.
My interpretation of your data structure is that you have a map where the keys are what you call a pattern, those being a list (or array, or something of that nature) of characters, and the values are tokens. Thus, you can, given a full pattern, quickly find the token associated with it.
Unfortunately, while such a map is a good match to converting your XML input format to a internal data structure, it's not a good match to the searches you need to do. Note that you're not looking up entire patterns, but the first character of a pattern, producing a set of possible tokens, followed by a lookup of the second character of a pattern from within the set of patterns produced by that first lookup, and so on.
So what you really need is not a single map, but maps of maps of maps, each keyed by a single character. A lookup of "p" on the top level should give you a new map, with two keys: p, producing the C-PPA token, and "anything else", producing the C-PA token. This is effectively a trie data structure.
Does this make sense?
It may help if you start out by writing the parsing code first, in this manner: imagine someone else will write the functions to do the lookups you need, and he's a really good programmer and can do pretty much any magic that you want. Writing the parsing code, concentrate on making that as simple and clean as possible, creating whatever interface using these arbitrary functions you need (while not getting trivial and replacing the whole thing with one function!). Now you can look at the lookup functions you ended up with, and that tells you how you need to access your data structure, which will lead you to the type of data structure you need. Once you've figured that out, you can then work out how to load it up.
This method will work - I'm not sure that it is efficient, but it should work.
I would use the standard std::map rather than your own system.
There are tools like lex (or flex) that can be used for this. The issue would be whether you can regenerate the lexical analyzer that it would construct when the XML specification changes. If the XML specification does not change often, you may be able to use tools such as lex to do the scanning and mapping more easily. If the XML specification can change at the whim of those using the program, then lex is probably less appropriate.
There are some caveats - notably that both lex and flex generate C code, rather than C++.
I would also consider looking at pattern matching technology - the sort of stuff that egrep in particular uses. This has the merit of being something that can be handled at runtime (because egrep does it all the time). Or you could go for a scripting language - Perl, Python, ... Or you could consider something like PCRE (Perl Compatible Regular Expressions) library.
Better yet, if you're going to use the boost library, there's always the Boost tokenizer library -> http://www.boost.org/doc/libs/1_39_0/libs/tokenizer/index.html
You could use a regex (perhaps the boost::regex library). If all of the patterns are just strings of letters, a regex like "(a|p|pp|u)" would find a greedy match. So:
Run a regex_search using the above pattern to locate the next match
Plug the match-text into your std::map to get the replace-text.
Print the non-matched consumed input and replace-text to your output, then repeat 1 on the remaining input.
And done.
It may seem a bit complicated, but the most efficient way to do that is to use a graph to represent a state-chart. At first, i thought boost.statechart would help, but i figured it wasn't really appropriate. This method can be more efficient that using a simple std::map IF there are many rules, the number of possible characters is limited and the length of the text to read is quite high.
So anyway, using a simple graph :
0) create graph with "start" vertex
1) read xml configuration file and create vertices when needed (transition from one "set of characters" (eg "pp") to an additional one (eg "ppa")). Inside each vertex, store a transition table to the next vertices. If "key text" is complete, mark vertex as final and store the resulting text
2) now read text and interpret it using the graph. Start at the "start" vertex. ( * ) Use table to interpret one character and to jump to new vertex. If no new vertex has been selected, an error can be issued. Otherwise, if new vertex is final, print the resulting text and jump back to start vertex. Go back to (*) until there is no more text to interpret.
You could use boost.graph to represent the graph, but i think it is overly complex for what you need. Make your own custom representation.

checking float inside a string and return result?

I have a text file which I geline to a string. The file is like this: 0.2abc 0.2 .2abc .2 abc.2abc abc.2 abc0.20 .2 . 20
I wanna check the result then parse it in to separate float. The result is:0.2 0.2abc 2 20 2abc abc0.20 abc
This is expalined: check if there is 2 digit (before and after '.' (full stop)) whether with char or not. If only 1 site of the '.' is digit the '.' will be full stop.
How can I parse a STRING to separate result like that? I did use iterator to check the '.' and pos of it, but still got stuck.
The first thing you need to do is split the input in words. Easy, just don't use .getline()
but instead rely on `while (cin >> strWord ) { /* do stuff with word*/ };
The second thing is to kick out bad input words early: words of 2 characters or less, with more than one ., or with the . first or last.
You now know that the . is somewhere in the middle. find() will give you an iterator. ++ and -- give you the next and previous iterators. * gives you the character that the iterator points to. isdigit() tells you whether that character is a digit. Add ingredients together and you're done.
Seems like some fairly complicated advice above -- and not necessarily helpful.
Your question does not make it entirely clear what the end result should look like. Do you want an array of floating point numbers? Do you just want the sum? Do you want to print out the results?
If you want help with homework, the best policy is to post your own attempt and then others can help you improve it, to make it work.
One approach that might help is to try to break the string into sub-strings (tokens) and discard the junk.
Write a function that accepts a character and returns true (this is part of a floating point number) or false (it isn't).
Scan along the string using an iterator or an index.
While current char is not part of a token, skip it.
If you find a token char, while current char is part of a token, copy it to another string
etc. to get all floating point substrings.
Then you can use std::stringstream or ::atof() to convert.
Have a bit of a go and post what you can get done.
sounds like you could use some regex to extract your number.
Try this regex in order to extract the floating values within a string.
[0-9]+\.[0-9]+
Keep in mind that this won't extract integer values. ie 234abc
I don't know if there is a built-in way to use regex in c++ but i found this library with a quick google search which allows you to use regex in c++
Sounds like you should look at the "Interpreter" Design Pattern.
Or you could use the "State" Design Pattern and do it by hand.
There should be plenty of examples of both on the web.