I'm doing a project that reads some string reactions from file at formula e.g: (5A+3B=c+10D) as an input. I need to do a parsing for the string reaction so that I can extract &(split) integer values beside a char and put them into a vector i.e vector associated with the reaction here is :[5 3 1 10].
I thought about std::strtok function but I think it cannot seperate integer values!!!
Can any one help me ??
Here my try:
int main()
{
std::string input;
std::getline(std::cin, input);
std::stringstream stream(input);
while(1) {
int n;
stream >> n;
char * pch;
pch = strtok (input," ");
while (pch != NULL)
{
printf ("%s\n",pch);
pch = strtok (NULL, " ,.");
}
return 0;
}
}
To do some serious parsing work, you need to learn some language theory. Fortunately, it isn't very difficult.
The method we are going to cover here is what we called Top Down Recursive Parsing.
The full listing of the source code here is going to be too long for the purpose of this forum, instead, I will present some pseudo-code for it.
The first thing you will need to do is to define your grammar. What is considered valid and what is not, you represent a grammar like this:
formula := term
:= term + formula
:= term - formula
term := variable
:= coefficient variable
So a formula C + 2D can be represented as
formula
term
variable
C
+
formula
term
coefficient
2
variable
D
With this in mind, we first solve a simpler problem, there are only a few types of things we need from the input string
+
-
coefficient
variable
Only these four things are valid input, you may want to skip space. Splitting the input string into these 4 types of things is called lexical analysis. We typically implement a so called scanner to do this.
A scanner typically look like this
class Scanner
{
public:
Scanner(const char* text);
Token GetToken(); // The current token
void Scan(); // read the next token
}
Next, you will want to group these token into a tree like what I have shown you above. This logic we typically call it parsing and it implemented as a parser. You can implement a parser in many ways, here is one way you can do it with a top down predictive parser
class Parser
{
public:
private:
bool ParseVariable()
{
if (s.GetToken() is variable) { s.Scan(); return true; }
}
bool ParseTerm()
{
if (s.GetToken() is variable) { s.Scan(); return true; }
if (s.GetToken() is coefficient) { s.Scan(); return this->ParseVariable(); }
}
Scanner s;
}
The similar code goes on. Obviously one can extend the return type of those Parse() method to return something useful to its caller and assemble the representation you need for your purpose.
For my personal purposes, I wrote a few parsers for different languages. You can take a look at them as sample.
This is a sample in Python.
https://github.com/cshung/MiscLab/blob/master/GreatestCommonDivisor/polynomial_module.py
This is a sample in C++ with a small twist, I parsed the string backwards to avoid 'left recursion'
https://github.com/cshung/Competition/blob/master/Competition/LEET_BASIC_CALCULATOR.cpp
To see a top down parser in action in real life product, see this example in ChakraCore, which I proudly worked on some time ago.
https://github.com/Microsoft/ChakraCore/blob/master/lib/Parser/Parse.cpp
Related
I have implemented the basic structure of the shunting yard algorithm, but I'm not sure how to read in values that are either multidigit or functions. Here's what I have currently for reading in values:
string input;
getline(cin, input);
input.erase(remove_if(input.begin(), input.end(), ::isspace), input.end());
//passes into function here
for (int i = 0; i < input.length(); ++i) {
string s = input.substr(i, 1);
//code continues
}
As you can see, this method can only parse one character at a time, so it is extremely flawed. I also have tried searching up reading in values or parsing them but haven't found a result that is relevant here.
Full Code: https://pastebin.com/76jv8k9Y
In order to run shunting-yard, you're going to want to tokenize your string first. That is, turn 12+4into {'12','+','4'}. Then you can just use the tokens to run shunting yard. A naive infix lexing algorithm might like this:
lex(string) {
buffer = ""
output = {}
for character in string {
if character is not whitespace {
if character is operator {
append buffer to output
append character to output
buffer = ""
} else {
append character to buffer
}
}
append buffer to output
return output
}
Real lexers are a lot more complicated and are a prime field of study in compiler design.
I'm new to c++ and I'm trying to solve the exercise 6 from chapter 4 out of Bjarne Stroustrups book "Programming Principles and Practise Using C++ and don't understand why my code doesn't work.
The exercise:
Make a vector holding the ten string values "zero", "one", ...,
"nine". Use that in a program that converts a digit to its
corresponding spelled-out value: e.g., the input 7 gives the output
seven. Have the same program, using the same input loop, convert
spelled-out numbers into their digit form; e.g., the input seven gives
the output 7.
My loop only executes one time for a string and one time for an int, the loop seems to continue but it doesn't matter which input I'm giving, it doesn't do what it's supposed to do.
One time it worked for multiple int inputs, but only every second time. It's really weird and I don't know how to solve this in a different way.
It would be awesome if someone could help me out.
(I'm also not a native speaker, so sorry, if there are some mistakes)
The library in this code is a library provided with the book, to make the beginning easier for us noobies I guess.
#include "std_lib_facilities.h"
int main()
{
vector<string>s = {"zero","one","two","three","four","five","six","seven","eight","nine"};
string input_string;
int input_int;
while(true)
{
if(cin>>input_string)
{
for(int i = 0; i<s.size(); i++)
{
if(input_string == s[i])
{
cout<<input_string<<" = "<<i<<"\n";
}
}
}
if(cin>>input_int)
{
cout<<input_int<<" = "<<s[input_int]<<"\n";
}
}
return 0;
}
When you (successfully) read input from std::cin, the input is extracted from the buffer. The input in the buffer is removed and can not be read again.
And when you first read as a string, that will read any possible integer input as a string as well.
There are two ways of solving this:
Attempt to read as int first. And if that fails clear the errors and read as a string.
Read as a string, and try to convert to an int. If the conversion fails you have a string.
if(cin >> input) doesn't work properly in while loop?
A possible implementation of the input of your program would look something like:
std::string sentinel = "|";
std::string input;
// read whole line, then check if exit command
while (getline(std::cin, input) && input != sentinel)
{
// use string stream to check whether input digit or string
std::stringstream ss(input);
// if string, convert to digit
// else if digit, convert to string
// else clause containing a check for invalid input
}
To discriminate between int and string value you could use peek(), for example.
Preferably the last two actions of conversion (between int and string) are done by separate functions.
Assuming the inclusion of the headers:
#include <iostream>
#include <sstream>
I'm trying to read in a large matrix calculated from a text file for a finite element code. The matrix is spatially dependent though and thus I need to be able to conveniently organize the data. The outside source that calculated the values for the matrix was kind enough to put the following lines at the top of the text file
No. activity levels : 3
No. pitch-angles : 90
No. energies : 11
No. L-shells : 10
Which basically tell me the number of positions the matrix is known at. I want to be able to easily pick out these values because it will allow me to preallocate the size of the matrix, as well as know immediately how much I need to interpolate for values not given by this text file. I am trying to do that with the following code
#include<iostream>
#include<fstream>
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
#include<vector>
using namespace std;
int main(){
string diffusionTensorFileName = "BAS_drift_averaged_chorus_kp.txt";
string sline;
int alphaSize=0;
ifstream diffusionTensorFile(diffusionTensorFileName.c_str());
while(getline(diffusionTensorFile,sline)){
if(strncmp(sline.c_str(),"No. pitch-angles : 90",sline.size()-1)==0 && sline.size()-1 != 0){
alphaSize = atoi(sline.c_str());
printf("alphaSize %d \n", alphaSize);
vector<double> alpha(alphaSize);
}
}
}
atoi of course doesn't work very well, and I can't seem to get strtod or any of those functions to work either. Any thoughts? I'm also open to this being the completely wrong way to do this and alternate suggestions on how to proceed.
I think the easiest way would be to use the scan_is method of the std::ctype facet imbued in the streams locale. Its job is to search for first character that matches a given classification and return a pointer to it. We'll take the result of that call and use std::stoi (C++11) to parse it into an integer.
std::locale loc(diffusionTensorFile.getloc());
auto& f = std::use_facet<std::ctype<char>>(loc);
while (std::getline(diffusionTensorFile, sline))
{
const char* begin = sline.front(),
end = sline.back() + 1;
const char* result;
if ((result = f.scan_is(f.digit, begin, end)) != end)
{
alphaSize = std::stoi(result);
// do something with alphaSize
}
}
Live Demo
I'm trying to parse web data coming from a server, and I'm trying to find a more stl version of what I had.
My old code consisted of a for() loop and checked each character of the string against a set of escape characters and used a stringstream to collect the rest. As I'm sure you can imagine, this sort of loop leads to being a high point of failure when reading web data, as I need strict syntax checking.
I'm trying to instead start using the string::find and string::substr functions, but I'm unsure of the best implementation to do it with.
Basically, I want to read a string of data from a server, different data, separated by a comma. (i.e., first,lastname,email#email.com) and separate it at the commas, but read the data in between.
Can anyone offer any advice?
I'm not sure what kind of data are you parsing, but it's always a good idea to use a multi layer architecture. Each layer should implement an abstract function, and each layer should only do one job (like escaping characters).
The number of layers you use depends on the actual steps needed to decode the stream
for your problem I suggest the following layers:
1st: tokenize by ',' and '\n': convert in to some kind of vector of strings
2nd: resolve escapes: decode escape characers
you should use std::stringstream, and process the characters with a loop. unless your format is REALLY simple (like only a single separator character, without escapes), you can't really use any standard function.
For the learning experience, this is the code I ended up using to parse data into a map. You can use the web_parse_resurn.err to see if an error was hit, or use it for specific error codes.
struct web_parse_return {
map<int,string> parsedata;
int err;
};
web_parse_return* parsewebstring(char* escapechar, char* input, int tokenminimum) {
int err = 0;
map<int,string> datamap;
if(input == "MISSING_INFO") { //a server-side string for data left out in the call
err++;
}
else {
char* nTOKEN;
char* TOKEN = strtok_s(input, escapechar,&nTOKEN);
if(TOKEN != 0) { //if the escape character is found
int tokencount = 0;
while(TOKEN != 0) {//since it finds the next occurrence, keep going
datamap.insert(pair<int,string>(tokencount,TOKEN));
TOKEN = strtok_s(NULL, escapechar,&nTOKEN);
tokencount++;
}
if(tokencount < tokenminimum) //check that the right number was hit
err++; //other wise, up the error count
}
else {
err++;
}
}
web_parse_return* p = new web_parse_return; //initializing a new struct
p->err = err;
p->parsedata = datamap;
return p;
}
Ok so before I even ask my question I want to make one thing clear. I am currently a student at NIU for Computer Science and this does relate to one of my assignments for a class there. So if anyone has a problem read no further and just go on about your business.
Now for anyone who is willing to help heres the situation. For my current assignment we have to read a file that is just a block of text. For each word in the file we are to clear any punctuation in the word (ex : "can't" would end up as "can" and "that--to" would end up as "that" obviously with out the quotes, quotes were used just to specify what the example was).
The problem I've run into is that I can clean the string fine and then insert it into the map that we are using but for some reason with the code I have written it is allowing an empty string to be inserted into the map. Now I've tried everything that I can come up with to stop this from happening and the only thing I've come up with is to use the erase method within the map structure itself.
So what I am looking for is two things, any suggestions about how I could a) fix this with out simply just erasing it and b) any improvements that I could make on the code I already have written.
Here are the functions I have written to read in from the file and then the one that cleans it.
Note: the function that reads in from the file calls the clean_entry function to get rid of punctuation before anything is inserted into the map.
Edit: Thank you Chris. Numbers are allowed :). If anyone has any improvements to the code I've written or any criticisms of something I did I'll listen. At school we really don't get feed back on the correct, proper, or most efficient way to do things.
int get_words(map<string, int>& mapz)
{
int cnt = 0; //set out counter to zero
map<string, int>::const_iterator mapzIter;
ifstream input; //declare instream
input.open( "prog2.d" ); //open instream
assert( input ); //assure it is open
string s; //temp strings to read into
string not_s;
input >> s;
while(!input.eof()) //read in until EOF
{
not_s = "";
clean_entry(s, not_s);
if((int)not_s.length() == 0)
{
input >> s;
clean_entry(s, not_s);
}
mapz[not_s]++; //increment occurence
input >>s;
}
input.close(); //close instream
for(mapzIter = mapz.begin(); mapzIter != mapz.end(); mapzIter++)
cnt = cnt + mapzIter->second;
return cnt; //return number of words in instream
}
void clean_entry(const string& non_clean, string& clean)
{
int i, j, begin, end;
for(i = 0; isalnum(non_clean[i]) == 0 && non_clean[i] != '\0'; i++);
begin = i;
if(begin ==(int)non_clean.length())
return;
for(j = begin; isalnum(non_clean[j]) != 0 && non_clean[j] != '\0'; j++);
end = j;
clean = non_clean.substr(begin, (end-begin));
for(i = 0; i < (int)clean.size(); i++)
clean[i] = tolower(clean[i]);
}
The problem with empty entries is in your while loop. If you get an empty string, you clean the next one, and add it without checking. Try changing:
not_s = "";
clean_entry(s, not_s);
if((int)not_s.length() == 0)
{
input >> s;
clean_entry(s, not_s);
}
mapz[not_s]++; //increment occurence
input >>s;
to
not_s = "";
clean_entry(s, not_s);
if((int)not_s.length() > 0)
{
mapz[not_s]++; //increment occurence
}
input >>s;
EDIT: I notice you are checking if the characters are alphanumeric. If numbers are not allowed, you may need to revisit that area as well.
Further improvements would be to
declare variables only when you use them, and in the innermost scope
use c++-style casts instead of the c-style (int) casts
use empty() instead of length() == 0 comparisons
use the prefix increment operator for the iterators (i.e. ++mapzIter)
A blank string is a valid instance of the string class, so there's nothing special about adding it into the map. What you could do is first check if it's empty, and only increment in that case:
if (!not_s.empty())
mapz[not_s]++;
Style-wise, there's a few things I'd change, one would be to return clean from clean_entry instead of modifying it:
string not_s = clean_entry(s);
...
string clean_entry(const string &non_clean)
{
string clean;
... // as before
if(begin ==(int)non_clean.length())
return clean;
... // as before
return clean;
}
This makes it clearer what the function is doing (taking a string, and returning something based on that string).
The function 'getWords' is doing a lot of distinct actions that could be split out into other functions. There's a good chance that by splitting it up into it's individual parts, you would have found the bug yourself.
From the basic structure, I think you could split the code into (at least):
getNextWord: Return the next (non blank) word from the stream (returns false if none left)
clean_entry: What you have now
getNextCleanWord: Calls getNextWord, and if 'true' calls CleanWord. Returns 'false' if no words left.
The signatures of 'getNextWord' and 'getNextCleanWord' might look something like:
bool getNextWord (std::ifstream & input, std::string & str);
bool getNextCleanWord (std::ifstream & input, std::string & str);
The idea is that each function does a smaller more distinct part of the problem. For example, 'getNextWord' does nothing but get the next non blank word (if there is one). This smaller piece therefore becomes an easier part of the problem to solve and debug if necessary.
The main component of 'getWords' then can be simplified down to:
std::string nextCleanWord;
while (getNextCleanWord (input, nextCleanWord))
{
++map[nextCleanWord];
}
An important aspect to development, IMHO, is to try to Divide and Conquer the problem. Split it up into the individual tasks that need to take place. These sub-tasks will be easier to complete and should also be easier to maintain.