I've created a start condition (for strings) in flex and everything works fine. However,
when I parse the same string twice, the elements using the start condition vanish.
How can I solve it?
please help me
flex file
%option stack noyywrap
%{
extern int lineNumber; // definie dans prog.y, utilise par notre code pour \n
#include "h5parse.hpp"
#include <iostream>
#include <fstream>
using namespace std;
extern string initialdata;
extern string pdata;
extern bool loop;
string val;
string compile(string content);
string compilefile(string path);
void runwithargs(int argc ,char ** argv);
int saveoutput(string compileddata ,string outputpath="");
%}
%x strenv
i_command #include
e_command #extends
l_command #layout
f_command #field
command {i_command}|{e_command}|{l_command}|{f_command}
%%
"\"" { val.clear(); BEGIN(strenv); }
<strenv>"\"" { BEGIN(INITIAL);sprintf(yylval.str,"%s",val.c_str());return(STRING); }
<strenv><<EOF>> { BEGIN(INITIAL); sprintf(yylval.str,"%s",val.c_str());return(STRING); }
<strenv>. { val+=yytext[0]; }
{command} {sprintf(yylval.str,"%s",yytext);return (COMMAND);}
"(" { return LPAREN; }
")" { return RPAREN; }
"{" { return LBRACE; }
"}" { return RBRACE; }
.|\n {yylval.c=yytext[0];return TXT; }
%%
//our main function
int main(int argc,char ** argv)
{
if(argc>1)runwithargs(argc,argv);// if there are arguments run with them
system("pause");//don't quit the app at the end of assembly
return(0);
}
//run h5A by using arguments
void runwithargs(int argc ,char ** argv)
{
if(argc == 2)
saveoutput(compilefile(argv[1]));
}
//assemble a string
string compile(string content)
{
do
{
loop=false;
pdata.clear();
YY_BUFFER_STATE b =yy_scan_string(content.c_str());
yyparse();
content=pdata;
}while(loop==true);
return content;
}
//assemble file
string compilefile(string path)
{
string data;
ifstream inputfile(path,ios::in|ios::binary|ios::ate);
int length = inputfile.tellg();
inputfile.seekg(0, std::ios::beg);
char * buffer = new char[length];// allocate memory for a buffer of appropriate dimension
inputfile.read(buffer, length);// read the whole file into the buffer
inputfile.close();
cout<<"start assembly : "<<path<<endl;
return compile(string(buffer));
}
//save assembled file to a specified path
int saveoutput(string compileddata ,string outputpath)
{
outputpath=(outputpath=="")?"output":outputpath;
ofstream outputfile ("output");
//dhow the compiled data in console if we're in debug
outputfile<<compileddata;
cout<<compileddata<<endl;
cout<<"operation terminated successfuly , output at :"
<<outputpath<<endl;
return 0;
}
bison file
%{
#include <stdio.h>
#include <iostream>
#include<fstream>
#include<map>
using namespace std;
typedef void* yyscan_t;
int lineNumber; // notre compteur de lignes
map <string,string> clayouts;
void yyerror ( char const *msg);
typedef union YYSTYPE YYSTYPE;
void yyerror ( char const *msg);
int yylex();
bool loop;
string pdata="";
%}
/* token definition */
%token STRING
%token COMMAND
%token LPAREN RPAREN LBRACE RBRACE
%token TXT
%union { char c; char str [0Xfff]; double real; int integer; }
%type<c> TXT;
%type<str> STRING COMMAND;
%start program
%%
program:value | command_call |txt | program program ;
value: STRING {pdata+='\"'+$1+'\"'; };
command_call : COMMAND LPAREN STRING RPAREN {
if(string($1)=="#field")
{
cout<<"define field :"<<$3;
}
else if(string($1)=="#include")
{
ifstream t;
int length;
char * buffer;
t.open($3);
t.seekg(0, std::ios::end);
length = t.tellg();
t.seekg(0, std::ios::beg);
buffer = new char[length];
t.read(buffer, length);
t.close();
pdata+=buffer;
}
else if (string($1)=="#layout")
{
cout<<"define layout for field "<<$3;
}
else if (string($1)=="#repeat")
{
cout<<"reapeat instruction"<<$3;
}
else
{
cout<<"extend with : "<<$3;
ifstream t;
int length;
char * buffer;
t.open($3);
t.seekg(0, std::ios::end);
length = t.tellg();
t.seekg(0, std::ios::beg);
buffer = new char[length];
t.read(buffer, length);
t.close();
}
loop=true;
};//LPAREN RPAREN ;
txt: TXT {pdata+=$1;};
%%
void yyerror (const char *msg)
{
cout<<msg;
}
this is the output
Please help me understand why the strings disappear.
Here is the full code my repository
thank in advance
Nothing here is disappearing and you're not parsing the same string twice.
The second parse is on a new string which you yourself created, consisting of data copied during the first parse. So they're different strings, and neither Flex nor Bison know about any relationship between them.
The reason that the second string does not contain the same data as the first string is simple: you didn't copy all of the data. Anything you don't copy "disappears".
In particular, your scanner only sends the data between double quotes to the parser. The parser attempts to add the double quotes, but it doesn't manage because the line:
pdata+='\"'+$1+'\"';
means
pdata += ('\"' + $1 + '\"');
Since character literals are integers and $1 is an array of characters, which decays to a character pointer, that is the same as:
pdata += &$1[68]; // '\"' is 34
which is really undefined behaviour unless $1 has at least 67 characters, but in practice will be an empty string because Bison zero initializes stack values. (You shouldn't depend on that, though.)
In short, the second time you parse, the double quoted strings are not present, something you could easily have noted by debugging your parser actions.
Honestly, I don't think this is an appropriate architecture for a macro preprocessor. In general, you should let Flex handle reading from a file; it's good at doing that. Also, the Flex manual illustrates a couple of ways to handle "include files", and macro expansions can be incorporated using a similar technique.
Moreover, using a semantic value which occupies 4kb is not a good way of managing memory. It can easily result in blowing up the parser stack. And constantly converting back and forth between std::string and C-style null-terminated arrays is also extremely inefficient.
But those are different questions.
Related
I am reading in a text file with lines of the format:
date =20170422,line =10,index =3,field =partType,lock =productCode1,bookmark=2/19/56,
I need to extract the name of the field (date, line, index, etc.) and its corresponding value into char field[] and char value[] variables. If necessary, I am allowed to modify the format of the lines.
My initial thinking was to use while loops and check for = and , characters but it was getting messy and it seems like there may be a cleaner way.
You could do something like the below example. Split the string by commas using getline from your file, then split use an istringstream to and getline to split it again by an equals sign.
#include<iostream>
#include<fstream>
#include<string>
#include<sstream>
int main()
{
std::ifstream file("test.txt");
std::string wholeLine, partOfLine;
while(std::getline(file, wholeLine, ',')) {
std::istringstream wholeLineSS(wholeLine);
while(std::getline(wholeLineSS, partOfLine, '=')) {
std::cout<<partOfLine<<std::endl;
}
}
return 0;
}
The program I post here extracts the parameters from one or more strings which are formatted as you require. The function extract extracts all the parameters contained in a string (of the format you specified) and insert their names and values in a structure (struct sParms) array.
You may compile the program as extract and execute it at the system prompt as:
username: ./extract "date =20170422,line =10,index =3,field
=partType,lock =productCode1,bookmark=2/19/56,"
The output will be the following:
[date]=[20170422]
[line]=[10]
[index]=[3]
[field]=[partType]
[lock]=[productCode1]
[bookmark]=[2/19/56]
You may execute the program with more than one string:
username: ./extract "date =20170422,line =10,index =3,field
=partType,lock =productCode1,bookmark=2/19/56," "yes=1, no=0"
The output will be the following:
[date]=[20170422]
[line]=[10]
[index]=[3]
[field]=[partType]
[lock]=[productCode1]
[bookmark]=[2/19/56]
[yes]=[1]
[no]=[0]
In the following line there's the code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <libgen.h>
//Parameters: date =20170422,line =10,index =3,field =partType,lock =productCode1,bookmark=2/19/56,
#define SEPARATOR ','
#define ASSIGNEMENT '='
typedef struct sParms {
char * fieldName;
char * fieldValue;
} tsParms;
int loadString(char *to, const char *from);
int extract(tsParms **oparms, const char *inb);
// Retrieve buffer length
int loadString(char *to, const char *from)
{
int len=0;
while(*from<=32 && *from!=SEPARATOR && *from!=ASSIGNEMENT)
from++;
// Get the string value
while(*from>32 && *from!=SEPARATOR && *from!=ASSIGNEMENT) {
*(to+len++)=*from;
++from;
}
*(to+len++)=0;
return len;
}
int extract(tsParms ** oparms, const char *inb)
{
int cnt=0,j;
const char * end, *equ, *start;
char * buff;
tsParms * parms;
if (inb == NULL || strlen(inb) == 0 || oparms == NULL)
return 0;
// It counts the number of parms
end=strchr(inb,ASSIGNEMENT);
while(end) {
cnt++;
end=strchr(end+1,ASSIGNEMENT);
}
if (!cnt)
return 0;
/* Doing some considerations we may assume that the memory to use to store
* fields name and values is the same of the input string (inb)
*
* The space to store the pointers is cnt * sizeof(tsParms *).
*/
j=cnt * sizeof(tsParms) + strlen(inb);
parms = malloc(j+1);
memset(parms,0,j+1);
buff = (char *)(parms+cnt); // The memory area where we can save data!
start=inb;end=start;cnt=0;
do {
end=strchr(start,SEPARATOR);
equ=strchr(start,ASSIGNEMENT);
if (equ) {
//Get the field name
parms[cnt].fieldName=buff;
buff+=loadString(buff,start);
//Get the field value
start=equ+1;
parms[cnt].fieldValue=buff;
buff+=loadString(buff,start);
cnt++;
}
if (end)
start=end+1;
} while(end);
*oparms = parms;
return cnt;
}
int main(int argc, char *argv[])
{
int i,j,cnt=0,retval=0;
tsParms * parms=NULL;
if (argc<2) {
printf("Usage: %s \"string-1\" [\"string-2\" ...\"string-n\"]\n",basename(argv[0]));
return 1;
}
for(i=1; i<argc; i++) {
cnt=extract(&parms, argv[i]);
if (cnt!=0 && parms!=NULL) {
for(j=0;j<cnt;j++) {
printf("[%s]=[%s]\n",parms[j].fieldName,parms[j].fieldValue);
}
puts("");
free((void *)parms);
} else {
retval=1;
break;
}
}
return retval;
}
The question is to replace the spaces contained in a string with a "%20". So basically need to insert this in a string wherever there is a space. Therefore, I want to replace all spaces with %20 but only partial string is getting replaced. I can see the correct o/p in the replace function
#include<iostream>
#include<string>
using namespace std;
int spaces(char* s,int size) /*calculate number of spaces*/
{
int nspace=0;
for(int i=0;i<size;i++)
{
if(s[i]==' ')
{
nspace++;
}
}
return nspace;
}
int len_new_string(char* inp,int l) /*calculate the length of the new string*/
{
int new_length=l+spaces(inp,l)*2;
return new_length;
}
char* replace(char* s,int length) /*function to replace the spaces within a string*/
{
int len=len_new_string(s,length);
char new_string[len];
int j=0;
for(int i=0;i<length;i++)
{
if(s[i]==' ') /*code to insert %20 if space is found*/
{
new_string[j]='%';
new_string[j+1]='2';
new_string[j+2]='0';
j=j+3;
}
else /*copy the original string if no space*/
{
new_string[j]=s[i];
j++;
}
}
cout<<"Replaced String: "<<new_string<<endl;
return s=new_string;
}
int main()
{
char str[]="abc def ghi ";
int length=sizeof(str)/sizeof(str[0]);
cout<<"String is: "<<str<<endl;
char *new_str=replace(str,length);
cout<<"Replaced String is: "<<new_str<<endl;
}
The char array should go out of scope and be released. The only reason you don't get a segfault is that apparently no other program has reserved the memory in that spot yet. To avoid this, try using a char array with padding, handing it over by reference or pointer and filling it in place:
void replace(char *in, char *out, size_t length)
{
/* copy as-is for non-spaces, insert replacement for spaces */
}
int main()
{
char str[]="abc def ghi";
size_t buflen(strlen(str)+2*spaces(str, strlen(str)));
char output[buflen+1];
memset(output, 0, buflen+1);
replace(str, output, strlen(str));
}
Another option is to new[] the return array (remember to delete[] it afterwards, then!) or, which I think you left out for a reason, use std::string all along to avoid the array issue.
When i read from a file string by string, >> operation gets first string but it starts with "i" . Assume that first string is "street", than it gets as "istreet".
Other strings are okay. I tried for different txt files. The result is same. First string starts with "i". What is the problem?
Here is my code :
#include <iostream>
#include <fstream>
#include <string>
#include <vector>
using namespace std;
int cube(int x){ return (x*x*x);}
int main(){
int maxChar;
int lineLength=0;
int cost=0;
cout<<"Enter the max char per line... : ";
cin>>maxChar;
cout<<endl<<"Max char per line is : "<<maxChar<<endl;
fstream inFile("bla.txt",ios::in);
if (!inFile) {
cerr << "Unable to open file datafile.txt";
exit(1); // call system to stop
}
while(!inFile.eof()) {
string word;
inFile >> word;
cout<<word<<endl;
cout<<word.length()<<endl;
if(word.length()+lineLength<=maxChar){
lineLength +=(word.length()+1);
}
else {
cost+=cube(maxChar-(lineLength-1));
lineLength=(word.length()+1);
}
}
}
You're seeing a UTF-8 Byte Order Mark (BOM). It was added by the application that created the file.
To detect and ignore the marker you could try this (untested) function:
bool SkipBOM(std::istream & in)
{
char test[4] = {0};
in.read(test, 3);
if (strcmp(test, "\xEF\xBB\xBF") == 0)
return true;
in.seekg(0);
return false;
}
With reference to the excellent answer by Mark Ransom above, adding this code skips the BOM (Byte Order Mark) on an existing stream. Call it after opening a file.
// Skips the Byte Order Mark (BOM) that defines UTF-8 in some text files.
void SkipBOM(std::ifstream &in)
{
char test[3] = {0};
in.read(test, 3);
if ((unsigned char)test[0] == 0xEF &&
(unsigned char)test[1] == 0xBB &&
(unsigned char)test[2] == 0xBF)
{
return;
}
in.seekg(0);
}
To use:
ifstream in(path);
SkipBOM(in);
string line;
while (getline(in, line))
{
// Process lines of input here.
}
Here is another two ideas.
if you are the one who create the files, save they length along with them, and when reading them, just cut all the prefix with this simple calculation: trueFileLength - savedFileLength = numOfByesToCut
create your own prefix when saving the files, and when reading search for it and delete all what you found before.
I am trying to build a program that copies text from one .txt file to another and then takes the first letter of each word in the text and switches it to an uppercase letter. So far, I have only managed to copy the text with no luck or idea on the uppercase part. Any tips or help would be greatly appreciated. This is what I have so far:
int main()
{
std::ifstream fin("source.txt");
std::ofstream fout("target.txt");
fout<<fin.rdbuf(); //sends the text string to the file "target.txt"
system("pause");
return 0;
}
Try this, Take the file content to a string, then process it, and again write to the traget file.
int main()
{
std::ifstream fin("source.txt");
std::ofstream fout("target.txt");
// get pointer to associated buffer object
std::filebuf* pbuf = fin.rdbuf();
// get file size using buffer's members
std::size_t size = pbuf->pubseekoff (0,fin.end,fin.in);
pbuf->pubseekpos (0,fin.in);
// allocate memory to contain file data
char* buffer=new char[size];
// get file data
pbuf->sgetn (buffer,size);
fin.close();
locale loc;
string fileBuffer = buffer;
stringstream ss;
for (std::string::size_type i=0; i<fileBuffer.length(); ++i){
if(i==0)
ss << toupper(fileBuffer[i],loc);
else if (isspace(c))
ss << fileBuffer[i] << toupper(fileBuffer[++i],loc);
else
ss << fileBuffer[i];
}
string outString = ss.str();
fout << outString;
fout.close();
}
Instead of copying the entire file at once, you'll need to read part or all of it into a local "buffer" variable - perhaps using while (getline(in, my_string)), then you can simply iterate along the string capitalising letters that are either in position 0 or preceeded by a non-letter (you can use std::isalpha and std::toupper), then stream the string to out. If you have a go at that and get stuck, append your new code to the question and someone's sure to help you out....
I think for this copying the whole file is not going to let you edit it. You can use get() and put() to process the file one character at a time. Then figure out how to detect the start of a word and make it uppercase:
Something like this:
int main()
{
std::ifstream fin("source.txt");
std::ofstream fout("target.txt");
char c;
while(fin.get(c))
{
// figure out which chars are the start
// of words (previous char was a space)
// and then use std::toupper(c)
fout.put(c);
}
}
#include <stdio.h>
#include <ctype.h>
#include <string.h>
#include <stdlib.h>
int main() {
FILE* fpin;
FILE* fpout;
int counter = 0;
char currentCharacter;
char previousCharacter=' ';
fpin = fopen("source.txt", "r"); /* open for reading */
if (fpin == NULL)
{
printf("Fail to open source.txt!\n");
return 1;
}
fpout = fopen("target.txt", "w");/* open for writing */
if (fpout == NULL)
{
printf("Fail to open target.txt!\n");
return 1;
}
/* read a character from source.txt until END */
while((currentCharacter = fgetc(fpin)) != EOF)
{
/* find first letter of word */
if(!isalpha(previousCharacter) && previousCharacter != '-' && isalpha(currentCharacter))
{
currentCharacter = toupper(currentCharacter); /* lowercase to uppercase */
counter++; /* count number of words */
}
fputc(currentCharacter, fpout); /* put a character to target.txt */
/* printf("%c",currentCharacter); */
previousCharacter = currentCharacter; /* reset previous character */
}
printf("\nNumber of words = %d\n", counter);
fclose(fpin); /* close source.txt */
fclose(fpout); /* close target.txt */
return 0;
}
When i read from a file string by string, >> operation gets first string but it starts with "i" . Assume that first string is "street", than it gets as "istreet".
Other strings are okay. I tried for different txt files. The result is same. First string starts with "i". What is the problem?
Here is my code :
#include <iostream>
#include <fstream>
#include <string>
#include <vector>
using namespace std;
int cube(int x){ return (x*x*x);}
int main(){
int maxChar;
int lineLength=0;
int cost=0;
cout<<"Enter the max char per line... : ";
cin>>maxChar;
cout<<endl<<"Max char per line is : "<<maxChar<<endl;
fstream inFile("bla.txt",ios::in);
if (!inFile) {
cerr << "Unable to open file datafile.txt";
exit(1); // call system to stop
}
while(!inFile.eof()) {
string word;
inFile >> word;
cout<<word<<endl;
cout<<word.length()<<endl;
if(word.length()+lineLength<=maxChar){
lineLength +=(word.length()+1);
}
else {
cost+=cube(maxChar-(lineLength-1));
lineLength=(word.length()+1);
}
}
}
You're seeing a UTF-8 Byte Order Mark (BOM). It was added by the application that created the file.
To detect and ignore the marker you could try this (untested) function:
bool SkipBOM(std::istream & in)
{
char test[4] = {0};
in.read(test, 3);
if (strcmp(test, "\xEF\xBB\xBF") == 0)
return true;
in.seekg(0);
return false;
}
With reference to the excellent answer by Mark Ransom above, adding this code skips the BOM (Byte Order Mark) on an existing stream. Call it after opening a file.
// Skips the Byte Order Mark (BOM) that defines UTF-8 in some text files.
void SkipBOM(std::ifstream &in)
{
char test[3] = {0};
in.read(test, 3);
if ((unsigned char)test[0] == 0xEF &&
(unsigned char)test[1] == 0xBB &&
(unsigned char)test[2] == 0xBF)
{
return;
}
in.seekg(0);
}
To use:
ifstream in(path);
SkipBOM(in);
string line;
while (getline(in, line))
{
// Process lines of input here.
}
Here is another two ideas.
if you are the one who create the files, save they length along with them, and when reading them, just cut all the prefix with this simple calculation: trueFileLength - savedFileLength = numOfByesToCut
create your own prefix when saving the files, and when reading search for it and delete all what you found before.