Bison does not appear to recognize C string literals appropriately - c++

My problem is that I am trying to run a problem that I coded using a flex-bison scanner-parser. What my program is supposed to do is take user input (in my case, queries for a database system I'm designing), lex and parse, and then execute the corresponding actions. What actually happens is that my parser code is not correctly interpreting the string literals that I feed it.
Here's my code:
130 insertexpr : "INSERT" expr '(' expr ')'
131
132 {
133 $$ = new QLInsert( $2, $4 );
134 }
135 ;
And my input, following the "Query: " prompt:
Query: INSERT abc(5);
input:1.0-5: syntax error, unexpected string, expecting end of file or end of line or INSERT or ';'
Now, if I remove the "INSERT" string literal from my parser.yy code on line 130, the program runs just fine. In fact, after storing the input data (namely, "abc" and the integer 5), it's returned right back to me correctly.
At first, I thought this was an issue with character encodings. Bison code needs to be compiled and run using the same encodings, which should not be an issue seeing as I am compiling and running from the same terminal.
My system details:
Ubuntu 8.10 (Linux 2.6.24-16-generic)
flex 2.5.34
bison 2.3
gcc 4.2.4
If you need any more info or code from, let me know!

This is a classic error, if you use flex to lex your input into tokens, you must not refer to the literal strings in the parser as literal strings, but rather use tokens for them.
For details, see this similar question

Thankee, thankee, thankee!
Just to clarify, here is how I implemented my solution, based on the comments from jpalecek. First, I declared an INSERT token in the bison code (parser.yy):
71 %token INSERT
Next, I defined that token in the flex code (scanner.ll):
79 "INSERT INTO" { return token::INSERT; }
Finally, I used the token INSERT in my grammar rule:
132 insertexpr : INSERT expr '(' expr ')'
133
134 {
135 $$ = new QLInsert( $2, $4 );
136 }
137 ;
And voila! my over-extended headache is finally over!
Thanks, jpalecek :).

Related

Error! Unknown Op code for conditional MS Word 365

I am trying to do a mail merge and am using the code below. Note: the {} are made with CTRL +F9.
and I keep getting the above error. A similar code is use in many other parts of my document and are working fine.
Can anyone help with what is triggering this error.
It would be surprising if either field in your post was working properly given that neither is properly constructed. Both lack an operator before the first "" and a space between the = and 0
In any event, all you need is:
{IF{MERGEFIELD MidYear_MAP_Test_Points_Grade_Numeric_}<> "" {MERGEFIELD MidYear_MAP_Test_Points_Grade_Numeric_ \# "0.00,,'Missing'"}}
and:
{IF{MERGEFIELD USATP_4_Points_Grade_Numeric_MaxPoints}<> "" {MERGEFIELD USATP_4_Points_Grade_Numeric_MaxPoints \# "0.00,,'Missing'"}}

unexpected behavior with replaceList function

We recently moved from CF 10 to CF 2016 and stumbled upon the following issue:
<cfscript>
x = "abc";
x = replaceList(x, "ab|cd", "1|2", "|");
writeDump(x);
// CF 11, CF 2016
// >> 12
// CF 10, Railo/Lucee
// >> 1c
// --------------------
x = "abc";
x = replaceList(x, "ab,cd", "1,2", ",");
writeDump(x);
// CF 11, CF 2016
// >> 1c
// CF 10, Railo/Lucee
// >> 1c
</cfscript>
What is going on here? Why is this change not documented by Adobe? Is it even an intended change to begin with?
Update:
Bug Report #4164200 filed with Adobe
Short answer:
I suspect it is unintentional and would file a bug report. As a work around, try escaping the pipe symbol: replaceList(x, "ab|cd", "1|2", "\|");.
Longer answer:
Internally, this function would almost certainly use some sort of regular expression ( where pipe symbols | have a special meaning ie logical OR). My guess is that CF first uses String.split("regex") to break the two lists into arrays. Then loops through the arrays to perform the replacements.
Based on the results, CF is not escaping the pipe symbol, causing the lists to be split differently than expected. Each individual character becomes a separate element, which obviously ends up matching more than you intended ie every character in the base string.
list = "ab|cd";
writeDump(list.split("|") );
However, if you escape the pipe symbol with \, you get the expected results:
list = "ab|cd";
writeDump(list.split("\|"));

c++: Decoding with mbtowsc() not working with special characters

After learning a lot of things from that post (to sum up: I need to decode special characters as 'ñ' in eclipse on a fedora), now I see that I was doing it wrong or at least that is what I think because I was decoding the text as a normal string and then trying to pass it from char to wchar_t and now I'm using mbtowsc() to decode it from the beggining. So now....I have another couple of doubts/problems.
What I have:
A server sends me the text "$ Euroñ $"(which I don't know how is it encoded, just that the string is encoded as ANSI 1 byte). I load that text in descr[0], so when I debug it, some values are:
descr[0][2]= 69 'E'
descr[0][3]= 117 'u'
descr[0][4]= 114 'r'
descr[0][5]= 111 'o'
descr[0][6]= -15 'ñ'
With that, I suppose you can know if it is utf-8 or not, which I suppose it is.
My goal is to be able to do something like:
for (int i = 0; i < total; i++)
{
if(descr[0][i]=='a') //do things
else if (descr[0][i]=='ñ') // do other stuff
}
This is what I am not able to get because ofc descr[0][6] never equals 'ñ'...
I've tried with:
if(descr[0][i]=='\x00D1')
if(descr[0][i]=='\x00F1')
if(descr[0][i]=='ñ') //this compiles but then ignores that if like it is an error
if(descr[0][i]==L'ñ')
But it never enters on any of that ifs, so.. to what should I compare descr[0][i] to know if it is an 'ñ' ?
Thanks.

Python Regex to Extract Genome Sequence

I’m trying to use a Python Regular Expression to extract a genome sequence from a genome database; I’ve pasted a snippet of the database below.
>GSVIVT01031739001 pacid=17837850 polypeptide=GSVIVT01031739001 locus=GSVIVG01031739001 ID=GSVIVT01031739001.Genoscope12X annot-version=Genoscope.12X ATGAAAACGGAACTCTTTCTAGGTCATTTCCTCTTCAAACAAGAAAGAAGTAAAAGTTGCATACCAAATATGGACTCGAT TTGGAGTCGTAGTGCCCTGTCCACAGCTTCGGACTTCCTCACTGCAATCTACTTCGCCTTCATCTTCATCGTCGCCAGGT TTTTCTTGGACAGATTCATCTATCGAAGGTTGGCCATCTGGTTATTGAGCAAGGGAGCTGTTCCATTGAAGAAAAATGAT GCTACACTGGGAAAAATTGTAAAATGTTCGGAGTCTTTGTGGAAACTAACATACTATGCAACTGTTGAAGCATTCATTCT TGCTATTTCCTACCAAGAGCCATGGTTTAGAGATTCAAAGCAGTACTTTAGAGGGTGGCCAAATCAAGAGTTGACGCTTC CCCTCAAGCTTTTCTACATGTGCCAATGTGGGTTCTACATCTACAGCATTGCTGCCCTTCTTACATGGGAAACTCGCAGG AGGGATTTCTCTGTGATGATGTCTCATCATGTAGTCACTGTTATCCTAATTGGGTACTCATACATATCAAGTTTTGTCCG GATCGGCTCAGTTGTCCTTGCCCTGCACGATGCAAGTGATGTCTTCATGGAAGCTGCAAAAGTTTTTAAATATTCTGAGA AGGAGCTTGCAGCAAGTGTGTGCTTTGGATTTTTTGCCATCTCATGGCTTGTCCTACGGTTAATATTCTTTCCCTTTTGG GTTATCAGTGCATCAAGCTATGATATGCAAAATTGCATGAATCTATCGGAGGCCTATCCCATGTTGCTATACTATGTTTT CAATACAATGCTCTTGACACTACTTGTGTTCCATATATACTGGTGGATTCTTATATGCTCAATGATTATGAGACAGCTGA AAAATAGAGGACAAGTTGGAGAAGATATAAGATCTGATTCAGAGGACGATGAATAG
>GSVIVT01031740001 pacid=17837851 polypeptide=GSVIVT01031740001 locus=GSVIVG01031740001 ID=GSVIVT01031740001.Genoscope12X annot-version=Genoscope.12X ATGGGTATTACTACTTCCCTCTCATATCTTTTATTCTTCAACATCATCCTCCCAACCTTAACGGCTTCTCCAATACTGTT TCAGGGGTTCAATTGGGAATCATCCAAAAAGCAAGGAGGGTGGTACAACTTCCTCATCAACTCCATTCCTGAACTATCTG CCTCTGGAATCACTCATGTTTGGCTTCCTCCACCCTCTCAGTCTGCTGCATCTGAAGGGTACCTGCCAGGAAGGCTTTAT GATCTCAATGCATCCCACTATGGTACCCAATATGAACTAAAAGCATTGATAAAGGCATTTCGCAGCAATGGGATCCAGTG CATAGCAGACATAGTTATAAACCACAGGACTGCTGAGAAGAAAGATTCAAGAGGAATATGGGCCATCTTTGAAGGAGGAA CCCCAGATGATCGCCTTGACTGGGGTCCATCTTTTATCTGCAGTGATGACACTCTTTTTTCTGATGGCACAGGAAATCCT GATACTGGAGCAGGCTTCGATCCTGCTCCAGACATTGATCATGTAAACCCCCGGGTCCAGCGAGAGCTATCAGATTGGAT GAATTGGTTAAAGATTGAAATAGGCTTTGCTGGATGGCGATTCGATTTTGCTAGAGGATACTCCCCAGATTTTACCAAGT TGTATATGGAAAACACTTCGCCAAACTTTGCAGTAGGGGAAATATGGAATTCTCTTTCTTATGGAAATGACAGTAAGCCA AACTACAACCAAGATGCTCATCGGCGTGAGCTTGTGGACTGGGTGAAAGCTGCTGGAGGAGCAGTGACTGCATTTGATTT TACAACCAAAGGGATACTCCAAGCTGCAGTGGAAGGGGAATTGTGGAGGCTGAAGGACTCAAATGGAGGGCCTCCAGGAA TGATTGGCTTAATGCCTGAAAATGCTGTGACTTTCATAGATAATCATGACACAGGTTCTACACAAAAAATTTGGCCATTC CCATCAGACAAAGTCATGCAGGGATATGTTTATATCCTCACTCATCCTGGGATTCCATCCATATTCTATGACCACTTCTT TGACTGGGGTCTGAAGGAGGAGATTTCTAAGCTGATCAGTATCAGGACCAGGAACGGGATCAAACCCAACAGTGTGGTGC GTATTCTGGCATCTGACCCAGATCTTTATGTAGCTGCCATAGATGAGAAAATCATTGCTAAGATTGGACCAAGGTATGAT GTTGGGAACCTTGTACCTTCAACCTTCAAACTTGCCACCTCTGGCAACAATTATGCTGTGTGGGAGAAACAGTAA
>GSVIVT01031741001 pacid=17837852 polypeptide=GSVIVT01031741001 locus=GSVIVG01031741001 ID=GSVIVT01031741001.Genoscope12X annot-version=Genoscope.12X ATGTCCAAATTAACTTATTTATTATCTCGGTACATGCCAGGAAGGCTTTATGATCTGAATGCATCCAAATATGGCACCCA AGATGAACTGAAAACACTGATAAAGGTGTTTCACAGCAAGGGGGTCCAGTGCATAGCAGACATAGTTATAAACCACAGAA CTGCAGAGAAGCAAGACGCAAGAGGAATATGGCCATCTTTGAAGGAGGAACCCCAGATGATCGCCTTGACTGGACCCCAT CTTTCCTTTGCAAGGACGACACTCCTTATTCCGACGGCACCGGAAACCCTGATTCTGGAGATGACTACAGTGCCGCACCA GACATCGACCACATCAACCCACGGGTTCAGCAAGAGCTAA
What I’m trying to do is get the genome (ACGT) sequence for GSVIV01031740001 (the middle sequence), and none of the others. My current regex is
sequence = re.compile('(?<=>GSVIVT01031740001) pacid=.*annot-version=.*\n[ACGT\n]*[^(?<!>GSVIVT01031740001) pacid]’)
with my logic being find the header with the genbank ID for the correct organism, give me that line, then go to a new line and give me all ACGT and new lines until I get to a header for an organism with a different genbank ID. This fails to give any results.
Yes, I know that re.compile doesn’t actually perform a search; I’m searching against a file opened as ‘target’ so my execution looks like
>>> for nucl in target:
... if re.search(sequence, nucl):
... print(nucl)
Can someone tell me what I’m doing wrong, either in my regex or by using regex in the first place? When I try this on regex101.com, it works, but when I try it in the Python interpreter (2.7.1), it fails.
Thanks!
If I understand correctly , you want JUST the genomic sequence for a given locus. So You can do something like this.(assumes your data is in a file)
lines = [line.split(' ') for line in open('results.txt') ]
somedict = {}
for each in lines:
locus = each[3].split('=')[-1]
seq = ''.join(each[6:])
somedict[locus] = seq
print somedict
It outputs a dictionary with the locus as key and sequence as value
{'GSVIVG01031741001': 'ATGTCCAAATTAACTTATTTATTATCTCGGTACATGCCAGGAAGGCTTTATGATCTGAATGCATCCAAATATGGCACCCAAGATGAACTGAAAACACTGATAAAGGTGTTTCACAGCAAGGGGGTCCAGTGCATAGCAGACATAGTTATAAACCACAGAACTGCAGAGAAGCAAGACGCAAGAGGAATATGGCCATCTTTGAAGGAGGAACCCCAGATGATCGCCTTGACTGGACCCCATCTTTCCTTTGCAAGGACGACACTCCTTATTCCGACGGCACCGGAAACCCTGATTCTGGAGATGACTACAGTGCCGCACCAGACATCGACCACATCAACCCACGGGTTCAGCAAGAGCTAA\n', 'GSVIVG01031740001': 'ATGGGTATTACTACTTCCCTCTCATATCTTTTATTCTTCAACATCATCCTCCCAACCTTAACGGCTTCTCCAATACTGTTTCAGGGGTTCAATTGGGAATCATCCAAAAAGCAAGGAGGGTGGTACAACTTCCTCATCAACTCCATTCCTGAACTATCTGCCTCTGGAATCACTCATGTTTGGCTTCCTCCACCCTCTCAGTCTGCTGCATCTGAAGGGTACCTGCCAGGAAGGCTTTATGATCTCAATGCATCCCACTATGGTACCCAATATGAACTAAAAGCATTGATAAAGGCATTTCGCAGCAATGGGATCCAGTGCATAGCAGACATAGTTATAAACCACAGGACTGCTGAGAAGAAAGATTCAAGAGGAATATGGGCCATCTTTGAAGGAGGAACCCCAGATGATCGCCTTGACTGGGGTCCATCTTTTATCTGCAGTGATGACACTCTTTTTTCTGATGGCACAGGAAATCCTGATACTGGAGCAGGCTTCGATCCTGCTCCAGACATTGATCATGTAAACCCCCGGGTCCAGCGAGAGCTATCAGATTGGATGAATTGGTTAAAGATTGAAATAGGCTTTGCTGGATGGCGATTCGATTTTGCTAGAGGATACTCCCCAGATTTTACCAAGTTGTATATGGAAAACACTTCGCCAAACTTTGCAGTAGGGGAAATATGGAATTCTCTTTCTTATGGAAATGACAGTAAGCCAAACTACAACCAAGATGCTCATCGGCGTGAGCTTGTGGACTGGGTGAAAGCTGCTGGAGGAGCAGTGACTGCATTTGATTTTACAACCAAAGGGATACTCCAAGCTGCAGTGGAAGGGGAATTGTGGAGGCTGAAGGACTCAAATGGAGGGCCTCCAGGAATGATTGGCTTAATGCCTGAAAATGCTGTGACTTTCATAGATAATCATGACACAGGTTCTACACAAAAAATTTGGCCATTCCCATCAGACAAAGTCATGCAGGGATATGTTTATATCCTCACTCATCCTGGGATTCCATCCATATTCTATGACCACTTCTTTGACTGGGGTCTGAAGGAGGAGATTTCTAAGCTGATCAGTATCAGGACCAGGAACGGGATCAAACCCAACAGTGTGGTGCGTATTCTGGCATCTGACCCAGATCTTTATGTAGCTGCCATAGATGAGAAAATCATTGCTAAGATTGGACCAAGGTATGATGTTGGGAACCTTGTACCTTCAACCTTCAAACTTGCCACCTCTGGCAACAATTATGCTGTGTGGGAGAAACAGTAA\n', 'GSVIVG01031739001': 'ATGAAAACGGAACTCTTTCTAGGTCATTTCCTCTTCAAACAAGAAAGAAGTAAAAGTTGCATACCAAATATGGACTCGATTTGGAGTCGTAGTGCCCTGTCCACAGCTTCGGACTTCCTCACTGCAATCTACTTCGCCTTCATCTTCATCGTCGCCAGGTTTTTCTTGGACAGATTCATCTATCGAAGGTTGGCCATCTGGTTATTGAGCAAGGGAGCTGTTCCATTGAAGAAAAATGATGCTACACTGGGAAAAATTGTAAAATGTTCGGAGTCTTTGTGGAAACTAACATACTATGCAACTGTTGAAGCATTCATTCTTGCTATTTCCTACCAAGAGCCATGGTTTAGAGATTCAAAGCAGTACTTTAGAGGGTGGCCAAATCAAGAGTTGACGCTTCCCCTCAAGCTTTTCTACATGTGCCAATGTGGGTTCTACATCTACAGCATTGCTGCCCTTCTTACATGGGAAACTCGCAGGAGGGATTTCTCTGTGATGATGTCTCATCATGTAGTCACTGTTATCCTAATTGGGTACTCATACATATCAAGTTTTGTCCGGATCGGCTCAGTTGTCCTTGCCCTGCACGATGCAAGTGATGTCTTCATGGAAGCTGCAAAAGTTTTTAAATATTCTGAGAAGGAGCTTGCAGCAAGTGTGTGCTTTGGATTTTTTGCCATCTCATGGCTTGTCCTACGGTTAATATTCTTTCCCTTTTGGGTTATCAGTGCATCAAGCTATGATATGCAAAATTGCATGAATCTATCGGAGGCCTATCCCATGTTGCTATACTATGTTTTCAATACAATGCTCTTGACACTACTTGTGTTCCATATATACTGGTGGATTCTTATATGCTCAATGATTATGAGACAGCTGAAAAATAGAGGACAAGTTGGAGAAGATATAAGATCTGATTCAGAGGACGATGAATAG\n'}

How do I nest syntax highlighting in Vim?

How do I get nest syntax highlight in Vim to actually highlight the syntax?
I'm working on a syntax file for a language where strings can contain code from other languages, specifically, SQL, Groovy, JavaScript, PHP and AppleScript. I'm just trying to get the first one working and hoping to make it easy on myself by beginning such strings with a comment in the embedded language that specifies what the language is. Here's an example:
ExecuteSQL( "/*lang=sql*/ SELECT *
FROM table
WHERE id = ?";
""; "";
456
)
In my syntax file I have the following lines of code:
runtime! syntax/sql.vim
unlet b:current_syntax
syntax include #sql_code syntax/sql.vim
syntax region fm_sql_code start /"\/\*lang=sql\*\// keepend end=/\v"/ contains=#sql_code
At the moment this code is at the end of my syntax/plugin.vim file. But I've also tried putting the code in ftplugin/plugin.vim but using autocmds to execute it:
au FileType fmcalc runtime! syntax/sql.vim
au FileType fmcalc unlet b:current_syntax
au FileType fmcalc syntax include #sql_code syntax/sql.vim
au FileType fmcalc syntax region fm_sql_code start /"\/\*lang=sql\*\// keepend end=/\v"/ contains=#sql_code
So far the location of the code hasn't changed the result.
I have a mapping that allows me to see the syntax stack for the current character by pressing <c-s-p>.
The result is that with the above, when my cursor is in the string I see the stack as ['fm_sql_code', 'sqlString']. So it's detecting the embedded syntax but flagging it as a string. The problem is that the text isn't highlighted as SQL language code, just as a string literal.
My first thought was that it was doing this because the regex I'm using for the region is including the double-quotes, so I tried changing that to the following:
syntax region fm_sql_code start=/"\#=\/\*lang=sql\*\// keepend end=/"\#=/ contains=#sql_code
But that fails to see the strings as SQL at all. The syntax stack for the above example returns ['fm_string_literal'], which is the default syntax name for strings.
Any help would be appreciated.
let csyn = b:current_syntax
unlet b:current_syntax
syntax include #sql syntax/sql.vim
syntax region sqlSnip start='\(\)\(/\* *lang=sql *\*/\)\#=' end='\(\)"\#=' contains=#sql containedin=ALL
let b:current_syntax = csyn
unlet csyn
EDIT:
Temporarily unsetting b:current_syntax is required since Vim syntax files are usually not loaded when the variable exists. For example, syntax/sql.vim starts as follows:
" For version 5.x: Clear all syntax items
" For version 6.x: Quit when a syntax file was already loaded
if version < 600
syntax clear
elseif exists("b:current_syntax")
finish
endif