losing type information when recognizing tokens in LEX - regex

I am trying to parse a config file using lex and yacc/bison.The sample file is shown below.
[section1]
attr1=1234567
attr2=848329832499934
[section2]
attr3=1233422
attr2=849999934834798
To recognize the values for the attributes listed above,I use the following regular expressions.
DIGIT [0-9]
NUM {DIGIT}+
Now in my specific example attr1 has a type of uint32_t and attr2 has a type of uint64_t
However I cannot recognize this.In otherwords I need to have a symbol table (which I statically define) from where I can lookup the type so that I can populate the appropriate type.
Is there any other approach to solve this problem? How do dynamically typed languages like python solve this issue.?

The usual way is to type the constant (or other token) based on the token itself. So if a sequence of digits token has value in the range 0..232-1, then it would be a unit32_t, otherwise it would be some other type (and corresponding token) capable of holding the value.
In the case of python, there's int type capable of holding integers that fit in a single word (what that is depends on the machine you're running on), and then there's long type capable of handling any larger integer.

Related

What is the number after the class name in llvn type?

I got function types in LLVM pass by getFunctionType(), when I print them, the function type contains something like: (%"class.xalanc_1_8::ReusableArenaBlock.10232"*).
The former part is the class name class.xalanc_1_8::ReusableArenaBlock, what about the number. Some of them contain multiple numbers. I'm wondering about the usage of the numbers.
The frontend (which typically parses a source code language) needs to create LLVM types with unique names. Most frontends use type names and other strings from the source code to construct the IR type's name, but sometimes that's not (guaranteed to be) unique. Appending a number is the normal way to achieve uniqueness.
In fact appending a number it is what LLVM itself does if you create certain types or many other things, and there's no error except that your requested name is already taken.
It's not the only way to avoid conflicts. My own code adds line numbers from the source code in some cases (not for types, though).

How do I prevent strings containing the word "yes" or "no" from printing out as true or false? [duplicate]

This question already has answers here:
How can I prevent SerializeJSON from changing Yes/No/True/False strings to boolean?
(7 answers)
Closed 6 years ago.
I'm currently setting a number of variables like so:
<cfset printPage = "YES">
Eventually, when I print these variables out, anything that I try to set to "YES" prints out as "true". Anything set to "NO", prints out as "false". I'm not opposed to using the YesNoFormat function. In fact I might end up using that function in this application, but in the mean time I would like to know if ColdFusion is actually storing the words "YES" and "NO" in memory, or if it is converting them to a boolean format behind the scenes.
If CF is storing my variables exactly the way that I declare them, how would I go about retrieving these variables as strings? If CF is changing the variables in some way, are there any special characters or keywords that I could use to force it to store the variables as strings?
Thank you to everyone that commented / answered. I did a little more experimenting and reading, and it seems that the serializeJSON function will automatically convert "Yes" to "true" and "No" to "false". I either need to deal with this problem in my javascript, or I can add a space in the affected properties to circumvent this behavior.
You already know how to display the boolean value as "yes" or "no" (using YesNoFormat()). I don't think there is a way to force ColdFusion to store a variable a certain way. It just doesn't support that. I guess you could call out the Java type directly by using JavaCast(). I just don't see why you would want to go through that extra work for something like this. You can certainly research that a bit more if you like. Here is a link to the document for JavaCast.
Have a look at this document regarding data types in ColdFusion. I will post some of the relevant points from that document here but please read that page for more information.
ColdFusion is often referred to as typeless because you do not assign types to variables and ColdFusion does not associate a type with the variable name. However, the data that a variable represents does have a type, and the data type affects how ColdFusion evaluates an expression or function argument. ColdFusion can automatically convert many data types into others when it evaluates expressions. For simple data, such as numbers and strings, the data type is unimportant until the variable is used in an expression or as a function argument.
ColdFusion variable data belongs to one of the following type categories:
Simple One value. Can use directly in ColdFusion expressions. Include numbers, strings, Boolean values, and date-time values.
Binary Raw data, such as the contents of a GIF file or an executable program file.
Complex A container for data. Generally represent more than one value. ColdFusion built-in complex data types include arrays, structures, queries, and XML document objects. You cannot use a complex variable, such as an array, directly in a ColdFusion expression, but you can use simple data type elements of a complex variable in an expression. For example, with a one-dimensional array of numbers called myArray, you cannot use the expression myArray * 5. However, you could use an expression myArray[3] * 5 to multiply the third element in the array by five.
Objects Complex constructs. Often encapsulate both data and functional operations.
It goes on to say this regarding Data Types:
Data type notes
Although ColdFusion variables do not have types, it is often convenient to use “variable type” as a shorthand for the type of data that the variable represents.
ColdFusion provides the following functions for identifying the data type of a variable:
IsArray
IsBinary
IsBoolean
IsImage
IsNumericDate
IsObject
IsPDFObject
IsQuery
IsSimpleValue
IsStruct
IsXmlDoc
ColdFusion also includes the following functions for determining whether a string can be represented as or converted to another data type:
IsDate
IsNumeric
IsXML
So in your code you could use something like IsBoolean(printPage) to check if it contains a boolean value. Of course that doesn't mean it is actually stored as a boolean but that ColdFusion can interpret it's value as a boolean.

Declaring a variable without specifying its data type in C++

Why is it necessary to specify the datatype of a variable? What if my program requires the user to enter data that could belong to any one of two non intersecting data types? Shouldn't the option of declaring a variable without specifying a variable be provided so as to account for a situation. Why can't we let the computer decide the data type on the basis of user input? If the compiler has enough capability to identify a type error, I'm sure it can easily specify a data type on the basis of the input.
Compilers don't deal with input, so that's no option.
There's boost::variant<T,U> which is a type can hold either T or U values, but you still have to specify to the compiler all possible options, and you have to make clear what you put in.
User input always starts out as a string. Parsing turns that into types, but the result depends on the actual parsing. If you're parsing float values, 0 is a perfectly fine float value.

Return Set for a Command Parser

I need to write a parser to parse commands. 5 such commands are:
"a=10"
"b=foo"
"c=10,10"
"clear d"
"c push_back 2"
In the case of the first example, set is the command, a is the object and 10 is the value.
What do you think the parser should return for each line above?
Here is my idea:
"a=10" -> SET (COMMAND_ENUM), INT (VALUE_TYPE), "a", ("10")
"b=foo" -> SET (COMMAND_ENUM), STRING (VALUE_TYPE), "b", ("foo")
Is this a good approach? What is the standard approach for this problem? Should I dispatch instead?
I have a function which checks the type associated with an object. For example, a above is of type INT and must be assigned an INT value, otherwise the parser should return or throw an error of some sort. I also have a convert function for converting values from strings to the desired type. These throw if the conversion is not possible. If the parser tries to convert the values from strings to the required type, then it is probably a good idea to return them via a boost::variant.
You need to come up with at least a semi-formal grammar for the command language you want to recognize, since you've left a whole lot of things really vaguely specified (e.g. in b=foo you want b to be a variable name but foo to be a string literal. How do you distinguish them?. Does a sequence of characters represent an identifier if it's on the right side of an assignment, but a literal if it's on the left side? Or does a single character represent an identifier, but multiple characters represent a literal?) In c=10,10 does 10,10 represent a list or a vector? Writing a grammar will at least force you to think about such things, and it will also serve at least as a guide to how to write your parser (at most it will be something that can be automatically translated into your parser).
You're on the right track by thinking of how statements should be represented as Abstract Syntax Trees (ASTs), but you need to take a step backwards and look at what you want in terms of concrete syntax.

Passing lists from Mathematica to c++ (Mathlink)

I simply want to pass a list of integers to a function written in C++. I've set up the template (.tm) file and all, and I can successfully call a test function whith scalar arguments. Calling the function with the list argument behaves as though the function was not defined at all. I suspect that the argument types don't match.
In the documentation for templates (http://reference.wolfram.com/mathematica/ref/file/file.tm.html) the datatype for lists is something like "Int32List". When I use that, my C++ function must contain an extra long parameter for the list length. The only example code which uses a list is "sumalist.tm". This example uses IntegerList (a type which doesn't appear in the doku).
When I use Int32List, the mprep result requires a function with an extra integer argument (not long as written in the doku). When I use the undocumented IntegerList type, the extra argument is of type long.
During my experiments with scalar types, I had a similar problem - a c++ function was called properly when using "Integer" in the tm-file, and not recognized with "Integer32".
The "sumalist.tm" example also uses a strange Pattern (list:{___Integer}) about which I didn't find any documentation. I'd also like to understand what the Evaluate line means (I suspect that it's used make the function callable without the curly braces around the list).
So who know which datatypes are really appropriate to call a c++ function with a list - maybe also with reals... ?
The mapping of MathLink data types (e.g., Integer32, Integer32List, ...) to C/C++ types is described on the MathLink template file documentation page.
The page no longer documents the old interface types Integer, Real, IntegerList and RealList. These should no longer be used, because the mapping of these types depends on C types whose bit length is platform and compiler dependent (e.g., long). Use the corresponding new type with explicit bit length instead (i.e., Integer32 or Integer64 instead of Integer). The old interface types are still documented in the somewhat dated MathLink reference guide.
The following talk slides contain a simple MathLink example that shows how to implement a MathLink function that adds a scalar value to a vector of reals. This may serve as a starting point.
I don't know much about MathLink, but I can explain the pattern, list:{___Integer}.
The colon is just the general form for a named pattern, that is symbol:pattern just says that the object referred to by symbol has to match pattern. Indeed, pattern like a_Integer or b__List are really just short forms for a:_Integer and b:__List.
So what we are left with interpreting is {___Integer}. This is a pattern matching a list of arbitrary many (including zero) integers. It works as follows:
{Pattern} is the Pattern for a list whose contents matches Pattern
___Integer is the Pattern for a sequence of zero or more Integers.