Python If statement with variable and string - if-statement

Is it possible to construct a python if-then statement where the equality depends on both a variable and a string?
For example, the possible values might be "4","4A", or "4B". But I am running over a loop from 4-12 (so I might have 5A, 6, 7B, etc.; always an integer 4-12, and there may or may not be a string afterward).
I attempted:
B_steps=np.zeros(8)
A_steps=np.zeros(8)
No_steps=np.zeros(8)
for i in range(0,9,1):
if data[i]=="i+4""A":
A_steps[i]=A_steps[i]+1
elif data[i]=="i+4B":
B_steps[i]=B_steps[i]+1
else:
No_Steps[i]=No_steps[i]+1
But this does not work; my syntax is improper to identify both a variable (i+4) and a possible letter. Can someone advise what the proper syntax is to have both a variable and a string in an if/then statement?

For this scenario (and why your syntax is improper), variable i is of type int,
while you put it in a string as part of a string literal.
Try if data[i] == "%sA" % (i + 4) instead.
Also the code you posted here is not indented correctly, albeit it could be caused by copy-pasting.
Apart from that, if you're not similar with the language itself and conventions in python, or some basic programming concepts, help yourself with the Python Documentation. I would recommend Chapters of "Tutorial", "Library Reference" and "Python HOWTOS".
For more complex scenarios, you might try using regular expressions with re from python standard library like:
for i in range(0, 9, 1):
if re.match(pattern_A, data[i]):
...
...

Related

Why are if expressions and if statements in ada, also for case

Taken from Introduction to Ada—If expressions:
Ada's if expressions are similar to if statements. However, there are a few differences that stem from the fact that it is an expression:
All branches' expressions must be of the same type
It must be surrounded by parentheses if the surrounding expression does not already contain them
An else branch is mandatory unless the expression following then has a Boolean value. In that case an else branch is optional and, if not present, defaults to else True.
I do not understand the need to have two different ways of constructing code with the if keyword. What is the reasoning behind this?
Also there case expressions and case statements. Why is this?
I think this is best answered by quoting the Ada 2012 Rationale Chapter 3.1:
One of the key areas identified by the WG9 guidance document [1] as
needing attention was improving the ability to write and enforce
contracts. These were discussed in detail in the previous chapter.
When defining the new aspects for preconditions, postconditions, type
invariants and subtype predicates it became clear that without more
flexible forms of expressions, many functions would need to be
introduced because in all cases the aspect was given by an expression.
However, declaring a function and thus giving the detail of the
condition, invariant or predicate in the function body makes the
detail of the contract rather remote for the human reader. Information
hiding is usually a good thing but in this case, it just introduces
obscurity. Four forms are introduced, namely, if expressions, case
expressions, quantified expressions and expression functions. Together
they give Ada some of the flexible feel of a functional language.
In addition, if statements and case statements often assigns different values to the same variable in all branches, and nothing else:
if Foo > 10 then
Bar := 1;
else
Bar := 2;
end if;
In this case, an if expression may increase readability and more clearly state in the code what's going on:
Bar := (if Foo > 10 then 1 else 2);
We can now see that there's no longer a need for the maintainer of the code to read a whole if statement in order to see that only a single variable is updated.
Same goes for case expressions, which can also reduce the need for nesting if expressions.
Also, I can throw the question back to you: Why does C-based languages have the ternary operator ?: in addition to if statements?
Egilhh already covered the main reason, but there are sometimes other useful reasons to implement expressions. Sometimes you make packages where only one or two methods are needed and they are the only reason to make a package body. You can use expressions to make expression functions which allow you to define the operations in the spec file.
Additionally, if you ever end up with some complex variant record combinations, sometimes expressions can be used to setup default values for them in instances where you normally would not be able to as cleanly. Consider the following example:
with Ada.Text_IO; use Ada.Text_IO;
procedure Hello is
type Binary_Type is (On, Off);
type Inner(Binary : Binary_Type := Off) is record
case Binary is
when On =>
Value : Integer := 0;
when Off =>
null;
end case;
end record;
type Outer(Some_Flag : Boolean) is record
Other : Integer := 32;
Thing : Inner := (if Some_Flag then
(Binary => Off)
else
(Binary => On, Value => 23));
end record;
begin
Put_Line("Hello, world!");
end Hello;
I had something come up with a more complex setup that was meant to map to a complex messaging interface at the hardware level. It's nice to have defaults whenever possible. Now I cold have used a case inside of Outer, but then I would have had to come up with two separately named versions of the message field for each case, which really isn't optimal when you want your code to map to an ICD. Again, I could have used a function to initialize it as well, but as noted in the other posters answer, that isn't always a good way to go.
Another place that outlines the motivation for adding conditional expressions to Ada can be found in the ARG document, AI05-0147-1, which explains the motivation and gives some examples of use.
An example of a place where I find them quite useful is in processing command line parameters, for the case when a default value is used if the parameter is not specified on the command line. Generally, you'd want to declare such values as constants in one's program. Conditional expressions makes it easier to do that.
with Ada.Command_Line; use Ada;
procedure Main
is
N : constant Positive :=
(if Command_Line.Argument_Count = 0 then 2_000_000
else Positive'Value (Command_Line.Argument (1)));
...
Otherwise, without conditional expressions, in order to achieve the same effect you'd need to declare a function, which I find to be more difficult to read;
with Ada.Command_Line; use Ada;
procedure Main
is
function Get_N return Positive is
begin
if Command_Line.Argument_Count = 0 then
return 2_000_000;
else
return Positive'Value (Command_Line.Argument (1));
end if;
end Get_N;
N : constant Positive := Get_N;
...
The if expression in Ada feels and works a lot like a statement using the ternary operator in the C-based languages. I took the liberty of copying some code from learn.adacore.com that introduces the if expression:
with Ada.Text_IO; use Ada.Text_IO;
with Ada.Integer_Text_IO; use Ada.Integer_Text_IO;
procedure Check_Positive is
N : Integer;
begin
Put ("Enter an integer value: ");
Get (N);
Put (N,0);
declare
S : constant String :=
(if N > 0 then " is a positive number"
else " is not a positive number");
begin
Put_Line (S);
end;
end Check_Positive;
And I translated it to a C-based language - in this case, Java. I believe the main point to notice is that both languages, although syntactically different, are effectively doing the same thing: testing a condition and assigning one of two values to a variable all within one statement. Although I realize this is an oversimplification for most here on stackoverlfow. My goal is to help the beginner to understand the basic concept with introductory examples. Cheers.
import java.util.Scanner;
public class IfExpression {
public static void main(String[] args) {
Scanner in = new Scanner(System.in);
System.out.print("Enter an integer value: ");
var N = in.nextInt();
System.out.print(N);
var S = N > 0 ? " is a positive number" : " is not a positive number";
System.out.println(S);
in.close();
}
}

How to split a simple Lisp-like code to tokens in C++?

Basically, the language has 3 list and 3 fixed-length types, one of them is string.
This is simple to detect the type of a token using regular expressions, but splitting them into tokens is not that trivial.
String is notated with double-quote, and double-qoute is escaped with backslash.
EDIT:
Some example code
{
print (sum (1 2 3 4))
if [( 2 + 3 ) < 6] : {print ("Smaller")}
}
Lists like
() are argument lists that are only evaluated when necessary.
[] are special list to express 2 operand operations in a prettier
way.
{} are lists that are always evaluated. First element is a function
name, second is a list of arguments, and this repeats.
anything : anything [ : anything [: ...]] translate to argument lists that have the elements joined by the :s. This is only for making loops and conditionals look better.
All functions take a single argument. Argument lists can be used for functions that need more. You can fore and argument list to evaluate using different types of eval functions. (There would be eval functions for each list model)
So, if you understand this, this works very similar like Lisp does, it's only has different list types for prettifying the code.
EDIT:
#rici
[[2 + 3] < 6] is OK too. As I mentioned, argument lists are evaluated only when it's necessary. Since < is a function that requires an argument list of length 2, (2 + 3) must be evaluated somehow, other ways it [(2 + 3) < 6] would translate to < (2 + 3) : 6 which equals to < (2 + 3 6) which is and invalid argument list for <. But I see you point, it's not trivial that how automatic parsing in this case should work. The version that I described above, is that the [...] evaluates arguments list with a function like eval_as_oplist (...) But I guess you are right, because this way, you couldn't use an argument list in the regular way inside a [...] which is problematic even if you don't have a reason to do so, because it doesn't lead to a better code. So [[. . .] . .] is a better code, I agree.
Rather than inventing your own "Lisp-like, but simpler" language, you should consider using an existing Lisp (or Scheme) implementation and embedding it in your C++ application.
Although designing your own language and then writing your own parser and interpreter for it is surely good fun, you will have hard time to come up with something better designed, more powerful and implemented more efficiently and robustly than, say, Scheme and it's numerous implementations.
Chibi Scheme: http://code.google.com/p/chibi-scheme/ is particularly well suited for embedding in C/C++ code, it's very small and fast.
I would suggest using Flex (possibly with Bison) or ANTLR, which has a C++ output target.
Since google is simpler than finding stuff on my own file server, here is someone else's example:
http://ragnermagalhaes.blogspot.com/2007/08/bison-lisp-grammar.html
This example has formatting problems (which can be resolved by viewing the HTML in a text editor) and only supports one type of list, but it should help you get started and certainly shows how to split the items into tokens.
I believe Boost.Spirit would be suitable for this task provided you could construct a PEG-compatible grammar for the language you're proposing. It's not obvious from the examples as to whether or not this is the case.
More specifically, Spirit has a generalized AST called utree, and there is example code for parsing symbolic expressions (ie lisp syntax) into utree.
You don't have to use utree in order to take advantage of Spirit's parsing and lexing capabilities, but you would have to have your own AST representation. Maybe that's what you want?

Regular Expression Equivalent for other data types used for Data Validation

I am creating a data quality framework for a database that looks at single cells of each data type and sees whether or not their values are acceptable.
For data type string:
I just use a regular expression to define what is valid
For other data types (Integer, Timestamp, Boolean, TimeDelta, Float, ... ):
I don't have any standard way of recording what is valid
Is there an equivalent to Regular Expressions for other data types? Like IntegerRegEx's?
For example, lets say I have a field that must contain numbers between 0 and 65535, or I have a field that can only contain odd numbers...
It would be nice if this IntegerRegEx was also a string (just like normal RegEx's), so I could store IntRegEx's and StringRegEx's in the same table.
Thanks in advance!
If you want something that's a string and regex-like, you could just use regexes. Just have a standard way of converting each type to a string, and write regexes against the string form. It might be awkward for some and error-prone for others, but it's simple and doesn't involve creating your own expression language or loading code straight from the db and evaling it.
I guess depending on what language you're programming in, say PHP, you could store a mathematical expression (a string), for example $x >= 0 && $x <= 65535 or $x % 2 == 1.
With regex, you would write something like this, right?
if (!preg_match($regexFromDb, $fieldValueFromDb)) {
// validation fails
}
So with mathematical expressions, you'd do the same thing, e.g.
$x = $fieldValueFromDb;
if (!eval("return $mathExprFromDb")) {
// validation fails
}
This is just exemplary code. Of course you should safeguard your code against the dangers of running arbitrary stored executable code, and also against gibberish expressions crashing your script.
I think this is as close as you're gonna get, because the "IntegerRegEx" you seek already has a name... Math. ;)

regex for name with capital letters in Xcode

I am trying to change all names that looks like this: thisForExample and change it with: this_for_example in Xcode with regex. Does anyone know how to do that?
I have tried with this: ([a-z][A-Z])*[a-z]?
but it does not find anything.
You just need to find [a-z][A-Z][a-z].
Automating the replacement process will be tricky though - how do you plan on changing an arbitrary upper case letter to its lower case equivalent ?
Perl would be a good tool for this (if you insist on using Regex)as it supports case modification in substitution patterns:
\l => change first char (of following match variable) to lower case
\L => change all chars (of following match variable) to lower case
\u => change single char (of following match variable) to upper case
\U => change all chars (of following match variable) to upper case
If all you care is to convert (simple/trivial!) variables and method names à la thisForExample into this_for_example.
For this a single regex like this would be sufficient:
echo 'thisForExample = orThisForExample();' \
| perl -pe 's/(?<=[^A-Z])([A-Z]+)(?=[^A-Z])/_\L\1/g;'
//output: "this_for_example = or_this_for_example();"
As soon however as you're expecting to come across (quite common) names like…
fooURL = URLString + URL + someURLFunction();
…you're in trouble. Deep trouble.
Let's see what our expression does with it:
echo 'fooURL = URLString + URL + someURLFunction();' \
| perl -pe 's/(?<=[^A-Z])([A-Z]+)(?=[^A-Z])/_\L\1/g;'
//output: "foo_url = _urlstring + _url + some_urlfunction();"
Pretty bad, huh?
And to make things even worse:
It is syntactically impossible to distinguish between a (quite common) variable name "URLString" and a class name "NSString".
Conclusion: Regex alone is pretty hack-ish and error prone for this kind of task. And simply unsufficient. And you don't want a single shell call to potentially mess up your entire code base, do you?
There is a reason why Xcode has a refactor tool that utilizes clang's grammar tree to differentiate between syntactically identical (pattern-wise at least) variable and class names.
This is a problem for context free languages, not regular languages. Hence regular expressions cannot deal with it. You'd need a contect free grammar to generate a language tree, etc. (and at that time you've just started building a compiler)
Also: Why use under_scores anyway? If you're using Xcode then you're probably coding in ObjC(++) or similar anyway, where it's common sense to use camelCase. I and probably pretty much everybody else would hate you for making us one day deal with your underscored ObjC/C/… code.
Alternative Answer:
In a comment answer to Paul R you said you were basically merging two projects, one with under_scored naming, one with camelCased naming.
I'd advise you then to switch your under_scored code base to camelCase. For two reasons:
Turning under_scored names into camelCased is way less error prone then vice versa. (That is: in a camelCase dominated environment only, of course! it would be just as error prone if you'd mainly deal with under_scored code in Xcode. Think of it as "there's simply less code to potentially break" ;) )
Quoting my own answer:
[…] I and probably pretty much
everybody else would hate you for making us one day deal with your
underscored ObjC/C/… code. […]
Here is a simple regex for converting under_score to camelCase:
echo 'this_for_example = _leading_under_score + or_this_for_example();' \
| perl -pe 's/(?<=[\w])_([\w])/\u\1/g;'
//output: "thisForExample = _leadingUnderScore + orThisForExample();"
Something like ([a-zA-Z][a-z]+)+?
The process could like this:
You get all the names to a file, there you automatically forge the replacement, and make (automatically) sed script to change one to another.

Why can't variable names start with numbers?

I was working with a new C++ developer a while back when he asked the question: "Why can't variable names start with numbers?"
I couldn't come up with an answer except that some numbers can have text in them (123456L, 123456U) and that wouldn't be possible if the compilers were thinking everything with some amount of alpha characters was a variable name.
Was that the right answer? Are there any more reasons?
string 2BeOrNot2Be = "that is the question"; // Why won't this compile?
Because then a string of digits would be a valid identifier as well as a valid number.
int 17 = 497;
int 42 = 6 * 9;
String 1111 = "Totally text";
Well think about this:
int 2d = 42;
double a = 2d;
What is a? 2.0? or 42?
Hint, if you don't get it, d after a number means the number before it is a double literal
It's a convention now, but it started out as a technical requirement.
In the old days, parsers of languages such as FORTRAN or BASIC did not require the uses of spaces. So, basically, the following are identical:
10 V1=100
20 PRINT V1
and
10V1=100
20PRINTV1
Now suppose that numeral prefixes were allowed. How would you interpret this?
101V=100
as
10 1V = 100
or as
101 V = 100
or as
1 01V = 100
So, this was made illegal.
Because backtracking is avoided in lexical analysis while compiling. A variable like:
Apple;
the compiler will know it's a identifier right away when it meets letter 'A'.
However a variable like:
123apple;
compiler won't be able to decide if it's a number or identifier until it hits 'a', and it needs backtracking as a result.
Compilers/parsers/lexical analyzers was a long, long time ago for me, but I think I remember there being difficulty in unambiguosly determining whether a numeric character in the compilation unit represented a literal or an identifier.
Languages where space is insignificant (like ALGOL and the original FORTRAN if I remember correctly) could not accept numbers to begin identifiers for that reason.
This goes way back - before special notations to denote storage or numeric base.
I agree it would be handy to allow identifiers to begin with a digit. One or two people have mentioned that you can get around this restriction by prepending an underscore to your identifier, but that's really ugly.
I think part of the problem comes from number literals such as 0xdeadbeef, which make it hard to come up with easy to remember rules for identifiers that can start with a digit. One way to do it might be to allow anything matching [A-Za-z_]+ that is NOT a keyword or number literal. The problem is that it would lead to weird things like 0xdeadpork being allowed, but not 0xdeadbeef. Ultimately, I think we should be fair to all meats :P.
When I was first learning C, I remember feeling the rules for variable names were arbitrary and restrictive. Worst of all, they were hard to remember, so I gave up trying to learn them. I just did what felt right, and it worked pretty well. Now that I've learned alot more, it doesn't seem so bad, and I finally got around to learning it right.
It's likely a decision that came for a few reasons, when you're parsing the token you only have to look at the first character to determine if it's an identifier or literal and then send it to the correct function for processing. So that's a performance optimization.
The other option would be to check if it's not a literal and leave the domain of identifiers to be the universe minus the literals. But to do this you would have to examine every character of every token to know how to classify it.
There is also the stylistic implications identifiers are supposed to be mnemonics so words are much easier to remember than numbers. When a lot of the original languages were being written setting the styles for the next few decades they weren't thinking about substituting "2" for "to".
Variable names cannot start with a digit, because it can cause some problems like below:
int a = 2;
int 2 = 5;
int c = 2 * a;
what is the value of c? is 4, or is 10!
another example:
float 5 = 25;
float b = 5.5;
is first 5 a number, or is an object (. operator)
There is a similar problem with second 5.
Maybe, there are some other reasons. So, we shouldn't use any digit in the beginnig of a variable name.
The restriction is arbitrary. Various Lisps permit symbol names to begin with numerals.
COBOL allows variables to begin with a digit.
Use of a digit to begin a variable name makes error checking during compilation or interpertation a lot more complicated.
Allowing use of variable names that began like a number would probably cause huge problems for the language designers. During source code parsing, whenever a compiler/interpreter encountered a token beginning with a digit where a variable name was expected, it would have to search through a huge, complicated set of rules to determine whether the token was really a variable, or an error. The added complexity added to the language parser may not justify this feature.
As far back as I can remember (about 40 years), I don't think that I have ever used a language that allowed use of a digit to begin variable names. I'm sure that this was done at least once. Maybe, someone here has actually seen this somewhere.
As several people have noticed, there is a lot of historical baggage about valid formats for variable names. And language designers are always influenced by what they know when they create new languages.
That said, pretty much all of the time a language doesn't allow variable names to begin with numbers is because those are the rules of the language design. Often it is because such a simple rule makes the parsing and lexing of the language vastly easier. Not all language designers know this is the real reason, though. Modern lexing tools help, because if you tried to define it as permissible, they will give you parsing conflicts.
OTOH, if your language has a uniquely identifiable character to herald variable names, it is possible to set it up for them to begin with a number. Similar rule variations can also be used to allow spaces in variable names. But the resulting language is likely to not to resemble any popular conventional language very much, if at all.
For an example of a fairly simple HTML templating language that does permit variables to begin with numbers and have embedded spaces, look at Qompose.
Because if you allowed keyword and identifier to begin with numberic characters, the lexer (part of the compiler) couldn't readily differentiate between the start of a numeric literal and a keyword without getting a whole lot more complicated (and slower).
C++ can't have it because the language designers made it a rule. If you were to create your own language, you could certainly allow it, but you would probably run into the same problems they did and decide not to allow it. Examples of variable names that would cause problems:
0x, 2d, 5555
One of the key problems about relaxing syntactic conventions is that it introduces cognitive dissonance into the coding process. How you think about your code could be deeply influenced by the lack of clarity this would introduce.
Wasn't it Dykstra who said that the "most important aspect of any tool is its effect on its user"?
The compiler has 7 phase as follows:
Lexical analysis
Syntax Analysis
Semantic Analysis
Intermediate Code Generation
Code Optimization
Code Generation
Symbol Table
Backtracking is avoided in the lexical analysis phase while compiling the piece of code. The variable like Apple, the compiler will know its an identifier right away when it meets letter ‘A’ character in the lexical Analysis phase. However, a variable like 123apple, the compiler won’t be able to decide if its a number or identifier until it hits ‘a’ and it needs backtracking to go in the lexical analysis phase to identify that it is a variable. But it is not supported in the compiler.
When you’re parsing the token you only have to look at the first character to determine if it’s an identifier or literal and then send it to the correct function for processing. So that’s a performance optimization.
Probably because it makes it easier for the human to tell whether it's a number or an identifier, and because of tradition. Having identifiers that could begin with a digit wouldn't complicate the lexical scans all that much.
Not all languages have forbidden identifiers beginning with a digit. In Forth, they could be numbers, and small integers were normally defined as Forth words (essentially identifiers), since it was faster to read "2" as a routine to push a 2 onto the stack than to recognize "2" as a number whose value was 2. (In processing input from the programmer or the disk block, the Forth system would split up the input according to spaces. It would try to look the token up in the dictionary to see if it was a defined word, and if not would attempt to translate it into a number, and if not would flag an error.)
Suppose you did allow symbol names to begin with numbers. Now suppose you want to name a variable 12345foobar. How would you differentiate this from 12345? It's actually not terribly difficult to do with a regular expression. The problem is actually one of performance. I can't really explain why this is in great detail, but it essentially boils down to the fact that differentiating 12345foobar from 12345 requires backtracking. This makes the regular expression non-deterministic.
There's a much better explanation of this here.
it is easy for a compiler to identify a variable using ASCII on memory location rather than number .
I think the simple answer is that it can, the restriction is language based. In C++ and many others it can't because the language doesn't support it. It's not built into the rules to allow that.
The question is akin to asking why can't the King move four spaces at a time in Chess? It's because in Chess that is an illegal move. Can it in another game sure. It just depends on the rules being played by.
Originally it was simply because it is easier to remember (you can give it more meaning) variable names as strings rather than numbers although numbers can be included within the string to enhance the meaning of the string or allow the use of the same variable name but have it designated as having a separate, but close meaning or context. For example loop1, loop2 etc would always let you know that you were in a loop and/or loop 2 was a loop within loop1.
Which would you prefer (has more meaning) as a variable: address or 1121298? Which is easier to remember?
However, if the language uses something to denote that it not just text or numbers (such as the $ in $address) it really shouldn't make a difference as that would tell the compiler that what follows is to be treated as a variable (in this case).
In any case it comes down to what the language designers want to use as the rules for their language.
The variable may be considered as a value also during compile time by the compiler
so the value may call the value again and again recursively
Backtracking is avoided in lexical analysis phase while compiling the piece of code. The variable like Apple; , the compiler will know its a identifier right away when it meets letter ‘A’ character in the lexical Analysis phase. However, a variable like 123apple; , compiler won’t be able to decide if its a number or identifier until it hits ‘a’ and it needs backtracking to go in the lexical analysis phase to identify that it is a variable. But it is not supported in compiler.
Reference
There could be nothing wrong with it when comes into declaring variable.but there is some ambiguity when it tries to use that variable somewhere else like this :
let 1 = "Hello world!"
print(1)
print(1)
print is a generic method that accepts all types of variable. so in that situation compiler does not know which (1) the programmer refers to : the 1 of integer value or the 1 that store a string value.
maybe better for compiler in this situation to allows to define something like that but when trying to use this ambiguous stuff, bring an error with correction capability to how gonna fix that error and clear this ambiguity.