How to create conditional if statement based on value + wildcard in Python? - python-2.7

I have a string that may be either:
my_string = "part1"
or:
my_string = "part1/part2"
I need to handle each of the above scenarios conditionally ie (pseudo code):
if my_string = "part1/" + *:
# do this
where * could be any value.
Once I can catch this condition, I will split my_string and assign the second part of the path to a new variable ie:
my_new_string = my_string.split("/")[1]
Is it possible to set up this sort of 'wildcard'?
Edit:
Actually, I just realised I could probably do something like:
if "/" in my_string:
my_new_string = my_string.split("/")[1]
I'd still be interested to know about whether such a 'wildcard' operation exists.

Well, you can always use Regular Expressions to match the condition see: http://docs.python.org/2/library/re.html#re.match
re.match(r'part1/.+', your_string)
Note the + instead of the * to make sure a string follows after the /

Related

Invalid regular expression - Invalid property name in character class

I am using a fastify server, containing a typescript file that calls a function, which make sure people won't send unwanted characters. Here is the function :
const SAFE_STRING_REPLACE_REGEXP = /[^\p{Latin}\p{Zs}\p{M}\p{Nd}\-\'\s]/gu;
function secure(text:string) {
return text.replace(SAFE_STRING_REPLACE_REGEXP, "").trim();
}
But when I try to launch my server, I got an error message :
"Invalid regular expression - Invalid property name in character class".
It used to work just fine with my previous regex :
const SAFE_STRING_REPLACE_REGEXP = /[^0-9a-zA-ZàáâäãåąčćęèéêëėįìíîïłńòóôöõøùúûüųūÿýżźñçčšžÀÁÂÄÃÅĄĆČĖĘÈÉÊËÌÍÎÏĮŁŃÒÓÔÖÕØÙÚÛÜŲŪŸÝŻŹÑßÇŒÆČŠŽ∂ð\-\s\']/g;
function secure(text:string) {
return text.replace(SAFE_STRING_REPLACE_REGEXP, "").trim();
}
But I have been told it wasn't optimized enough. I have also been told it's better to use split/join than regex/replace in matter of performances, but I don't know if I can use it in my case.
You need to use
const SAFE_STRING_REPLACE_REGEXP = /[^\p{Script=Latin}\p{Zs}\p{M}\p{Nd}'\s-]/gu;
// or
const SAFE_STRING_REPLACE_REGEXP = /[^\p{sc=Latin}\p{Zs}\p{M}\p{Nd}'\s-]/gu;
You need to prefix scripts with sc= or Script= in Unicode category classes, so \p{Latin} should be specified as \p{Script=Latin}. See the ECMAScript reference.
Also, when you use the u flag, you cannot escape non-special chars, so do not escape ' and better move the - char to the end of the character class.
You can use split&join, too:
const SAFE_STRING_REPLACE_REGEXP = /[^\p{Script=Latin}\p{Zs}\p{M}\p{Nd}'\s-]/u;
console.log("Ącki-Łał русский!!!中国".split(SAFE_STRING_REPLACE_REGEXP).join(""))
Note you don't need the g modifier with split, it is the default behavior.

Efficient C++ way of giving literal meaning to special symbols (") in a C++ string

I want to put this:
<script src = "Script2.js" type = "text/javascript"> < / script>
in a std::string so I append a (\) symbol before every double quotes (") to give it a literal meaning of ", instead of a string demarcation in C++ like this:
std::string jsFilesImport = "<script src = \"Script2.js\" type = \"text/javascript\"> < / script>\""
If I have a big string with many ("), adding (\) for every (") becomes difficult. Is there a simple way to achieve this in C++?
The easiest way is to use a raw string literal:
std::string s = R"x(<script src = "Script2.js" type = "text/javascript"> < / script>)x";
// ^^^^ ^^^
You just need to take care that the x( )x delimiters aren't used in the text provided inside. All other characters appearing within these delimiters are rendered as is (including newlines). x is arbitrarily chosen, could be something(, )somethingas well.
To prevent further questions regarding this:
No, it's not possible to do1 something like
std::string s = R"resource(
#include "MyRawTextResource.txt"
)resource";
The preprocessor won't recognize the #include "MyRawTextResource.txt" statement, because it's enclosed within a pair of double quotes (").
For such case consider to introduce your custom pre-build step doing the text replacement before compilation.
1)At least I wasn't able to find a workaround. Appreciate if someone proves me wrong about that.

Swift overriding + with multiple types and long expressions

I am trying to make a convenience class for building complex regular expressions in swift. This part works as intended:
/**
A RegexAtom contains a regular expression pattern, or fragment of a pattern. Capture groups can be named with the groupNames array.
RegexAtom does no syntax checking on the pattern. Typical usage is to define several fragments of a regex pattern, and then combine
them using finalPattern = foo + bar + soom
*/
public class RegexAtom{
var regex: String = ""
var groupNames: [String] = []
public init(regex: String, groupNames:[String]){
self.regex = regex
self.groupNames = groupNames
}
public init(regex: String){
self.regex = regex
self.groupNames = []
}
}
func +(left: RegexAtom, right: RegexAtom) -> RegexAtom{
var foo = RegexAtom(regex: left.regex,groupNames: left.groupNames)
foo.regex += right.regex
foo.groupNames.extend(right.groupNames)
return foo
}
I can add complex RegexAtoms using the + , and it complies and runs just fine.
The problem comes from wanting to be able to handle literal strings a bit more compactly, as is done in PyParsing (I miss PyParsing :( )
this function kind of works:
func +(left:RegexAtom, right: String)->RegexAtom{
var foo = RegexAtom(regex: left.regex,groupNames: left.groupNames)
foo.regex.extend(right)
return foo
}
but if I try to build up a complex regex like this
let WS: RegexAtom = RegexAtom(regex:"\\s*")
let tableHeader1 = RegexAtom(regex: "Date") + WS + "Flight" + WS + "Depart" + WS + "Arrive" + WS + "Eq" + WS + "Blk" + WS + "Grnd" + WS + "Blk" + WS + "Duty" + WS + "Cred"
I get a compile error:
"Expression was too complex to be solved in a reasonable time. Consider breaking up the expression into distinct sub-expressions."
But this works fine:
let tableHeader1 = RegexAtom(regex: "Date") + WS + "Flight"
The longer expression doesn't seem ridiculously long to me, but I have minimal experience in Xcode or swift.
Any suggestions?
There's a few things here that are combining together to cause your problem.
Firstly, this kind of thing is best avoided:
func +(left: RegexAtom, right: String) -> RegexAtom { ... }
for two reasons. This kind of silent type coercion is a bit frowned-upon in Swift. For example, there is no standard library operator that allows you to add an integer to a float. And people tend to assume + is commutative (i.e. a + b == b + a), so you'd need to define a version where left is a string and right is a regex.
The second issue is that "thing" is not a String. It's a string literal, which can create any type that implements StringLiteralConvertible. The default type is a String, but it could also be a Character or a StaticString or any other custom type.
Swift's type inference engine is pretty accommodating – it will try hard to find a possible match for all the types you've thrown into an expression. And then if there are multiple matches, it'll try to pick the best one based on a bunch of precedence rules. But in the big expression, there's just too many possibilities, and investigating them all is a combinatoric nightmare, so it just gives up.
Think about your simple example. The interpretation Swift is picking is this:
let tableHeader1 = (RegexAtom(regex: "Date") + WS) + ("Flight" as String)
This is because + is left associative and string literals default to String. But it also had to consider some other alternatives:
// this will have been considered and rejected, because there's no operator
// that adds characters to regexes
let tableHeader1 = (RegexAtom(regex: "Date") + WS) + ("Flight" as Character)
// this will compile, but the other version is preferred because + is left-
// associative. But if the first version resulted in an expression that
// didn't compile (say a regex plus a regex equals an integer), it would
// fall back to this possibility
let tableHeader1 = RegexAtom(regex: "Date") + (WS + ("Flight" as String))
Now, imagine the combinatorial explosion of possibilities that your extra-long expression of plusses implies.
As an alternative, you could consider ditching the +(RegexAtom,String) operator, and instead making RegexAtom conform to StringLiteralConvertible:
extension RegexAtom: StringLiteralConvertible {
public init(stringLiteral: String) {
self.init(regex: stringLiteral)
}
public init(extendedGraphemeClusterLiteral: String) {
self.init(regex: extendedGraphemeClusterLiteral)
}
public init(unicodeScalarLiteral: String) {
self.init(regex: unicodeScalarLiteral)
}
}
If you do this, it looks like your complex expression will compile OK.
In some sense, this is more of a Swift bug than anything of your doing. If you can roll it up into a minimal example to attach to a bug report I'm sure it'd help Apple sort out cases like this.
Meanwhile, it's probably better to use the join function for cases like this anyway. If you had a bunch of strings you wanted to concat with the same separator, you'd write something like:
join(" and ", ["lions", "tigers", "bears"])
// returns "lions and tigers and bears", oh my!
If you can make your RegexAtom type comply with the ExtensibleCollectionType and StringLiteralConvertible protocols, you'll be able to write something like:
let WS: RegexAtom = RegexAtom(regex:"\\s*")
let tableHeader1 = join(WS, ["Date", "Flight", /*etc*/ ] as [RegexAtom])
The as trick above is one way to get the compiler to type-infer an array of string literals as an array of RegexAtom. Here are a couple more:
let a: [RegexAtom] = ["a", "b"] // infer via type declaration
let b = [RegexAtom](["a", "b"]) // infer via array constructor
let c = [RegexAtom("a"), "b"] // infer from first element
What's presumably going on here is that every + operator in that long expression adds depth to the type-checking process. For example consider the expression a + b + c + d: in order to perform type inference and select the right + functions to use, the compiler has to look everything in the expression and decide whether it looks like this:
+ ——— + ——— String
\ \——— String
\—— + ——— String
\——— String
Or like this:
+ ——— + ——— RegExAtom
\ \——— RegExAtom
\—— + ——— RegExAtom
\——— RegExAtom
Or like some other variation involving any number of other types (arrays, numbers) or values whose type is indeterminate (like literals, whose type is decided by the expression they're used in via protocols like StringLiteralConvertible). There are a lot of possible types and conversions available just for one a + b operation, and every time you chain another operator you exponentially increase the complexity of type checking.
Nobody wants the compiler to hang indefinitely while trying to evaluate an expression, so if the type-checking operation starts taking too long, it just gives up. There are probably ways Apple can improve the efficiency here so that operations that are linguistically legal can be practically supported, so file bugs and you can help them catch the ugly performance cases.
(If you want to better understand the theory going into things like this, read some books with dragons on the covers.)

How to concatenate string in ROOT,c++?

I want to concatenate two string and I did in my program like String Filename = name+ "" + extension where extension is an integer value that I read just above this line and name is the path that is already defined.
But in ROOT I am getting error like Error: + illegal operator for pointer 1
What went wrong here? Is there any other method?
If extension is an integer, then convert it to a string first.
std::string Filename = name+ "" + std::to_string(extension);
+""+ does nothing, btw
The TString class in ROOT has a function called "Format" which you can use to concatenate strings the same way you format a print statement. Here is the documentation for the Format method: https://root.cern.ch/root/html/TString.html#TString:Format
and here is the documentation for how the formatting works http://www.cplusplus.com/reference/cstdio/printf/
I'm going to go ahead and assume the 'name' is a char*.
Char const* name = "john";
Char const* space = " ";
Here name and space are 2 pointers to character arrays.
When you add try to add these 2 together, the compiler tries to add the value of the 2 pointer together. This makes no sense to the compiler. You can obviously only add an offset to a pointer.
The solution to this is to make sure that one of the 2 things you are adding is a std::string and not 'c string'.

Regular Expression for removing suffix

What is the regular expression for removing the suffix of file names? For example, if I have a file name in a string such as "vnb.txt", what is the regular expression to remove ".txt"?
Thanks.
Do you really need a regular expression to do this? Why not just look for the last period in the string, and trim the string up to that point? Frankly, there's a lot of overhead for a regular expression, and I don't think you need it in this case.
As suggested by tstenner, you can try one of the following, depending on what kinds of strings you're using:
std::strrchr
std::string::find_last_of
First example:
char* str = "Directory/file.txt";
size_t index;
char* pStr = strrchr(str,'.');
if(nullptr != pStr)
{
index = pStr - str;
}
Second example:
int index = string("Directory/file.txt").find_last_of('.');
If you are using Qt already, you could use QFileInfo, and use the baseName() function to get just the name (if one exists), or the suffix() function to get the extension (if one exists).
If you're looking for a solution that will give you anything except for the suffix, you should use string::find_last_of.
Your code could look like this:
const std::string removesuffix(const std::string& s) {
size_t suffixbegin = s.find_last_of('.');
//This will handle cases like "directory.foo/bar"
size_t dir = s.find_last_of('/');
if(dir != std::string::npos && dir > suffixbegin) return s;
if(suffixbegin == std::string::npos) return s;
else return s.substr(0,suffixbegin);
}
If you're looking for a regular expression, use \.[^.]+$.
You have to escape the first ., otherwise it will match any character, and put a $ at the end, so it will only match at the end of a string.
Different operating systems may allow different characters in filenams, the simplest regex might be (.+)\.txt$. Get the first capture group to get the filename sans extension.