How to build Jape rules in gate - gate

I need to build a rule where Lhs check if the first character of word beggin in b then check the whole word without the first character that found in lookup

This is a sample code for something similar to what you want(Copied from https://gate.ac.uk/wiki/jape-repository/strings.html#section-1.). You can read a little more and get to the exact solution:
Rule:GetMobile
(
{Phone}
):tag
-->
:tag{
// get the offsets
Long phoneStart = tagAnnots.firstNode().getOffset();
Long phoneEnd = tagAnnots.lastNode().getOffset();
// check the number is longer than or equal to 2 characters (just in case)
if(phoneEnd - phoneStart >= 2) {
try {
String firstTwoChars = doc.getContent()
.getContent(tagAnnots.firstNode().getOffset(),
tagAnnots.firstNode().getOffset() + 2).toString();
// check it matches 07
if("07".equals(firstTwoChars)) {
// create the new annotation
gate.FeatureMap features = Factory.newFeatureMap();
features.put("kind", "mobile");
outputAS.add(tagAS.firstNode(),
tagAS.lastNode(), "Phone", features);
}
}
catch(InvalidOffsetException e) {
// not possible
throw new LuckyException("Invalid offset from annotation");
}
}
}
Here are some places where you can read up:
https://gate.ac.uk/wiki/jape-repository/
https://gate.ac.uk/sale/talks/gate-course-jun14/module-1-jape/module-1-jape.pdf

Related

Generate string lexicographically larger than input

Given an input string A, is there a concise way to generate a string B that is lexicographically larger than A, i.e. A < B == true?
My raw solution would be to say:
B = A;
++B.back();
but in general this won't work because:
A might be empty
The last character of A may be close to wraparound, in which case the resulting character will have a smaller value i.e. B < A.
Adding an extra character every time is wasteful and will quickly in unreasonably large strings.
So I was wondering whether there's a standard library function that can help me here, or if there's a strategy that scales nicely when I want to start from an arbitrary string.
You can duplicate A into B then look at the final character. If the final character isn't the final character in your range, then you can simply increment it by one.
Otherwise you can look at last-1, last-2, last-3. If you get to the front of the list of chars, then append to the length.
Here is my dummy solution:
std::string make_greater_string(std::string const &input)
{
std::string ret{std::numeric_limits<
std::string::value_type>::min()};
if (!input.empty())
{
if (std::numeric_limits<std::string::value_type>::max()
== input.back())
{
ret = input + ret;
}
else
{
ret = input;
++ret.back();
}
}
return ret;
}
Ideally I'd hope to avoid the explicit handling of all special cases, and use some facility that can more naturally handle them. Already looking at the answer by #JosephLarson I see that I could increment more that the last character which would improve the range achievable without adding more characters.
And here's the refinement after the suggestions in this post:
std::string make_greater_string(std::string const &input)
{
constexpr char minC = ' ', maxC = '~';
// Working with limits was a pain,
// using ASCII typical limit values instead.
std::string ret{minC};
auto rit = input.rbegin();
while (rit != input.rend())
{
if (maxC == *rit)
{
++rit;
if (rit == input.rend())
{
ret = input + ret;
break;
}
}
else
{
ret = input;
++(*(ret.rbegin() + std::distance(input.rbegin(), rit)));
break;
}
}
return ret;
}
Demo
You can copy the string and append some letters - this will produce a lexicographically larger result.
B = A + "a"

Regex.IsMatch for only letters and numbers [duplicate]

How can I validate a string using Regular Expressions to only allow alphanumeric characters in it?
(I don't want to allow for any spaces either).
In .NET 4.0 you can use LINQ:
if (yourText.All(char.IsLetterOrDigit))
{
//just letters and digits.
}
yourText.All will stop execute and return false the first time char.IsLetterOrDigit reports false since the contract of All cannot be fulfilled then.
Note! this answer do not strictly check alphanumerics (which typically is A-Z, a-z and 0-9). This answer allows local characters like åäö.
Update 2018-01-29
The syntax above only works when you use a single method that has a single argument of the correct type (in this case char).
To use multiple conditions, you need to write like this:
if (yourText.All(x => char.IsLetterOrDigit(x) || char.IsWhiteSpace(x)))
{
}
Use the following expression:
^[a-zA-Z0-9]*$
ie:
using System.Text.RegularExpressions;
Regex r = new Regex("^[a-zA-Z0-9]*$");
if (r.IsMatch(SomeString)) {
...
}
You could do it easily with an extension function rather than a regex ...
public static bool IsAlphaNum(this string str)
{
if (string.IsNullOrEmpty(str))
return false;
for (int i = 0; i < str.Length; i++)
{
if (!(char.IsLetter(str[i])) && (!(char.IsNumber(str[i]))))
return false;
}
return true;
}
Per comment :) ...
public static bool IsAlphaNum(this string str)
{
if (string.IsNullOrEmpty(str))
return false;
return (str.ToCharArray().All(c => Char.IsLetter(c) || Char.IsNumber(c)));
}
While I think the regex-based solution is probably the way I'd go, I'd be tempted to encapsulate this in a type.
public class AlphaNumericString
{
public AlphaNumericString(string s)
{
Regex r = new Regex("^[a-zA-Z0-9]*$");
if (r.IsMatch(s))
{
value = s;
}
else
{
throw new ArgumentException("Only alphanumeric characters may be used");
}
}
private string value;
static public implicit operator string(AlphaNumericString s)
{
return s.value;
}
}
Now, when you need a validated string, you can have the method signature require an AlphaNumericString, and know that if you get one, it is valid (apart from nulls). If someone attempts to pass in a non-validated string, it will generate a compiler error.
You can get fancier and implement all of the equality operators, or an explicit cast to AlphaNumericString from plain ol' string, if you care.
I needed to check for A-Z, a-z, 0-9; without a regex (even though the OP asks for regex).
Blending various answers and comments here, and discussion from https://stackoverflow.com/a/9975693/292060, this tests for letter or digit, avoiding other language letters, and avoiding other numbers such as fraction characters.
if (!String.IsNullOrEmpty(testString)
&& testString.All(c => Char.IsLetterOrDigit(c) && (c < 128)))
{
// Alphanumeric.
}
^\w+$ will allow a-zA-Z0-9_
Use ^[a-zA-Z0-9]+$ to disallow underscore.
Note that both of these require the string not to be empty. Using * instead of + allows empty strings.
Same answer as here.
If you want a non-regex ASCII A-z 0-9 check, you cannot use char.IsLetterOrDigit() as that includes other Unicode characters.
What you can do is check the character code ranges.
48 -> 57 are numerics
65 -> 90 are capital letters
97 -> 122 are lower case letters
The following is a bit more verbose, but it's for ease of understanding rather than for code golf.
public static bool IsAsciiAlphaNumeric(this string str)
{
if (string.IsNullOrEmpty(str))
{
return false;
}
for (int i = 0; i < str.Length; i++)
{
if (str[i] < 48) // Numeric are 48 -> 57
{
return false;
}
if (str[i] > 57 && str[i] < 65) // Capitals are 65 -> 90
{
return false;
}
if (str[i] > 90 && str[i] < 97) // Lowers are 97 -> 122
{
return false;
}
if (str[i] > 122)
{
return false;
}
}
return true;
}
In order to check if the string is both a combination of letters and digits, you can re-write #jgauffin answer as follows using .NET 4.0 and LINQ:
if(!string.IsNullOrWhiteSpace(yourText) &&
yourText.Any(char.IsLetter) && yourText.Any(char.IsDigit))
{
// do something here
}
Based on cletus's answer you may create new extension.
public static class StringExtensions
{
public static bool IsAlphaNumeric(this string str)
{
if (string.IsNullOrEmpty(str))
return false;
Regex r = new Regex("^[a-zA-Z0-9]*$");
return r.IsMatch(str);
}
}
While there are many ways to skin this cat, I prefer to wrap such code into reusable extension methods that make it trivial to do going forward. When using extension methods, you can also avoid RegEx as it is slower than a direct character check. I like using the extensions in the Extensions.cs NuGet package. It makes this check as simple as:
Add the https://www.nuget.org/packages/Extensions.cs package to your project.
Add "using Extensions;" to the top of your code.
"smith23".IsAlphaNumeric() will return True whereas "smith 23".IsAlphaNumeric(false) will return False. By default the .IsAlphaNumeric() method ignores spaces, but it can also be overridden as shown above. If you want to allow spaces such that "smith 23".IsAlphaNumeric() will return True, simple default the arg.
Every other check in the rest of the code is simply MyString.IsAlphaNumeric().
12 years and 7 months later, if anyone comes across this article nowadays.
Compiled RegEx actually has the best performance in .NET 5 and .NET 6
Please look at the following link where I compare several different answers given on this question. Mainly comparing Compiled RegEx, For-Loops, and Linq Predicates: https://dotnetfiddle.net/WOPQRT
Notes:
As stated, this method is only faster in .NET 5 and .NET 6.
.NET Core 3.1 and below show RegEx being the slowest.
Regardless of the version of .NET, the For-Loop method is consistently faster than the Linq Predicate.
I advise to not depend on ready made and built in code in .NET framework , try to bring up new solution ..this is what i do..
public bool isAlphaNumeric(string N)
{
bool YesNumeric = false;
bool YesAlpha = false;
bool BothStatus = false;
for (int i = 0; i < N.Length; i++)
{
if (char.IsLetter(N[i]) )
YesAlpha=true;
if (char.IsNumber(N[i]))
YesNumeric = true;
}
if (YesAlpha==true && YesNumeric==true)
{
BothStatus = true;
}
else
{
BothStatus = false;
}
return BothStatus;
}

C++ Windows Form - If statements

I'm trying to make a password strength checker, at the moment i've got it setup so that if 'password' is typed into the password field then the strength goes red, and that if you type anything else it goes green
I've done this using the following if statement:
try{
if (password_textbox_form3->Text == "password")
{
strength_color_textbox->BackColor = Color::Red;
}
else
{
strength_color_textbox->BackColor = Color::Green;
}
}
catch (Exception^ )
{
strength_color_textbox->BackColor = Color::Black;
}
What i'm trying to do now and what i'm stuck on, is how to create a field called passwordscore that goes through a list of if statements and adds 10 if for example the password they have entered has more than 8 chars, and then from this score I can change the color of the strength box (red to green) that way
String ^ strength = password_textbox_form3->Text; //makes whatever the user enters in pw tb now called string
int passwordscore=0;
while // some sort of while loop to increment passwordscore? //passwordscore=passwordscore+1;
try{
if (strength //contains more than 8 characters)
{
//passwordscore +10
}
if (strength //contains a special character !"£$%^&*)
{
//password score +10
}
if (passwordscore <=10)
{
strength_color_textbox->BackColor = Color::Red;
}
if (passwordscore <=20)
{
strength_colour_textbox->BackColor = Color::Green;
}
I've started by assigning the contents of the password textbox to a string called strength (i think) and then got stuck on the IF statements such as how to see if strength has more than 8 characters etc
Any help or direction is appreciated, thanks
EDIT - found this from MSDN but I think it's in C#, can't be that much different to what i'm trying to do?
String ^ strength = password_textbox_form3->Text;
int numberOfDigits = 0;
int numberOfLetters = 0;
int numberOfSymbols = 0;
foreach (char c in strength)
{
if (char.IsDigit(c))
{
numberOfDigits++;
}
else if (char.IsLetter(c))
{
numberOfLetters++;
}
else if(char.IsSymbol(c))
{
numberOfSymbols++;
}
}
Take in the password as characters, and count the number of characters in the password form so that if the number of characters is >= 8 you can set the strength points to ten. Additionally you can use strings and put individual characters into a vector, and use the vector's index to count the # of characters.
EDIT TO FIRST EDIT:
Just to explain the new code posted:
A character can be either a alphabetical character (a,b,c) a number(1,2,3) or a symbol(+*^) obviously.
In the code they use one general FOREACH statement to contain three other if statements in which the character is checked to see if it is an alphabetical char a num or a symbol using the std library functions IsDigit IsSymbol IsLetter.
It adds one to the appropriate, initially declared variables whenever a character qualifies as one of the three categories.
For your purpose, you could use a similar technique but declare an int Pw_Str and Total_Char and add an if statement to increase Total_Char as necessary. When Total_Char exceeds 8 you can add 10 to Pw_Str as required and change the color using the Pw_Str variable.
To make any such code more compact instead of using if statements over and over i would suggest using a FOR loop to wind through each character and to add to the necessary variables.

Match a structure against set of patterns

I need to match a structure against set of patterns and take some action for each match.
Patterns should support wildcards and i need to determine which patterns is matching incoming structure, example set:
action=new_user email=*
action=del_user email=*
action=* email=*#gmail.com
action=new_user email=*#hotmail.com
Those patterns can be added/removed at realtime. There can be thousands connections, each have its own pattern and i need to notify each connection about I have received A structure which is matching. Patterns are not fully regex, i just need to match a string with wildcards * (which simple match any number of characters).
When server receives message (lets call it message A) with structure action=new_user email=testuser#gmail.com and i need to find out that patterns 1 and 3 are matching this message, then i should perform action for each pattern that match (send this structure A to corresponding connection).
How this can be done with most effecient way? I can iterate this patterns and check one-by-one, but im looking for more effecient and thread-safe way to do this. Probably its possible to group those patterns to reduce checking.. Any suggestions how this can be done?
UPD: Please note i want match multiplie patterns(thousands) aganst fixed "string"(actually a struct), not vice versa. In other words, i want to find which patterns are fitting into given structure A.
Convert the patterns to regular expressions, and match them using RE2, which is written in C++ and is one of the fastest.
Actually, if I understood correctly, the fourth pattern is redundant, since the first pattern is more general, and includes every string that is matched by the fourth. That leaves only 3 patterns, which can be easly checked by this function:
bool matches(const char* name, const char* email)
{
return strstr(name, "new_user") || strstr(name, "del_user") || strstr(email, "#gmail.com");
}
And if you prefer to parse whole string, not just match the values of action and email, then the following function should do the trick:
bool matches2(const char* str)
{
bool match = strstr(str, "action=new_user ") || strstr(str, "action=del_user ");
if (!match)
{
const char* emailPtr = strstr(str, "email=");
if (emailPtr)
{
match = strstr(emailPtr, "#gmail.com");
}
}
return match;
}
Note that the strings you put as arguments must be escaped with \0. You can read about strstr function here.
This strglobmatch supports * and ? only.
#include <string.h> /* memcmp, index */
char* strfixstr(char *s1, char *needle, int needle_len) {
int l1;
if (!needle_len) return (char *) s1;
if (needle_len==1) return index(s1, needle[0]);
l1 = strlen(s1);
while (l1 >= needle_len) {
l1--;
if (0==memcmp(s1,needle,needle_len)) return (char *) s1;
s1++;
}
return 0;
}
int strglobmatch(char *str, char *glob) {
/* Test: strglobmatch("almamxyz","?lmam*??") */
int min;
while (glob[0]!='\0') {
if (glob[0]!='*') {
if ((glob[0]=='?') ? (str[0]=='\0') : (str[0]!=glob[0])) return 0;
glob++; str++;
} else { /* a greedy search is adequate here */
min=0;
while (glob[0]=='*' || glob[0]=='?') min+= *glob++=='?';
while (min--!=0) if (*str++=='\0') return 0;
min=0; while (glob[0]!='*' && glob[0]!='?' && glob[0]!='\0') { glob++; min++; }
if (min==0) return 1; /* glob ends with star */
if (!(str=strfixstr(str, glob-min, min))) return 0;
str+=min;
}
}
return str[0]=='\0';
}
If all you want is wildcart matching, then you might try this algorithm. The point is to check all substrings that is not a wildcart to be subsequent in a string.
patterns = ["*#gmail.com", "akalenuk#*", "a*a#*", "ak*#gmail.*", "ak*#hotmail.*", "*#*.ua"]
string = "akalenuk#gmail.com"
preprocessed_patterns = [p.split('*') for p in patterns]
def match(s, pp):
i = 0
for w in pp:
wi = s.find(w, i)
if wi == -1:
return False
i = wi+len(w)
return i == len(s) or pp[-1] == ''
print [match(string, pp) for pp in preprocessed_patterns]
But it might be best to still use regexp in case you would need something more than a wildcart in a future.

What is the most efficient way to check if a string is part of a bigger string?

I have a string which is formed by concatenation of IP addresses, for example:
"127.272.1.43;27.27.1.43;127.127.27.67;128.27.1.43;127.20.1.43;111.27.1.43;127.27.1.43;"
When a new IP address is given, I need to check if the first half of the IP is part of the IP address string. For example, if "127.27.123.23" is given I need to find if any of the IP address in the string starts with "127.27"
I have the following code, where userIP = "127.27."
int i = StringUtils.indexOf(dbIPString, userIP);
do {
if (i > 0) {
char ch = dbIPString.charAt(i - 1);
if (ch == ';') {
System.out.println("IP is present in db");
break;
} else {
i = StringUtils.indexOf(dbIPString, userIP, i);
}
} else if (i == 0) {
System.out.println("IP is present in db");
break;
} else {
System.out.println("IP is not present in db");
}
} while (i >= 0);
Can it be more efficient? Or can I use regular expression? Which one is more efficient?
Plain string matches are usually faster than regex matches. I'd keep it simple and do something like this:
if (StringUtils.startsWith(dbIPString, userIP)) {
... // prefix is present
} else if (StringUtils.indexOf(dbIPString, ";" + userIP) > 0) {
... // prefix is present
} else {
... // prefix is not present
}
If you can arrange to have the list always begin with a ';' then searching the first entry would no longer be a special case and the logic can be simplified.
If the list will be large and you're going to be doing a lot of these searches and speed really matters then perhaps you could add each prefix to some sort of hash or tree as you build the list of addresses. Lookups in those data structures should be faster than string matches.
Assuming that you only care for entire IP address matches, and assuming you don't want 127.255.1.43 to match when you're looking for 127.25, then
(?<=^|;)127\.25\.\d+\.\d+
would be a fitting regex.
In Java:
Pattern regex = Pattern.compile(
"(?<=^|;) # Assert position at the start of the string or after ;\n" +
Pattern.quote(userIP) +
"\\.\\d+\\.\\d+ # Match .nnn.nnn",
Pattern.COMMENTS);