How to validate accented characters with coffeescript regex? - regex

I need to validate alphabetical characters in a text field. What I have now works fine, but there is a catch, I need to allow accented characters (like āēīūčļ) and on a Latvian keyboard these are obtained by typing the singlequote first ('c -> č), so my validator fails is the user types the singlequote and a disallowed character like a number, obtaining '1.
I have this coffeescript-flavor jQuery webpage text entry field validator that only allows alphabetical characters (for entering a name).
allowAlphabeticalEntriesOnly = (target) ->
target.keypress (e) ->
regex = new RegExp("[a-zA-Z]")
str = String.fromCharCode((if not e.charCode then e.which else e.charCode))
return true if regex.test(str)
e.preventDefault()
false
And it gets called with:
allowAlphabeticalEntriesOnly $("#user_name_text")
The code and regex work fine, denying input of most everything except small and large letters and the singlequote, where things get tricky.
Is there a way to allow accented characters with the singlequote layout, but deny entry of forbidden characters after the quote?
EDIT: If all else fails, one can implement back-end invalid character deletion a-la .gsub(/[^a-zA-Z\u00C0-\u017F]/, ''), which is what I ended up doing

Try using [a-zA-Z\u00C0-\u017F] to match a-z and accented characters (all characters within unicode range specified).
See: Matching accented characters with Javascript regexes

Related

What is the regular expression for all pages except "/"?

I am using NextAuth for Next.js for session management. In addition, I am using the middleware.js to protect my routes from unauthenticated users.
According to https://nextjs.org/docs/advanced-features/middleware#matcher,
if we want to exclude a path, we do something like
export const config = {
matcher: [
/*
* Match all request paths except for the ones starting with:
* - api (API routes)
* - static (static files)
* - favicon.ico (favicon file)
*/
'/((?!api|static|favicon.ico).*)',
],
}
In this example, we exclude /api, /static,/favicon.icon. However, I want to exclude all path except the home page, "/". What is the regular expression for that? I am tried '/(*)'. It doesn't seem to work.
The regular expression which matches everything but a specific one-character string / is constructed as follows:
we need to match the empty string: empty regex.
we need to match all strings two characters long or longer: ..+
we need to match one-character strings which are not that character: [^/].
Combining these three together with the | branching operator: "|..+|[^/]".
If we are using a regular expression tool that performs substring searching rather than a full match, we need to use its anchoring features; perhaps it supports the ^ and $ notation for that: "^(|..+|[^/])$".
I'm guessing that you might not want to match empty strings; in which case, revise your requirement and drop that branch from the expression.
Suppose we wanted to match all strings, except for a specific fixed word like abc. Without negation support in the regex language, we can use a generalization of the above trick.
Match the empty string, like before, if desired.
Match all one-character strings: .
Match all two-character strings: ..
Match all strings longer than three characters: ....+
Those simple cases taken care of, we focus on matching just those three-symbol strings that are not abc. How can we do that?
Match all three-character strings that don't start with a: [^a]...
Match all three-character strings that don't have a b in the middle: .[^b].
Match all three-character strings that don't end in c: ..[^c].
Combine it all together: "|.|..|....+|[^a]..|.[^b].|..[^c]".
For longer words, we might want to take advantage of the {m,n} notation, if available, to express "match from zero to nine characters" and "match eleven or more characters".
I will need to exclude the signin page and register page as well. Because, it will cause an infinite loop and an error, if you don't exclude signin page. For register page, you won't be able to register if you are redirected to the signin page.
So the "/", "/auth/signin", and "/auth/register" will be excluded. Here is what I needed:
export const config = {
matcher: [
'/((?!auth).*)(.+)'
]
}

How to execute a structured query containing symbols in AWS Cloudsearch

I'm trying to execute a structured prefix query in Cloudsearch.
Here's a snippet of the query args (csattribute is of type text)
{
"query": "(prefix field=csattribute '12-3')",
"queryParser": "structured",
"size": 5
}
My above query will result in No matches for "(prefix field=csattribute '12-3')".
However, if I change my query to
{
"query": "(prefix field=csattribute '12')",
"queryParser": "structured",
"size": 5
}
Then I will get a list of results I expect.
I haven't found much in my brief googling. How do I include the - in the query? Does it need to be escaped? Are there other characters that need to be escaped?
I got pointed to the right direction via this SO question: How To search special symbols AWS Search
Below is a snippet from https://docs.aws.amazon.com/cloudsearch/latest/developerguide/text-processing.html
Text Processing in Amazon CloudSearch ... During tokenization, the
stream of text in a field is split into separate tokens on detectable
boundaries using the word break rules defined in the Unicode Text
Segmentation algorithm.
According to the word break rules, strings separated by whitespace
such as spaces and tabs are treated as separate tokens. In many cases,
punctuation is dropped and treated as whitespace. For example, strings
are split at hyphens (-) and the at symbol (#). However, periods that
are not followed by whitespace are considered part of the token.
From what I understand, text and text-array fields are tokenized based upon the analysis scheme (in my case it's english). The text was tokenized, and the - symbol is a word break token.
This field doesn't need to be tokenized. Updating the index type to literal prevents all tokenization on the field, which allows the query in my question to return the expected results.

How do i get password value in data connection using regex

I need to grab the value of password from db connection string using regex.
This is my current regex .*;(?i)Password=([^;]*).
This works only if there is not any ; character in password.
add key="myKey" value="Data Source=MyDataSource;Initial Catalog=MyDB;User ID=test-user;Password=pA&-pass; unicode=True"
But it fails if there is ; character in password
add key="myKey" value="Data Source=MyDataSource;Initial Catalog=MyDB;User ID=test-user;Password=pass>; unicode=True"
Brief
There will always be ways for your code to break since someone can create a password such as ;Password= such that your string is actually ;Password=;Password=;.
Assuming that is not possible (and also assuming it's not possible for someone to use similar variations such as portions of the password being in the following format ;s= where s is any word or space character), this should work for you.
Code
See regex in use here
(?<=;Password=)(?:(?!;[\w ]+=).)*
Results
Input
add key="myKey" value="Data Source=MyDataSource;Initial Catalog=MyDB;User ID=test-user;Password=pA&-pass; unicode=True"
add key="myKey" value="Data Source=MyDataSource;Initial Catalog=MyDB;User ID=test-user;Password=pass>; unicode=True"
Output
pA&-pass
pass>
Explanation
(?<=;Password=) Positive lookbehind ensuring what precedes matches ;Password literally
(?:(?!;[\w ]+=).)* Tempered greedy token matching any character, but ensuring it doesn't match ;, followed by any word or space characters, followed by =

CodeIgniter - Does 'name' regex match in Cart class support Unicode?

I want to include unicode characters (to be more specific, Tamil words) in the 'name' of my Code Igniter cart. I found this example. I tried the following, so that the regex could match anything:
$this->cart->product_name_rules = '.+';
$this->cart->product_name_rules = '.*';
$this->cart->product_name_rules = '.';
But for all these, I get the error "An invalid name was submitted as the product name: சும்மாவா சொன்னாங்க பெரியவங்க The name can only contain alpha-numeric characters, dashes, underscores, colons, and spaces" in my log.
Also, thinking it could be due to unicode support, I tried the following:
$this->cart->product_name_rules = '\p{Tamil}';
But to no avail. Can you please point if something wrong here?
Try adding each Tamil character individually to your regex. I had to do this for special characters in input keys:
if ( ! preg_match("/^[a-z0-9àÀâÂäÄáÁãÃéÉèÈêÊëËìÌîÎïÏòÒôÔöÖõÕùÙûÛüÜçÇ’ñÑß¡¿œŒæÆåÅøØö:_\.\-\/-\\\,]+$/i", $str))
{
exit('Disallowed Key Characters.');
}
Here he posted how did he managed to save the cyrilic character in Codeigniter 1.7.2's cart.

Django url parsing -pass raw string

I'm trying to pass a 'string' argument to a view with a url.
The urls.py goes
('^add/(?P<string>\w+)', add ),
I'm having problems with strings including punctuation, newlines, spaces and so on.
I think I have to change the \w+ into something else.
Basically the string will be something copied by the user from a text of his choice, and I don't want to change it. I want to accept any character and special character so that the view acts exactly on what the user has copied.
How can I change it?
Thanks!
Notice that you can use only strings that can be understood as a proper URLs, it is not good idea to pass any string as url.
I use this regex to allow strings values in my urls:
(?P<string>[\w\-]+)
This allows to have 'slugs; in your url (like: 'this-is-my_slug')
Well, first off, there are a lot of characters that aren't allowed in URLs. Think ? and spaces for starters. Django will probably prevent these from being passed to your view no matter what you do.
Second, you want to read up on the re module. It is what sets the syntax for those URL matches. \w means any upper or lowercase letter, digit, or _ (basically, identifier characters, except it doesn't disallow a leading digit).
The right way to pass a string to a URL is as a form parameter (i.e. after a ?paramName= in the URL, and with special characters escaped, such as spaces changed to +).