Regex to match substring of a string [duplicate] - regex

This question already has answers here:
Regular expression to match A, AB, ABC, but not AC. ("starts with")
(4 answers)
Closed 4 years ago.
I have an enumeration that I use as fixed parameter values to a program and I use regex to sanitize the user input.
I want the user to be able to enter a partial match to one of the values and accept that value and not other values.
For example if the enumeration is:
end
end now
start
swarm
condition
and the user entered
s
st
sta
etc...
it will be ok because it is part of start;
but if the user entered
ending
it will not be ok because its not part of any of the other words.
I know I can specify each permutation in a group (s|st|sta|star|start) and it will do the work, but doing this for around 12 different values seems very hard to maintain and "ugly'...
Is there an easier way to match a fixed values or sub of those fixed values?
I'm not searching for something that is specific to one engine/language (for example java code..)

Regex is not the correct tool for this job.
Just find the length of the user's input (call it N), then loop through your valid values and see if the first N characters of that value matches the input.
If only one item matches, you've got yourself a result! If more than one matches, you'll need more letters from the user to identify the correct one. And if none match, it's invalid.

Related

Regex matching either positive/negative floats, ints or string

I want to be able to match and parse some parameters read from a file such as :
"type:int,register_id:15,value:123456"
"type:int,register_id:16,value:-456789"
"type:double,register_id:17,value:123.456"
"type:double,register_id:18,value:-456.789"
"type:bool,register_id:19,value:true"
"type:bool,register_id:20,value:false"
"type:string,register_id:17,value:Test Set Data Register"
I've come up with the following Regex expression :
(^(type:)\b(bool|int|double|string)\b,(\bregister_id:\b)([1-9][0-9]),(\bvalue:\b)(.)$)
but I have issues where there are negative floats or ints, I can't get the hyphen sorted properly ...
Can someone point me in the right direction ?
https://regex101.com/r/WhXmBE/3
Thanks !
Tried [\s\S] but it reads everything, tried -? as well
Given your example, this seems to work:
(^(type:)(bool|int|double|string),(register_id:)([1-9][0-9]*),(value:)(.*)$)
At least from the example, I didn't see why the \b are necessary. Apologies if I missed something.
Looking at what you try to achieve, I would actually consider moving away from regexes, as regexes by themselves add complexity. You will likely have an easier life if you approach it like this:
Split the line by "," to get the key value pairs
Split each key value pair by the first ":" to split key and value
Validate that all keys are present and that every value matches the format for the key (e.g. if the type is bool then the value should parse to a bool)
You can easily adjust every step to e.g. trim whitespaces.
Edit: Fixed typo

Regex to match a few specific 6 digit numbers (Indian pin codes)

I am using Google Forms to create an application form. Need to restrict submission for a few specific pin codes only. Here are some of the pin codes that I'm trying to limit it to.
560078 560070 560085 560069 560011 560080 560004 570070 560089 560060
First 4 digits are same in all the pin codes. I need to match the last two digits to the ones from a list. The list might end up being about 30, hence looking for a regex.
Which regex should I use for that?
You haven't specified the exact ranges. The pattern 5600(04|11|[6-8]\d) will match:
560004
560011
all the numbers from 560060 to 560089
It will need to be corrected according to the acceptable ranges.
It is suprising to see 570070 in the list when you note that the first 4 digits are always the same. Is it an error or an excpetion.

Regex extract number from a string with a specific pattern in Alteryx [duplicate]

This question already has answers here:
Find numbers after specific text in a string with RegEx
(3 answers)
Closed 3 years ago.
I have string like this which looks like a url
mainpath/path2/abc/PI 6/j
From the string I need to get the number along with PI
Main problem is the position of PI part wont be always the same. Sometimes it could be at the end. Sometimes at the middle.
So how can I get that number extracted using regex?
I'm really stucked with this
It's as simple as using the RegEx Tool. A Regular Expression of /PI (\d+) and the Output Method of "Parse" should do the trick.
If you're using Alteryx... suppose your field name is [s] and you're looking for [f] (in your example the value of [f] is "PI")... then you could have a Formula tool that first finds /PI by first creating a new field [tmp] as:
SubString([s],FindString([s],"/"+[f])+1)
and then creating the field you're after [target]:
SubString([tmp],0,FindString([tmp],"/"))
From there run [target] through a "Text to Columns" tool to split on the space, which will give you "PI" and "6".

How to have multiple regex for same element

I have a text box which has a regular expression which is something like below
^AB[a-zA-Z0-9]{20}$
which basically allows charecter AB , followed by 20 either alphabetic or numbers, and for example lets consider the validation error for not following this regex is Some Test Error
I have a scenario where user enters AB1234 and tabs out of the text box, and the error Some Test Error shows, but I have a requirement of not showing the same error message Some Test Error if user is trying to follow the format but not adhering to the entire regex.
Scenario 1 :- User enters CD12345675438976524381
I need to show Some Test Error
Scenario 2 : USer enters AB12345
I need to shoe Different Test Error, because user tried to enter a value starting from AB*
How can achieve this, is there a way of specifying multiple regex's?
I am not sure which language you are using... but I suppose that you may change the regex, when user got the message once. While the user is trying to enter the entire string, don't count the number, unless the user input the 21st char or something not belong to [a-zA-Z0-9]...
I wish I made myself understood, the point is that I suppose you change the regex in time.
I think you can for example use multiple regexes and check the input:
if input is valid, everithing is ok,
if input is invalid check: a) if starts with AB (regex: ^AB) or if is valid length (regex ^([^A][^B][a-zA-Z0-9]{20})$) show proper info
if is totally invalid, give another info
OR you can use one long regex, like:
^(AB[a-zA-Z0-9]{20})$|^(AB[a-zA-Z0-9]{0,19}|AB[a-zA-Z0-9]{21,})$|^([^A][^B][a-zA-Z0-9]{20})$
DEMO
which capture given type of input in saparete groups,
and then find which groups was captured to check level of correctness:
if group 1 exist - valid string,
if group 2 - starts with AB but inproper length,
if group 3 - proper lenght, invalid beginning
I sure there are also other solutions.

How do I find strings that only differ by their diacritics?

I'm comparing three lexical resources. I use entries from one of them to create queries — see first column — and see if the other two lexicons return the right answers. All wrong answers are written to a text file. Here's a sample out of 3000 lines:
réincarcérer<IND><FUT><REL><SG><1> réincarcèrerais réincarcérerais réincarcérerais
réinsérer<IND><FUT><ABS><PL><1> réinsèrerons réinsérerons réinsérerons
macérer<IND><FUT><ABS><PL><3> macèreront macéreront macéreront
répéter<IND><FUT><ABS><PL><1> répèterons répéterons répéterons
The first column is the query, the second is the reference. The third and fourth columns are the results returned by the lexicons. The values are tab-separated.
I'm trying to identify answers that only differ from the reference by their diacritics. That is, répèterons répéterons should match because the only difference between the two is that the second part has an acute accent on the e rather than a grave accent.
I'd like to match the entire line. I'd be grateful for a regex that would also identify answers that differ by their gemination — the following two lines should match because martellerait has two ls while martèlerait only has one.
modeler<IND><FUT><ABS><SG><2> modelleras modèleras modèleras
marteler<IND><FUT><REL><SG><3> martellerait martèlerait martèlerait
The last two values will always be identical. You can focus on values #2 and 3.
The first part can be achieved by doing a lossy conversion to ASCII and then doing a direct string comparison. Note, converting to ASCII effectively removes the diacritics.
To do the second part is not possible (as far as I know) with a regex pattern. You will need to do some research into things like the Levenshtein distance.
EDIT:
This regex will match duplicate consonants. It might be helpful for your gemination problem.
([b-df-hj-np-tv-xz])\\1+
Which means:
([b-df-hj-np-tv-xz]) # Match only consonants
\\1+ # Match one or times again what was captured in the first capture group