Regex Check Char Location - regex

I need to verify a Swedish postal code which must be 6 characters long. There must be a space for the 4th character.
An example is 114 55
I know I can verify the length being 6 with ^[a-zA-Z]{6}$ but can I verify the 4th char is a space and verify the length as well?
This must be regex as it is stored in an xml file which is parsed
I am looking at the regex provided, as well as some other solutions and for the most part it works.
I now see I need to validate 11455 or 114 55, but I can not validate 11455- 9085 or 11455 6625.
I am using the regex ^\d{5}|(\d{3}\s\d{2})$, but it is also considering the last two as valid, how can I exclude this?
Looking into this more I have one case I can not solve. IF the postal code is put in as 60922- 62264 the last 5 digits are recognized as valid.
My regex is (\d{5}$)|(\d{3}\s\d{2}$) and it works for all cases except for the one just mentioned.
Any advice?

As much fun as I was having learning about regex, I have a deadline and I decided to google the regex to validate Sweden's postal codes and I found this
^(s-|S-){0,1}[0-9]{3}\s?[0-9]{2}$
Which works in every case so far!

Related

Open Refine regex for alphabets

i want to edit only alphabetic charcter from my cell
.
what i have done
value.match(/.*?(\^[a-zA-Z]*$).*?/)
but it returns null
i am try to clean address column in my data set following are the sample address
H3656 GALI#4 BLOCK-D, AREA 1
H#36/17 SECTOR 5D AREA 2
AREA 3 BLOCK-B NORTH NAZIMABAD
GERMANY AL JANNAT BENQUET SECTOR 16 Area 2 with short name
so that i first try to remove all numbers from my string
If you want to remove all the numbers, the most direct approach is probably:
value.replace(/\d+/, "")
If for any reason you want to find only the alphabetic characters, as indicated by the title of your question, this will be more effective than a value.match() :
value.find(/\p{L}\s?/).join("")
(\p{L} is a Java regular expression - Openrefine is written in Java - equivalent to [a-zA-Z], but which also takes into account Unicode characters like accented letters.)
In general, you should avoid using the .match() method unless you know exactly what you are doing. In 90% of cases, it is actually .find() that is desired.

Using RegEx to Find a Block of Text

I'm attempting to block a long string of unnecessary text that's on every page of a document.
Ex: "36075 This is another page and this is the date March 4 2013"
I know this must be very simple, but I'm hoping there is a way to block text verbatim. Is the only way to block this text by using a lot of /d/s/w+/+ etc or is there is a way to say, "match 36075 This is another page and this is the date March 4 2013".
This would be SO HELPFUL to know. Thank you for helping!
From what you wrote I assume you need to get leading numbers from string, to do it you just need to use this pattern: ^\d+ which from this input:
36075 This is another page and this is the date March 4 2013
will return this:
36075
For future, in case of such questions please provide example string and expected output. As well as what you have tried.
I realized the issue I was having. I didn't need to use RegEx. The program I was using has the functionality to match specific words or groups of words and pronounce them differently. What I discovered is that it will not match the words unless the word groups are input exactly the way the program typically reads them.
Ergo --> The channel saw
the end of the British hold over
Would have to be listed as one group for, "The channel saw" and a second group for "the end of the British hold over"
In addition, there were some numbers --> 11960_30_o_ho_
and if the program naturally read 119 and then 60_3 and then _o_ho_ then three strings would need to be input for each section.
A few frustrating hours later, problem solved :) Thank you for your assistance.

RegEx to match Bitcoin addresses?

I am trying to come up with a regular expression to match Bitcoin addresses according to these specs:
A Bitcoin address, or simply address, is an identifier of 27-34
alphanumeric characters, beginning with the number 1 or 3 [...]
I figured it would look something like this
/^[13][a-zA-Z0-9]{27,34}/
Thing is, I'm not good with regular expressions and I haven't found a single source to confirm this would not create false negatives.
I've found one online that's ^1[1-9A-Za-z][^OIl]{20,40}, but I don't even know what the [^OIl] part means and it doesn't seem to match the 3 a Bitcoin address could start with.
^[13][a-km-zA-HJ-NP-Z1-9]{25,34}$
will match a string that starts with either 1 or 3 and, after that, 25 to 34 characters of either a-z, A-Z, or 0-9, excluding l, I, O and 0 (not valid characters in a Bitcoin address).
^[13][a-km-zA-HJ-NP-Z1-9]{25,34}$
bitcoin address is
an identifier of 26-35 alphanumeric characters
beginning with the number 1 or 3
random digits
uppercase
lowercase letters
with the exception that the uppercase letter O, uppercase letter I, lowercase letter l, and the number 0 are never used to prevent visual ambiguity.
[^OIl] matches any character that's not O, I or l. The problems in your regex are:
You don't have a $ at the end, so it'd match any string beginning with a BC address.
You didn't count the first character in your {27,34} - that should be {26,33}
However, as mentioned in a comment, a regex is not a good way to validate a bitcoin address.
^(bc1|[13])[a-zA-HJ-NP-Z0-9]{25,39}$
Based on the new address type Bech32
Based on answer of runeks and Erhard Dinhobl I got this that accepts bech32 and legacy:
\b(bc(0([ac-hj-np-z02-9]{39}|[ac-hj-np-z02-9]{59})|1[ac-hj-np-z02-9]{8,87})|[13][a-km-zA-HJ-NP-Z1-9]{25,35})\b
Including testnet address:
\b((bc|tb)(0([ac-hj-np-z02-9]{39}|[ac-hj-np-z02-9]{59})|1[ac-hj-np-z02-9]{8,87})|([13]|[mn2])[a-km-zA-HJ-NP-Z1-9]{25,39})\b
Only testnet:
\b(tb(0([ac-hj-np-z02-9]{39}|[ac-hj-np-z02-9]{59})|1[ac-hj-np-z02-9]{8,87})|[mn2][a-km-zA-HJ-NP-Z1-9]{25,39})\b
Based on the description here: https://github.com/bitcoin/bips/blob/master/bip-0173.mediawiki I would say the regex for a Bech32 bitcoin address for Version 1 and Version 0 (only for mainnet) is:
\bbc(0([ac-hj-np-z02-9]{39}|[ac-hj-np-z02-9]{59})|1[ac-hj-np-z02-9]{8,87})\b
Here are some other links where I found infos:
https://github.com/bitcoin/bips/blob/master/bip-0173.mediawiki
http://r6.ca/blog/20180106T164028Z.html
As the OP didn't provide a specific use case (only matching criteria) and I came across this in researching methods to detect BitCoin addresses, wanted to post back and share with the community.
These RegEx provided will find BitCoin addresses either at the start of a line and/or end of the line. My use case was to find BitCoin addresses in the body of an email given the rise of blackmail/sextortion (Reference: https://krebsonsecurity.com/2018/07/sextortion-scam-uses-recipients-hacked-passwords/) - so these weren't effective solutions (as outlined later). The proposed RegEx will catch many FPs in email, due to filenames and other identifiers within URLs. I am not knocking the solutions, as they work for certain use cases, but they simply don't work for mine. One variation caught many spam emails within a short timeframe of passive alerting (examples follow).
Here are my test cases:
--------------------------------------------------------
BitCoin blackmail formats observed (my org and online):
--------------------------------------------------------
BTC Address: 1JHwenDp9A98XdjfYkHKyiE3R99Q72K9X4
BTC Address: 1Unoc4af6gCq3xzdDFmGLpq18jbTW1nZD
BTC Address: 1A8Ad7VbWDqwmRY6nSHtFcTqfW2XioXNmj
BTC Address: 12CZYvgNZ2ze3fGPFzgbSCELBJ6zzp2cWc
BTC Address: 17drmHLZMsCRWz48RchWfrz9Chx1osLe67
Receiving Bitcoin Address: 15LZALXitpbkK6m2QcbeQp6McqMvgeTnY8
Receiving Bitcoin Address: 1MAFzYQhm6msF2Dxo3Nbox7i61XvgQ7og5
--------------------------------------------------------
Other possible BitCoin test cases I added:
--------------------------------------------------------
- What if text comes before and/or after on same line? Or doesn't contain BitCoin/BTC/etc. anywhere (or anywhere close to the address)?
Send BitCoin payments here 1MAFzYQhm6msF2Dxo3Nbox7i61XvgQ7og5
1MAFzYQhm6msF2Dxo3Nbox7i61XvgQ7og5 to keep your secrets safe.
Send payments here 1MAFzYQhm6msF2Dxo3Nbox7i61XvgQ7og5 to keep your secrets safe.
- Standalone address:
1Dvd7Wb72JBTbAcfTrxSJCZZuf4tsT8V72
--------------------------------------------------------
Redacted Body content generating FPs from spam emails:
--------------------------------------------------------
src=3D"https://example.com/blah=3D2159024400&t=3DXWP9YVkAYwkmif9RgKeoPhw2b1zdMnMzXZSGRD_Oxkk"
"cursor:pointer;color:#6A6C6D;-webkit-text-size-blahutm_campaign%253Drdboards%2526e_t%253Dd5c2deeaae5c4a8b8d2bff4d0f87ecdd%2526utm_cont=blah
src=3D"https://example.com/blah/74/328e74997261d5228886aab1a2da6874.jpg"
src=3D"https://example.com/blah-1c779f59948fc5be8a461a4da8d938aa.jpg"
href=3D"https://example.com/blah-0ff3169b28a6e17ae8a369a3161734c1?alert_=id=blah
Some RegEx samples I tested (won't list those I'd knock for greedy globbing with backtraces):
^[13][a-km-zA-HJ-NP-Z1-9]{25,34}$
[13][a-km-zA-HJ-NP-Z1-9]{25,34}$
(Too narrow and misses BitCoin addresses within a paragraph)
(bc1|[13])[a-zA-HJ-NP-Z0-9]{25,39}$
(Still misses text after BTC on same line and triples execution time)
\W[13][a-km-zA-HJ-NP-Z1-9]{25,34}\W
(Too broad and catches URL formats)
The current RegEx I am evaluating which catches all my known/crafted sample cases and eliminates known FPs (specifically avoiding end of sentence period for URL filename FPs):
[13][a-km-zA-HJ-NP-Z1-9]{25,34}\s
One reference point for execution times (shows cost in steps and time): https://regex101.com/
Please feel free to weigh in or provide suggestions on improvements (I am by no means a RegEx master). As I further vet it against email detection of Body content, I will update if other FP cases are observed or more efficient RegEx is derived.
Seth
for mainnet bitcoin
/^([13]{1}[a-km-zA-HJ-NP-Z1-9]{26,33}|bc1[a-z0-9]{39,59})$/
if you don't want to understand the above regex you can skip the detail below
breaking it down
For regular addresses
/[13]{1}/
address will start with 1 or 3, {1} defines that only match one character in square bracket
/[13]{1}[a-km-zA-HJ-NP-Z1-9]/
cannot have l (small el), I (capital eye), O (capital O) and 0 (zero)
/[13]{1}[a-km-zA-HJ-NP-Z1-9]{26,33}/
can be 27 to 34 characters long, remember we already checked the first character to be 1 or 3, so remaining address will be 26 to 33 characters long
For segwit
/bc1/
starts with bc1
/bc1[a-z0-9]/
can only contain lower case letters and numbers
/bc1[a-z0-9]{39,59}/
can be 42 to 62 characters long, we already checked first three characters to be bc1, so remaining address will be 39 to 59 characters long
I am not into complicated solutions and this regex served the purpose for the most simplest validation, when you just don't want to receive complete nonsense.
\w{25,}
For matching legacy, nested SegWit, and native SegWit addresses:
/^(?:[13]{1}[a-km-zA-HJ-NP-Z1-9]{26,33}|bc1[a-z0-9]{39,59})$/
Source: Regex for Bitcoin Addresses.

telephone number regex

I am currently trying to validate UK telephone numbers:
The format I'm looking for is: 01234 567891 or 01234567891 - So I need the number to have 5 numbers then a space then 6 numbers or simply a 11 numbers.
The number must start with a 0.
I've had a look at a couple of examples:
/^[0-9]{10,11} - to check that the chars are all numbers
/^0[0-9]{9,10}$/ - to check that the first number is a 0
I'm just unsure how to put all these together and check if there is a space or not.
Could someone help me with this regex?
Thanks
Try this regex:
/^0\d{4}\s?\d{6}$/
Many people try to do input validation and formatting in a single step.
It is better to separate these processes.
Match UK telephone number in any format
^(?:(?:\(?(?:0(?:0|11)\)?[\s-]?\(?|\+)44\)?[\s-]?(?:\(?0\)?[\s-]?)?)|(?:\(?0))(?:(?:\d{5}\)?[\s-]?\d{4,5})|(?:\d{4}\)?[\s-]?(?:\d{5}|\d{3}[\s-]?\d{3}))|(?:\d{3}\)?[\s-]?\d{3}[\s-]?\d{3,4})|(?:\d{2}\)?[\s-]?\d{4}[\s-]?\d{4}))(?:[\s-]?(?:x|ext\.?|\#)\d{3,4})?$
The above pattern allows the user to enter the number in any format they are comfortable with. Don't constrain the user into entering specific formats.
Extract NSN, prefix and extension
^(\(?(?:0(?:0|11)\)?[\s-]?\(?|\+)(44)\)?[\s-]?)?\(?0?(?:\)[\s-]?)?([1-9]\d{1,4}\)?[\d[\s-]]+)((?:x|ext\.?|\#)\d{3,4})?$
Next, extract the various elements.
$2 will be '44' if international format was used, otherwise assume national format with leading '0'.
$4 contains the extension number if present.
$3 contains the NSN part.
Validation and formatting
Use further RegEx patterns to check the NSN has the right number of digits for this number range. Finally, store the number in E.164 format or display it in E.123 format.
There's a very detailed list of validation and display formatting RegEx patterns for UK numbers at:
http://www.aa-asterisk.org.uk/index.php/Regular_Expressions_for_Validating_and_Formatting_UK_Telephone_Numbers
It's too long to reproduce here and it would be difficult to maintain multiple copies of this document.
If you are looking for all UK numbers, I'd look for a bit more than just that number, some are in the format 020 7123 4567 etc.
^\s*\(?(020[7,8]{1}\)?[ ]?[1-9]{1}[0-9{2}[ ]?[0-9]{4})|(0[1-8]{1}[0-9]{3}\)?[ ]?[1-9]{1}[0-9]{2}[ ]?[0-9]{3})\s*$
/\d*(*)*+*-*/
Simple Telephone Regex includes + () and - anywhere, as well as digits
I think ^0[\d]{4}\s?[\d]{5,6}} will work for you. I have used [\d] instead of [0-9].
I find that RegExr is a useful online tool to check and try your regular expressions. It also has a nice library of examples to help point you in the right direction
you should just count the number of digits and check that it's 10,
Some UK numbers have only 9 digits, not 10 (not including the leading 0).
These include 40 of the 01 area codes (using "4+5" format), the 016977 area code (using "5+4" format), all 0500 numbers and some 0800 numbers.
There's a list at: http://www.aa-asterisk.org.uk/index.php/01_numbers
This US numbers pattern accepts following phones as well:
800-432-4500, Opt: 9, Ext: 100316
800-432-4500, Opt: 9, Ext: X100316
800-432-4500, Option #3
(?:(?:\+?1\s*(?:[.-]\s*)?)?(?:\(\s*([2-9]1[02-9]|[2-9][02-8]1|[2-9][02-8][02-9])\s*\)|([2-9]1[02-9]|[2-9][02-8]1|[2-9][02-8][02-9]))\s*(?:[.-]\s*)?)?([2-9]1[02-9]|[2-9][02-9]1|[2-9][02-9]{2})\s*(?:[.-]\s*)?([0-9]{4}),?(?:\s*(?:#|x\.?|opt(\.|:|\.:)?|option)\s*#?(\d+))?,?(?:\s*(?:#|x\.?|ext(\.|:|\.:)?|extension)\s*(\d+))?
(used this answer in other topic as start point)

How to validate with regex that a string is OK as long as it contains 10 digits?

I'm processing input from a Web form. Basically, all I care about is that the value provided includes 10 digits, no more, no less.
These would be valid inputs:
1234567890
123 456 789 0 Hello!
My number is: 123456-7890 thanks
These would be invalid inputs:
123456789033 (too long)
123 Hello! (too short)
My number is one five zero nine thanks (no digits)
I've tried many different things with Regextester but it never matches correctly. I'm using the 'preg' setting (which is what I figured my CMS Typo3 uses) and my closest attempt is:
([0-9][^0-9]*){10}
which is kinda lame but is the closest I got.
Cheers!
EDIT: I cannot use any programming language to implement this. Imagine that I have a admin console field in front of me, in which I must enter a regular expression that will be used to validate the value. That's all the latitude I have. Cheers.
I think you've got the right idea. Maybe you can simplify it as (\d\D*){10}
If the regex has to match the complete string, you would want \D*(\d\D*){10}
UPDATE: It looks like you need ^\D*(\d\D*){10}$ to make sure you match the complete string.
A regular expression is not always the best tool for this kind of job. In this case it's probably easier and simpler to write a function to count the number of digits in a string. (Since you didn't mention a programming language, I'll use Python in my example.)
def count_digits(s):
return len([x for x in s if x.isdigit()])
Then, you can use it like this:
s = "My number is: 123456-7890 thanks"
if count_digits(s) == 10:
print("looks okay")
else:
print("doesn't contain 10 digits")