Random Text generator based on regex [duplicate]

Random Text generator based on regex [duplicate] - regex

This question already has answers here:
Using Regex to generate Strings rather than match them
(12 answers)
Closed 3 years ago.
I would like to know if there is software that, given a regex and of course some other constraints like length, produces random text that always matches the given regex.
Thanks

Yes, software that can generate a random match to a regex:
Exrex, Python
Pxeger, Javascript
regex-genex, Haskell
Xeger, Java
Xeger, Python
Generex, Java
rxrdg, C#
String::Random, Perl
regldg, C
paggern, PHP
ReverseRegex, PHP
randexp.js, Javascript
EGRET, Python/C++
MutRex, Java
Fare, C#
rstr, Python
randexp, Ruby
goregen, Go
bfgex, Java
regexgen, Javascript
strgen, Python
random-string, Java
regexp-unfolder, Clojure
string-random, Haskell
rxrdg, C#
Regexp::Genex, Perl
StringGenerator, Python
strrand, Go
regen, Go
Rex, C#
regexp-examples, Ruby
genex.js, JavaScript
genex, Go

Xeger is capable of doing it:
String regex = "[ab]{4,6}c";
Xeger generator = new Xeger(regex);
String result = generator.generate();
assert result.matches(regex);

All regular expressions can be expressed as context free grammars. And there is a nice algorithm already worked out for producing random sentences, from any CFG, of a given length. So upconvert the regex to a cfg, apply the algorithm, and wham, you're done.

If you want a Javascript solution, try randexp.js.

Check out the RandExp Ruby gem. It does what you want, though only in a limited fashion. (It won't work with every possible regexp, only regexps which meet some restrictions.)

Too late but it could help newcomer , here is a useful java library that provide many features for using regex to generate String (random generation ,generate String based on it's index, generate all String..) check it out here .
Example :
Generex generex = new Generex("[0-3]([a-c]|[e-g]{1,2})");
// generate the second String in lexicographical order that match the given Regex.
String secondString = generex.getMatchedString(2);
System.out.println(secondString);// it print '0b'
// Generate all String that matches the given Regex.
List<String> matchedStrs = generex.getAllMatchedStrings();
// Using Generex iterator
Iterator iterator = generex.iterator();
while (iterator.hasNext()) {
System.out.print(iterator.next() + " ");
}
// it print 0a 0b 0c 0e 0ee 0e 0e 0f 0fe 0f 0f 0g 0ge 0g 0g 1a 1b 1c 1e
// 1ee 1e 1e 1f 1fe 1f 1f 1g 1ge 1g 1g 2a 2b 2c 2e 2ee 2e 2e 2f 2fe 2f 2f 2g
// 2ge 2g 2g 3a 3b 3c 3e 3ee 3e 3e 3f 3fe 3f 3f 3g 3ge 3g 3g 1ee
// Generate random String
String randomStr = generex.random();
System.out.println(randomStr);// a random value from the previous String list

We did something similar in Python not too long ago for a RegEx game that we wrote. We had the constraint that the regex had to be randomly generated, and the selected words had to be real words. You can download the completed game EXE here, and the Python source code here.
Here is a snippet:
def generate_problem(level):
keep_trying = True
while(keep_trying):
regex = gen_regex(level)
# print 'regex = ' + regex
counter = 0
match = 0
notmatch = 0
goodwords = []
badwords = []
num_words = 2 + level * 3
if num_words > 18:
num_words = 18
max_word_length = level + 4
while (counter < 10000) and ((match < num_words) or (notmatch < num_words)):
counter += 1
rand_word = words[random.randint(0,max_word)]
if len(rand_word) > max_word_length:
continue
mo = re.search(regex, rand_word)
if mo:
match += 1
if len(goodwords) < num_words:
goodwords.append(rand_word)
else:
notmatch += 1
if len(badwords) < num_words:
badwords.append(rand_word)
if counter < 10000:
new_prob = problem.problem()
new_prob.title = 'Level ' + str(level)
new_prob.explanation = 'This is a level %d puzzle. ' % level
new_prob.goodwords = goodwords
new_prob.badwords = badwords
new_prob.regex = regex
keep_trying = False
return new_prob

Instead of starting from a regexp, you should be looking into writing a small context free grammer, this will allow you to easily generate such random text. Unfortunately, I know of no tool which will do it directly for you, so you would need to do a bit of code yourself to actually generate the text. If you have not worked with grammers before, I suggest you read a bit about bnf format and "compiler compilers" before proceeding...

I'm not aware of any, although it should be possible. The usual approach is to write a grammar instead of a regular expression, and then create functions for each non-terminal that randomly decide which production to expand. If you could post a description of the kinds of strings that you want to generate, and what language you are using, we may be able to get you started.

Related

How to combine the Output of Regex Findall in Pandas

I'm exploring regex with pandas in a jupyter notebook.
My goal is to extract housenumberadditions from an addressline, using a set of regex patterns.
I'm building upon this post: https://gist.github.com/christiaanwesterbeek/c574beaf73adcfd74997
and I use this for input from a .csv:
Afleveradres
Dorpstraat 2
Dorpstr. 2
Dorpstraat 2
Laan 1933 2
18 Septemberplein 12
Kerkstraat 42-f3
Kerk straat 2b
42nd street, 1337a
1e Constantijn Huigensstraat 9b
Maas-Waalweg 15
De Dompelaar 1 B
Kümmersbrucker Straße 2
Friedrichstädter Straße 42-46
Höhenstraße 5A
Saturnusstraat 60-75
Saturnusstraat 60 - 75
Plein \'40-\'45 10
Plein 1945 1
Steenkade t/o 56
Steenkade a/b Twee Gezusters
1, rue de l\'eglise
Herestraat 49 BOX1043
Maas-Waalweg 15 15
My goal is to extract the streetnames, housenumbers & housenumberadditions.
So far I basically use:
# get data
file_base_name = 'examples'
dfa = pd.read_csv(''+file_base_name+'.csv', sep=';')
#get number
dfa['num'] = dfa['Afleveradres'].str.extract(r"([,\s]+\d+)\s*")
dfa['num'] = dfa['num'].str.strip()
# split leftover values into street & addition
dfa['tmp']=dfa.Afleveradres.str.replace(r"([,\s]+\d+)\s*", ';')
# new data frame with split value columns
new = dfa["tmp"].str.split(";", n = 1, expand = True)
# making separate first name column from new data frame
dfa["str"]= new[0]
# making separate last name column from new data frame
dfa["add"]= new[1]
dfa.drop(['tmp'], axis=1, inplace=True)
which results in:
listing streenames, numbers & addition:
;Afleveradres;str;add;num
0;Dorpstraat 2;Dorpstraat;;2
1;Dorpstr. 2;Dorpstr.;;2
2;Dorpstraat 2;Dorpstraat;;2
3;Laan 1933 2;Laan;2;1933
4;18 Septemberplein 12;18 Septemberplein;;12
5;Kerkstraat 42-f3;Kerkstraat;-f3;42
6;Kerk straat 2b;Kerk straat;b;2
7;42nd street, 1337a;42nd street;a;, 1337
8;1e Constantijn Huigensstraat 9b;1e Constantijn Huigensstraat;b;9
9;Maas-Waalweg 15;Maas-Waalweg;;15
10;De Dompelaar 1 B;De Dompelaar;B;1
So far so good, for now.
Next, I'd like to correct for housenumber ranges, like '42-46' and '60 - 65'.
A re.findall returns expected values:
import re
def rem(str):
pattern = r'[,#\'?\.$%_]'
if re.match(pattern, str):
tmp = 'Y'
else:
tmp = 'N'
return tmp
def extract_numrange(row):
r = ''+row['Afleveradres']
num_range1 = re.findall(r'([,\s]+\d+\-+\d+)\s*|([,\s]+\d+\s+\-+\s+\d+)\s*',r)
return num_range1
# return rem(num_range1)
dfa['excep'] = dfa.apply(extract_numrange, axis=1)
dfa
output re.findall
15 Friedrichstädter Straße 42-46 Friedrichstädter Straße -46 42 [( 42-46, )]
16 Höhenstraße 5A Höhenstraße A 5 []
17 Saturnusstraat 60-75 Saturnusstraat -75 60 [( 60-75, )]
18 Saturnusstraat 60 - 75 Saturnusstraat -; 60 [(, 60 - 75)]
But how do I clean this output, from [( 42-46, )] and [(, 60 - 75)] into something like 42-46 and 60 - 75 in a new column?
Or are there better approaches for my question?

The problem comes from the fact there are two capturing groups. You need to re-vamp the pattern to use only a single capturing group, or get rid of the group altogether.
Your pattern is of the (Group1)\s*|(Group2)\s* type. As you see, all you need is to re-group the parts into (Group1|Group2)\s*.
So, the quickest fix is
([,\s]+\d+\-+\d+|[,\s]+\d+\s+\-+\s+\d+)\s*
See the regex demo.
However, I think you do not need the whitespaces on both ends. Then, move those patterns you do not want to capture out of the grouping:
[,\s]+(\d+\-+\d+|\d+\s+\-+\s+\d+)\s*
^^^^^^
See this regex demo.
Probably, you may reduce this even further to
[,\s](\d+(?:-+|\s+-+\s+)\d+)
See this regex demo, the (?:-+|\s+-+\s+) is a non-capturing group that won't result in additional tuple item.

Python HMAC-SHA1 calculation goin wrong

I stuck in python while developing security access python script using HMAC-SHA1 algorithm key.
I have python version 2.7 which already includes HMAC-SHA1 libraries. Using library I tried to write script in the below mentioned way. But unfortunately when I execute the script the key calculated is different from the expected key given to me.
---------------Code Start--------------------------
from hashlib import sha1
import hmac
import base64
import hashlib, binascii
SecurityConst_key = "121a3ace5827a3b6" #(0x12 1A 3A CE 58 27 A3 B6)
msg = "4272696C6C69616E63655F6175746F21" # Brilliance_auto!
key = hmac.new(SecurityConst_key, msg, sha1).digest()
key = base64.b64encode(key)
print binascii.hexlify(key)
----------------Code End----------------------
Key calculated is : 4d416963747a41737a546f774530464373536e4d646b6c323972673d
Which is different from the leftmost 128 bits.
Expected key is: 0x15 4A ED 59 CF B3 2E DC 37 8D 30 6B 0F 02 AB 6B
(Truncate the 160 bits result. Output the leftmost 128 bits of the HMAC, it is the Key)
Can some one please help me to fix the possible issue.

So,what you need to be really careful of is understanding whether something is a byte-string or a hex-string. In python, most crypto and mac and sha's take binary strings. So your line:
SecurityConst_key = "121a3ace5827a3b6" #(0x12 1A 3A CE 58 27 A3 B6)
is not true. Thats actually 0x31 32 31 ... You need to get your data right first. Try hex decoding your key and your data before passing it in. "121a3ace5827a3b6".decode("hex").
On the other hand, for the data, you could just use the raw string.
Secondly you are also base64 encoding your output of the mac before printing that off as hex. That seems - wrong. You either base64 it and look at that, or you hex-encode it, and look at that. Base64 by design is ascii-printable (for hashes you traditionally print in hex)

How to identify unknown compression algorithm?

I'm trying to parse some text information from within project files of an Adobe program (Adobe Premiere Pro). The project files are gzip compressed XML files. These then contain fields which have base64 encoded fields which contain further information about a component of the project. I'm trying to look at the information within these fields.
I can't for the life of me identify the compression scheme used here. It doesn't look like GZip as it doesn't have the appropriate headers. Can anyone help?
An example of a base64 encoded field is:
AQAAAAAAAACk1QAAAAAAAENvbXByZXNzZWRUaXRsZQB4nO0da1MbSa5/iuu+L34Eg6maY4sQ2OWWhBRmk/hTytiGeM+xfbZJYH/83enRPf2cGfMIMyZdFOCR1Gp1t6TWSOPp//03Eb+KW/FVTERNfBMjsRBLMRYzMRX/FP8QTbElGvC/BpipGAB8CNipuCbsn+JCHItfgGqHaH4V+yIRB0AzE5fQ4rM4h08zsfLgF8BlBX2OCPPB6ndftKDPhujA7y78toCi7tEk4i1x5uvPogv4FV2jdEuieA899MUdYI+BaiG+w9UCpNgHuoW4gRbIOZsqAa4ruloBVvUzgOsRzce+uALsBHpjTkXUKNEinZcJ/OzD+FgGF56II+C1JOgJtWZKF4p0U5AWe7yCn3wJ82lRvhnQroIS+hhTxjOYz1VASgVHaPGK1XO0BMe7JHnHhF+IU1q3GfWwBP37kKm/2N6kdq+PaD5GwH1F/VwBPMSvJXmZ9BfAZU4yqrGHcCw/Ws/AWxUTg9dFkrnSv4G2X2kNlax5ss/FJ1jNE/h7BFy6AGuKPbI57NvHYosefD4R7wDG9GydTG/jkLorDkFLjuDnHaznObVgWh+TwNovQMbvMI4vNJKFHK2ydIU/B8g10OB48XpKPItau/Ppz5U7mweklaxleD2SutkFGlwJrZUH1FufsHh9JS1gS45XwxL4O4Pxm1gFSagvHhnj9whvQhOQZUZyzUB6k8qGsxfKk1NhD4xZyh5HQ7RzR6LweWNprzmadsF4siSur7V6r4FmIP4t9eWGLC5kJdvSSrogax9oEH9Oox9Rn7gLoj1vg9R7YAUsYT5tAl7Q98VnUvfmAEUfqDzhOVzjf2w5pR3pK7UzR8nWlN0O8aERI+YNSdgHyATa4p51l/rYopGGR1fP5WnjLuDqluxwlHrzuWWvTyNDcT+/AawPsC/kec9AX/6SvsPmxPozovk7pBbaqy/Ff4CqT7KgDGGqBDjPaJUU7BA4z0gTcE30+iCPfFqf1xn9nRby0XRo5YgZiY/wf0i+cz+1dB+TgAaOyZZwzx3LK9UijEvAJ49JlteA+xs+s6W5VpBFlRijOSQrmBLuGGgmpFu8Mq8gRtwCSWpP9Il91To9J6ArE9KqOa3Nc0u5fu9IfT9tV9ZzSJ4NVyhEZXontXf00zsEFQ+5UPb+t2QlU5IL929XL8I0qFUzis4/yvHYsbyPTch60A/iLJ1K7V5CfOP2mE2XxaO3Jo+evP+Z0B44lfEHWonaVzny9fEJ3VFcUuSM+4vpWT/KsfL+m0WVkE3OifeAbPOCaLu0fwzSVWnSPRfyWo8+i6/2eubYimjVOJW22leunrFuvNtw7TNbPkz3ijlEzXuc5oU17TKd7WrooLujRi18aVqYpW9LyXUquWye/sU9eDN00NW0+poxYv59WDhrp+7B89vWAneC3BrvFXdSHjy+Lv0dSKsZw73GiOLyHTnjGobzcCt+h886Y7ENkTfPiYth6j9ICp2ltWFob5wPVTPYpThc5VnY2rIp1rdr7Lcr7z9x/nfT/IwNdylD+YkQhWo3Ia+wssZrQpnuT7qHwlnQmuP3EaLi9uY9qLkC9r0pt2J7932BiUsAO4Y7TcxfD1PZbRjTnJD2s62YdDacaa9I3yfgm/vkNfYpOzUmuregf6qtT6faz2jWjgmOd253QV6aTx69zdP0Qmgt1yQD5itcbra/MjHaQ7S9Vqb3uKJZ6VMu4Jr84h21QRvrUCVnDzxcK+Xh07JX8e21XuAJHusrdqOviL4i+oroK9bwFZ3oK6KviL7ip/EVRTSqNqxG6V7XYG3fBO5LNMVraaF5d0IrY64a8oe1Vs/TnKpE41SyN1RDRT3tC1UnUzX7YkpdI+LKAXoDdUccsoUQXUJPjPAo2afup7PAvtHFJsQbJcA6PeJxDVeUAVDWkoXPatsraNuTknINF7Mcn+FqRj6ScwfLlEMRVUKrsSDMVFr0inRQzXwYez9vcwSrdUk+fUIj+ZJmK7pytzkne/T94n1a4tNAqsLMle0+ZRC0bbLmosUh7yJqpPG1HuHHpDnXqSyncn1CFtGWFuG2sfdff+/lJ56wR7272LBEcP18YnAwIQntzfwUlDuvJiYh6Y/IZ4zkHjRMnxFAzfFlt7ndvz3Wp12o+1xSHgXajo07o7lxJcuiSig7xNb1u9zzcOXc9llUPPdY51mR91b+dwfWGz19J10Nnyarb/OpjBM5j7cCny5s0q6/DVxxD1ERwP24mM9tIO4tfEZ4k+qzP+4vSxrqGTEhzXicpTTXsJVtuYM/l7WYFYcq2YopV1mWsrsxltIAbX7M7/0twYeqnYYjMi1/3g7UzHxCrObsge9AG77KZ1XNJzCXXks3vjAxvG+G4PeX4FXpEmyXLkG7dAl2Spdgt3QJOqVLsFe6BM302fQyZWhWQIbyPWOzAr6xWQHv2Mz1jy3KuWzDXLUp6us8i0y/5OqoK9Pes9luoxL2i1JUwYYb9F2tashRBVtGOapgzw36lkY15Cg/8mE5yo9/WI7yoyCWoxqxUDXioUZFYqJGReKiRkVio0ZF4qNGQYz0EDlsuJmVMHPmLHFWZUn5tJ+7srQrYmUpVpZiZSlWlpgqVpZiZSlWlmJlKVaWfmxcXP4dQvn3BuVnWcrPr5SfWSk/p1KNbEr5MlQhi1K+Z6xG9qR877hOZakts5F7IG9VKktapvaz2W6sLMXKUqwsxcpSrCw9Vca+Gv60GjFRrCzFytJ9KkvKl/zclSX1rFOsLMXKUqwsxcpSrCzFylKsLMXKUqws/di4uPw7hPLvDcrPspSfXyk/s1J+TqUa2ZTyZahCFqV8z1iN7En53nG9ytIOxRuvnkmm9SpLSqbms9lurCzFylKsLMXKUqwsPVXGvhr+tBoxUawsxcqSX1kyr5feGHxIXlVpICVAfPh8qDBFQnODObcbgtsjD+OwN8wVYc3lzHov4LbsKYy155jfcGnm2XxsQjnIsVXV0HnMEM5swXk6u06QhcVzlbjKwSc6Xgg+uU290TOMdVu9FuYpba1gW5tGczjwKhYhTEL6iyeSTqjuwu8ANSuC2Xjd1yFo3SDYl4nhk5z6VA/DypCpvQ250ll43Zebj/XhCZ3SxBqj3kNZo1Uakv6hnqvdBPVwaPBSdbRr+szneKmV1hA+m+TGmiV1nUAPmHtcpDh9jVe+bOtJ3Nw4iVsbJ/GrjZN425G4RU8BZEut8VmSa4rHSq99o++LwjjTG2kf7Poh0ztraOhUUrdl+ORSjX8vxtKHuGe0+BRmO398YZw9Ptyd5nIHckeocW4bnLWxMN8jnY0322LkoN56nTU6k8Zs+5b2UD7z0txhwxS8L7Nuhd5M7WLdGOfxcUr4/Tn3i1jMN+jGuCXGLdWMW1pU28339+1Cf6/OBI4xTIxhYgwTY5gYw1Qvhtl7QAzzsKyL2tN+bPQSY5efPXb5MTsTZsZ3Y8yyURJvfsxStsQxTolxSplxivuN8ofkWtQJYTHXEuOVKsYrMdcS45aXFrfEXEuMYWIME34DTcy1xNjlpcQuMdcSY5aXErOULXGMU2KcUm6cYn/H6iG5ls6DopWYa4nxSsy1xLglxi0x1xJjmBjDPEUM03xADBNzLTF2qW7sEnMtMWZ5KTFL2RLHOCXGKQ+NU2wIf//5QpgxQwgWimjUOyEe/k0hG/6O9hmNV2+M8Tm4lBzd4B6l3tNrvpdV52eyKBJa1280whXZnE3pv280nxrfOXhDEQNLOJGW4fLJojLbq3cSuhoSpkANGVAfI/HJGKF6x08IZ7cJ0do0vRy+vQy+vSAt68A47cUcnwnXVL0Mqp6ctQnZ2TQw9jDObtMF+KWBbXgtXQpXhz+BJ0SrxRUJR+Q2hdm/P69hnN2mSGafwpW5Vyhzz5HZxuo3ufot9RsmUaI5eZ5uqgHmOH2c289BSvUNtOgPgNxl9BqmNCP3EN6N4cM87Aj6Qu61rmVnUbljCnnxEAW+ExRnkve9pYwFVun4s7CIsz350/n2/CcTo2+Pvj369ujbo2/fRN+eXwmNvj369ujbo2+Pvv15fbsL40zNKXiSa/JjTGNfn4KMd0LnevQ1Xr2heftO7/C7TnsJQVXPIdghzfdYjlhfvQc+MzGXHvcuuBupN2l+pyznUHykvufe7Pr4RNZn9LlMnE/dB54t+N2i74lzBop1GTWhQ3cuW0DRkdWeOznqEDczI4o82tBuC1o206qD4qvOelDczJZoF5hxs7OLLhTpZrD+fG7RKcFxjV0Pm02VxcH1vtlUmJucwagOBVahfqP1++6tRZgmIT9zCb5jRtlFs7r0Ua4e20kWFe7naMMTsj5ciQui7VJOfJDOVDONL9ajz+KrK1oT2slXOTztc5myddy2AG6rMqr42X5XZQiW1ZptTa1XyJawFlSDq0vxl/Qk6iy1JmHmaYul4PNfbArsR50zZY9rTBqSd77UCqTTZ1TtgeW9AkvZpco2r/qtdYbVJ2lvIWvVuOIzpr6lEvIOpObJ1dlsusSasVPHKluy1zwaxOfNG1IcUVV3In0MtjszOGrtukytE7mu2yqhfY8jB32Cy0B+QlmQW4gGNeucrHCRjudfMBbWTheTUDwypRlAyAp0lWVkax5K7V2Q/XHPB2IleVxS7KtmZD3KmuwR4/UbkllruvYdHInrUwN3Ca/WOAv7ieKDRVrXUXVrxrEOrjJp1Iqr8/y4RnRFq25T1h893mbBeHdyx7tTynjrT6grK4o61P3jfrqPuDDXS9pQO06pavSyQ88wb8HfbXkfrqMMPFOKfWVT3h/fJ3pBvh1ojc8Zxfglxi+bE7+0CuMX9RzM08Qv/DTelnwmLz9+8e21SvHL5kYvpk6vG7kMBT4DcQWeYkyfb+DzF/g/rHg8w2/qz9vhO7k7fOcF7vBT+Vn39tgdX8PNLIZ+wob3ghvQSpZJX9foiS7+7N4xzWkOBtDTBfziLuWfTelTJGRnfemjdUbMhSaWzWDfNfAFyB13Qx0bqjnSMhePoFWREbRyRlDPWZ+6k82qe9muDwDp0qdjMTbsynyiOu/MyivZyvUa9UwMzthhmkXE+0fOnttQRcd2YtdMbAxCsmVnvB6lmoEZ2RZDToAWs/dsc3xehqZRM3kghrRLjMRnqk3MqP3/AUeAlD0=

It's a Zlib-compressed string with a 32-byte uncompressed header.
var encoding = #"";
byte[] data = Convert.FromBase64String(encoding);
var compressedArray = new byte[data.Length - 32];
Array.Copy(data, 32, compressedArray, 0, data.Length - 32);
var decompressed = ZlibStream.UncompressBuffer(compressedArray);
var str = Encoding.Unicode.GetString(decompressed);
The header contains the uncompressed length of the data in little-endian order at offset 8: a5 d5 or 0xd5a4, which equals 54692. It is hard to tell from this example if the uncompressed length is stored as a5 d5, a5 d5 00 00, or a5 d5 00 00 00 00 00 00.

Validate Mobile number using regular expression

I need to validate mobile number. My need:
The number may start with +8801 or 8801 or 01
The next number can be 1 or 5 or 6 or 7 or 8 or 9
Then there have exact 8 digit.
How can i write the regular expression using this conditions ?
the mobile numbers I tried
+8801811419556
01811419556
8801711419556
01611419556
8801511419556

Should be pretty simple:
^(?:\+?88)?01[15-9]\d{8}$
^ - From start of the string
(?:\+?88)? - optional 88, which may begin in +
01 - mandatory 01
[15-9] - "1 or 5 or 6 or 7 or 8 or 9"
\d{8} - 8 digits
$ - end of the string
Working example: http://rubular.com/r/BvnSXDOYF8
Update 2020
As BTRC approved 2 new prefixes, 013 for Grameenphone and 014 for Banglalink, updated expression for now:
^(?:\+?88)?01[13-9]\d{8}$

You may use either one of given regular expression to validate Bangladeshi mobile number.
Solution 1:
/(^(\+88|0088)?(01){1}[56789]{1}(\d){8})$/
Robi, Grameen Phone, Banglalink, Airtel and Teletalk operator mobile no are allowed.
Solution 2:
/(^(\+8801|8801|01|008801))[1|5-9]{1}(\d){8}$/
Citycell, Robi, Grameen Phone, Banglalink, Airtel and Teletalk operator mobile no are allowed.
Allowed mobile number pattern
+8801812598624
008801812598624
01812598624
01712598624
01919598624
01672598624
01512598624
................
.................

Use the following regular expression and test it if you want on following site quickly
regex pal
[8]*01[15-9]\d{8}

I know, that question was asked long time ago, but i assume that #G. M. Nazmul Hossain want to validate mobile number againt chosen country. I show you, how to do it with free library libphonenumber from Google. It's available for Java, C++ and Javascript, but there're also fork for PHP and, i believe, other languages.
+880 tells me that it's country code for Bangladesh. Let's try to validate example numbers with following code in Javascript:
String bdNumberStr = "8801711419556"
PhoneNumberUtil phoneUtil = PhoneNumberUtil.getInstance();
try {
//BD is default country code for Bangladesh (used for number without 880 at the begginning)
PhoneNumber bdNumberProto = phoneUtil.parse(bdNumberStr, "BD");
} catch (NumberParseException e) {
System.err.println("NumberParseException was thrown: " + e.toString());
}
boolean isValid = phoneUtil.isValidNumber(bdNumberProto); // returns true
That code will handle also numbers with spaces in it (for example "880 17 11 41 95 56"), or even with 00880 at the beggininng (+ is sometimes replaced with 00).
Try it out yourself on demo page. Validates all of provided examples and even more.

Have a look at libphonenumber at:
https://code.google.com/p/libphonenumber/

Bangladeshi phone number (Citycell, Robi, Grameen Phone, Banglalink, Airtel and Teletalk operators) validation by using regular expression :
$pattern = '/(^(\+8801|8801|01|008801))[1-9]{1}(\d){8}$/';
$BangladeshiPhoneNo = "+8801840001417";
if(preg_match($pattern, $BangladeshiPhoneNo)){
echo "It is a valid Bangladeshi phone number;
}

**Laravel Bangladeshi Phone No validation for (Citycell, Robi, Grameen Phone, Banglalink, Airtel and Teletalk) and start with +88/88 then 01 then 356789 then 8 digit**
public function rules()
{
return [
'mobile' => 'sometimes|regex:/^(?:\+?88)?01[35-9]\d{8}$/',
];
}
public function messages()
{
'mobile.regex' => 'Mobile no should be bd standard',
];
}

Example invalid utf8 string?

I'm testing how some of my code handles bad data, and I need a few series of bytes that are invalid UTF-8.
Can you post some, and ideally, an explanation of why they are bad/where you got them?

Take a look at Markus Kuhn's UTF-8 decoder capability and stress test file
You'll find examples of many UTF-8 irregularities, including lonely start bytes, continuation bytes missing, overlong sequences, etc.

In PHP:
$examples = array(
'Valid ASCII' => "a",
'Valid 2 Octet Sequence' => "\xc3\xb1",
'Invalid 2 Octet Sequence' => "\xc3\x28",
'Invalid Sequence Identifier' => "\xa0\xa1",
'Valid 3 Octet Sequence' => "\xe2\x82\xa1",
'Invalid 3 Octet Sequence (in 2nd Octet)' => "\xe2\x28\xa1",
'Invalid 3 Octet Sequence (in 3rd Octet)' => "\xe2\x82\x28",
'Valid 4 Octet Sequence' => "\xf0\x90\x8c\xbc",
'Invalid 4 Octet Sequence (in 2nd Octet)' => "\xf0\x28\x8c\xbc",
'Invalid 4 Octet Sequence (in 3rd Octet)' => "\xf0\x90\x28\xbc",
'Invalid 4 Octet Sequence (in 4th Octet)' => "\xf0\x28\x8c\x28",
'Valid 5 Octet Sequence (but not Unicode!)' => "\xf8\xa1\xa1\xa1\xa1",
'Valid 6 Octet Sequence (but not Unicode!)' => "\xfc\xa1\xa1\xa1\xa1\xa1",
);
From http://www.php.net/manual/en/reference.pcre.pattern.modifiers.php#54805

The idea of patterns of ill-formed byte-sequences can be gotten from the table of well-formed byte sequences. See "Table 3-7. Well-Formed UTF-8 Byte Sequences" in the Unicode Standard 6.2.
Code Points First Byte Second Byte Third Byte Fourth Byte
U+0000 - U+007F 00 - 7F
U+0080 - U+07FF C2 - DF 80 - BF
U+0800 - U+0FFF E0 A0 - BF 80 - BF
U+1000 - U+CFFF E1 - EC 80 - BF 80 - BF
U+D000 - U+D7FF ED 80 - 9F 80 - BF
U+E000 - U+FFFF EE - EF 80 - BF 80 - BF
U+10000 - U+3FFFF F0 90 - BF 80 - BF 80 - BF
U+40000 - U+FFFFF F1 - F3 80 - BF 80 - BF 80 - BF
U+100000 - U+10FFFF F4 80 - 8F 80 - BF 80 - BF
Here are the examples generated from U+24B62. I used them for a bug report: Bug #65045 mb_convert_encoding breaks well-formed character
// U+24B62: "\xF0\xA4\xAD\xA2"
"\xF0\xA4\xAD" ."\xF0\xA4\xAD\xA2"."\xF0\xA4\xAD\xA2"
"\xF0\xA4\xAD\xA2"."\xF0\xA4\xAD\xA2"."\xF0\xA4\xAD"
The oversimplification of range of trailing bytes([0x80, 0xBF]) can be seen in the various libraries.
// U+0800 - U+0FFF
\xE0\x80\x80
// U+D000 - U+D7FF
\xED\xBF\xBF
// U+10000 - U+3FFFF
\xF0\x80\x80\x80
// U+100000 - U+10FFFF
\xF4\xBF\xBF\xBF

,̆ was particularly evil. I see it as combined on ubuntu.
comma-breve

This might not be exactly what OP asked but it's somewhat related :
if you happen to already have byte ordinance values (0 - 255) and wanna know whether a byte# is a valid UTF-8 starting point byte or not, I came up with this strange unified formula that returns a 1 (true) or 0 (false) :
function newUTF8start(__) {
return 118^(+__< 194) < (246-__) }

Fuzz Testing - generate a random sequence of octets. Most likely you'll get some illegal sequences sooner than later.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Random Text generator based on regex [duplicate] - regex

Xeger is capable of doing it: String regex = "[ab]{4,6}c"; Xeger generator = new Xeger(regex); String result = generator.generate(); assert result.matches(regex);

All regular expressions can be expressed as context free grammars. And there is a nice algorithm already worked out for producing random sentences, from any CFG, of a given length. So upconvert the regex to a cfg, apply the algorithm, and wham, you're done.

If you want a Javascript solution, try randexp.js.

Check out the RandExp Ruby gem. It does what you want, though only in a limited fashion. (It won't work with every possible regexp, only regexps which meet some restrictions.)

Related

How to combine the Output of Regex Findall in Pandas

Python HMAC-SHA1 calculation goin wrong

How to identify unknown compression algorithm?

Validate Mobile number using regular expression

Example invalid utf8 string?

Categories

Resources