regular expression to match a ascii character - regex

I want to match a regular expression for the string
2=abc\u000148=123\u0001
Explanation
Key value pairs separated by SOH(\u0001) characeter
Key - Number
Value can be string of number ,alphabets,decimals
key and value are separated by "="
The regex I tried is
[0-9]=.*[u0001]+
but it does not matches properly
Update
I have a list of numbers val num =Seq(2,3,4)
Instead of finding I want to remove the matches from the string
keys for which I want to replace is from values inside list num
Input
2=abc\u000148=123\u00013=def\u0001
Output It is the filtered string
148=123\u0001 ,where keys which match value 2 and 3 are removed from list
object Main extends App {
val s = "2=abc\u000148=123\u00013=def\u0001"
val num = Seq(2,3)
for (e <- num) {
val p = s"(\\$e+)=([^\u0001]*)".r
test(p)
}
private def test(p: Regex) = {
p.findAllIn(s).matchData foreach {
m => println(m.group(1) + " : " + m.group(2))
}
}
}

You need to build the pattern dynamically like this:
s"\\b(?:${num.mkString("|")})=[^\\u0001]*\\u0001*"
Details
\b - a word boundary
(?:num1|num2...|numN) - any of the values in the num variable
= - an equal sign
[^\u0001]* - zero or more chars other than a SOH char (a char with the decimal code of 1)
\u0001* - zero or more SOH chars.
See a Scala demo:
val num = Seq(2,3)
val s = "1041=pqr\u000148=xyz\u000122=8\u00012=abc\u000148=123\u00013=def\u0001"
val pattern = s"\\b(?:${num.mkString("|")})=[^\\u0001]*\\u0001*"
// println(pattern) // => \b(?:2|3)=[^\u0001]*\u0001*
println(s.replaceAll(pattern, ""))
// => 1041=pqr\u000148=xyz\u000122=8\u000148=123\u0001

Related

How to group similar characters in a string in scala?

Lets assume I have a string as such:
val a = "aaaabbbcccss"
and I want to group only the a's and b's as such:
"a4b3cccss"
I have tries a.toList.groupBy(identity).mapValues(_.size) but that returns a map with no ordering so I cannot convert it into the form I want. I was wondering if there is a function in scala that can achieve what I want?
You may use
val a = "aaaabbbcccss"
val p = """([ab])\1*""".r
println(p replaceAllIn (a, m => s"${m.group(1)}${m.group(0).size}") )
See Scala demo
The regex matches:
([ab]) - Group 1: a or b
\1* - zero or more occurrences of the char captured into Group 1.
In the replacement part, m.group(1) is the char captured into Group 1 and m.group(0).size is the size of the whole match.
As an alternative, you might create a function which you can give your string and a list of characters and use a recursive approach where you could take consecutive characters from the list using takeWhile.
Then drop from the list using the length of the result from takewhile and add to the accumulator what you want to concatenate to the acc string which will be returned when the list will be empty.
def countSimilar(str: String, ch: List[Char]): String = {
def process(l: List[Char], acc: String = ""): String = {
l match {
case Nil => acc
case h :: _ =>
val tw = l.takeWhile(_ == h)
acc + process(
l.drop(tw.length),
if (ch.contains(h)) h + tw.length.toString else tw.mkString("")
)
}
}
process(str.toList)
}
println(countSimilar("aaaabbbcccss", List('a', 'b')))
println(countSimilar("aaaabbbcccssaaaabb", List('a', 'b', 'c')))
That will give you:
a4b3cccss
a4b3c3ssa4b2
See the Scala demo

Regex Matching using Matcher and Pattern

I am trying to do regex on a number based on the below conditions, however its returning an empty string
import java.util.regex.Matcher
import java.util.regex.Pattern
object clean extends App {
val ALPHANUMERIC: Pattern = Pattern.compile("^[a-zA-Z0-9]*$")
val SPECIALCHAR: Pattern = Pattern.compile("[a-zA-Z0-9\\-#\\.\\(\\)\\/%&\\s]")
val LEADINGZEROES: Pattern = Pattern.compile("^[0]+(?!$)")
val TRAILINGZEROES: Pattern = Pattern.compile("\\.0*$|(\\.\\d*?)0+$")
def evaluate(codes: String): String = {
var str2: String = codes.toString
var text:Matcher = LEADINGZEROES.matcher(str2)
str2 = text.replaceAll("")
text = ALPHANUMERIC.matcher(str2)
str2 = text.replaceAll("")
text = SPECIALCHAR.matcher(str2)
str2 = text.replaceAll("")
text = TRAILINGZEROES.matcher(str2)
str2 = text.replaceAll("")
}
}
the code is returning empty string for LEADINGZEROES match.
scala> println("cleaned value :" + evaluate("0001234"))
cleaned value :
What change should I do to make the code work as I expect. Basically i am trying to remove leading/trailing zeroes and if the numbers has special characters/alphanumeric values than entire value should be returned null
Your LEADINGZEROES pattern is working correct as
val LEADINGZEROES: Pattern = Pattern.compile("^[0]+(?!$)")
println(LEADINGZEROES.matcher("0001234").replaceAll(""))
gives
//1234
But then there is a pattern matching
text = ALPHANUMERIC.matcher(str2)
which replaces all alphanumeric to "" and this made str as empty ("")
As when you do
val ALPHANUMERIC: Pattern = Pattern.compile("^[a-zA-Z0-9]*$")
val LEADINGZEROES: Pattern = Pattern.compile("^[0]+(?!$)")
println(ALPHANUMERIC.matcher(LEADINGZEROES.matcher("0001234").replaceAll("")).replaceAll(""))
it will print empty
Updated
As you have commented
if there is a code that is alphanumeric i want to make that value NULL
but in case of leading or trailing zeroes its pure number, which should return me the value after removing zeroes
but its also returning null for trailing and leading zeroes matches
and also how can I skip a match , suppose i want the regex to not match the number 0999 for trimming leading zeroes
You can write your evaluate function and regexes as below
val LEADINGTRAILINGZEROES = """(0*)(\d{4})(0*)""".r
val ALPHANUMERIC = """[a-zA-Z]""".r
def evaluate(codes: String): String = {
val LEADINGTRAILINGZEROES(first, second, third) = if(ALPHANUMERIC.findAllIn(codes).length != 0) "0010" else codes
if(second.equalsIgnoreCase("0010")) "NULL" else second
}
which should give you
println("cleaned value : " + evaluate("000123400"))
// cleaned value : 1234
println("alphanumeric : " + evaluate("0001A234"))
// alphanumeric : NULL
println("skipping : " + evaluate("0999"))
// skipping : 0999
I hope the answer is helpful

Need to separate out Letters from numbers in a string, via vb.net

I would like my code to find anywhere in the string that there is a number next to a letter and insert a space between the two.
I have several addresses that have no spaces and I need to insert spaces by separating out the numbers from the letters. For example:
123MainSt.Box123
Should be 123 Main St. Box 123
or
123Parkway134
should be: 123 Parkway 134
Here is where I started with my code but it was combing both numbers in the beginning....
Dim Digits() As String = Regex.Split(Address, "[^0-9]+")
'MsgBox(Digits.Count)
If Digits.Length > 2 Then
' For Each item As String In Digits
Dim Letters As String = Regex.Replace(Address, "(?:[0-9]+\.?[0-9]*|\.[0-9]+)", "")
rCell.Value = Digits(0) & Letters & Digits(1)
End If
If Digits.Length < 3 Then
If Address.Contains("th") Then
Else
Dim Part1 As String = Regex.Replace(Address, "[^0-9]+", "")
Dim Part2 As String = Regex.Replace(Address, "(?:[0-9]+\.?[0-9]*|\.[0-9]+)", "")
'MsgBox(Part1 & " " & Part2)
rCell.Value = Part1 & " " & Part2
End If
End If
I would like my code to find anywhere in the string that there is a number next to a letter and insert a space between the two.
The regex you may use is
Regex.Replace(input, "(?<=\d)(?=\p{L})|(?<=\p{L})(?=\d)", " ")
The first alternative - (?<=\d)(?=\p{L}) - matches the location between a digit and a letter, and the second alternative - (?<=\p{L})(?=\d) - matches a location between a letter and a digit.
Note that (?<=\p{L}) is a positive lookbehind that requires a letter before the current position and (?=\d) is a positive lookahead that requires a digit after the current position. These are lookarounds that do not consume text, thus, you replace empty spaces with (= insert) a space.
Here's a quick function:
Private Function AddSpaces(ByVal input As String) As String
If input.Length < 2 Then Return input
Dim ReturnValue As String = String.Empty
Dim CurrentChar, NextChar As String
For x As Integer = 1 To input.Length
CurrentChar = Mid(input, x, 1)
NextChar = Mid(input, x + 1, 1)
ReturnValue &= CurrentChar
If (IsNumeric(CurrentChar) AndAlso (Not IsNumeric(NextChar))) OrElse
((Not IsNumeric(CurrentChar)) AndAlso IsNumeric(NextChar)) Then
ReturnValue &= " "
End If
Next
Return ReturnValue
End Function

Dart how to add commas to a string number

I'm trying to adapt this:
Insert commas into number string
to work in dart, but no luck.
either one of these don't work:
print("1000200".replaceAllMapped(new RegExp(r'/(\d)(?=(\d{3})+$)'), (match m) => "${m},"));
print("1000300".replaceAll(new RegExp(r'/\d{1,3}(?=(\d{3})+(?!\d))/g'), (match m) => "$m,"));
Is there a simpler/working way to add commas to a string number?
You just forgot get first digits into group. Use this short one:
'12345kWh'.replaceAllMapped(RegExp(r'(\d{1,3})(?=(\d{3})+(?!\d))'), (Match m) => '${m[1]},')
Look at the readable version. In last part of expression I added checking to any not digit char including string end so you can use it with '12 Watt' too.
RegExp reg = RegExp(r'(\d{1,3})(?=(\d{3})+(?!\d))');
String Function(Match) mathFunc = (Match match) => '${match[1]},';
List<String> tests = [
'0',
'10',
'123',
'1230',
'12300',
'123040',
'12k',
'12 ',
];
for (String test in tests) {
String result = test.replaceAllMapped(reg, mathFunc);
print('$test -> $result');
}
It works perfectly:
0 -> 0
10 -> 10
123 -> 123
1230 -> 1,230
12300 -> 12,300
123040 -> 123,040
12k -> 12k
12 -> 12
import 'package:intl/intl.dart';
var f = NumberFormat("###,###.0#", "en_US");
print(f.format(int.parse("1000300")));
prints 1,000,300.0
check dart's NumberFormat here
The format is specified as a pattern using a subset of the ICU formatting patterns.
0 A single digit
# A single digit, omitted if the value is zero
. Decimal separator
- Minus sign
, Grouping separator
E Separates mantissa and expontent
+ - Before an exponent, to say it should be prefixed with a plus sign.
% - In prefix or suffix, multiply by 100 and show as percentage
‰ (\u2030) In prefix or suffix, multiply by 1000 and show as per mille
¤ (\u00A4) Currency sign, replaced by currency name
' Used to quote special characters
; Used to separate the positive and negative patterns (if both present)
Try the following regex: (\d{1,3})(?=(\d{3})+$)
This will provide two backreferences, and replacing your number using them like $1,$2, will add commas where they are supposed to be.
Let's take the example amount 12000. now our expected amount should be 12,000.00
so, the solution is
double rawAmount = 12000;
String amount = rawAmount.toStringAsFixed(2).replaceAllMapped(RegExp(r'(\d{1,3})(?=(\d{3})+(?!\d))'), (Match m) => '${m[1]},');
or if you don't want to add .00 then, we just need to use toString() instead of toStringAsFixed().
String amount = rawAmount.toString().replaceAllMapped(RegExp(r'(\d{1,3})(?=(\d{3})+(?!\d))'), (Match m) => '${m[1]},');
extension on int {
String get priceString {
final numberString = toString();
final numberDigits = List.from(numberString.split(''));
int index = numberDigits.length - 3;
while (index > 0) {
numberDigits.insert(index, ',');
index -= 3;
}
return numberDigits.join();
}
}
because in case of type double, the output will change based on the way, so check them.
If you need to format integer then any way works.
//1233.45677 => 1,233.4567
String num='1233.45677';
RegExp pattern = RegExp(r'(?<!\.\d*)(\d)(?=(?:\d{3})+(?:\.|$))');
String Function(Match) replace = (m) => '${m[1]},';
print(num..replaceAllMapped(pattern, replace));
//1233.45677 => 1,233.456,7
String num='1233.45677';
pattern = RegExp(r'(\d{1,3})(?=(\d{3})+(?!\d))');
String Function(Match) replace = (m) => '${m[1]},';
print(num..replaceAllMapped(pattern, replace));
//1233.45677 => 1,233.46
//after import intl package, to be able to use NumberFormat
String num='1233.45677';
var f = NumberFormat("###,###.0#", "en");
print(f.format(double.parse()));
if the number is in String type.
//in case of int data type
int.parse(num);
//in case of double data type
double.parse(num);

Using regex in Scala to group and pattern match

I need to process phone numbers using regex and group them by (country code) (area code) (number). The input format:
country code: between 1-3 digits
, area code: between 1-3 digits
, number: between 4-10 digits
Examples:
1 877 2638277
91-011-23413627
And then I need to print out the groups like this:
CC=91,AC=011,Number=23413627
This is what I have so far:
String s = readLine
val pattern = """([0-9]{1,3})[ -]([0-9]{1,3})[ -]([0-9]{4,10})""".r
val ret = pattern.findAllIn(s)
println("CC=" + ret.group(1) + "AC=" + ret.group(2) + "Number=" + ret.group(3));
The compiler said "empty iterator." I also tried:
val (cc,ac,n) = s
and that didn't work either. How to fix this?
The problem is with your pattern. I would recommend using some tool like RegexPal to test them. Put the pattern in the first text box and your provided examples in the second one. It will highlight the matched parts.
You added spaces between your groups and [ -] separators, and it was expecting spaces there. The correct pattern is:
val pattern = """([0-9]{1,3})[ -]([0-9]{1,3})[ -]([0-9]{4,10})""".r
Also if you want to explicitly get groups then you want to get a Match returned. For an example the findFirstMatchIn function returns the first optional Match or the findAllMatchIn returns a list of matches:
val allMatches = pattern.findAllMatchIn(s)
allMatches.foreach { m =>
println("CC=" + m.group(1) + "AC=" + m.group(2) + "Number=" + m.group(3))
}
val matched = pattern.findFirstMatchIn(s)
matched match {
case Some(m) =>
println("CC=" + m.group(1) + "AC=" + m.group(2) + "Number=" + m.group(3))
case None =>
println("There wasn't a match!")
}
I see you also tried extracting the string into variables. You have to use the Regex extractor in the following way:
val Pattern = """([0-9]{1,3})[ -]([0-9]{1,3})[ -]([0-9]{4,10})""".r
val Pattern(cc, ac, n) = s
println(s"CC=${cc}AC=${ac}Number=$n")
And if you want to handle errors:
s match {
case Pattern(cc, ac, n) =>
println(s"CC=${cc}AC=${ac}Number=$n")
case _ =>
println("No match!")
}
Also you can also take a look at string interpolation to make your strings easier to understand: s"..."