how to remove double characters and spaces from string - regex

Please let me how to remove double spaces and characters from below string.
String = Test----$$$$19****45#### Nothing
Clean String = Test-$19*45# Nothing
I have used regex "\s+" but it just removing the double spaces and I have tried other patterns of regex but it is too complex... please help me.
I am using vb.net

What you'll want to do is create a backreference to any character, and then remove the following characters that match that backreference. It's usually possible using the pattern (.)\1+, which should be replaced with just that backreference (once). It depends on the programming language how it's exactly done.
Dim text As String = "Test###_&aa&&&"
Dim result As String = New Regex("(.)\1+").Replace(text, "$1")
result will now contain Test#_&a&. Alternatively, you can use a lookaround to not remove that backreference in the first place:
Dim text As String = "Test###_&aa&&&"
Dim result As String = New Regex("(?<=(.))\1+").Replace(text, "")
Edit: included examples

For a faster alternative try:
Dim text As String = "Test###_&aa&&&"
Dim sb As New StringBuilder(text.Length)
Dim lastChar As Char
For Each c As Char In text
If c <> lastChar Then
sb.Append(c)
lastChar = c
End If
Next
Console.WriteLine(sb.ToString())

Here is a perl way to substitute all multiple non word chars by only one:
my $String = 'Test----$$$$19****45#### Nothing';
$String =~ s/(\W)\1+/$1/g;
print $String;
output:
Test-$19*45# Nothing

Here's how it would look in Java...
String raw = "Test----$$$$19****45#### Nothing";
String cleaned = raw.replaceAll("(.)\\1+", "$1");
System.out.println(raw);
System.out.println(cleaned);
prints
Test----$$$$19****45#### Nothing
Test-$19*45# Nothing

Related

Replace 2 step Regex with 1 step Regex to get one upper case letter between underscores

I have a string, myFile, that looks like: Name_2019-11-29_D_HPSeries.txt. I need to extract the letter D between the underscores...the letter could be any uppercase letter. Right now I am using a 2 step Regex code.
Dim bC As String = Regex.Match(myFile, "_[A-Z]+_").ToString
boatClass = Regex.Match(bC, "[A-Z]+").ToString
This works but I believe it could be done with one line. I tried the code below but it doesn't work.
boatClass = Regex.Replace(myFile, "_[A-Z]_", "[A-Z]").ToString
You can use positive lookarounds to avoid a 2-step process, checking that the characters before and after the letter are underscores without capturing them:
Dim myFile AS String = "Name_2019-11-29_D_HPSeries.txt"
Dim bC As String = Regex.Match(myFile, "(?<=_)[A-Z](?=_)").ToString
Console.WriteLine(bc)
Output:
D
You were almost there with a single char A-Z, but you could wrap it in a capturing group and then use the Match.Groups property.
_([A-Z])_
Regex demo | VB.Net Demo
For example
Dim myFile AS String = "Name_2019-11-29_D_HPSeries.txt"
Dim bC As String = Regex.Match(myFile, "_([A-Z])_").Groups(1).Value
Console.WriteLine(bc)
Result
D

Regex VB.Net Regex.Replace

I'm trying to perform a simple regex find and replace, adding a tab into the string after some digits as outlined below.
From
a/users/12345/badges
To
a/users/12345 /badges
I'm using the following:
s = regex.replace(s, "(a\/users\/\d*)("a\/users\/\d*\t)", $1 $2")
But im clearly doing something wrong.
Where am I going wrong, I know its a stupid mistake but help would be gratefully received.
VBVirg
You can achieve that with a mere look-ahead that will find the position right before the last /:
Dim s As String = Regex.Replace("a/users/12345/badges", "(?=/[^/]*$)", vbTab)
Output:
a/users/12345 /badges
See IDEONE demo
Or, you can just use LastIndexOf owith Insert:
Dim str2 As String
Dim str As String = "a/users/12345/badges"
Dim idx = str.LastIndexOf("/")
If idx > 0 Then
str2 = str.Insert(idx, vbTab)
End If
When I read, "adding a tab into the string after some digits" I think there could be more than one set of digits that can appear between forward slashes. This pattern:
"/(\d+)/"
Will capture only digits that are between forward slashes and will allow you to insert a tab like so:
Imports System.Text.RegularExpressions
Module Module1
Sub Main()
Dim str As String = "a/54321/us123ers/12345/badges"
str = Regex.Replace(str, "/(\d+)/", String.Format("/$1{0}/", vbTab))
Console.WriteLine(str)
Console.ReadLine()
End Sub
End Module
Results (NOTE: The tab spaces can vary in length):
a/54321 /us123ers/12345 /badges
When String is "a/54321/users/12345/badges" results are:
a/54321 /users/12345 /badges

Removing data from string using regular expressions in C Sharp

Definitely I'm not good using regular expressions but are really cool!, Now I want to be able to get only the name "table" in this string:
[schema].[table]
I want to remove the schema name, the square brackets and the dot.
so I will get only the work table
I tried this:
string output = Regex.Replace(reader["Name"].ToString(), #"[\[\.\]]", "");
So you came up with a new idea?? Here is what you can try:
string input = "[schema].[table]";
// replacing the first thing into [] with the dot with empty
string one = Regex.Replace(input, #"^\[.*?\]\.", "");
// or replacing anything before the dot with empty
// string two = Regex.Replace(input, #".*[.]", "");
try this
string strRegex = #"^\[.*?\]\.";
Regex myRegex = new Regex(strRegex, RegexOptions.None);
string strTargetString = #"[schema].[table]";
string strReplace = #"";
var result=myRegex.Replace(strTargetString, strReplace);
Console.WriteLine(result);
Why do you want to do replace if you just want to extract part of string?
string table = Regex.Match("[schema].[table]", #"\w+(?=]$)").Value;
It works even in case if you don't have schema.

Get/split text inside brackets/parentheses

Just have a list of words, such as:
gram (g)
kilogram (kg)
pound (lb)
just wondering how I would get the words within the brackets for example get the "g" in "gram (g)" and dim it as a new string.
Possibly using regex?
Thanks.
Use split function ..
strArr = str.Split("(") ' splitting 'gram (g)' returns an array ["gram " , "g)"] index 0 and 1
strArr2 = strArr[1].Split(")") ' splitting 'g)' returns an array ["g " ..]
the string is in
strArr2[0]
Edit
you want getAbbrev and getAbbrev2 to be arrays
try
Dim getAbbrev As String() = Str.Split("(")
Dim getAbbrev2 as String() = getAbbrev[1].Split(")")
To do it without declaring arrays you can do
"gram (g)".Split("(")[1].Split(")")[0]
but that's unreadable
Edit
You have some very trivial errors. I would suggest you strengthen your understanding on objects and declarations first. Then you can look into invoking methods. I rather have you understand it than give it to you. Re-read the book you have or look for a basic tutorial.
Dim unit As String = 'make sure this is the actual string you are getting, not sure where you are supposed to get the string value from => ie grams (g)
Dim getAbbrev As String() = unit.Split("(") 'use unit not Str - Str does not exist
Dim getAbbrev2 As String() = getAbbrev[1].Split(")") 'As no as - case sensitive
for the last line reference getAbbrev2 instead of the unknown abbrev2
Fun with Regular Expressions (I'm really not an expert here, but tested and works)
Imports System.Text.RegularExpressions
.....
Dim charsToTrim() As Char = { "("c, ")"c }
Dim test as String = "gram (g)" + Environment.NewLine +
"kilogram (kg)" + Environment.NewLine +
"pound (lb)"
Dim pattern as String = "\([a-zA-Z0-9]*\)"
Dim r As Regex = new Regex(pattern, RegexOptions.IgnoreCase)
Dim m As Match = r.Match(test)
While(m.Success)
System.Diagnostics.Debug.WriteLine("Match" + "=" + m.Value.ToString())
Dim tempText as String = m.Value.ToString().Trim(charsToTrim)
System.Diagnostics.Debug.WriteLine("String Trimmed" + "=" + tempText)
m = m.NextMatch()
End While
You can split at the space and remove the parens from the second token (by replacing them with an empty string).
A regex is also an option, and is very simple, its pattern is
\w+\s+\((\w+)\)
Which means, a word, then at least one space, then opening parens, then in real regex parens you search for a word, and, eventually a closing paren. The inner parentheses are capturing parentheses, which make it possible to refer to the unit g, kg, lb.

Regular expression to extract numbers from long string containing lots of punctuation

I am trying to separate numbers from a string which includes %,/,etc for eg (%2459348?:, or :2434545/%). How can I separate it, in VB.net
you want only the numbers right?
then you could do it like this
Dim theString As String = "/79465*44498%464"
Dim ret = Regex.Replace(theString, "[^0-9]", String.Empty)
hth
edit:
or do you want to split by all non number chars?
then it would go like this
Dim ret = Regex.Split(theString, "[^0-9]")
You could loop through each character of the string and check the .IsNumber() on it.
This should do:
Dim test As String = "%2459348?:"
Dim match As Match = Regex.Match(test, "\d+")
If match.Success Then
Dim result As String = match.Value
' Do something with result
End If
Result = 2459348
Here's a function which will extract all of the numbers out of a string.
Public Function GetNumbers(ByVal str as String) As String
Dim builder As New StringBuilder()
For Each c in str
If Char.IsNumber(c) Then
builder.Append(c)
End If
Next
return builder.ToString()
End Function