Regex regular expression to remove lines which start with certain text - regex

I know it may be quite easily for you.
I have a text which contains 40 lines, I want to remove lines which starts with a constant text.
Please check below data.
When I used (?mn)[\+CMGL:].*($) it removes the whole text , when I use (?mn)[\+CMGL:].*(\r) , it only leaves the first line.
+CMGL: 0,1,,159
07910201956905F0440B910201532762F20008709021225282808
+CMGL: 1,1,,159
07910201956905F0240B910201915589F7000860013222244480
+CMGL: 2,1,,151
07910201956905F0240B910201851177F6000850218122415
+CMGL: 3,1,,159
07910201956905F0440B910201532762F200087090311
+CMGL: 4,1,,159
07910221020020F0440B910221741514F40008802041120481808C050
I want to remove all lines that starts with +CMGL , and leave only other line.
Thanks...

Why do you need Regex for this? String.StartsWith was created for this purpose.
Dim result = lines.Where(Function(l) Not l.StartsWith("+CMGL")).ToList()
Edit: If you don't have "lines" but a text which contains NewLine-characters:
Dim result = text.Split({ControlChars.CrLf, ControlChars.Lf}, StringSplitOptions.None).
Where(Function(l) Not l.StartsWith("+CMGL")).ToList()
If you want it to be converted back to a string:
Dim text = String.Join(Environment.NewLine, result)

Related

Remove lines that is shorter than or equal 5 characters after the : using Notepad++

The question is like: Remove lines that is shorter than 5 characters before the # using Notepad++
But it differs a bit...
I have like that:
abc:123
abc:1234
abc:12345
PLEASE NOTE: abc is not on all the lines, it is just an example.
I want to remove the first line in the previous example because 123 which is after : is shorter than or not equal to 5 characters.
Any help would be appreciated.
Thanks!
Open Notepad++ find and replace choose regex mode in the search and place ^((?!.+:\d{5,}).)*$ in search and keep replace with blank and press replaceAll
^((?!.+:\d{5,}).)*$
Without knowing the language there is only so much help I can offer. I'll give you an example of how I would solve this problem in C#.
Start by creating a string for your updated file (without the short lines)
string content = "";
Read a line in from your file.
Then get a substring of the line you read in - the abc: portion and check the length.
line = line.substring(indexof(":"), length - indexof(":"))
if(line.length > 5)
{
content += line;
}
At the end, truncate your file and write content to it.

How to include 2 words within Regex and result must be based on only those 2 words VB.NET

I would like to know how to include only 2 or more keywords within a Regex. and ending results should only show those words defined, not only one word.
What I currently have works with multiple keywords but I want it to use BOTH words not either one of the other.
For example:
Dim pattern As String = "(?i)[\t ](?<w>((arma)|(crapo))[a-z0-9]*)[\t ]"
Now the code works fine by including 'arma' or 'crapo'. I only want it to include BOTH 'arma' AND 'crapo' otherwise do not show any results.
Dealing with finding certain keywords within a PDF document and I only want to be shown results if the PDF document includes BOTH 'arma' and 'crapo' (Works fine by showing results for 'arma' OR 'crapo' I want to see results based on 'arma' AND 'crapo'.
Sorry for sounding so repetitive.
Edit: Here is my code. Please read comment.
Dim filesz() As String = GetPatternedFiles("c:\temp\", New String() {"tes*.pdf", "fes*.pdf", "Bas*.pdf"})
'The getpatterenedfiles is a function" also gettextfromPDF is another function.
For Each s As String In filesz
Dim thetext As String = Nothing
Dim pattern As String = "(?i)[\t ](?<w>(crapo)|(arma)[a-z0-9]*)[\t ]"
thetext = GetTextFromPDF(s)
For Each m As Match In Regex.Matches(thetext, pattern)
ListBox1.Items.Add(s)
Next
Next
You can use this regex:
\barma\b.*?\bcrapo\b|\bcrapo\b.*?\barma\b
Working demo
The idea is to match arma whatever crapo or crapo whatever arma and use word boundaries to avoid words like karma.
However, if you want to match karma or crapotos as you asked in your comment you can use:
arma.*?crapo|crapo.*?arma

Writing a word macro to organize chat logs

I need some help writing a word macro to organize some chat logs. What I want is to eliminate repeated consecutive occurrences of names, regardless of timestamp. Besides this, each person will be using their own formatting style (font, font color, etc.). Edit: the raw logs have no formatting (i.e. specific fonts, font color ,etc.). I want the macro to automatically add a specific (already existent) word style to each user.
So, what I have is:
[12:40] Steve: this is an example text.
[12:41] Steve: this is another example text.
[12:41] Steve: this is yet another example text.
[12:45] Bob: some more text.
[12:46] Bob: even more text.
[12:47] Steve: yadda yadda yadda.
The expected output would be:
[12:40] Steve: *style1*this is an example text.
this is another example text.
this is yet another example text.*/style1*
[12:45] Bob: *style2*some more text.
even more text.*/style2*
[12:47] Steve: *style1*yadda yadda yadda.*style1*
As of now, unfortunately, I know next to nothing of VBA for Applications. I was thinking of maybe searching for the names by a regex pattern and assigning them to a variable, comparing each match to the previous and, if they're equal, deleting the latter. The problem is I'm not fluent in VBA, so I don't know how to do what I want.
So far, all I've got is this:
Sub Organize()
Dim re As RegExp
Dim names As MatchCollection, name As Match
re.Pattern = "\[[0-9]{2}:[0-9]{2}\] [a-zA-Z]{1,20}:"
re.IgnoreCase = True
re.Global = True
Set names = re.Execute(ActiveDocument.Range)
For Each name In names
'This is where I get lost
Next name
End Sub
So, in the interest of solving this problem and me learning some VBA, could I get some help?
EDIT: the question has been edited to better reflect what I want the macro to do.
Assuming that each line in your log is a separate paragraph I would do it without Regex but with .Find object feature. The following code is working find for the sample data you provided.
Sub qTest()
Dim PAR As Paragraph
Dim PrevName As String
For Each PAR In ActiveDocument.Content.Paragraphs
PAR.Range.Select 'highlight current paragraph
'find name in paragraph
With Selection.Find
.ClearFormatting
.Text = "\]*\:"
.Execute
End With
If Selection.Text = PrevName Then
'extend region for the whole paragraph
'end delete it
ActiveDocument.Range(PAR.Range.Start, Selection.End + 1).Delete
Else
PrevName = Selection.Text
Debug.Print PrevName
End If
Next
End Sub

Regex Matching and Deleting/Replacing a string

So I am trying to parse through a file which has multiple "footers" (the file is an output that was designed for printing which my company wants to keep electronically stored...each footer is a new page and the new page is no longer needed as).
I am trying to look for and remove lines that look like:
1 of 2122 PRINTED 07/01/2013 04:46 Page : 1 of 11
2 of 2122 PRINTED 07/01/2013 04:46 Page: 2 of 11
3 of 2122 PRINTED 07/01/2013 04:46 Page: 3 of 11
and so on
I then want to replace the final line (which would read something like "2122 of 2122") with a "custom" footer.
I am using RegEx, but am very new to using it so how should my RegEx look in order to accomplish this? I plan on using the RegEx "count" function to find out when I've found the last line and then do a .replace on it.
I am using VB .NET, but can translate C# if required. How can I accomplish what I'm looking to do? Specifically I only care about matching/removing of a match so long as the # of matches > 1.
Here's one I created with RegExr:
/^(\d+\s+of\s+\d+)(?=\s+printed)/gim
It matches (number)(space)('of')(space)(number) at the beginning of a line, and only if it is followed by (space)('printed'), case insensitive. The /m flag turns ^ and $ into line-aware boundaries.
This is how I ended up doing it...
Private Function FixFooters(ByVal fileInput As String, Optional ByVal numberToLeaveAlone As Integer = 1) As String
Dim matchpattern As String = "^\d+\W+of\W+\d+\W+PRINTED.*$"
Dim myRegEx As New Regex(matchpattern, RegexOptions.IgnoreCase Or RegexOptions.Multiline)
Dim replacementstring As String = String.Empty
Dim matchCounter As Integer = myRegEx.Matches(fileInput).Count
If numberToLeaveAlone > matchCounter Then numberToLeaveAlone = matchCounter
Return myRegEx.Replace(fileInput, replacementstring, matchCounter - numberToLeaveAlone, 0)
End Function
I used myregextester.com to get the inital matchpattern. Since I wanted to leave the last footer alone (to manipulate it further later on) I created the numberToLeaveAlone variable to ensure we don't remove ALL of the variables. For the purposes of this program I made the default value 1, but that could be changed to zero (I only did it for readability in the calling code as I know I will ALWAYS want to leave one...but I do like to reuse code). It's fairly fast, I'm sure there are better ways out there, but this one made the most sense to me.

regular expressions and vba

Does anyone know how to extract matches as strings from a RegExp.Execute() function?
Let me show you what I've gotten to so far:
Regex.Pattern = "^[^*]*[*]+"
Set myMatches = Regex.Execute(temp)
I want the object "myMatches" which is holding the matches, to be converted to a string. I know that there is only going to be one match per execution.
Does anyone know how to extract the matches from the object as Strings to be displayed lets say via a MsgBox?
Try this:
Dim sResult As String
'// Your expression code here...
sResult = myMatches.Item(0)
'// or
sResult = myMatches(0)
Msgbox("The matching text was: " & sResult)
The Execute method returns a match collection and you can use the item property to retrieve the text using an index.
As you stated you only ever have one match then the index is zero. If you have more than one match you can return the index of the match you require or loop over the entire collection.
This page has a lot of information on regex and seems to have what you want.
http://www.regular-expressions.info/vbscript.html