How remove 1st ":" word from line in txt file? - regex

Please see my textfile data below
I am want delete first word until :
actually I am want if sumonkhan starting line then no problem but if sumonkhan line area 1st position available : with something then need remove this.
below actually data show in my .txt file
all line available sumonkhan so if sumon khan starting position like this then good else delete this : full word not full line.

I hope this regex would help you. This regex deletes everything until first colon(:).
If you are reading a file then, read it line by line and run following regex on each line.
$str = 'roydwk27:teenaibuchytilibu5762sumonkhan:IJQRiq&76:8801627574057';
$str =~ s/^(?:.*?):(.*)/$1/g;
This code is in perl, you can re-write equivalent code in any other language.

See this demo at
^ // match the beginning of a line
[\w\d]+ // match any letter and any number
: // match ":" literally
( // start of the capturing group
.* // match any characters
) // end of capturing group
Now in all your matches in the first group you have the text you want matched. Note the g (global) and m (multiline) modifiers.


Remove duplicate lines containing same starting text

So I have a massive list of numbers where all lines contain the same format.
What I am trying to do is remove all duplicate lines that contain the same hex codes, regardless of the text after it.
Example, in the first line #976B4B|B|0|0 the hex #976B4B shows up in line 32 as #976B4B|B|0|31. I want all lines EXCEPT the first occurrence to be removed.
I have been attempting to use regex to solve this, and found ^(.*)(\r?\n\1)+$ $1 can remove duplicate lines but obviously not what I need. Looking for some guidance and maybe a possibility to learn from this.
You can use the following regex replacement, make sure you click Replace All as many times as necessary, until no match is found:
Find What: ^((#[[:xdigit:]]+)\|.*(?:\R.+)*?)\R\2\|.*
Replace With: $1
See the regex demo and the demo screenshot:
^ - start of a line
((#[[:xdigit:]]+)\|.*(?:\R.+)*?) - Group 1 ($1, it will be kept):
(#[[:xdigit:]]+) - Group 2: # and one or more hex chars
\| - a | char
.* - the rest of the line
(?:\R.+)*? - any zero or more non-empty lines (if they can be empty, replace .+ with .*)
\R\2\|.* - a line break, Group 2 value, | and the rest of the line.

i need help in regex

so i have (matlab) code .. and of the lines doesnt have (;) after the line
i want to find that line
for a starter :
sad= sdfsdf ; %this is comment
sad = awaww ;
n= sdfdsfd ;
m = (asd + adsf(asd,asd)) %this is comment
lets say i want to find the 4th line because it doesnt have (;) at the end of line ..
so far im stuck at this :
/(^[-a-zA-Z0-9]+\s*=[-a-zA-Z0-9#:%,_\+.()~#?&//= ]+)(?!;)$/gim
so this will work fine.. it will find the fourth line only
but what if i wanted (;) in middle of the line but not at end or before the comment .. ?
w=sss (;)aaa **;** % i dont want this line to be selected
w=sss (;)aaa %i want this line to be selected
Well, let's find all lines which end with a semicolon:
optionally followed by horizontal whitespace:
^.+?;[ \t]*
and an optional comment:
^.+?;[ \t]*(?:%.*)?
This expression easily matches all the lines you don't want. So, inverse it:
^(?!.+?;[ \t]*(?:%.*)?$).+
Unfortunately, that's too easy. It fails to match lines which contain a semicolon in a comment. We could replace .+? with [^%\r\n]+? but this would fail on lines containing a % in a string.
If you need a more robust pattern, you'll have to account for all of this.
So let's start the same way, by defining what a "correct" line should look like. I'll use the PCRE syntax for atomic grouping, so you'll have to use perl = TRUE.
A string is: '(?>[^']+|'')*'
Other code (except string, comments and semicolons) is covered by: [^%';\r\n]+
So "normal" code is:
Then, we add the required semicolon and optional comment:
(?>[^%';\r\n]+|'(?>[^']+|'')*'|;)+?;[ \t]*(?:%.*)?$
Finally, we invert all of this:
^(?!(?>[^%';\r\n]+|'(?>[^']+|'')*'|;)+?;[ \t]*(?:%.*)?$).+
And we have the final pattern. Demo.
You don't need to fully tokenize the input, you only have to recognize the different "lexer modes". I hope handling strings and comments is enough, but I didn't check the Matlab syntax thoroughly.
You could use this with other regex engines that do not support atomic groups by replacing (?> with (?: but you'll expose yourself to the catastrophic backtracking problem.

Regular expression to get only the first word from each line

I have a text file
#sp_id int,
#sp_name varchar(120),
#sp_gender varchar(10),
#sp_date_of_birth varchar(10),
#sp_address varchar(120),
#sp_is_active int,
#sp_role int
Here, I want to get only the first word from each line. How can I do this? The spaces between the words may be space or tab etc.
Here is what I suggest:
Find what: ^([^ \t]+).*
Replace with: $1
Explanation: ^ matches the start of line, ([^ \t]+) matches 1 or more (due to +) characters other than space and tab (due to [^ \t]), and then any number of characters up to the end of the line with .*.
See settings:
In case you might have leading whitespace, you might want to use
^\s*([^ \t]+).*
I did something similar with this:
with open('handles.txt', 'r') as handles:
handlelist = [line.rstrip('\n') for line in handles]
newlist = [str(re.findall("\w+", line)[0]) for line in handlelist]
This gets a list containing all the lines in the document,
then it changes each line to a string and uses regex to extract the first word (ignoring white spaces)
My file (handles.txt) contained info like this:
JoIyke - personal twitter link;
newMan - another twitter handle;
yourlink - yet another one.
The code will return this list:
[JoIyke, newMan, yourlink]
Find What: ^(\S+).*$
Replace by : \1
You can simply use this to get the first word.Here we are capturing the first word in a group and replace the while line by the captured group.
Find the first word of each line with /^\w+/gm.

Regular Expression: Extract the lines

I try to extract the name1 (first-row), name2 (second-row), name3 (third-row) and the street-name (last-row) with regex:
Company Inc.
Industrieterrein 13
The very last row is the street name and this part is already working (the text is stored in the variable "S2").
REGEXREPLACE(S2, "(.*\n)+(?!(.*\n))", "")
This expression will return me the very last line. I am also able the extract the first row:
REGEXREPLACE(S2, "(\n.*)", "")
My problem is, that I do not know how to extract the second and third row....
Also how do I test if the text contains one, two, three or more rows?
The regex is used in the context of Scribe (a ETL tool). The problem is I can not execute sourcecode, I only have the following functions:
REGEXMATCH(input, pattern)
REGEXREPLACE(input, pattern, replacement)
If the regex language provides support for lookaheads you may count rows backwards and thus get (assuming . does not match newline)
(.*)$ # matching the last line
(.*)(?=(\n.*){1}$) # matching the second last line (excl. newline)
(.*)(?=(\n.*){2}$) # matching the third last line (excl. newline)
just use this regex:
Wildcard: Matches any single character except \n.
Matches the previous element one or more times.
As for a regular expression that will match each of four rows, how about this:
The parentheses will match, and the \n will match a new line. Note: you may have to use \r\n instead of just \n depending; try both.
You can try the following:

Regex to match all lines starting with a specific string

I have this very long cfg file, where I need to find the latest occurrence of a line starting with a specific string. An example of the cfg file:
# format: - search.index.[number] = [search field]:element.qualifier
search.index.1 = author:dc.contributor.*
search.index.12 = language:dc.language.iso
... = ANY
I need to be able to get the last occurrence of the line starting with search.index.[number] , more specific: I need that number. For the above snippet, that number would be 12.
As you can see, there are other lines too containing that pattern, but I do not want to match those.
I'm using Groovy as a programming/scripting language.
Any help is appreciated!
Have you tried:
def m = lines =~ /(?m)^search\.index\.(\d+)/
m[ -1 ][ 1 ]
Try this as your expression :
And then with Groovy you can get your result with:
Here is an explanation page.
I don't think you should go for it but...
If you can do a multi-line search (anyway you have to here), the only way would be to read the file backward. So first, eat everything with a .* (om nom nom)(if you can make the dot match all, (?:.|\s)* if you can't). Now match your pattern search\.index\.(\d+). And you want to match this pattern at the beginning of a line: (?:^|\n) (hoping you're not using some crazy format that doesn't use \n as new line character).
The number should be in the 1st matching group. (Test in JavaScript)
PS: I don't know groovy, so sorry if it's totally not appropriate.
This should also work: