I'm a bit new to regex and am looking to search for multiple lines/instaces of some wildcard strings such as *8768, *9875, *2353.
I would like to pull all instances of these (within one file) rather than searching them individually.
Any help is greatly appreciated. I've tried things such as *8768,*9875 etc...
If I understand what you are asking, it is a regular expression like this:
^(8768|9875|2353)
This matches the three sets of digit strings at beginning of line only.
To get the lines that contain the texts 8768, 9875 or 2353, use:
^.*(8768|9875|2353).*$
What it means:
^ from the beginning of the line
.* get any character except \n (0 or more times)
(8768|9875|2353) if the line contains the string '8768' OR '9875' OR '2353'
.* and get any character except \n (0 or more times)
$ until the end of the line
If you do want the literal * char, you'd have to escape it:
^.*(\*8768|\*9875|\*2353).*$
I suggest much better solution. Task in my case: add http://google.com/ path before each record and import multiple fields.
CSV single field value (all images just have filenames, separate by |):
"123.jpg|345.jpg|567.jpg"
Tamper 1st plugin: find and replace by REGEXP:
pattern: /([a-zA-Z0-9]*)./
replacement: http://google.com/$1
Tamper 2nd plugin: explode
setting: explode by |
In this case you don't need any additinal fields mappings and can use 1 field in CSV
Related
I want to match two lines like the following using a Regular Expression:-
abcmnoxyz
=========
The first line is essentially random, the second line will be all the same character of a limited number of possibles (=, - and maybe a couple more). The lines can probably be required to be the same length but it would be nice if they didn't have to be. It would be OK to have multiple REs, one for each possible 'underline' character.
Can anyone come up with a way to do this?
This regex should do what you're trying to do :
regex = "(.*)\n(.)\2{2,}$"
group 1 will give you the line before the repeated linet
Live demo here
EXPLANATION
(.*)\n: match anything followed by a new line
(.)\2{2,} : capture something then check if its followed by same character 2+ more no. of times. You don't need to worry about which character is repeated.
In case you've a set of characters that can be repeated you can put a character set like this : [=-] instead of dot (.)
Use Grep's -B Flag
Matching with Alternation
Given your example, you can use extended regular expressions with alternations and a range operator. The -B flag tells grep how many lines before the match to include in the output.
$ grep -E -B1 '^(={5,}|-{5,})$' sample.txt
abcmnoxyz
=========
You can add alternations for additional characters if you want, although boundary markers ought to be as consistent as you can make them. You can also adjust the minimum number of sequential characters required for a match to suit your needs. I used a five-character range in the example because that's what was posted as the criterion in your original topic sentence, and because a shorter boundary marker is more likely to accidentally match truly random text.
Matching with a Character Class
Also, note that the following does the same job, but is a bit more concise. It uses a character class and a backreference to avoid alternations, which can get messy if you add many more boundary characters. Both versions are equally effective at matching your example.
$ grep -E -B1 '^([=-])\1{4,}$'
abcmnoxyz
========
A regex like this
^([^=\v]+)\v=+$
will do. Check it out at example 1
Explanation:
^([^=\v]+) # 1 or more matches of anything that is not a '=' or vertical space \v
\v=+$ # match a vertical space followed by 1 or more '='
If you want to extend this to more characters like '-' you could do this:
^([^=\-\v]+)\v(-|=)\2+$
Look at example 2
And, thanks to Ashish Ranjan, suppose you wanted to have = and/or - on the first line, use something like this:
^(.+)\v(-|=)\2+$
which would even allow you to have a first line like "=====". Having my doubts if OP had this in mind, though. Look at example 3
Hope this works
^([a-z]{1,})\n([=-]{1,})
\n and \r you have try both based on file format (unix or dos)
\1 will give you first line
\2 will give you second line
If the file contains same pattern over the text, then it might give you lot occurrence.
This answer is irrespective of number of characters in one line.
Ex: Tester
I have a database table that I have exported. I need to replace the image file name with a space and would like to use notepad++ and regex to do so. I have:
'data/green tea powder.jpg'
'data/prod_img/lumina herbal shampoo.JPG'
'data/ALL GREEN HERBS.jpeg'
'data/prod_img/PSORIASIS KIT (640x530) (2).jpg'
and need to make them look like this:
'data/green_tea_powder.jpg'
'data/prod_img/lumina_herbal_shampoo.JPG'
'data/ALL_GREEN_HERBS.jpeg'
'data/prod_img/PSORIASIS_KIT_(640x530)_(2).jpg'
I just want to change the spaces between the quotes (I don't want to change the capitalization). To be more specific I would like to replace any and all spaces between 'data/ and ' because there are other spaces between quotes in the DB, for example:
'data/ REPLACE ANY SPACE HERE '
I found this:
\s(?!(?:[^']*'[^']*')*[^']*$)
but there are other places where there are spaces between quotes so I'd like to search for data/ in the beging and not just a single quote but I can't figure out how. I tried \s(?!(?:[^'data\/]*'[^']*')*[^']*$) but it didn't work and I am not familiar enough with regex to make it do so.
An example of a full line from the database is:
(712, 'GRTE-P', '', 'data/green tea powder.jpg', '2014-03-12 22:52:03'),
I don't want to replace the spaces in the time and data stamp at the end of the line, just the image file names.
Thanks in advance for your help!
You have to use a \G based pattern to ensure that matches are contiguous.
search: (?:\G(?!^)|'data/)[^' ]*\K[ ]replace: _
The first match uses the second branch of the alternation, then the next matches are contiguous and use the first branch.
I don't know anything about Notepad++ Regex.
This is the data I have in my CSV:
6454345|User1-2ds3|62562012032|324|148|9c1fe63ccd3ab234892beaf71f022be2e06b6cd1
3305611|User2-42g563dgsdbf|22023001345|0|0|c36dedfa12634e33ca8bc0ef4703c92b73d9c433
8749412|User3-9|xgs|f|98906504456|1534|51564|411b0fdf54fe29745897288c6ad699f7be30f389
How can I use a Regex to remove the 5th and 6th column? The numbers in the 5th and 6th column are variable in length.
Another problem is the User row can also contain a |, to make it even worse.
I can use a macro to fix this, but the file is a few millions lines long.
This is the final result I want to achieve:
6454345|User1-2ds3|62562012032|9c1fe63ccd3ab234892beaf71f022be2e06b6cd1
3305611|User2-42g563dgsdbf|22023001345|c36dedfa12634e33ca8bc0ef4703c92b73d9c433
8749412|User3-9|xgs|f|98906504456|411b0fdf54fe29745897288c6ad699f7be30f389
I am open for suggestions on how to do this with another program, command line utility, either Linux or Windows.
Match \|[^|]+\|[^|]+(\|[^|]+$)
Repalce $1
Basically, Anchor to the end of the line, and remove columns [-1] and [-2] (I assume columns can't be empty. Replace + with * if they can)
If you need finer detail then that, I'd recommend writing a Java or Python script to manual parse and rewrite the file for you.
I've captured three groups and given them names. If you use a replace utility like sed or vimregex, you can replace remove with nothing. Or you can use a programming language to concatenate keep_before and keep_after for the desired result.
^(?<keep_before>(?:[^|]+\|){3})(?<remove>(?:[^|]+\|){2})(?<keep_after>.*)$
You may have to remove the group namings and use \1 etc. instead, depending on what environment you use.
Demo
From Notepad++ hit ctrl + h then enter the following in the dialog:
Find what: \|\d+\|\d+(\|[0-9a-z]+)$
Replace with: $1
Search mode: Regular Expression
Click replace and done.
Regex Explain:
\|\d+ : match 1st string that starts with | followed by number
\|\d+ : match 2nd string that starts with | followed by number
(\|[0-9a-z]+): match and capture the string after the 2nd number.
$ : This is will force regex search to match the end of the string.
Replacement:
$1 : replace the found string with whatever we have between the captured group which is whatever we have between the parentheses (\|[0-9a-z]+)
I am trying to extract a report of all incidents matching a certain pattern and then need to plot how many occurances of each type. For example the below lines.
File: ../../../transfer/200.FILETYPE1.0000003115.20160419-082708-089.xml successfully imported.
some other logs....
File: ../../../transfer/200.FILETYPE1.0000003116.20160419-082708-090.xml successfully imported.
some other logs...
File: ../../../transfer/201.FILETYPE2.0000003117.20160419-082708-091.xml successfully imported.
Please note that there are many filetypes but the pattern is same "/transfer/" prefix and "successfully imported." suffix and these prefix and suffix must match as other lines may also contain same file name before completion.
So in above case I need to find all such occurrences of above lines and find count of each FILETYPE1 and FILETYPE2 in splunk.
Can someone help me with regex that can match above pattern and give me all such lines so that I can extract counts of each file type?
Straight forward:
^File:.*FILETYPE\d.*$
# ^ beginning of the line
# File: literally
# .* anything to the end of the line
# FILETYPE + a number literally
# .* anything afterwards
# $ the end of the line
See a demo on regex101.com.
Hint: If you only have these two strings (FILETYPE1 and FILETYPE2) you might be faster with string functions only.
Edit FILETYPE1/FILETYPE2 for counting
\.\.\/.*\/\d+\.FILETYPE1\..*?\.xml
Regex demo
Try this one:
\/transfer\/.*FILETYPE(\d+).*successfully imported
The file type number will be captured by the capture group, so you can count the file occurrences
Regex Demo
Im trying to create a single regex pattern to match a string where 2 fields (separated by a comma) could either be
a) empty,
b) a single word, or
c) 2 words separated by a backslash (\).
This is a log file where position 1 is a source username field and position 2 is a destination user field, but both could be separated with a backslash if domain name is present (domain\username)
I've tried everything I can think of and can get 2 out of 3 to match, but not all conditions. Below are the possible variants that this string could be in. (something1 and something2 are known patterns that occur before and after this condition)
something1,,,something2
something1,,dstuser,something2
something1,,dstdomain\dstuser,something2
something1,srcdomain\srcuser,,something2
something1,srcdomain\srcuser,dstdomain\dstuser,something2
something1,srcuser,dstdomain\dstuser,something2
something1,srcuser,dstuser,something2
something1,srcuser,,something2
something1,srcdomain\srcuser,dstuser,something2
something1,srcdomain\srcuser,dstdomain\dstuser,something2
For example, I've tried this:
^.*something1,(,|(?J)(?<src_username>[^\\]*),|(?<src_domain>.*?)\\(?<src_username>[^\\]*),).*?,something2*
this matches some of the time, but I'm curious if this is possible with a single line of regex.
Thanks in advance....
I think you are looking for this regex:
(?J)^.*something1,(?:,|(?<src_username>[^,\\]+),|(?<src_domain>[^,\\]+)\\(?<src_username>[^,\\]+),)(?:,|(?<dst_user>[^\\,]+),|(?<dst_domain>[^,\\]+)\\(?<dst_username>[^,\\]*),)something2.*
Check the demo
I am using negated character class [^,\\] extensively to not overmatch and stay in the boundaries of a "cell". Also, I make use of (?:...) non-capturing groups to not make a mess with the captured groups and helps keep the output clean.