Using regex to extract string after certain number of characters

Using regex to extract string after certain number of characters - regex

How do I extract the string after certain number of characters?
Eg. 0 HOPOPT IPv6 Hop-by-Hop Option Y [RFC2460]
I want to select the highlighted part which is every character after 57th character. I need to make a replace in Sublime editor and have to specifically select the highlighted portion, how can I do that?

This should work:
.{57}(.*)
Everything after the 57th character will be captured in group 1.
Alternatively, if your platform supports it, you can use a lookbehind:
(?<=.{57}).*
Using sublime, simply press Ctrl+F to bring up the find options, select the regex option (.*) and enter in one of the above regular expressions. Here's a screenshot (with slightly modified regular expression, since what you say you want to hightlight actually appears to begin at the 32nd character.)

Related

Select specific text in specific column of CSV with variable text length

I'm working on a CSV file in Notepad++, and I need to match a specific occurance of a set number of characters in the text. Example data:
19256506_1.MSG,19256506,1.MSG,RE: New Consent Language,
19256505_1.MSG,19256505,1.DOCX,RE: New Consent Language,
19256433_1.MSG,19256433,1.MSG,RE: New Consent Language,
What I need to select is the file extensions in the 3rd row, leaving only the number. The problem is, it could either be .MSG, .DOCX, .PDF, etc. Basically, I need to select anything in the 3rd column after and including the ., but up to and excluding the next ,.
How can I match this using regex?

You could use
^[^,]*,[^,]*,[^.,]*(\.[^,]+)
Use the first group, see a demo on regex101.com.

Press ctrl + h (or Search > Replace) and try replacing:
(?s)^(?:[^,]*,){2}[^,]*(\.[^,]+).*?$
With
\1
Make sure Search Mode is set to Regular Expression
Hit Replace All
Demo

Find and Replace with Regex in Microsoft Word 2013

I am editing an e-book document with a lot of unnecessary markup. I have a number of sections in the text with code similar to this:
<i>Some text here</i>
I am trying to run a regex find and replace that will find any phrase between the two i-tags, remove the i-tags, and apply a style to the text.
Here is what I'm using to search:
Find: (<i>)(*)(</i>)
Replace: \2
I'm also selecting Styles > i (for italic). This tells our conversion software to apply italics to the text. If I leave the i-tags, what ends up happening is ScribeNet's conversion process converts them to hex-values so that they show up as literal text in the e-book. Messy.
When I run this search, I get no results. I have "use wildcards" checked. What am I missing? According to Microsoft's help website, * is used to represent any number or type of characters, and individual strings are supposed to be enclosed in parentheses.

To search for a character that's defined as a wildcard, place a backslash (\) before that character. The * itself matches any string of characters, so use the range quantifier to match (1 or more times)
Find: \<i\>(*{1,})\</i\>
Replace: \1

Search for \<i\>(*{1,})\</i\> and replace with \1. Don't forget to check Use wildcard.
There is a reference table for Word's "regular expressions" here: http://office.microsoft.com/en-ca/word-help/find-and-replace-text-by-using-regular-expressions-advanced-HA102350661.aspx
< and > are special characters that need to be escaped
* means any character
{1,} means one or more times

There is a special tool for Microsoft Word called Multiple Find & Replace (see http://www.translatortools.net/products/transtoolsplus/word-multiplefindreplace) which allows to work around Word's wildcard limitations. This tool can use the standard regular expressions syntax to search and replace any text within a Word document. For example, to search for any HTML tags, you can just use <[^>]+> which will find opening, closing and standalone HTML tags. You can add any number of expressions to a list and then search the document for all of them, replace everything, see all matches for all the search expressions entered, replace only selected matches, and a few more things.
I created it for translators and editors, but it is great for any advanced search/replace operations in Word, and I am sure you will find it very useful.
Stanislav

Remove everything before and after variable=int

I'm terrible at regex and need to remove everything from a large portion of text except for a certain variable declaration that occurs numerous times, id like to remove everything except for instances of mc_gross=anyint.

Generally we'd need to use "negative lookarounds" to find everything but a specified string. But these are fairly inefficient (although that's probably of little concern to you in this instance), and lookaround is not supported by all regex engines (not sure about notepad++, and even then probably depends on the version you're using).
If you're interested in learning about that approach, refer to How to negate specific word in regex?
But regardless, since you are using notepad++, I'd recommend selecting your target, then inverting the selection.
This will select each instance, allowing for optional white space either side of the '=' sign.
mc_gross\s*=\s*\d+
The following answer over on super user explains how to use bookmarks in notepad++ to achieve the "inverse selection":
https://superuser.com/questions/290247/how-to-delete-all-line-except-lines-containing-a-word-i-need
Substitute the regex they're using over there, with the one above.

You could do a regular expression replace of ^.*\b(mc_gross\s*=\s*\d+)\b.*$ with \1. That will remove everything other than the wanted text on each line. Note that on lines where the wanted text occurs two or more times, only one occurrence will be retained. In the search the ^.*\b matches from start-of-line to a word boundary before the wanted text; the \b.*$ matches everything from a word boundary after the wanted text until end of line; the round brackets capture the wanted text for the replacement text. If text such as abcmc_gross=13def should be matched and retained as mc_gross=13 then delete the \bs from the search.
To remove unwanted lines do a regular expression search for ^mc_gross\s*=\s*\d+$ from the Mark tab, tick Bookmark line and click Mark all. Then use Menu => Search => Bookmark => Remove unmarked lines.

Find what: [\s\S]*?(mc_gross=\d+|\Z)
Replace with: \1
Position the cursor at the start of the text then Replace All.
Add word boundaries \b around mc_gross=\d+ if you think it's necessary.

Including Regular Expressions in AutoHotKey Script

I am currently developing a very "simple" script in AutoHotKey, but it involves using hotstrings following the format:
::btw::by the way
which would detect whenever a user types "btw" and replace it with "by the way".
However, whenever I try to put a regular expression in between the colons, it interprets it literally. Is there any way to use regular expressions with hotstrings? Workarounds are accepted.

Hotstrings don't natively support RegEx,
but there is RegEx Powered Dynamic Hotstrings which I've never tried.
Your other option is a Loop with the Input command inside of it.
That would require an end character, such as space.
Then you would have the script analyze what the Input command returns with RegExReplace.
Place the number in the regular expression in a capturing group and use it as a back-reference in the replacement. But unless the pattern always has the digit in the same place I think it would require two steps (with RegExMatch) as shown in this working example:
loop
{
Input, retrieved, V, {space}
RegExMatch(retrieved, "[a-zA-Z0-9]{6}", match)
RegExMatch(match, "\d", output)
If (output != "")
Sendinput, {bs 7}%output%
}
Type any sequence of six with five letters and one digit,
press space and it will replace the sequence with only the number.

Regular expression question

I have some text like this:
dagGeneralCodes$_ctl1$_ctl0
Some text
dagGeneralCodes$_ctl2$_ctl0
Some text
dagGeneralCodes$_ctl3$_ctl0
Some text
dagGeneralCodes$_ctl4$_ctl0
Some text
I want to create a regular expression that extracts the last occurrence of dagGeneralCodes$_ctl[number]$_ctl0 from the text above.
the result should be: dagGeneralCodes$_ctl4$_ctl0
Thanks in advance
Wael

This should do it:
.*(dagGeneralCodes\$_ctl\d\$_ctl0)
The .* at the front is greedy so initially it will grab the entire input string. It will then backtrack until it finds the last occurrence of the text you want.
Alternatively you can just find all the matches and keep the last one, which is what I'd suggest.
Also, specific advice will probably need to be given depending on what language you're doing this in. In Java, for example, you will need to use DOTALL mode to . matches newlines because ordinarily it doesn't. Other languages call this multiline mode. Javascript has a slightly different workaround for this and so on.

You can use:
[\d\D]*(dagGeneralCodes\$_ctl\d+\$_ctl0)
I'm using [\d\D] instead of . to make it match new-line as well. The * is used in a greedy way so that it will consume all but the last occurrence of dagGeneralCodes$_ctl[number]$_ctl0.

I really like using this Regular Expression Cheatsheet; it's free, a single page, and printed, fits on my cube wall.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Using regex to extract string after certain number of characters - regex

Related

Select specific text in specific column of CSV with variable text length

Find and Replace with Regex in Microsoft Word 2013

Remove everything before and after variable=int

Including Regular Expressions in AutoHotKey Script

Regular expression question

Categories

Resources