In notepad ++, I want to select text up to a certain text match, including the match.
The txt file I am working with contains a lot of text with also white characters, returns and some special characters. In this text, there are characters that mark an end. Let's call these stop characters "ZZ." for now.
Using RegEx, I tried to create an expression that finds the next "ZZ." and selects everything before it. This is what it looks like:
+., \c ZZ.\n
But I seem to have gotten something wrong. As it is a similar to this
problem, I tried to use their RegEx with slight modification. Here is a picture so you can figure what I'd like to accomplish:
Find the next stop marker, selext the marker and everything before it.
In the actual file, the stop marker is "გვ."
If I want to use those, maybe I need to change the RegEx even more, as those are no ASCII characters? Like so, as stated in the RegEx Wiki?
\c+ (\x{nnnn}\x{nnnn}.)\n
Not quite sure if the \c works that way. I have seen expressions that use something like (A-Za-z)(0-9) but this is a different alphabet.
To match any text up to and including some pattern, use .*? (to match any zero or more characters, as few as possible) with the . matches newline option ON and add the გვ after it:
I'm probably going to get pilloried for asking this question, but after searching and trying to figure out this regex on my own, I'm just tired of wasting time trying to figure out. Here's the problem I'm trying to solve. I frequently use editpad pro to to convert character strings so they will fit into a mainframe.
For instance, I want to convert a column of words from excel into an IN clause for sql. The column is 5000 words or so.
I can easily copy and paste that into the text editor and then using find and replace convert that from a column of words to a single row with ',' separating each word.
Once that's done, though I want to use a regex to split this row before or after a comma after 70 characters have gone by.
(?P<start>^.{0,70})
This will give me the first 70 characters, but then I get stuck as I can't figure out how to create the next group to find all the characters up to the next comma so I can refer to it like this
(?P<start>^.{0,70})(?P<next>????,)
If I could get that, then I could create do a find and replace that would break it after the first comma that appears after the 70th character.
I know given the rest of the day I could figure it out, but I need to move on. I've tried this before. I would even be willing to only find the first 7o characters and then next few characters until the comma and then have to repeat the replace and find multiple times, if necessary, but I can not get the regex to work.
Any assistance with this would be greatly appreciated.
Here is some sample data that I have added line breaks into as an example of what I want it to look like after the regex runs.
'Ability','Absence','Absolute','Absorb','Accident','Acclaim','Accompany',
'Accomplish','Achievement','Acquaintance','Acquire','Across','Acting','Address',
'Admire','Adorable','Advance','Advertisement','Afraid','Agriculture','Align',
'All','Allow','Allowance','Allowed','Alone','Aluminium','Always','America',
'Analyze','Android','Angle','Announce','Annual','Ant','Antarctica','Antler',
I think you should consider restricting your initial concatenation, but here's a solution to your specific implementation :
^.{0,70}[^,]*
This will select the first 70 characters (if available), then every character up to the one before the next comma.
I don't think you need groups here, but you can obviously add them to the regex :
(?P<start>^.{0,70})(?P<next>[^,]*)
Shortform: searching:
"{,[0-9][0-9]," inserting Space+00... getting replaced string segment:
"{,SPACE00[0-9][0-9]," or other so-garbaged data for found [0-9][0-9] sequence ... so how do I search with a regex and insert in the middle???
Longform question:
I'm trying to do a series of simple character insertions -- digits actually -- in a series of mixed model CSV profiling data (five files each with different model parameters, several hundred lines each).
I'm visually challenged and desire to insert padding characters to columize data, so I can focus on tweaking key values, not keeping place data file to data file.
This need where the CSV data lines format are:
*Variable_symbolic-name*,{##,##,* ... ('Set of CSV Numerical Data lists' ...},\n*
an actual data line:
61,parameter17,{,70,6,1,-1,3, 00,0,0,0,0,},,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
to be morphed to:
61,parameter17,\t\t{, 0070,6,1,-1,3, 00,0,0,0,0,},,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
Give or take a tab character to align all the { numeric field starts...
I've found searching: "{,[0-9][0-9]," failed but "\{,[0-9][0-9]," succeeds for the find part of the search and replace operation... but have hit a proverbial brick wall in how to do the actual replace (with an insert) of such a short length. (Obviously with so many parameters and files, I'm moving cautiously!)
However, This Perl Help tutorial leaves me in the dark as to how to keep the found ranges and insert padding before (Space, zero, zero to be specific if positive, '-00' if negative) In short, I need to know how to insert 2-3 places in the replace field in Notepad++... and retain the original data without prejudicing it!
Articles herein have cited replacing paragraphs and lines, adding newlines, etc. but this simple insertion alteration seems too simple for you all. But it's been several hours of frustration for me!
Thanks! // Frank
Resolved:
Good news: ({,)([0-9][0-9],) and \1 xx\2 works fine as does ({,)(#[0-9][0-9],) and replacing with \1 xx#\2 ... whether or not tabs are utilized. Obviously the key was ([0-9][0-9],) which included the discrimination of the comma... though I have no idea why that seemed to fail an hour ago with trials made using Sobrinho's help. Must have not tried the sequence. Thanks all!
Try to type this in the search box:
(.+)(\{,[0-9][0-9].*)
And in the replace:
\1\t\t\2
When you have things between parenthesis, they are "stored" by Notepad++ and can be reused in the replace box.
The order of the parenthesis starts with one and are accessed as \1, \2, ...
You tagged it as Perl, so here is how you do it in Perl ...
I prefer to use lookahead assertions rather than backreferences
s/(?= {,[0-9][0-9], ) /\t\t/x
Alternatively, $& contains the matched string ($0 is something different)
s/ {,[0-9][0-9], /\t\t$&/x
You will need a backreference here, meaning something which, in the replace part, will be equal to what you have matched.
Usually, the whole matched part is stored in the $0 backreference. (You can get $1 with a capture group too, and up to $2 with two capture groups, etc)
Back to your question, you could try this:
Find:
(\{,)([0-9][0-9],)
Replace by:
\t\t$1 00$2
This will insert two tab characters before the part that matched \{,[0-9][0-9], (or in other words, replace the part that matched by 2 tab characters and what you matched), then put the first captured part ({,) and then the space and double 0's and then the second captured part, the two digits and following comma.
regex101 demo
I found this regex somewhere online that finds strings in my files that are likely presented to the user for a localization clean-up. However, I have a new task to find specific instances of two words and I thought I could use the same regex. I have tried several combinations but I'm just not good enough at regex to get it right.
Current regex for finding strings:
(?<=text=|label=|prompt=|toolTip=|title=|icon=|String=|Error=|Separator=|Symbol=)(("(?:\.|(\\\")|[^\""\n])*")|('(?:\.|(\\\')|[^\''\n])*'))
But now I want it to also capture if the words: catalog or in stock exist anywhere between the quotes.
Any help would be appreciated.
OK, this should do it, I believe:
(?<=text=|label=|prompt=|toolTip=|title=|icon=|String=|Error=|Separator=|Symbol=)((?:"(?:\.|(\\\")|[^\""\n])*\b(?:catalog|in stock)\b(?:\.|(\\\")|[^\""\n])*")|(?:'(?:\.|(\\\')|[^\''\n])*\b(?:catalog|in stock)\b(?:\.|(\\\')|[^\''\n])*'))
All I did was add \b(?:catalog|in stock)\b in the quote section. For example, for the double-quote section, it used to be this:
"(?:\.|(\\\")|[^\""\n])*"
I.e. any number of non-quote (unless escaped), non-return characters between double-quotes.
Now it is this:
"(?:\.|(\\\")|[^\""\n])*\b(?:catalog|in stock)\b(?:\.|(\\\")|[^\""\n])*"
Which is a double-quote, any number of legal characters as above, "catalog" or "in stock", any number of more legal characters, and a quote.
I have a text-file container a number of lines, which I need to turn into a csv. What is the easiest way to replace all the line-breaks with ", ". I have TextWrangler and read that it would do so by using grep and regular expressions, but have very little experience using regular expressions and don't know how grep works. Anyone who can help me get started?
Choose Find from the Search menu. TextWrangler opens the Find window.
Select the "Grep" checkbox
Type the string you are looking for ("\n" or "\r\n" or "\r") in the Find textfield.
Type the replace string (", ") in the Replace text field.
Click "Replace All"
See chapters 7 and 8 of the TextWrangler User Manual if you have problems.
Alternatively, and with only two pieces of software (Excel and Notepad++, which is also free and AWESOME).
Take your list (I assume it's one per line, in a column, for example):
Remove any empty cells
Copy the addresses, and Paste Special>Transpose (this put them into cells going from A-->).
Copy list into notepad++ - you'll note that it shoves them in as one long list, losing that irritating table structure.
Find the shortest clear space (but use the entire space) between two email addresses, and replace all with a ","
Then find all remaining spaces (i go with one space at a time), and replace with nothing (i.e. the 'replace with' box is empty).
Et voila!
and you know you can also save stuff as comma-separated values, right?