How to convert a *.txt file (copy/pasted variables) into a tabular format - regex

I have a bunch of variables (roughly 80) which I copy+paste into my editor (get those variables from a different *.txt file). After this, it looks a bit messy like
ka15 1-2 tre15 3-4 hsha15 5
juso15 6
kl15 7-9 kkjs15 10
but I'd like to have it structured to get a better idea of what's going on inside the code. I also have to strip away the 15 from each variable. Ideally I would get something like
ka 1-2 tre 3-4 hsha 5
juso 6 kl 7-9 kkjs 10
Is there a clever way to achieve this? I am using SAS Enterprise Guide Editor and VSCode but couldn't find a way. For Example, when I find and replace the 15 I would wish I could replace it with a tab, but couldn't find that option in neither editors. Any ideas to get this automated or at least not do everything by hand?

I found a hacky solution to your problem, if anyone finds a better solution, I'll delete mine, but here it goes ¯\_(ツ)_/¯:
1) Copy all content of file(for example I copied yours twice):
ka15 1-2 tre15 3-4 hsha15 5
juso15 6
kl15 7-9 kkjs15 10
ka15 1-2 tre15 3-4 hsha15 5
juso15 6
kl15 7-9 kkjs15 10
2) Ctrl+H and replace all 15 with nothing (leave empty) using Ctrl+Alt+Enter.
3) Ctrl+F and turn Regular expressions in search box. Now type \s to select whitespace and it should select one whitespace after every word. Now select all occurrences with Alt+Enter and press Backspace followed by Enter. This will delete spaces between the words and place one word on one line of code like so:
ka
1-2
tre
3-4
hsha
...
Press Escape to remove multiple cursors.
4) Press Ctrl+F again and in search box type $ sign. This wil select end of every line. Again, press Alt+Enter to select all occurrences a press Space 5-8 times. Notice however that cursors are not properlly aligned. Press Escape to remove multiple cursors.
5) Place cursor a few spaces from a first word. Then, hold Ctrl+Alt+↓ to add multiple cursors below first one. Then, press Shift+End to select all the whitespace to the end of every line and press Delete to delete it. Press Delete again to align all words in one line seperated by n spaces.
6) Unfortunately, I couldn't find regex for the last part. Cursor should be placed after every 6th variable, but I solved it with by placing cursor next to every 7th word and pressing Enter.
I usually don't type too much like this, but I liked the problem you had. It was more puzzle than a problem to me.

I've come up with 3 regex's that will do what you want. In order to run them all sequentially you will need the regreplace extension or similar.
This goes in your settings:
"regreplace.on-save": false,
"regreplace.commands": [
{
"name": "Transform Data to Table Format, step 1",
"regexp": "([a-zA-Z]+|[\\d-]+)(15)?(\\s[\r\n]?)*",
"replace": "$1 \n",
"priority": 1,
},
{
"name": "Transform Data to Table Format, step 2",
"regexp":
"(([\\S-] {6})(.*))|(([\\S-]{2} {5})(.*))|(([\\S-]{3} {4})(.*))|(([\\S-]{4} {3})(.*))",
"replace": "$2$5$8$11",
"priority": 2,
},
{
"name": "Transform Data to Table Format, step 3",
"regexp":
"((.*)\n)((.*)\n)((.*)\n)((.*)\n)((.*)\n)((.*?)(\\s*\\n))",
"replace": "$2$4$6$8$10$12\n",
"priority": 3,
}
],
It creates a rule for each of the three regex steps. All three rules can be run sequentially by running the regreplace.regreplace command. Here is a demo:
The regex's are designed to look good with data items up to 4 characters long but could be easily modified for longer items.
In step 1, increase the number of spaces before the \n in the replace rule to something like 16 or so.
In step 2, you will have to sense the pattern of the regex groups like (([\\S-]{4} {3})(.*) to modify them. A 13 character long variable might require something like (([\\S-]{13} {3})(.*) as the last group and ([\\S-] {15})(.*))as the first in the sequence, etc. modifying all the other groups in order. Let me know if you need help with that.
Step 3 needs no modification unless you want to change how many data items appear on each line - right now there are 3 variables with their data on each line hence 6 groups in that regex.
It does not matter how many data-value pairs are in any row prior to running the command.
[Two items of caution: There should not be any empty lines before the start of the data, although if necessary you could a regex as the first rule to remove empty lines. Empty lines within the data or at the end are not a problem.
Secondly, the extension cannot be run on selected text only so you will have to place your data at the top of an empty file to convert it and then copy it elsewhere if you wish.]
There is also the replace rules extension which works like regreplace but will according to the docs run on a selection only but it didn't work for me here for some unknown reason. It does have a nicer interface though - all regex's could go into a single rule which could then be independently run.

Related

Regex to extract all strings from source code used when calling a function

We have an old, grown project with thousands of php files and need to clean it up.
Throughout the whole project we do have a lot of function calls similar to:
trans('somestring1');
trans("SomeString2");
trans('more_string',$somevar);
trans("anotherstring4",$somevar);
trans($tx_key);
trans($anotherKey,$somevar);
All of those are embedded into the code and represent translation keys. I would like to find a way to extract all "translation keys" in all occurrences.
The PHP project is in VS Code, so a RegEx Search would be helpful to list the results.
Or I could search through the project with any other tool you would recommend
However I would also need to "export" just the strings to a textfile or similar.
The ideal result would be:
somestring1
SomeString2
more_string
anotherstring4
$tx_key
$anotherKey
As a bonus - if someone knows, how I could get the above list including filename where the result has been found - that would be really fantastic!
Any help would be greatly appreciated!
Update:
The RegEx I came up with:
/(trans)+\([^\)]*\)(\.[^\)]*\))?/gim
list the full occurrence - How can I just get the first part of the result (between Single Quotes OR between Double Quotes OR beginning with $)
See here: regexr.com/548d4
Here are some steps to get exactly what you want. Using this you can do a find and replace on your search results!
So you could do sequential regex find/replaces in the right circumstances.
The replace can be just within the search results editor and not affect the underlying files at all - which is what you want.
You can also have the replace action actually edit the underlying files if you wish.
[Hint: This technique can also make doing a find item a / replace with b in files that contain term c much easier to do.]
(1) Open a new search editor: Ctrl+Shift+P
(That command is currently unbound to a keybinding.)
(2) Paste this regex into the Search input box (with the regex option .* selected):
`(.*?)(\btrans\(['"]?)([^,'")]+)(.*)` - a relatively simple regex
regex101 demo
See my other answer for a regex to work with up to 6 entries per line:
(\s*\d+:\s)?((.*?)(\btrans\(['"]?)([^,'")]*)((.*?)(\btrans\(['"]?)([^,'")]*))?((.*?)(\btrans\(['"]?)([^,'")]*))?((.*?)(\btrans\(['"]?)([^,'")]*))?((.*?)(\btrans\(['"]?)([^,'")]*))?((.*?)(\btrans\(['"]?)([^,'")]*))?)(.*)
(3) You will get a list of files with the search results. Now open a Find widget Shift+F in this Search editor.
(4) Put the same regex into that Find input. Regex option selected. Put $3 into the Replace field. This only replaces in this Search editor - not the original files (although that can be done if you want it in some case). Replace All.
If using the 1-6 version regex, replace with:
$1$5 $9 $13 $17 $21 $25
(5) Voila. You can now save this Search Editor as a file.
The first answer works for one desired capture per line as in the original question. But that relatively simple regex won't work if there are two or more per line.
The regex below works for up to 6 entries per line, like
trans('somestring1');
stuff trans("SomeString2"); some content trans("SomeString2a");more stuff [repeat, repeat]
But it doesn't for 7+ - you'll need a regex guru for that.
Here is the process again with a twist of using a snippet in the Search Editor instead of a Find/Replace. Using a snippet allows more control over the formatting of the final result.
(1) Open a new search editor: Ctrl+Shift+P (That command is currently unbound to a keybinding.)
(2) Paste this regex into the Search input box (with the regex option .* selected):
`((.*?)(\btrans\(['"]?)([^,'")]*)((.*?)(\btrans\(['"]?)([^,'")]*))?((.*?)(\btrans\(['"]?)([^,'")]*))?((.*?)(\btrans\(['"]?)([^,'")]*))?((.*?)(\btrans\(['"]?)([^,'")]*))?((.*?)(\btrans\(['"]?)([^,'")]*))?)(.*)`
regex101 demo
(3) You will get a list of files with the search results. Now select all your results individually with Ctrl+Shift+L.
(4) Trigger this keybinding:
{
"key": "alt+i", // whatever keybinding you like
"command": "editor.action.insertSnippet",
"when": "editorTextFocus",
"args": {
"snippet": "${TM_SELECTED_TEXT/((.*?)(\\btrans\\([\\'\\\"]?)([^,\\'\\\")]*)((.*?)(\\btrans\\([\\'\\\"]?)([^,\\'\\\")]*))?((.*?)(\\btrans\\([\\'\\\"]?)([^,\\'\\\")]*))?((.*?)(\\btrans\\([\\'\\\"]?)([^,\\'\\\")]*))?((.*?)(\\btrans\\([\\'\\\"]?)([^,\\'\\\")]*))?((.*?)(\\btrans\\([\\'\\\"]?)([^,\\'\\\")]*))?)(.*)/$4${8:+\n }$8${12:+\n }$12${16:+\n }$16${20:+\n }$20${24:+\n }$24/g}"
}
},
That snippet will be applied to each selection in your search result. This part ${8:+\n } is a conditional which adds a newline and some spaces if there is a capture group 8 - which would be a second trans(...) on a line.
Demo: (unfortunately, it doesn't properly show the Ctrl+Shift+L selecting all lines individually or the Alt+i snippet trigger)

Search in VS Code for multiple terms

Suppose I search on VS Code the terms 'word1 word2'. Then it finds all the occurrences where 'word1' is followed by 'word2'. In reality I want to find all the files where word1 and word2 occur, but they don't have to be consecutive. How can I do it?
Use regex flag and search for (word1[\s\S\n]*word2)|(word2[\s\S\n]*word1)
Made a small extension based on #tonix regex:
https://marketplace.visualstudio.com/items?itemName=usernamehw.search
Here is also a simple way for simple needs - use this as regex
(word1)|(word2)|(word3)
It may not cover some cases, but has been working fine for me, and easy to remember to type it in.
VSCode has an open issue to support multiple searches. You may want to get on there and push them a little.
To apply logical and
(?=.*word1)(?=.*word2)(?=.*word3)
To apply logical or
(word1)|(word2)|(word3)
For you guys,
if you want to search for multiple words (more than 2) at once in a single file and all the words must appear in the file at least once (logical AND), you can use the following regex which leverages lookahead assertions:
^(?=[\s\S\n]*(word1))(?=[\s\S\n]*(word2))(?=[\s\S\n]*(word3))(?=[\s\S\n]*(word4))[\s\S\n]*$
A global search with this pattern will only return all the files that contain word1 AND word2 AND word3 AND word4 in any order (e.g. word4 may appear at the beginning and/or word2 may appear at the end of the file).
I also wrote a little Python CLI helper which creates the regex automatically for you given the patterns you want to AND (though creating the regex by hand is pretty straightforward).
Copy the following code, paste it in a new file and save it somewhere on your machine (I've called it regex_and_lookahead.py). Then make the file executable with chmod +x ./regex_and_lookahead.py (important, I used Python 3.6, the literal prefix f -> f'(?=[\s\S\\n]*({arg}))' won't work in previous versions):
#!/usr/bin/env python
from sys import argv
args = argv[1:]
regex = '^'
for arg in args:
regex += f'(?=[\s\S\\n]*({arg}))'
regex += '[\s\S\\n]*$'
print(regex)
Usage:
./regex_and_lookahead.py word1 word2 word3 word4
Will generate the above regex. You can also use it to generate more complex regexes cause each parameter can have regex characters in it!
As an example:
./regex_and_lookahead.py "pattern with space" "option1|option2" "\bword3\b" "(repeated pattern\.){6}"
Will generate the following regex:
^(?=[\s\S\n]*(pattern with space))(?=[\s\S\n]*(option1|option2))(?=[\s\S\n]*(\bword3\b))(?=[\s\S\n]*((repeated pattern\.){6}))[\s\S\n]*$
Which will match a file if and only if all of the following conditions are true:
There's at least one occurrence of the string pattern with space;
There's at least one occurrence of either option1 or option2;
There's at least one occurrence of the word word3 delimited by word boundary assertions;
There is at least one occurrence of the string repeated pattern. repeated 6 times (i.e.: repeated pattern.repeated pattern.repeated pattern.repeated pattern.repeated pattern.repeated pattern.).
As you can see, the sky is the only limit. Have fun!
This is now supported, you can search for the term then open in editor and use ctrl + f to search the search results thanks #pushkin
This extension: Find and Transform, I am the author, makes it quite easy to do any number of sequential searches across files only using the files from previous search results for future searches.
There is a variable ${resultsFiles} that resolves to those previous search results files and can be used in the "filesToInclude" argument. Here is a sample keybinding
{
"key": "alt+b",
"command": "runInSearchPanel",
"args": {
"find": ["first", "second"],
"delay": 2000, // necessary to allow results to populate
// delay may need to be longer if you are searching a lot of files
"replace": ["", "knuckles"], // optional
"filesToInclude": ["", "${resultsFiles}"],
"filesToExclude": "Users\\Mark\\AppData\\Roaming\\Code\\User\\keybindings.json",
"isRegex": true,
// so that the first search will be triggered and produce results
"triggerSearch": true,
"triggerReplaceAll": [false, true] // optional
}
}
"find": ["first", "second"], : search for first and then search for second
"filesToInclude": ["", "${resultsFiles}"], : clear the filesToInclude on the first search, on second search use the resultFiles from the first search
You can do as many sequential searches as you like
The finds can be regex's and as complex as you wish
The original question asked to do a single search for files containing two separate words in the same file. Below is what I do to search for two (or more) words in the same file by using multiple searches.:
Search Like you normally do
Click on "Open in editor"
Adjust the context line count. (The higher the context count the more you can search for that second term, but the more non relevant searches you bring in)
Hit Cmd + F (or equivalent if not on mac) and search there. In the image below I have narrowed it down to 53 hits. I can manually skip through until I find it.
Need even more Fine tuning?
Same Steps as 1 - 3
Copy the contents to a file. (In the image below I saved it to a file called haystack.ts)
Search there for a third word. (In the image below I have now narrowed it down to 7 searches.)
Try Open new Search Editor command, through command pallete, You can map it to any keybinding you'd like in the Keybindings Editor. I mapped to cmd+shift+i
This is helpful for me!There is one more way, using up/ down arrow key in search editor, moves us across our search history, even this is useful,
It needs a little bent of mind to accept that it is equivalent to having multiple search editors (what IntelliJ etc provides) but without persistence!

EditPad: Need a regex that handles multiple possible data formats

First, I'm using EditPadPro for my regex cleaning, so any answers given should work within that environment.
I get a large spreadsheet full of data that I have to clean every day. I've managed to get it down to a couple of different regexes that I run, and this works... but I'm curious to see if it's possible to reduce down to a single regex.
Here is some sample data:
3-CPC_114851_70095_70095_CAN-bre
3-CPC_114851_70095_70095_CAN
b11-ao1-113775-bre
b7-ao-114441
b7-ao-114441-bre
b7-ao1-114441
b7-ao1-114441-bre
http://go.nlvid.com/results1/?http://bo
go.nlv/results1/?click
b4-sm-1359
b6-sm-1356-bre
1359_195_1453814569-bre
1356_104_1456856729
b15-rad-8905
b15-rad-8905-bre
Here is how the above data needs to end up:
114851-bre
114851
113775-bre
114441
114441-bre
114441
114441-bre
http://go.nlvid.com/results1/
go.nlv/results1/
sm-1359
sm-1356-bre
sm-1359-bre
sm-1356
rad-8905
rad-8905-bre
So, there are numerous rules, such as:
In cases of more than 2 underscores, the result needs to contain only the value immediately after the first underscore, and everything from the dash onwards.
In cases where the string contains "-ao-", "-ao1-", everything prior to the final numeric string should be removed.
If a question mark is present, everything from the mark onwards should be removed.
If the string contains "-sm-" or "-rad-", everything prior to those alpha strings should be removed.
If the string contains 2 underscores, averything after the first numeric string up to a dash
(if present) should be removed, and the string "sm-" should be prepended.
Additionally there is other data that must be left untouched, including but not limited to:
113535|24905|24905
as well as many variations on this pattern of xxxxxx|yyyyy|zzzzz (and not always those string lengths)
This may be asking way too much of regex, I'm not sure as I'm not great with it. But I've seen some pretty impressive things done with it, so I thought I'd put this out to the community and see what you come back with.
Jonathan, I can wrap all of those into one regex, except the last one (where you prepend sm- to a string that does not contain sm). It is not possible in this context, because we cannot capture "sm" to reuse in the replacement, and because there is no "conditional replacement" syntax in EPP.
That being said, you can achieve what you want in EPP with two regexes and one macro to chain the two.
Here is how.
The solution below is tested in EPP.
Regex 1
Press Ctrl + Sh + F to enter Search / Replace mode
Enter the following Search and Replace in the appropriate boxes
At the top right of the Search bar, click the Favorite Searches pull-down, select "Add", give it a name, e.g. Regex 1
Search:
(?mx)^
(?=(?:[^_\r\n]*?_){3})[^_\r\n]+?_([^_\r\n]+)[^-\r\n]+(-[^\r\n]+)?
|
[^\r\n]*?-ao1?-\D*([^\r\n]+)
|
([^\r\n?]*)(?=\?)[^\r\n]+
|
[^\r\n]*?-((?:sm|rad)-[^\r\n]+)
Replace:
\1\2\3\4\5
Regex 2
Same 1-2-3 steps as above.
Search
^(?!(?:[^_\r\n]*?_){3})(?=(?:[^_\r\n]*?_){2})(\d+)(?:[^-\r\n]+(-[^\r\n]+)?)
Replace
sm-\1\2
Chaining Regex 1 and Regex 2
Top menu: Macros, Record Macro, give it a name.
Click the Favorite searches pulldown, select Regex 1
Hit Replace All.
Click the Favorite searches pulldown, select Regex 2
Hit Replace All.
Macros, Stop recording.
Whenever you want to do your sequence of replacements, pull it by name under the Macros menu.
Testing This
I have tested my "Jonathan macro" on your input. Here is the result:
114851-bre
114851
113775-bre
114441
114441-bre
114441
114441-bre
http://go.nlvid.com/results1/
go.nlv/results1/
sm-1359
sm-1356-bre
sm-1359-bre
sm-1356
rad-8905
rad-8905-bre
Try this:
Toggle the Search Panel : SHIFT+CTRL+F
SEARCH: .*?((?:sm-|rad-)?(?:(?:\d+|[\w\.]+\/.*?))(?:-\w+)?$)
REPLACE: $1
Check REGEX and WORDS
Click Replace All or Hit CTRL+ALT+F3
Check the image below:

Add trailing zeroes to line in notepad++

I have a file containing (hundreds) of blocks of numbers like below;
This one is fine (16x20, correct number of rows and columns)
11111111111111111110
16666616666666661110
16111616111111162610
16111646111663132610
16162616261623132610
16162313261623132610
16162313261623132610
16162313261623132610
16162313261623132610
16162313261623132610
16162313261623132610
16162313261626132610
16166313661116632610
16111111111116132610
16666666666666136610
11111111111111111110
This one needs to be padded with trailing zeroes so it is (16x20)
111111111111111111
166616666666663661
166611111111111661
166666366663661661
113161111111161611
1316166666616161
1616162262616161
11616166112616161
16616166116616161
16616162262616161
16616166266616161
16616111161116161
1661666666666616111
1661666166163366661
1641666166166613661
1111111111111111111
I would like to pad them with zeroes so they are all like the first example. I'm aware of the regular expressions feature in notepad++ but am struggling to get it to work. I appreciate any help given.
You could do it via a macro.
First append a large number of zeroes to the end of each line using a macro.
Caret on the first entry
click record macro
press end
type out 20 zeroes
press down arrow
click stop recording
play the macro until all lines look like this
11111111111111111100000000000000000000000000000000000000000000
16661666666666366100000000000000000000000000000000000000000000
16661111111111166100000000000000000000000000000000000000000000
16666636666366166100000000000000000000000000000000000000000000
11316111111116161100000000000000000000000000000000000000000000
131616666661616100000000000000000000000000000000000000000000
161616226261616100000000000000000000000000000000000000000000
1161616611261616100000000000000000000000000000000000000000000
1661616611661616100000000000000000000000000000000000000000000
1661616226261616100000000000000000000000000000000000000000000
1661616626661616100000000000000000000000000000000000000000000
1661611116111616100000000000000000000000000000000000000000000
166166666666661611100000000000000000000000000000000000000000000
166166616616336666100000000000000000000000000000000000000000000
164166616616661366100000000000000000000000000000000000000000000
111111111111111111100000000000000000000000000000000000000000000
Then...
Caret on first line
click record
press home key
press the right arrow key 20 times
hold shift and press end key
press delete key
press down arrow
click stop recording
play the macro until all lines are processed
You could save the entire process as a single macro so its just a single click in the future.
I can give you a macro solution
go to the beginning of your text
select Macro/Start Recording
press end, press 0 16 times then press Home and down arrow key
select Macro/End Recording
You now have a macro to add sixteen zeros to the end of all lines.
Playback this macro on all lines.
You now have appended zeroes to all lines.
Pressing Alt key and using mouse select the required block(columns) of text you want and paste it into another empty notepad tab
help on column mode editing is there inside notepad ? / help contents menu
Good luck
You can use the plugin ConyEdit to do this.
With ConyEdit running in the background, follow these steps:
use the command line cc.aal 00000000000000000000 to append after lines with twenty zero character.
use the command line cc.gc 1/\d{20}/ to get the first column of regex match.
Looking to do this manualy and not progomaticly ?
Open Findreplace
Copy from the last to rhe first WITHOUT NUMBERS on a line so...
in this example
111111111111111111 <---from here
to here ---> 166616666666663661
166611111111111661
paste that into the fine ( yes your effecticly copying the return wich some applications allow you to manualy input others wont )
then in the replace box, type '0' then your return
Hit that magic replace all :D
This will then add a 0 every time it hits a new line, then add a new... new line....
edit : quickly reviewing another method a second to recover for alternate options :P give me 10
edit 2:
Ah ok somthing like this will work :P just tested it.
use [0-9] in the find replace. so if im looking for 123123123123 ( wich is 12 long ) and i need to buff i up to 20,
Your FIND must be in ()
so..
the find would be
([0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9] )
and the replace is referd to as \1 no the regex, this was my mistake
\100000000
tested and confirmed !dont forget YOU NEED MATCH ALL on, WRAP off!
And so on for your other numbers, Not sure if you can loop this with macros nd stuff :P but hope it helps more than you have now
two good resources.
http://blog.creativeitp.com/posts-and-articles/editors/understanding-regex-with-notepad/comment-page-1/
http://regexpal.com/
base on OP's comment: you could try an editor called vim/gvim
open your file in vim, then type:
:%s/.*/\=printf("%-20s",getline("."))/|%s/ *$/\=substitute(submatch(0)," ","0","g")/
don't forget pressing <Enter> after the above typing.
then you will see the text has been changed into what you want.
of course vim macro can work as well, however, I feel command better... :)

Use cases for regular expression find/replace

I recently discussed editors with a co-worker. He uses one of the less popular editors and I use another (I won't say which ones since it's not relevant and I want to avoid an editor flame war). I was saying that I didn't like his editor as much because it doesn't let you do find/replace with regular expressions.
He said he's never wanted to do that, which was surprising since it's something I find myself doing all the time. However, off the top of my head I wasn't able to come up with more than one or two examples. Can anyone here offer some examples of times when they've found regex find/replace useful in their editor? Here's what I've been able to come up with since then as examples of things that I've actually had to do:
Strip the beginning of a line off of every line in a file that looks like:
Line 25634 :
Line 632157 :
Taking a few dozen files with a standard header which is slightly different for each file and stripping the first 19 lines from all of them all at once.
Piping the result of a MySQL select statement into a text file, then removing all of the formatting junk and reformatting it as a Python dictionary for use in a simple script.
In a CSV file with no escaped commas, replace the first character of the 8th column of each row with a capital A.
Given a bunch of GDB stack traces with lines like
#3 0x080a6d61 in _mvl_set_req_done (req=0x82624a4, result=27158) at ../../mvl/src/mvl_serv.c:850
strip out everything from each line except the function names.
Does anyone else have any real-life examples? The next time this comes up, I'd like to be more prepared to list good examples of why this feature is useful.
Just last week, I used regex find/replace to convert a CSV file to an XML file.
Simple enough to do really, just chop up each field (luckily it didn't have any escaped commas) and push it back out with the appropriate tags in place of the commas.
Regex make it easy to replace whole words using word boundaries.
(\b\w+\b)
So you can replace unwanted words in your file without disturbing words like Scunthorpe
Yesterday I took a create table statement I made for an Oracle table and converted the fields to setString() method calls using JDBC and PreparedStatements. The table's field names were mapped to my class properties, so regex search and replace was the perfect fit.
Create Table text:
...
field_1 VARCHAR2(100) NULL,
field_2 VARCHAR2(10) NULL,
field_3 NUMBER(8) NULL,
field_4 VARCHAR2(100) NULL,
....
My Regex Search:
/([a-z_])+ .*?,?/
My Replacement:
pstmt.setString(1, \1);
The result:
...
pstmt.setString(1, field_1);
pstmt.setString(1, field_2);
pstmt.setString(1, field_3);
pstmt.setString(1, field_4);
....
I then went through and manually set the position int for each call and changed the method to setInt() (and others) where necessary, but that worked handy for me. I actually used it three or four times for similar field to method call conversions.
I like to use regexps to reformat lists of items like this:
int item1
double item2
to
public void item1(int item1){
}
public void item2(double item2){
}
This can be a big time saver.
I use it all the time when someone sends me a list of patient visit numbers in a column (say 100-200) and I need them in a '0000000444','000000004445' format. works wonders for me!
I also use it to pull out email addresses in an email. I send out group emails often and all the bounced returns come back in one email. So, I regex to pull them all out and then drop them into a string var to remove from the database.
I even wrote a little dialog prog to apply regex to my clipboard. It grabs the contents applies the regex and then loads it back into the clipboard.
One thing I use it for in web development all the time is stripping some text of its HTML tags. This might need to be done to sanitize user input for security, or for displaying a preview of a news article. For example, if you have an article with lots of HTML tags for formatting, you can't just do LEFT(article_text,100) + '...' (plus a "read more" link) and render that on a page at the risk of breaking the page by splitting apart an HTML tag.
Also, I've had to strip img tags in database records that link to images that no longer exist. And let's not forget web form validation. If you want to make a user has entered a correct email address (syntactically speaking) into a web form this is about the only way of checking it thoroughly.
I've just pasted a long character sequence into a string literal, and now I want to break it up into a concatenation of shorter string literals so it doesn't wrap. I also want it to be readable, so I want to break only after spaces. I select the whole string (minus the quotation marks) and do an in-selection-only replace-all with this regex:
/.{20,60} /
...and this replacement:
/$0"¶ + "/
...where the pilcrow is an actual newline, and the number of spaces varies from one incident to the next. Result:
String s = "I recently discussed editors with a co-worker. He uses one "
+ "of the less popular editors and I use another (I won't say "
+ "which ones since it's not relevant and I want to avoid an "
+ "editor flame war). I was saying that I didn't like his "
+ "editor as much because it doesn't let you do find/replace "
+ "with regular expressions.";
The first thing I do with any editor is try to figure out it's Regex oddities. I use it all the time. Nothing really crazy, but it's handy when you've got to copy/paste stuff between different types of text - SQL <-> PHP is the one I do most often - and you don't want to fart around making the same change 500 times.
Regex is very handy any time I am trying to replace a value that spans multiple lines. Or when I want to replace a value with something that contains a line break.
I also like that you can match things in a regular expression and not replace the full match using the $# syntax to output the portion of the match you want to maintain.
I agree with you on points 3, 4, and 5 but not necessarily points 1 and 2.
In some cases 1 and 2 are easier to achieve using a anonymous keyboard macro.
By this I mean doing the following:
Position the cursor on the first line
Start a keyboard macro recording
Modify the first line
Position the cursor on the next line
Stop record.
Now all that is needed to modify the next line is to repeat the macro.
I could live with out support for regex but could not live without anonymous keyboard macros.