Using regex to eliminate chunks in a file (categorized events in iCal file)

Using regex to eliminate chunks in a file (categorized events in iCal file) - regex

I have one .ics file from which I would like to create individual new .ics files depending on the event categories (I can't get egroupware to export only events of one category, I want to create new calendars depending on category). My intended approach is to repeatedly eliminate all events but those of one category and then save the file using EditPad Lite 7 (Windows).
I am struggling to get the regular expression right. .+? is still too greedy and negating the string (e.g. to eliminate all but events from one category) doesn't work either.
Sample
BEGIN:VEVENT
DESCRIPTION:Event 2
END:VEVENT
BEGIN:VEVENT
DESCRIPTION:Event 3
CATEGORIES:Sports
END:VEVENT
BEGIN:VEVENT
DESCRIPTION:Event 4
END:VEVENT
The regex BEGIN:VEVENT.+?CATEGORIES:Sports.+?END:VEVENT should only match sports events but it catches everything from the first BEGINto the first ENDfollowing the category.
Edit: negating doesn't work either: BEGIN:VEVENT.+?((?!CATEGORIES:Sports).).+?END:VEVENT.
What am I missing? Any pointers are highly appreciated.

I guess newlines are removed or ignored, because your regex does not care about them.
I only have a correction to the match after CATEGORIES
BEGIN:VEVENT.+?CATEGORIES:Sports.*?END:VEVENT
^
Zero or more
The first part of your regex looks good, maybe the regex engine in EditPad is not so good.
Try it with a different editor or scripting language (like Eclipse or perl or Notepad+ or Notepad2)
You could split the input and then grep the matching Sports events
#sportevents = grep /Sports/, split /END:VEVENT/, $input
map $_.="END:VEVENT", #sportevents
This was perl, maybe you can launch a script from EditPad to do it.
The second line just restores the END:VEVENT that was stripped during split.

OK. Solved it. I found something here which can be used to split ics files. I tweaked it to use the category rather than the summary in the file name and then merged the individually generated files according to category. I added the usual ics header and footer to all files and, voilà, I had individual calendar files.

Related

Regex to extract all strings from source code used when calling a function

We have an old, grown project with thousands of php files and need to clean it up.
Throughout the whole project we do have a lot of function calls similar to:
trans('somestring1');
trans("SomeString2");
trans('more_string',$somevar);
trans("anotherstring4",$somevar);
trans($tx_key);
trans($anotherKey,$somevar);
All of those are embedded into the code and represent translation keys. I would like to find a way to extract all "translation keys" in all occurrences.
The PHP project is in VS Code, so a RegEx Search would be helpful to list the results.
Or I could search through the project with any other tool you would recommend
However I would also need to "export" just the strings to a textfile or similar.
The ideal result would be:
somestring1
SomeString2
more_string
anotherstring4
$tx_key
$anotherKey
As a bonus - if someone knows, how I could get the above list including filename where the result has been found - that would be really fantastic!
Any help would be greatly appreciated!
Update:
The RegEx I came up with:
/(trans)+\([^\)]*\)(\.[^\)]*\))?/gim
list the full occurrence - How can I just get the first part of the result (between Single Quotes OR between Double Quotes OR beginning with $)
See here: regexr.com/548d4

Here are some steps to get exactly what you want. Using this you can do a find and replace on your search results!
So you could do sequential regex find/replaces in the right circumstances.
The replace can be just within the search results editor and not affect the underlying files at all - which is what you want.
You can also have the replace action actually edit the underlying files if you wish.
[Hint: This technique can also make doing a find item a / replace with b in files that contain term c much easier to do.]
(1) Open a new search editor: Ctrl+Shift+P
(That command is currently unbound to a keybinding.)
(2) Paste this regex into the Search input box (with the regex option .* selected):
`(.*?)(\btrans\(['"]?)([^,'")]+)(.*)` - a relatively simple regex
regex101 demo
See my other answer for a regex to work with up to 6 entries per line:
(\s*\d+:\s)?((.*?)(\btrans\(['"]?)([^,'")]*)((.*?)(\btrans\(['"]?)([^,'")]*))?((.*?)(\btrans\(['"]?)([^,'")]*))?((.*?)(\btrans\(['"]?)([^,'")]*))?((.*?)(\btrans\(['"]?)([^,'")]*))?((.*?)(\btrans\(['"]?)([^,'")]*))?)(.*)
(3) You will get a list of files with the search results. Now open a Find widget Shift+F in this Search editor.
(4) Put the same regex into that Find input. Regex option selected. Put $3 into the Replace field. This only replaces in this Search editor - not the original files (although that can be done if you want it in some case). Replace All.
If using the 1-6 version regex, replace with:
$1$5 $9 $13 $17 $21 $25
(5) Voila. You can now save this Search Editor as a file.

The first answer works for one desired capture per line as in the original question. But that relatively simple regex won't work if there are two or more per line.
The regex below works for up to 6 entries per line, like
trans('somestring1');
stuff trans("SomeString2"); some content trans("SomeString2a");more stuff [repeat, repeat]
But it doesn't for 7+ - you'll need a regex guru for that.
Here is the process again with a twist of using a snippet in the Search Editor instead of a Find/Replace. Using a snippet allows more control over the formatting of the final result.
(1) Open a new search editor: Ctrl+Shift+P (That command is currently unbound to a keybinding.)
(2) Paste this regex into the Search input box (with the regex option .* selected):
`((.*?)(\btrans\(['"]?)([^,'")]*)((.*?)(\btrans\(['"]?)([^,'")]*))?((.*?)(\btrans\(['"]?)([^,'")]*))?((.*?)(\btrans\(['"]?)([^,'")]*))?((.*?)(\btrans\(['"]?)([^,'")]*))?((.*?)(\btrans\(['"]?)([^,'")]*))?)(.*)`
regex101 demo
(3) You will get a list of files with the search results. Now select all your results individually with Ctrl+Shift+L.
(4) Trigger this keybinding:
{
"key": "alt+i", // whatever keybinding you like
"command": "editor.action.insertSnippet",
"when": "editorTextFocus",
"args": {
"snippet": "${TM_SELECTED_TEXT/((.*?)(\\btrans\\([\\'\\\"]?)([^,\\'\\\")]*)((.*?)(\\btrans\\([\\'\\\"]?)([^,\\'\\\")]*))?((.*?)(\\btrans\\([\\'\\\"]?)([^,\\'\\\")]*))?((.*?)(\\btrans\\([\\'\\\"]?)([^,\\'\\\")]*))?((.*?)(\\btrans\\([\\'\\\"]?)([^,\\'\\\")]*))?((.*?)(\\btrans\\([\\'\\\"]?)([^,\\'\\\")]*))?)(.*)/$4${8:+\n }$8${12:+\n }$12${16:+\n }$16${20:+\n }$20${24:+\n }$24/g}"
}
},
That snippet will be applied to each selection in your search result. This part ${8:+\n } is a conditional which adds a newline and some spaces if there is a capture group 8 - which would be a second trans(...) on a line.
Demo: (unfortunately, it doesn't properly show the Ctrl+Shift+L selecting all lines individually or the Alt+i snippet trigger)

Search/replace in block selection in Notepad++

Is there a way to limit search/replace only to a columnar block selection in Notepad++?
Here is what I am trying to do:
I am bulk-editing metadata extracted from large numbers of photos.
The metadata comes to me as a csv file with no quotes around fields in header line and no quotes around first field in each succeeding line.
I edit this file in Open Office calc which exports with quotes around all fields.
I can easily edit header row but the problem comes in stripping quotes from only first field in successive lines.
I can use notepad in columnar mode but, after selecting the first column, the 'search only in selection' option box is greyed out.
I can do this by hand but it means lots of hand-work and increased chance of error.

I know, this probably won't help you any more, but I just had the same problem and stumbled across this question.
I found moving the block in question to a new file and performing the find/replace there works quite decently. When moving the block back, be sure to select it in block mode (see this question).

No. Another editor may have this feature.

sort of a late reply but... I had the same problem when I moved to a new machine with Notepad++ installed. Previously, I was using a text editor called Boxer that had this feature, which I found invaluable. Its not free-ware however.

You may not be able to Search/Replace within a columnar selection, but you can easily carry out your task within Notepad++. Use Find and Replace feature, with the Regular Expressions box checked.
If you want to remove quotes only from a target column, use the following regular expression in the Find field:
(^([^,]*,){i})"([^,\n\r]*)"(.*$)
Replace i with the position of the target column minus 1.
(i.e.- Us 2 if you want quotes around the third column, 0 for the first column, etc)
In the Replace field use:
\1\3\4
Clicking "Replace All" will strip quotes from the target column.
If you want to blow away all quotes surrounding each element in your csv without prejudice, use the following regular expression in the Find field:
((?<=,)|(?<=^))"(.*?)"((?=$|,))
In the Replace field use:
\1\2\3
Clicking Replace All will strip quotes form the columns.
Example
Since you didn't provide an example csv file, I'll walk through my own working example. Below is my csv:
"0","1","2","3","4","5","6","7","8","9"
"10","11","12","13","14","15","16","17","18","19"
"20","21","22","23","24","25","26","27","28","29"
"30","31","32","33","34","35","36","37","38","39"
"40","41","42","43","44","45","46","47","48","49"
"50","51","52","53","54","55","56","57","58","59"
"60","61","62","63","64","65","66","67","68","69"
"70","71","72","73","74","75","76","77","78","79"
"80","81","82","83","84","85","86","87","88","89"
"90","91","92","93","94","95","96","97","98","99"
"100","101","102","103","104","105","106","107","108","109"
"110","111","112","113","114","115","116","117","118","119"
"120","121","122","123","124","125","126","127","128","129"
"130","131","132","133","134","135","136","137","138","139"
"140","141","142","143","144","145","146","147","148","149"
"150","151","152","153","154","155","156","157","158","159"
"160","161","162","163","164","165","166","167","168","169"
"170","171","172","173","174","175","176","177","178","179"
"180","181","182","183","184","185","186","187","188","189"
"190","191","192","193","194","195","196","197","198","199"
If I wanted to remove quotes from the second column, I would use the below Find and Replace fields
(^([^,]*,){1})"([^,\n\r]*)"(.*$)
\1"\3"\4
Clicking Replace All yields the below result:
"0",1,"2","3","4","5","6","7","8","9"
"10",11,"12","13","14","15","16","17","18","19"
"20",21,"22","23","24","25","26","27","28","29"
"30",31,"32","33","34","35","36","37","38","39"
"40",41,"42","43","44","45","46","47","48","49"
"50",51,"52","53","54","55","56","57","58","59"
"60",61,"62","63","64","65","66","67","68","69"
"70",71,"72","73","74","75","76","77","78","79"
"80",81,"82","83","84","85","86","87","88","89"
"90",91,"92","93","94","95","96","97","98","99"
"100",101,"102","103","104","105","106","107","108","109"
"110",111,"112","113","114","115","116","117","118","119"
"120",121,"122","123","124","125","126","127","128","129"
"130",131,"132","133","134","135","136","137","138","139"
"140",141,"142","143","144","145","146","147","148","149"
"150",151,"152","153","154","155","156","157","158","159"
"160",161,"162","163","164","165","166","167","168","169"
"170",171,"172","173","174","175","176","177","178","179"
"180",181,"182","183","184","185","186","187","188","189"
"190",191,"192","193","194","195","196","197","198","199"

My search on internet, to to see weather notepad++ suports this; brought me here.
I have used TextPad and confirm that it supports find-and-replace within column selected block. Also TextPad is free for personal use.

Stripping superscript from plaintext

I often grab quotes from articles that include citations that include superscripted footnotes, which when copied are a pain in the ass. They show up as actual letters in the text as they are pasted in plaintext and not in html.
Is there a way I could run this through a regex to take out these superscripts?
For example
In the abeginning bGod ccreated the dheaven and the eearth.
Should become
In the beginning God created the heaven and the earth.
I can't think of a way to have regex search for misspellings and a corresponding sequential set of numbers and letters.
Any thoughts? I'm also using Sublime Text 3 for the majority of my writing, but I wouldn't mind outsourcing this to an AppleScript, or text replacement app (aText, textExpander, etc.).

Matching Code vs. Matching a Screen
It's hard to tell without seeing an example, but this should be doable if you copy the text from code view, as opposed to the regular browser view. (Ctrl or Cmd-J is your friend). Since writing the rules will take time, this will only be worthwhile for large chunks of text.
In code view, your superscript will be marked up in a way that can be targetted by regex. For instance:
and therefore bananas make you smartera
in the browser view (where the a at the end is a citation note) may look like this in code view:
and therefore bananas make you smarter<span class="mycitations">a</span>
In your editor, using regex, you can process the text to remove all tags, or just certain tags. The rules may not always be easy to write, and of course there are many disclaimers about using regex to parse html.
However, if your source is always the same (Wikipedia for instance), then you can create and save rules that should work across many pages.

Very basic image renaming with regex

I spent most of yesterday putting together a collection of regular expressions to convert all my image names and paths to lower case. Today, I processed a folder full of files and was surprised to discover that many image names are still capitalized.
So I decided to try it one step at a time, first renaming .jpg's, then .gif's, .png's, etc.
I'm working on a Mac, using Dreamweaver and TextWrangler as my text editors. The following regex works perfectly for jpg's, with one major flaw - it deletes the extension...
([\w/-]+)\.jpe?g
\L\1
In other words, it changes South-America.jpg to south-america.
How can I change it so that it retains the file extension? I assume I can then just change it to...
([\w/-]+)\.png
\L\1
...to process png's, etc.

([\w\/-]+)(\.jpe?g)
and replace with \L\1\2
its deleting your extension because you are never saving it in a matchgroup.

You could perhaps capture the extension too?
([\w/-]+)(\.jpe?g)
\L\1\2
And I think you should be able to use something like this for all the files:
([\w/-]+)(\.[^.]+$)
\L\1\2
Or if you specifically want to convert those jpegs, pngs and gifs:
([\w/-]+)(\.(?:jpe?g|gif|png))
\L\1\2

If it's okay for the extension to become lowercase as well, you could just do
^(.*)$
\L\1
As long as you're certain that all lines contain file names.
If you want to process only certain file formats, use
^(.*\.(jpe?g|png|gif))$
\L\1

Use cases for regular expression find/replace

I recently discussed editors with a co-worker. He uses one of the less popular editors and I use another (I won't say which ones since it's not relevant and I want to avoid an editor flame war). I was saying that I didn't like his editor as much because it doesn't let you do find/replace with regular expressions.
He said he's never wanted to do that, which was surprising since it's something I find myself doing all the time. However, off the top of my head I wasn't able to come up with more than one or two examples. Can anyone here offer some examples of times when they've found regex find/replace useful in their editor? Here's what I've been able to come up with since then as examples of things that I've actually had to do:
Strip the beginning of a line off of every line in a file that looks like:
Line 25634 :
Line 632157 :
Taking a few dozen files with a standard header which is slightly different for each file and stripping the first 19 lines from all of them all at once.
Piping the result of a MySQL select statement into a text file, then removing all of the formatting junk and reformatting it as a Python dictionary for use in a simple script.
In a CSV file with no escaped commas, replace the first character of the 8th column of each row with a capital A.
Given a bunch of GDB stack traces with lines like
#3 0x080a6d61 in _mvl_set_req_done (req=0x82624a4, result=27158) at ../../mvl/src/mvl_serv.c:850
strip out everything from each line except the function names.
Does anyone else have any real-life examples? The next time this comes up, I'd like to be more prepared to list good examples of why this feature is useful.

Just last week, I used regex find/replace to convert a CSV file to an XML file.
Simple enough to do really, just chop up each field (luckily it didn't have any escaped commas) and push it back out with the appropriate tags in place of the commas.

Regex make it easy to replace whole words using word boundaries.
(\b\w+\b)
So you can replace unwanted words in your file without disturbing words like Scunthorpe

Yesterday I took a create table statement I made for an Oracle table and converted the fields to setString() method calls using JDBC and PreparedStatements. The table's field names were mapped to my class properties, so regex search and replace was the perfect fit.
Create Table text:
...
field_1 VARCHAR2(100) NULL,
field_2 VARCHAR2(10) NULL,
field_3 NUMBER(8) NULL,
field_4 VARCHAR2(100) NULL,
....
My Regex Search:
/([a-z_])+ .*?,?/
My Replacement:
pstmt.setString(1, \1);
The result:
...
pstmt.setString(1, field_1);
pstmt.setString(1, field_2);
pstmt.setString(1, field_3);
pstmt.setString(1, field_4);
....
I then went through and manually set the position int for each call and changed the method to setInt() (and others) where necessary, but that worked handy for me. I actually used it three or four times for similar field to method call conversions.

I like to use regexps to reformat lists of items like this:
int item1
double item2
to
public void item1(int item1){
}
public void item2(double item2){
}
This can be a big time saver.

I use it all the time when someone sends me a list of patient visit numbers in a column (say 100-200) and I need them in a '0000000444','000000004445' format. works wonders for me!
I also use it to pull out email addresses in an email. I send out group emails often and all the bounced returns come back in one email. So, I regex to pull them all out and then drop them into a string var to remove from the database.
I even wrote a little dialog prog to apply regex to my clipboard. It grabs the contents applies the regex and then loads it back into the clipboard.

One thing I use it for in web development all the time is stripping some text of its HTML tags. This might need to be done to sanitize user input for security, or for displaying a preview of a news article. For example, if you have an article with lots of HTML tags for formatting, you can't just do LEFT(article_text,100) + '...' (plus a "read more" link) and render that on a page at the risk of breaking the page by splitting apart an HTML tag.
Also, I've had to strip img tags in database records that link to images that no longer exist. And let's not forget web form validation. If you want to make a user has entered a correct email address (syntactically speaking) into a web form this is about the only way of checking it thoroughly.

I've just pasted a long character sequence into a string literal, and now I want to break it up into a concatenation of shorter string literals so it doesn't wrap. I also want it to be readable, so I want to break only after spaces. I select the whole string (minus the quotation marks) and do an in-selection-only replace-all with this regex:
/.{20,60} /
...and this replacement:
/$0"¶ + "/
...where the pilcrow is an actual newline, and the number of spaces varies from one incident to the next. Result:
String s = "I recently discussed editors with a co-worker. He uses one "
+ "of the less popular editors and I use another (I won't say "
+ "which ones since it's not relevant and I want to avoid an "
+ "editor flame war). I was saying that I didn't like his "
+ "editor as much because it doesn't let you do find/replace "
+ "with regular expressions.";

The first thing I do with any editor is try to figure out it's Regex oddities. I use it all the time. Nothing really crazy, but it's handy when you've got to copy/paste stuff between different types of text - SQL <-> PHP is the one I do most often - and you don't want to fart around making the same change 500 times.

Regex is very handy any time I am trying to replace a value that spans multiple lines. Or when I want to replace a value with something that contains a line break.
I also like that you can match things in a regular expression and not replace the full match using the $# syntax to output the portion of the match you want to maintain.

I agree with you on points 3, 4, and 5 but not necessarily points 1 and 2.
In some cases 1 and 2 are easier to achieve using a anonymous keyboard macro.
By this I mean doing the following:
Position the cursor on the first line
Start a keyboard macro recording
Modify the first line
Position the cursor on the next line
Stop record.
Now all that is needed to modify the next line is to repeat the macro.
I could live with out support for regex but could not live without anonymous keyboard macros.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Using regex to eliminate chunks in a file (categorized events in iCal file) - regex

Related

Regex to extract all strings from source code used when calling a function

Search/replace in block selection in Notepad++

Stripping superscript from plaintext

Very basic image renaming with regex

Use cases for regular expression find/replace

Categories

Resources