Find and replace in notepad for large file++

Find and replace in notepad for large file++ - regex

I have an XML file that has 9000 lines in it.
Each XML node has about 10 attributes in it.
One of the attributes is:
<CreatedDate>2009-10-26T02:39:24</CreatedDate>
What I need to do is change the format of the DateTime to:
<CreatedDate>27/05/2010 07:30:16</CreatedDate>
But I do not know how to do it.
I know I could write a Regex to identify each value that needs to be replaced, but how can make it only change the values I want and maintain the rest?
I have thought about writing a macro, but the document is too big to format in a way that I could predict where the element I want ot change is, and searching for something does not seem to work on a macro.
Any ideas? - I am sure it can be done.

If you want to change datetimes format only inside <CreatedDate> tags, try the regex replace in Notepad++ like this:
Replace this:
<CreatedDate>(\d{4})\-(\d{2})\-(\d{2})T([\d\:]*)</CreatedDate>
With this:
<CreatedDate>$3/$2/$1 $4</CreatedDate>
We refer to each parentheses using a $ symbol and it's position, so we can use those values in the replacing result.

Find:
<CreatedDate>(\d\d\d\d)-(\d\d)-(\d\d)T(\d\d):(\d\d):(\d\d)</CreatedDate>
Replace:
<CreatedDate>\3/\2/\1 \4:\5:\6</CreatedDate>
\d matches a Digit.
And the braces create a group that you reference with e.g. \1

Find this
<CreatedDate>(\d{4})-(\d{2})-(\d{2})T(\d{2}:\d{2}:\d{2})<\/CreatedDate>
Replace with:
<CreatedDate>\3/\2/\1 \4<\/CreatedDate>

Related

Search and replace with particular phrase

I need a help with mass search and replace using regex.
I have a longer strings where I need to look for any number and particular string - e.g. 321BS and I need to replace just the text string that I was looking for. So I need to look for BS in "gf test test2 321BS test" (the pattern is always the same just the position differs) and change just BS.
Can you please help me to find particular regex for this?
Update: I need t keep the number and change just the text string. I will be doing this notepad++. However I need a general funcion for this if possible. I am a rookie in regex. Moreover, is it possible to do it in Trados SDL Studio? Or how am i able to do it in excel file in bulk?
Thank you very much!

Your question is a bit vague, however, as I understand it you want to match any digits followed by BS, ie 123BS. You want to keep 123 but replace BS?
Regex: (\d+)BS matches 123BS
In notepad++ you can:
match (\d+)BS
replace \1NEWTEXT
This will replace 123BS with 123NEWTXT.
\1 will substitue the capture group (\d+). (which matches 1 or more digits.

You could do this in Trados Studio using an app. The SDLXLIFF Toolkit may be the most appropriate for you. The advantage over Notepad++ is that it's controlled and will only affect the translatable text and not anything that might break the integrity of the file if you make a mistake. You can also handle multiple files, or even multiple Trados Studio projects in one go.
The syntax would be very similar to the suggestion above... you would:
match (\d+)BS
replace $1NEWTEXT

Find and replace with regular expression in Notepad++

At the moment, I have a PHP function that gets the contents of a CSV file and puts it into a multi-dimensional array, which contains text that I print out in various places, using the indexes.
an example of use would be:
$localText[index][pageText][conceptQualityText][$lang];
The first index, [index], would be the name of the page. The second index [pageText] would indicate what it is (text for the page). The third index, [conceptQualityText] indicates what the actual text is. The last index, [$lang] gets the text in the desired language.
so:
->page location
->what is it
->the content
->what language it should be displayed in.
This all worked fine in the previous PHP versions. However, upgrading to 7.2, PHP seems to be a bit more strict. I was a bit more green ~2 years ago when I first made this solution, and now know that since these indexes aren't defined as strings e.g. encapsulated in single quotes like so: ['index'], they fit the notation of a superglobal (DEFINE). I didn't give it much thought back then, but now PHP seems to interpret them as so (superglobals), and so I get thrown the error that x word is an undefined superglobal.
My initial thought is to make a search and replace on my example string:
$localText[index][pageText][conceptQualityText][$lang];
using the regular expression functionality in Notepad++.
However, the example is just one of many, the notation of the array indexing is basically:
$localText[index][index2][index3][$lang];
So my question is:
How can I make use of the Notepad++ search and replace, using a regular expression, so that my index pointers become strings, instead of acting as superglobal variables?
e.g. make:
$localText[index][index2][index3][$lang];
into:
$localText['index']['index2']['index3'][$lang];
I will need some sort of logic that checks for whatever is inside the brackets and encapsulates them with single quotes, except for the last index, [$lang].
I tried to give as much information as possible, let me know if anything needs to be elaborated.
I tried to refer to these docs without much luck.

I found a solution using
this:
find: \b(localText\[)([a-zA-z0-9_\-]+)(\]\[)([a-zA-z0-9_\-]+)(\]\[)([a-zA-z0-9_\-]+)
replace: $1'$2'$3'$4'$5'$6'
and it works like a charm. Thanks for everyone who took their time to help.

You can use the following regex to match:
\[[^'](\w+)[^']\]
The regex matches a Word between Square brackets unless it quoted.
Replace with:
['$1']
The regex will not match the last brackets because it contains a '$' sign.

Regex capture words inside tags

Given an XML document, I'd like to be able to pick out individual key/value pairsfrom a particular tag:
<aaa>key0:val0 key1:val1 key2:va2</aaa>
I'd like to get back
key0:val0
key1:val1
key2:val2
So far I have
(?<=<aaa>).*(?=<\/aaa>)
Which will match everything inside, but as one result.
I also have
[^\s][\w]*:[\w]*[^\s] which will also match correctly in groups on this:
key0:val0 key1:val1 key2:va2
But not with the tags. I believe this is an issue with searching for subgroups and I'm not sure how to get around it.
Thanks!

You cannot combine the two expressions in the way you want, because you have to match each occurrence of "key:value".
So in what you came up with - (?<=<abc>)([\w]*:[\w]*[\s]*)+(?=<\/abc>) - there are two matching groups. The bigger one matches everything inside the tags, while the other matches a single "key:value" occurrence. The regex engine cannot give each individual occurence because it does not work that way. So it just gives you the last one.
If you think in python, on the matcher object obtained after applying you regex, you will have access to matcher.group(1) and matcher.group(2), because you have two matching ( ) groups in the regex.
But what you want is the n occurences of "key:value". So it's easier to just run the simpler \w+:\w+ regex on the string inside the tags.

I uploaded this one at parsemarket, and I'm not sure its what you are looking for, but maybe something like this:
(<aaa>)((\w+:\w+\s)*(\w+:\w+)*)(<\/aaa>)
AFAIK, unless you know how many k:v pairs are in the tags, you can't capture all of them in one regex. So, if there are only three, you could do something like this:
<(?:aaa)>(\w+:\w+\s*)+(\w+:\w+\s*)+(\w+:\w+\s*)+<(?:\/aaa)>
But I would think you would want to do some sort of loop with whatever language you are using. Or, as some of the comments suggest, use the parser classes in the language. I've used BeautifulSoup in Python for HTML.

If duplicate within brackets, delete one of the lines

Hi i have a long list of items (~6k), that comes in this format:
'Entry': ['Entry'],
What i want to do, is if within the first bracket, the words match, i.e.:
'ACT': ['KOSOV'],
'ACT': ['STIG'],
I want it to leave only one of the entries, it doesn't matter which entry the first the second or whatever, i just need it to leave one of them.
If possible I would like to accomplish that by sublime, or notepad++ using regexp and if there is no way then do whatever you think is best to solve this.
UPD: The AWK command did the job indeed, thank you

You can't solve this using just regular expressions. You either need to remember all entries you've seen so far while scanning the text (would require writing a small utility program, probably), or you could sort the entries and then remove any repeated entries.
If you have a sorted file, then you can solve it using a regular expression, such as this one:
^(([^:]+):.+\n)(?:\2.+\n)+
Replace with \1. See it in action here

Exclude a certain String from variable in regex

Hi I have a Stylesheet where i use xsl:analyze-string with the following regex:
(&journal_abbrevs;)[\s ]*([0-9]{{4}})[,][\s ][S]?[\.]?[\s ]?([0-9]{{1,4}})([\s ][(][0-9]{{1,4}}[)])?
You don't need to look at the whole thing :)
&journal_abbrevs; looks like this:
"example-String1|example-String2|example-String3|..."
What I need to do know is exclude one of the strings in &journal_abbrevs; from this regex. E.g. I don't want example-String1 to be matched.
Any ideas on how to do that ?

It seems XSLT regex does not support look-around. So I don't think you'll be able to get a solution for this that does not involve writing out all strings from journal_abbrevs in your regex. Related question.
To minimize the amount of writing out, you could split journal_abbrevs into say journal_abbrevs1, journal_abbrevs2 and journal_abbrevs3 (or how many you decide to use) and only write out whichever one that contains the string you wish to exclude. If journal_abbrevs1 contains the string, you'd then end up with something like:
((&journal_abbrevs2;)|(&journal_abbrevs3;)|example-String2|example-String3|...)...
If it supported look-around, you could've used a very simple:
(?!example-String1)(&journal_abbrevs;)...

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Find and replace in notepad for large file++ - regex

Find: <CreatedDate>(\d\d\d\d)-(\d\d)-(\d\d)T(\d\d):(\d\d):(\d\d)</CreatedDate> Replace: <CreatedDate>\3/\2/\1 \4:\5:\6</CreatedDate> \d matches a Digit. And the braces create a group that you reference with e.g. \1

Find this <CreatedDate>(\d{4})-(\d{2})-(\d{2})T(\d{2}:\d{2}:\d{2})<\/CreatedDate> Replace with: <CreatedDate>\3/\2/\1 \4<\/CreatedDate>

Related

Search and replace with particular phrase

Find and replace with regular expression in Notepad++

Regex capture words inside tags

If duplicate within brackets, delete one of the lines

Exclude a certain String from variable in regex

Categories

Resources