I've been trying to write a regex to match all the " - " deliminators in a filename except the first and last, so I can combine all the data in the middle into one group, for example a filename like:
Ann M Martin - Baby sitters Club - Baby sitters Little Sister - Super Special 04 - Karen, Hannie and Nancy - The Three Musketeers.doc
Has to become:
Ann M Martin - Baby sitters Club- Baby sitters Little Sister- Super Special 04- Karen, Hannie and Nancy - The Three Musketeers.doc
So basically I'm trying to replace " - " with "- " but not the first or last instance. The Filenames can have 1 to 6 " - " deliminators, but should only affect the ones with 3, 4, 5 or 6 " - " deliminators.
It's for use in File Renamer. flavor is JavaScript. Thanks.
Can you not use a regex? If so:
var s = "Ann M Martin - Baby sitters Club - Baby sitters Little Sister - Super Special 04 - Karen, Hannie and Nancy - The Three Musketeers.doc";
var p = s.split(' - ');
var r = ''; // result output
var i = 0;
p.forEach(function(e){
switch(i) {
case 0: r += e; break;
case 1: case p.length - 1: r += ' - ' + e; break;
default: r += '- ' + e;
}
i++;
});
console.log(r);
http://jsfiddle.net/c7zcp8z6/1/
s=Ann M Martin - Baby sitters Club - Baby sitters Little Sister - Super Special 04 - Karen, Hannie and Nancy - The Three Musketeers.doc
r=Ann M Martin - Baby sitters Club- Baby sitters Little Sister- Super Special 04- Karen, Hannie and Nancy - The Three Musketeers.doc
This is assuming that the separator is always - (1 space, 1 dash, 1 space). If not, you need to split on - only, then trim each tokens before reconstructing.
Two options:
1 - You'll need to do some processing of your own by iterating through the matches using
( - )
and building a new string (see this post about getting match indices).
You'll have to check that the match count is greater than 2 and skip the first and last matches.
2 - Use
.+ - ((?:.+ - )+).+ - .+
to get the part of the string to be modified and then do a replace on the the dashes, then build your string (again using the indices from the above regex).
Thanks for the suggestions.
I got it to work this way
It replaces the first and last " - " with " ! ", so I can then do a simple Find and Replace of all remaining " - " with "- ", then change all the " ! " back to " - "
Related
Example:
I want to extract everything between "Item:" until " * "
Item: *Sofa (1 SET), 2 × Mattress, 3 × Baby Mattress, 5
Seaters Car (Fabric)*
Total price: 100.00
Subtotal: 989.00
But I only managed to extract "Item: *" and " Seaters Car (Fabric)* " by using (.*?)\*
After matching Item:, match anything but a colon with [^:]+, and then lookahead for a newline, ensuring that the match ends at the end of a line just before another label (like Total price:) starts:
Item: ([^:]+)(?=\n)
I need a regular expression for the next rules:
should not start or end with a space
should contain just letters (lower / upper), digits, #, single quotes, hyphens and spaces (spaces just inside, but not at the beginning and the end, as I already said)
should contain at least one letter (lower or upper).
Thank you
I think
^[^ ](?=.*[a-zA-Z]+)[a-zA-Z0-9#'\- ]*[^ ]$
should help you.
"Does it really matter guys?"
with regards to the dialect of regex: yes it does matter. Different languages may have different dialects. One example off the top of my head is that the RegEx library in PHP supports lookbehinds whereas RegEx library in JavaScript does not. This is why it is important for you to list the underlying language that you're using. Also for future reference, it is helpful for those wanting to answer your questions to provide us with sample input and sample matches from the input.
Using the information that you provided, this is also a question that I feel as though you should use RegEx and JavaScript to validate the input. Take a look at this example:
window.onload = function() {
var valid = "a1 - 'super' 1";
var invalid1 = " a1 - 'super' 1"; //leading ws
var invalid2 = "a1 - 'super' 1 "; //trailing ws
var invalid3 = "a1 - 'super' 1?"; //invalid (?) char
var invalid4 = "1 - '123'"; //no letters
console.log(valid + ": " + validation(valid));
console.log(invalid1 + ": " + validation(invalid1));
console.log(invalid2 + ": " + validation(invalid2));
console.log(invalid3 + ": " + validation(invalid3));
}
function validation(input) {
var acceptableChars = new RegExp(/[^a-zA-Z\d\s'-]/g);
var containsLetter = new RegExp(/[a-zA-Z]/);
return input.length > 1 && input.trim().length == input.length && !acceptableChars.test(input) && containsLetter.test(input);
}
I have a returned string formatted as below:
PR ER
89
>
from which the number can be extracted by using \n(\d+), but sometimes it returns:
23 PR P 10000>
Or, it could be something like:
23
PR P
10000
>
In these scenarios, how can I extract the number 10000 between PR and >?
This might work for you:
\d+(?=\s*>)
It looks for any sequence of digits followed by any number of whitespaces and a '>'
For java if you need
String str = "23 PR P 10000>";
Pattern reg = Pattern.compile("(\\d+)");
Matcher m = reg.matcher(str);
while (m.find()){
System.out.println("group : " + m. group() + " - start :" + m.start() + " - end :" + m.end());
}
i might just answer this myself
\d+\n>
worked!
thanks all
Best
I would like to have a reg expression which transforms the next sentence
heb/MD/B-VP/O/hebben ik/PRP/B-NP/O/ik zitten/MD/B-VP/O/zitten slapen/VB/I-VP/O/slapen ?/./O/O/?
of/CC/O/O/of heb/MD/B-VP/O/hebben ik/PRP/B-NP/O/ik het/PRP/I-NP/O/het samenwonen/NN/I-NP/O/samenwonen zo/RB/B-ADJP/O/zo lang/JJ/I-ADJP/O/lang uitgesteld/VBN/B-VP/O/uitstellen omdat/CC/O/O/omdat ik/PRP/B-NP/O/ik het/PRP/I-NP/O/het onbewust/JJ/B-ADJP/O/onbewust niet/RB/B-ADVP/O/niet wil/MD/B-VP/O/willen ?/./O/O/?
ben/MD/B-VP/O/zijn ik/PRP/B-NP/O/ik wel/RB/B-VP/O/wel gaan/MD/I-VP/O/gaan houden/VB/I-VP/O/houden van/IN/B-PP/O/van haar/MD/B-VP/O/haren ,/,/O/O/, maar/CC/O/O/maar niet/RB/B-ADVP/O/niet van/IN/B-PP/B-PNP/van haar/PRP$/B-NP/I-PNP/haar -/./O/O/- echte/JJ/B-ADJP/O/echt -/./O/O/- leven/NN/B-NP/O/leven ?/./O/O/?
http:&slash;&slash;www.google.be&slash;test/NNP/B-NP/O/http://www.google.be/test
Into, this desired result:
hebben ik zitten slapen ? of hebben ik het samenwonen zo lang uitstellen omdat ik het onbewust niet willen ? zijn ik wel gaan houden van haren , maar niet/ van haar - echt - leven ? http://www.google.be/test
Therefore, I would like to select "each word" (e.g. heb/MD/B-VP/O/hebben) -> ([^\s]+) and take all the characters (a-z&é"'(§234567etc") until the 4th slash (heb/MD/B-VP/O/).
In such a way that I can replace those matches by " "
Kind regards
I'd use ([^\/]+\/){4} which looks for 4 segments of at least one non \ followed by a \. Then after splitting the input by whitespace you repace that pattern for each word with an empty string.
import re
input_str='heb/MD/B-VP/O/hebben ik/PRP/B-NP/O/ik zitten/MD/B-VP/O/zitten slapen/VB/I-VP/O/slapen ?/./O/O/? of/CC/O/O/of heb/MD/B-VP/O/hebben ik/PRP/B-NP/O/ik het/PRP/I-NP/O/het samenwonen/NN/I-NP/O/samenwonen zo/RB/B-ADJP/O/zo lang/JJ/I-ADJP/O/lang uitgesteld/VBN/B-VP/O/uitstellen omdat/CC/O/O/omdat ik/PRP/B-NP/O/ik het/PRP/I-NP/O/het onbewust/JJ/B-ADJP/O/onbewust niet/RB/B-ADVP/O/niet wil/MD/B-VP/O/willen ?/./O/O/? ben/MD/B-VP/O/zijn ik/PRP/B-NP/O/ik wel/RB/B-VP/O/wel gaan/MD/I-VP/O/gaan houden/VB/I-VP/O/houden van/IN/B-PP/O/van haar/MD/B-VP/O/haren ,/,/O/O/, maar/CC/O/O/maar niet/RB/B-ADVP/O/niet van/IN/B-PP/B-PNP/van haar/PRP$/B-NP/I-PNP/haar -/./O/O/- echte/JJ/B-ADJP/O/echt -/./O/O/- leven/NN/B-NP/O/leven ?/./O/O/? http:&slash;&slash;www.google.be&slash;test/NNP/B-NP/O/http://www.google.be/test'
regex=re.compile(r'([^\/]+\/){4}')
s=[]
for word in input_str.split():
s.append(regex.sub('',word))
print(' '.join(s))
I'm fairly new to regex, I can write expressions to do most simple file renaming jobs now but this one has me stuck.
I'm just trying to change the deliminator in a bunch of filenames from " -" to " - ", some examples:
"Author Name -Series 00 -Title.txt" needs to become:
"Author Name - Series 00 - Title.txt"
"Author_Name -[Series 01] -Title -Genre.txt" needs to become:
"Author_Name - [Series 01] - Title - Genre.txt"
The expression needs to be able to cope with 1, 2 or 3 " -" deliminators, and must ignore all other hyphens, for example "-" "- " and existing " - " should all be ignored. For example:
"File_Name1 - Sometext- more-info (V1.0).txt" Should not be changed at all.
It's for use in File Renamer, which is in Python.
You can use a positive look-ahead, search with the following pattern and replace it afterwards with the correct characters. There is a space in the beginning of the pattern. You can also use the white space selector \s.
-(?=[^ ])
or with the whitespace character \s:
\s-(?=[^ ])
Here is an example to test the pattern in JavaScript:
// expected:
// "Author Name -Series 00 -Title.txt" ->
// "Author Name - Series 00 - Title.txt"
// "Author_Name -[Series 01] -Title -Genre.txt" ->
// "Author_Name - [Series 01] - Title - Genre.txt"
// "File_Name1 - Sometext- more-info (V1.0).txt" ->
// no change
var regex = / -(?=[^ ])/g;
var texts = [
"Author Name -Series 00 -Title.txt",
"Author_Name -[Series 01] -Title -Genre.txt",
"File_Name1 - Sometext- more-info (V1.0).txt"
];
for(var i = 0; i < texts.length; i++) {
var text = texts[i];
console.log(text, "->", text.replace(regex, ' - '));
}