R Regular expression for string containing full stops - regex

I have a bunch of strings, some of which end with ..t.. I am trying to find a regular expression to match these strings but dealing with the full stops is giving me a headache!
I have tried
grep('^.+(..t.)$', myStrings)
but this also matches strings such as w...gate. I think I am dealing with the full stops incorrectly. Any help at all appreciated.
Note: I am using grep within R.

Since you are only checking if the end of the string ends with ..t., you can eliminate ^.+ in your pattern.
The dot . in regular expression syntax is a character of special meaning which matches any character except a newline sequence. To match a literal dot or any other character of special meaning you need to escape \\ it.
> x <- c('foo..t.', 'w...gate', 'bar..t.foo', 'bar..t.')
> grep('\\.{2}t\\.$', x)
# [1] 1 4
Or place that character inside of a character class.
> x <- c('foo..t.', 'w...gate', 'bar..t.foo', 'bar..t.')
> grep('[.]{2}t[.]$', x)
# [1] 1 4
Note: I used the range operator \\.{2} to match two dots instead of escaping it twice \\.\\.

k, a little bit of better googling provided the answer;
grep("^.+(\\.\\.t\\.)$", myStrings)
this works because we need to escape the point as \\. in R.

The dot(.) matches only a single character.. to remove meaning of dot u should use double slash before dot char (\\).
try this instead.....
grep('^.+(\\.\\.t\\.)$', myStrings)
Satheesh Appu

Related

Regex Check Whether a string contains characters other than specified

How to check whether a string contains character other than:
Alphabets(Lowe-Case/Upper-Case)
digits
Space
Comma(,)
Period (.)
Bracket ( )
&
'
$
+(plus) minus(-) (*) (=) arithmetic operator
/
using regular expression in ColdFusion?
I want to make sure a string doesn't contain even single character other than the specified.
You can find if there are any invalid characters like this:
<cfif refind( "[^a-zA-Z0-9 ,.&'$()\-+*=/]" , Input ) >
<!--- invalid character found --->
</cfif>
Where the [...] is a character class (match any single char from within), and the ^ at the start means "NOT" - i.e. if it finds anything that is not an accepted char, it returns true.
I don't understand "Small Bracket(opening closing)", but guess you mean < and > there? If you want () or {} just swap them over. For [] you need to escape them as \[\]
Character Class Escaping
Inside a character class, only a handful of characters need escaping with a backslash, these are:
\ - if you want a literal backslash, escape it.
^ - a caret must be escaped if it's the first character, otherwise it negates the class.
- - a dash creates a range. It must be escaped unless first/last (but recommended always to be)
[ and ] - both brackets should be escaped.
ColdFusion uses Java's engine to parse regular expressions, anyway to make sure a string doesn't contain one of the characters you mentioned then try:
^(?![a-zA-Z0-9 ,.&$']*[^a-zA-Z0-9 ,.&$']).*$
The above expression would only work if you are parsing the file line by line. If you want to apply this to text which contains multiple lines then you need to use the global modifier and the multi-line modifier and change the expression a bit like this:
^(?![a-zA-Z0-9 ,.&$']*[^a-zA-Z0-9 ,.&$'\r\n]).*$
Regex101 Demo
The regular expression:
[^][a-zA-Z0-9 ,.&'$]
will match if the string contains any characters other than the ones in your list.

regular expression what's the meaning of this regular expression s#^.*/##s

what is the meaning of s#^.*/##s
because i know that in the pattern '.' denotes that it can represent random letter except the \n.
then '.* 'should represent the random quantity number of random letter .
but in the book it said that this would be delete all the unix type of path.
My question is that, does it means I could substitute random quantity number of random letter by space?
s -> subsitution
# -> pattern delimiter
^.* -> all chars 0 or more times from the begining
/ -> literal /
## -> replace by nothing (2 delimiters)
s -> single line mode ( the dot can match newline)
Substitutions conventionally use the / character as a delimiter (s/this/that/), but you can use other punctuation characters if it's more convenient. In this case, # is used because the regexp itself contains a / character; if / were used as the delimiter, any / in the pattern would have to be escaped as \/. (# is not the character I would have chosen, but it's perfectly valid.)
^ matches the beginning of the string (or line; see below)
.*/ matches any sequence of characters up to and including a / character. Since * is greedy, it will match all characters up to an including the last / character; any precedng / characters are "eaten" by the .*. (The final / is not, because if .* matched all / characters the final / would fail to match.)
The trailing s modifier treats the string as a single line, i.e., causes . to match any character including a newline. See the m and s modifiers in perldoc perlre for more information.
So this:
s#^.*/##s
replaces everything from the beginning of the string ($_ in this case, since that's the default) up to the last / character by nothing.
If there are no / characters in $_, the match fails and the substitution does nothing.
This might be used to replace all directory components of an absolute or relative path name, for example changing /home/username/dir/file.txt to file.txt.
It will delete all characters, including line breaks because of the s modifier, in a string until the last slash included.
Please excuse a little pedantry. But I keep seeing this and I think it's important to get it right.
s#^.*/##s is not a regular expression.
^.* is a regular expression.
s/// is the substitution operator.
The substitution operator takes two arguments. The first is a regular expression. The second is a replacement string.
The substitution operator (like many other quote-like operators in Perl) allows you you change the delimiter character that you use.
So s### is also a substitution operator (just using # instead of /).
s#^.*/## means "find the text that matches the regular expression ^.*/ and replace it with an empty string. And the s on the end is a option which changes the regex so that the . matches "\n" as well as all other characters.

Match a string with regexp

I am having a string like
-------- AGG x y PORT-16385-INFO ----------------------------+
I want to extract the "AGG x y PORT-16385-INFO ". However this pattern in not same. It can have any number of spaces inbetween .
Help me with the regexp to get the string.
I am using this regexp
regexp {\s+(.*)\-\-*} $a - am
Ouput
AGG PORT-16385-INFO ---------------------------
this is not i want. Help me with regexp.
Well, I'll assume your delimiter is at least two - long and is seperated via a space from the contents. Then a trivial regex like
--\s+(.*?)\s+--
would already work. The *? quantifier does non-greedy matching, to terminate as early as possible.
If this regex works depends strongly on allowed values and the exact format of your input, which you have not sufficiently explained.
I am also suprised you tagged this as Perl — I am quite sure your code isn't valid Perl code.
If you do not want to use the . character class, then we can rewrite it to match all non-hyphen characters or a single hyphen followed by a non-hyphen:
--\s+((?:[^-]+|-[^-])*)\s+--
You might want to disallows newlines along the hyphens as well.
Using .*? can work, like amon says, however, I sometimes find that the non-greedy quantifier is somewhat unpredictable. You can use anchors to make the greedy quantifier do the same thing:
^-+ (.*) -+\+$
Here we require the string to start and end with the specified sequence of dashes (and a plus sign at the end), so the greedy match is not allowed to match too much.
In tcl, you easily handle it using string trim.
set a "-------- AGG x y PORT-16385-INFO ----------------------------+"
set b [string trim $a +-]; # to remove all + and -
set b [string trim $b]; # to remove all the white spaces
puts $b

meaning of a regexp if ($_ =~ /-\n/)

I am a beginner of perl scripting.
I know hyphen (-) is used to specify the range.
But what if it is mentioned in the beginning of the expression?
Example:
if ($_ =~ /-\n/)
//do something
How to interpret the above code?
"if the parameter is equal to a range of newline" ?
(No, that is weird understanding :-/)
Please help.
Outside of [] - means "-" as far as I know, it only indicates a range within a [] block.
Here is a more complete answer I found
How to match hyphens with Regular Expression? (look at the second answer)
So the expression should match a - followed by a newline or line ending with -
The pattern will match hyphens "-" followed by a newline \n.
The hyphen is treated as a range operator inside character classes, as explained in perldoc perlrequick:
The special character '-' acts as a range operator within character
classes, so that the unwieldy [0123456789] and [abc...xyz] become
the svelte [0-9] and [a-z] :
/item[0-9]/; # matches 'item0' or ... or 'item9'
/[0-9a-fA-F]/; # matches a hexadecimal digit
If '-' is the first or last character in a character class, it is
treated as an ordinary character.
This means:
If there is a hyphen immediately followed by a newline-character, no matter where this pair of characters is located inside the string.

Replace repeating characters with one with a regex

I need a regex script to remove double repetition for these particular words..If these character occurs replace it with single.
/[\s.'-,{2,0}]
These are character that if they comes I need to replace it with single same character.
Is this the regex you're looking for?
/([\s.'-,])\1+/
Okay, now that will match it. If you're using Perl, you can replace it using the following expression:
s/([\s.'-,])\1+/$1/g
Edit: If you're using :ahem: PHP, then you would use this syntax:
$out = preg_replace('/([\s.\'-,])\1+/', '$1', $in);
The () group matches the character and the \1 means that the same thing it just matched in the parentheses occurs at least once more. In the replacement, the $1 refers to the match in first set of parentheses.
Note: this is Perl-Compatible Regular Expression (PCRE) syntax.
From the perlretut man page:
Matching repetitions
The examples in the previous section display an annoying weakness. We were only matching 3-letter words, or chunks of words of 4 letters or less. We'd like to be able to match words or, more generally, strings of any length, without writing out tedious alternatives like \w\w\w\w|\w\w\w|\w\w|\w.
This is exactly the problem the quantifier metacharacters ?, *, +, and {} were created for. They allow us to delimit the number of repeats for a portion of a regexp we consider to be a match. Quantifiers are put immediately after the character, character class, or grouping that we want to specify. They have the following meanings:
a? means: match 'a' 1 or 0 times
a* means: match 'a' 0 or more times, i.e., any number of times
a+ means: match 'a' 1 or more times, i.e., at least once
a{n,m} means: match at least "n" times, but not more than "m" times.
a{n,} means: match at least "n" or more times
a{n} means: match exactly "n" times
As others said it depends on you regex engine but a small example how you could do this:
/([ _-,.])\1*/\1/g
With sed:
$ echo "foo , bar" | sed 's/\([ _-,.]\)\1*/\1/g'
foo , bar
$ echo "foo,. bar" | sed 's/\([ _-,.]\)\1*/\1/g'
foo,. bar
Using Javascript as mentioned in a commennt, and assuming (It's not too clear from your question) the characters you want to replace are space characters, ., ', -, and ,:
var str = 'a b....,,';
str = str.replace(/(\s){2}|(\.){2}|('){2}|(-){2}|(,){2}/g, '$1$2$3$4$5');
// Now str === 'a b..,'
If I understand correctly, you want to do the following: given a set of characters, replace any multiple occurrence of each of them with a single character. Here's how I would do it in perl:
perl -pi.bak -e "s/\.{2,}/\./g; s/\-{2,}/\-/g; s/'{2,}/'/g" text.txt
If, for example, text.txt originally contains:
Here is . and here are 2 .. that should become a single one. Here's
also a double -- that should become a single one. Finally here we have
three ''' which should be substituted with one '.
it is modified as follows:
Here is . and here are 2 . that should become a single one. Here's
also a double - that should become a single one. Finally here we have
three ' which should be substituted with one '.
I simply use the same replacement regex for each character in in the set: for example
s/\.{2,}/\./g;
replaces 2 or more occurrences of a dot character with a single dot. I concatenate several of this expressions, one for each character of your original set.
There may be more compact ways of doing this, but, I think this is simple and it works :)
I hope it helps.