I have R objects that have domain names and IP addresses in them. For example.
11.22.44.55.test.url.com.localhost
I used regex in R to capture the IP addresses. My problem is that when there is no match the whole string gets matched or "outputed". This becomes a problem as I work on a very large dataset. I currently have the following using regex
sub("([0-9]+)\\.([0-9]+)\\.([0-9]+)\\.([0-9]+).*","\\1.\\2.\\3.\\4","11.22.44.55.test.url.com.localhost")
which gives me 11.22.44.55
11.22.44.55
but if I were to have to following
sub("([0-9]+)\\.([0-9]+)\\.([0-9]+)\\.([0-9]+).*","\\1.\\2.\\3.\\4","11.22.44.test.url.com.localhost")
Then it gives me
11.22.44.test.url.com.localhost
which is actually not correct. Wondering if there is any solution for this.
You could pre-process with grep to get only the strings that are formatted they way you want them, then use gsub on those.
x <- c("11.22.44.55.test.url.com.localhost", "11.22.44.test.url.com.localhost")
gsub("((\\d+\\.){3}\\d+)(.*)", "\\1", grep("(\\d+\\.){4}", x, value=TRUE))
#[1] "11.22.44.55"
Indeed, your code is working. When sub() fails to match, it returns the original string. From the manual:
For sub and gsub return a character vector of the same length and with the same attributes as x (after possible coercion to character). Elements of character vectors x which are not substituted will be returned unchanged (including any declared encoding). If useBytes = FALSE a non-ASCII substituted result will often be in UTF-8 with a marked encoding (e.g. if there is a UTF-8 input, and in a multibyte locale unless fixed = TRUE). Such strings can be re-encoded by enc2native.
Emphasis added
You could try this pattern:
(?:\d{1,3}+\.){3}+\d{1,3}
I have tested it in Java:
static final Pattern p = Pattern.compile("(?:\\d{1,3}+\\.){3}+\\d{1,3}");
public static void main(String[] args) {
final String s1 = "11.22.44.55.test.url.com.localhost";
final String s2 = "11.24.55.test.url.com.localhost";
System.out.println(getIps(s1));
System.out.println(getIps(s2));
}
public static List<String> getIps(final String string) {
final Matcher m = p.matcher(string);
final List<String> strings = new ArrayList<>();
while (m.find()) {
strings.add(m.group());
}
return strings;
}
Output:
[11.22.44.55]
[]
Look at the gsubfn or strapply functions in the gsubfn package. When you want to return the match rather than replace it, these functions work better than sub.
Related
I want to remove all special symbols from string and have only words in string
I tried this but it gives same output only
main() {
String s = "Hello, world! i am 'foo'";
print(s.replaceAll(new RegExp('\W+'),''));
}
output : Hello, world! i am 'foo'
expected : Hello world i am foo
There are two issues:
'\W' is not a valid escape sequence, to define a backslash in a regular string literal, you need to use \\, or use a raw string literal (r'...')
\W regex pattern matches any char that is not a word char including whitespace, you need to use a negated character class with word and whitespace classes, [^\w\s].
Use
void main() {
String s = "Hello, world! i am 'foo'";
print(s.replaceAll(new RegExp(r'[^\w\s]+'),''));
}
Output: Hello world i am foo.
Fully Unicode-aware solution
Based on What's the correct regex range for javascript's regexes to match all the non word characters in any script? post, bearing in mind that \w in Unicode aware regex is equal to [\p{Alphabetic}\p{Mark}\p{Decimal_Number}\p{Connector_Punctuation}\p{Join_Control}], you can use the following in Dart:
void main() {
String s = "Hęllo, wórld! i am 'foo'";
String regex = r'[^\p{Alphabetic}\p{Mark}\p{Decimal_Number}\p{Connector_Punctuation}\p{Join_Control}\s]+';
print(s.replaceAll(RegExp(regex, unicode: true),''));
}
// => Hęllo wórld i am foo
The docs for the RegExp class state that you should use raw strings (a string literal prefixed with an r, like r"Hello world") if you're constructing a regular expression that way. This is particularly necessary where you're using escapes.
In addition, your regex is going to catch spaces as well, so you'll need to modify that. You can use RegExp(r"[^\s\w]") instead - that matches any character that's not whitespace or a word character
I found this question looking for how to remove a symbol from a string. For others who come here wanting to do that:
final myString = 'abc=';
final withoutEquals = myString.replaceAll(RegExp('='), ''); // abc
First solution
s.replaceAll(RegExp(",|!|'"), ""); // The | operator works as OR
Second solution
s.replaceAll(",", "").replaceAll("!", "").replaceAll("'", "");
Removing characters "," from string:
String myString = "s, t, r";
myString = myString.replaceAll(",", ""); // myString is "s t r"
How would I remove the the first two characters of a QString or if I have to put it a StackOverflows layman's terms:
QString str = "##Name" //output: ##Name
to
output: Name
So far I have used this small piece of code:
if(str.contains("##"))
{
str.replace("##","");
}
..but it doesn't work as I would need to have "##" in some other strings, but not at the beginning.
The first two characters may occur to be "%$" and "##" as well and that mostly the reason why I need to delete the first two characters.
Any ideas?
This the syntax to remove the two first characters.
str.remove(0, 2);
You can use the QString::mid function for this:
QString trimmed = str.mid(2);
But if you wish to modify the string in place, you would be better off using QString::remove as others have suggested.
You can use remove(const QRegExp &rx)
Removes every occurrence of the regular expression rx in the string, and returns a reference to the string. For example:
QString str = "##Name" //output: ##Name
str.remove(QRegExp("[#]."));
//strr == "Name"
I need to write simple program that takes 2 parameters (RegEx pattern and string) and if string does not match answers whether exist larger string (containing smaller one) that can match pattern.
Example1
Input: "^\w+\s+\w+$" and "hello" are not match, but program will return 'true' because there is string "hello word" that contains first one and matches to given pattern
Example2
Input: "^(abc)*$" and "ca" not match, but program will return true because there is string abcabc (contains ca) that matches to pattern.
In short, program needs to answer if such string exist (true/false).
C# (C++, Java) and any help will be appreciated. At least some direction how to do it.
The problem is with the regex you are passing.
In first one, remove the backslash \ before $
In second one, change () with [] (abc)* match abcabcabc and likewise but [abc]* will match a, b, c, ac, ab, aaa, bc, null
In Java (from OP's comment : Java is OK)
import java.util.regex.*;
public class RegExTest{
public static boolean check(String pat, String test){
return test.matches(pat);
}
public static void main(String[] args){
System.out.println(check("^\\w+\\s+\\w+$","hello"));
System.out.println(check("^[abc]*$","ca"));
}
}
Here's the regular expression:
let legalStr = "(?:[eE][\\+\\-]?[0-9]{1,3})?$"
Here's the invocation:
if let match = sender.stringValue.rangeOfString(legalStr, options: .RegularExpressionSearch) {
print("\(sender.stringValue) is legal")
}
else {
print( "\(sender.stringValue) is not legal")
}
If I type garbage, like "abcd" is returns illegal string.
If I type something like "e123" it returns legal string.
(note that the empty string is also legal.)
However, if I type "e1234" it still returns "legal". I'd expect it to return "not legal". Am I missing something here? BTW, note the "$" at the end of the regular expression. The three digits should appear at the end of the string.
If it's not immediately clear, the source of the string is a text edit box.
Your pattern is only anchored at the end, and matches the empty string. So any string at all will match successfully by just matching your pattern as an empty string at the end.
Add a ^ to the front to anchor it on that side, too.
I need to find a variable in a C program and need to convert its 1st letter to upper case. For example:
int sum;
sum = 50;
I need to find sum and I should convert it to Sum. How can I achieve this using regular expressions (find and replace)?
This can't be done with a regex. You need a C language parser for that, otherwise how would you know what is a variable, what is a keyword, what is a function name, what is a word inside a string or a comment...
.Net's Regex replace support what you want to do (if you can come up with the regular expression you need). The ReplaceCC function at the bottom is invoked to provide the replacement value.
static void Main(string[] args)
{
string sInput, sRegex;
// The string to search.
sInput = #"int sum;
sum = 1;";
// A very simple regular expression.
sRegex = "sum";
Regex r = new Regex(sRegex);
MyClass c = new MyClass();
// Assign the replace method to the MatchEvaluator delegate.
MatchEvaluator myEvaluator = new MatchEvaluator(c.ReplaceCC);
// Write out the original string.
Console.WriteLine(sInput);
// Replace matched characters using the delegate method.
sInput = r.Replace(sInput, myEvaluator);
// Write out the modified string.
Console.WriteLine(sInput);
}
public string ReplaceCC(Match m)
{
return m.Value[0].ToUpper () + m.Value.Substring (1);
}