How can I split certain phrase a String? - regex

I have String with some search items and I want to split them in an array of String.
Example:
String text = "java example \"this is a test\" hello world";
I want to get the following results
result[0] = "java";
result[1] = "example";
result[2] = "\"this is a test\"";
result[3] = "hello";
result[4] = "world";
In short, I want to combine text.split(" ") and text.split("\"");
Is there an easy way to code it?
Thanks!

You can use this regex in String#split method:
(?=(([^\"]*\"){2})*[^\"]*$)\\s+
Code:
String text = "java example \"this is a test\" hello world";
String[] tok = text.split("(?=(([^\"]*\"){2})*[^\"]*$)\\s+");
// print the array
System.out.println( Arrays.toString( arr ) );
Output:
[java, example, "this is a test", hello, world]

This regex should match (\\".+?\\")|([^\s]+)
It matches anything within \" including the \" OR single words.
Check here for results: http://www.regexr.com/399a4

I think you are a bit confused and there are errors in your code!
Composing your string should be:
String text = "java example \"this is a test\" hello world";
The value of the variable text would then be:
java example "this is a test" hello world
I am rather assuming that you want to extract this into the following array:
result[0] = "java";
result[1] = "example";
result[2] = "\"this is a test\"";
result[3] = "hello";
result[4] = "world";
You can do this by using a regular expression, for example:
import java.util.ArrayList;
import java.util.List;
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class Example {
public static void main(String[] args) {
String data = "java example \"this is a test\" hello world";
Pattern p = Pattern.compile("((?:\"[a-z\\s]+\")|[a-z]+)");
Matcher m = p.matcher(data);
List<String> lst = new ArrayList<String>();
while(m.find()) {
lst.add(m.group(1));
}
String[] result= new String[lst.size()];
result = lst.toArray(results);
for(String s: result) {
System.out.println(s);
}
}
}
The regular expression ((?:\"[a-z\\s]+\")|[a-z]+) will match on either:
1) sequences of characters a to z or whitespace between double quotes
2) sequence of characters a to z.
We then extract these matches using m.find

Related

How to trim substrings after a non-letter token in Java

I have a string. In my code, I'm trying to trim substrings after a non-letter token if there are any. What do you think would be a better way to do that?
I tried split, replaceAll functions and matches function with regex but couldn't deliver a good solution.
String initialString = "Brown 1fox jum'ps over 9 the_t la8zy dog.";
String[] splitString = initialString.split(" ");
String finalString= new String();
for (int i = 0; i < splitString.length; i++) {
finalString+=splitString[i].split("[^a-zA-Z]",2)[0]+" ";
}
finalString=finalString.trim().replaceAll("\\s+", " ");
Actual Result (as expected): "Brown jum over the la dog"
As an alternative you might use [^a-zA-Z ]+\S*
to replace the matches with an empty string and after that replace the double whitespace characters with a single using \\s{2,}
String string = "Brown 1fox jum'ps over 9 the_t la8zy dog.";
String result = string.replaceAll("[^a-zA-Z ]+\\S*", "").replaceAll("\\s{2,}", " ");
Demo
All you have to do is this,
String initialString = "Brown 1fox jum'ps over 9 the_t la8zy dog.";
String resultStr = Stream.of(initialString.split(" "))
.map(s -> s.replaceAll("[^A-Za-z].*", ""))
.filter(s -> !s.isEmpty())
.collect(Collectors.joining(" "));

Replacing the 1st regex-match group instead of the 0th

I was expecting this
val string = "hello , world"
val regex = Regex("""(\s+)[,]""")
println(string.replace(regex, ""))
to result in this:
hello, world
Instead, it prints this:
hello world
I see that the replace function cares about the whole match. Is there a way to replace only the 1st group instead of the 0th one?
Add the comma in the replacement:
val string = "hello , world"
val regex = Regex("""(\s+)[,]""")
println(string.replace(regex, ","))
Or, if kotlin supports lookahead:
val string = "hello , world"
val regex = Regex("""\s+(?=,)""")
println(string.replace(regex, ""))
You can retrieve the match range of the regular expression by using the groups property of MatchGroupCollection and then using the range as a parameter for String.removeRange method:
val string = "hello , world"
val regex = Regex("""(\s+)[,]""")
val result = string.removeRange(regex.find(string)!!.groups[1]!!.range)

Trying to split a long String into List ( which each var in the list represent a word) using Python

Given (for example):
text = "Hello world Hello Stack"
I need to make a list which contains each word (only) in text.
The list should look like:
the_list = ["Hello","world","Hello","Stack"]
I tried to do that by
the_list = text.split(' ')
Of course, it doesn't work.
Can someone explain me what should I write in order to get the ideal list?
You're very close. You can just call text.split without any arguments and it should work.
text = "Hello world Hello Stack"
the_list = text.split()
using regex library we can achieve you want to get
import re
text = "Hello world Hello Stack"
answer = re.sub("[^\w]", " ", text).split()
Regular expressions would work in this case
text = "Hello world Hello Stack"
import re
my_list = re.sub(r'\s+', ' ', text).split(' ') #replacing one or more whitespaces with a single whitespace
print my_list #prints ['Hello', 'world', 'Hello', 'Stack']
Works fine on Python 3
text = "Hello world Hello Stack"
the_list = []
t = ''
count = 0
for char in text:
count += 1
if len(the_list) > 0 and the_list[len(the_list) - 1] == ' ':
del the_list[len(the_list) - 1]
t += char
if char == ' ' or count == len(text):
the_list.append(t)
t = ''
print(the_list)

RegularExpression get strings between new lines

I want to taking every string who is located on a new line with Regular Expression
string someStr = "first
second
third
"
example:
string str1 = "first";
string str2 = "second";
string str3 = "third";
Or if you just want the first word of each line;
^(\w+).*$ with multi-line flag.
Regex101 has a nice regex testing tool: https://regex101.com/r/JF3cKR/1
Just split it with "\n";
someStr.split("\n")
And you can filter the empty strings if you'd like
Or if you really want regex, do /^.*$/ with multiline flag
List<String> listOfLines = new ArrayList<String>();
Pattern pattern = Pattern.compile("^.*$", Pattern.MULTILINE);
Matcher matcher = pattern.matcher("first\nsecond\nthird\n");
while (matcher.find()) {
listOfLines.add(matcher.group());
}
Then you have;
listOfLines.get(0) = first
listOfLines.get(1) = second
listOfLines.get(2) = third
You can use the following regex :
(\w+)(?=\n|"|$)
see demo

c# Regex- Remove string which developed only combination of special charter

I am looking for regular expression by which I can ignore strings which is only combination of All special charters.
Example
List<string> liststr = new List<string>() { "a b", "c%d", " ", "% % % %" ,"''","&","''","'"}; etc...
I need result of this one
{ "a b", "c%d"}
You can use this, too, to match string without any Unicode letter:
var liststr = new List<string>() { "a b", "c%d", " ", "% % % %", "''", "&", "''", "'" };
var rx2 = #"^\P{L}+$";
var res2 = liststr.Where(p => !Regex.IsMatch(p, rx2)).ToList();
Output:
I also suggest creating the regex object as a private static readonly field, with Compiled option, so that performance is not impacted.
private static readonly Regex rx2 = new Regex(#"^\P{L}+", RegexOptions.Compiled);
... (and inside the caller)
var res2 = liststr.Where(p => !rx2.IsMatch(p)).ToList();
Use this one :
.*[A-Za-z0-9].*
It matches at least one alphanumeric character. Doing this, it will take any string that is not only symbols/special chars. It does the output you want, see here : demo
You can use a very simple regex like
Regex regex = new Regex(#"^[% &']+$");
Where
[% &'] Is the list of special characters that you wish to include
Example
List<string> liststr = new List<string>() { "a b", "c%d", " ", "% % % %" ,"''","&","''","'"};
List<string> final = new List<string>();
Regex regex = new Regex(#"^[% &']+$");
foreach ( string str in liststr)
{
if (! regex.IsMatch(str))
final.Add(str);
}
Will give an output as
final = {"a b", "c%d"}