Break String on full stop - regex

I want to break a string on . (full stop).
Ex. String str="We are going there.How are you."
then output should be
We are going there.
How are you.
It should split on "."
But if my string is
Dr.Harry is going. then it should not break like
Dr.
Harry is going.
It should be Dr.Harry is going. as it is.
just like I have some words, if they come in string then it should not break
StringBuffer regex = new StringBuffer("Dr[\\.]|Gr[\\.]|[Aa][\\.][Mm][\\.]|"+ "[Pp][\\.][Mm][\\.]|Emp[\\.]|Rs[\\.]|Ms[\\.]|No[\\.]|Nos[\\.]|"+ "Dt[\\.]|Sh[\\.]|(Mr|mr)[\\.]|(Mrs|mrs)[\\.]|Admn[\\.]|Ad[\\.]|Smt[\\.]|"+ "GOVT[\\.]|Govt[\\.]|Deptt[\\.]|Tel[\\.]|Secy[\\.]|Estt[\\.]|"+ "Asstt[\\.]|Hqrs[\\.]|DY[\\.]|Supdt[\\.]|w[\\.]e[\\.]f[\\.]|"+ "I[\\.]|N[\\.]|[0-9]+[\\.][0-9]+[\\.][0-9]|K[\\.]|NSI[\\.]|"+ "Prof[\\.]|Dte[\\.]|no[\\.]|nos[\\.]|Agri[\\.]|R[\\.]|"+ "K[\\.]|Y[\\.]|C[\\.]|N[\\.]|Dept[\\.]|S[\\.]|Spl[\\.]|N[\\.]|"+ "Sr[\\.]|Addl[\\.]|i[\\.]e[\\.]|Sl[\\.]|CS[\\.]|M[\\.]|IPS[\\.]|"+ "Jt[\\.]|viz[\\.]|hrs[\\.]|S/Sh[\\.]|Jr[\\.]|E[\\.]|S[\\.]|"+ "Pers[\\.]|Deptts[\\.]|OM[\\.]|DT[\\.]|Proj[\\.]|Instrum[\\.]|"+ "Div[\\.]|Dev[\\.]|Env[\\.]|e[\\.]g[\\.]|etc[\\.]|Misc[\\.]|"+ "vig[\\.]|Dr[\\.]|Nos[\\.]|Ltd[\\.]|Maj[\\.]|"+ "Gen[\\.]|MAJ[\\.]|GEN[\\.]|Su[\\.]|/Ess[\\.]|Com[\\.]|St[\\.]|");
these are some words in which string should not split if they come. just like Dr.Harry is going.
Any regular expression is possible ?
or any other method ?
thanks

use this :
search : (?<!(Mr|Dr|Gr|Aa))\.
replace : \n
you can add as many words you want using | after the Aa.
demo here : http://regex101.com/r/fP6hN9
I tried the code below and its working fine for me:
import java.util.*;
import java.lang.*;
import java.io.*;
/* Name of the class has to be "Main" only if the class is public. */
class Ideone
{
public static void main (String[] args) throws java.lang.Exception
{
String str1 = "We are going there.How are you.Mr.Gordon is also coming with us.Are you sure you want to take him", str2;
String substr = "\n", regex = "(?<!(Mr|Dr|Gr|Aa))\\.";
// prints string1
System.out.println("String = " + str1);
/* replaces each substring of this string that matches the given
regular expression with the given replacement */
str2 = str1.replaceAll(regex, substr);
System.out.println("After Replacing = " + str2);
}
}
outputs :
We are going there
How are you
Mr.Gordon is also coming with us
Are you sure you want to take him
checked here : http://ideone.com/YfLU7v

Use this regex:
(?<!(Dr|Gr|Aa|Mm|Pp))\.
Fill in the rest as required. This uses Lookaround

Related

How can I replace all words in a string except one

So, I would like to change all words in a string except one, that stays in the middle.
#include <boost/algorithm/string/replace.hpp>
int main()
{
string test = "You want to join player group";
string find = "You want to join group";
string replace = "This is a test about group";
boost::replace_all(test, find, replace);
cout << test << endl;
}
The output was expected to be:
This is a test about player group
But it doesn't work, the output is:
You want to join player group
The problem is on finding out the words, since they are a unique string.
There's a function that reads all words, no matter their position and just change what I want?
EDIT2:
This is the best example of what I want to happen:
char* a = "This is MYYYYYYYYY line in the void Translate"; // This is the main line
char* b = "This is line in the void Translate"; // This is what needs to be find in the main line
char* c = "Testing - is line twatawtn thdwae voiwd Transwlate"; // This needs to replace ALL the words in the char* b, perserving the MYYYYYYYYY
// The output is expected to be:
Testing - is MYYYYYYYY is line twatawtn thdwae voiwd Transwlate
You need to invert your thinking here. Instead of matching "All words but one", you need to try to match that one word so you can extract it and insert it elsewhere.
We can do this with Regular Expressions, which became standardized in C++11:
std::string test = "You want to join player group";
static const std::regex find{R"(You want to join (\S+) group)"};
std::smatch search_result;
if (!std::regex_search(test, search_result, find))
{
std::cerr << "Could not match the string\n";
exit(1);
}
else
{
std::string found_group_name = search_result[1];
auto replace = boost::format("This is a test about %1% group") % found_group_name;
std::cout << replace;
}
Live Demo
To match the word "player" I used a pretty simply regular expression (\S+) which means "match one or more non-whitespace characters (greedily) and put that into a group"
"Groups" in regular expressions are enclosed by parentheses. The 0th group is always the entire match, and since we only have one set of parentheses, your word is therefore in group 1, hence the resulting access of the match result at search_result[1].
To create the regular expression, you'll notice I used the perhaps-unfamiliar string literal syntaxR"(...)". This is called a raw string literal and was also standardized in C++11. It was basically made for describing regular expressions without needing to escape backslashes. If you've used Python, it's the same as r'...'. If you've used C#, it's the same as #"..."
I threw in some boost::format to print the result because you were using Boost in the question and I thought you'd like to have some fun with it :-)
In your example, find is not a substring of test, so boost::replace_all(test, find, replace); has no effect.
Removing group from find and replace solves it:
#include <boost/algorithm/string/replace.hpp>
#include <iostream>
int main()
{
std::string test = "You want to join player group";
std::string find = "You want to join";
std::string replace = "This is a test about";
boost::replace_all(test, find, replace);
std::cout << test << std::endl;
}
Output: This is a test about player group.
In this case, there is just one replace of the beginning of the string because the end of the string is already the right one. You could have another call of replace_all to change the end if needed.
Some other options:
one is in the other answer.
split the strings into a vector (or array) of words, then insert the desired word (player) at the right spot of the replace vector, then build your output string from it.

check if certain char sequences exists in a string builder and remove it

in java, I am trying to find if a given string has one of the many sub strings using multiple ORs in a single If condition and if any of the sub string exists, remove it. I am not sure how to do it. Also, this string search needs to be case insensitive.
Here is the sample code
if (inputString contains any of the subStrings i.e. "_LOCATION" OR "_MANAGEMENT" Or "_ZIPCODE")
{
remove the subString from inPutString
}
Ex: Given the string - "STATE_CAPITAL_LOCATION_MANAGEMENT_PHONE_EMAIL_zipcode"
Resulting string should be - "STATE_CAPITAL_PHONE_EMAIL"
What is the best way to do it.
Thanks
Using separate If statements makes more easier.
Try this code:
String a="STATE_CAPITAL_LOCATION_MANAGEMENT_PHONE_EMAIL_zipcode";
if(a.contains("_LOCATION"))// relace with your string
{
a=a.replace("_LOCATION","");
System.out.println(a);
}
if(a.contains("_MANAGEMENT"))// relace with your string
{
a=a.replace("_MANAGEMENT","");
System.out.println(a);
}
// .....

regular expression to replace a pattern

I use Microsoft Visual Studio and have a file with some text delimited by | . I need to find a particular pattern and remove it from the file
sometext|maxusage=sometext,,,,...|somemoretext
I want to isolate any | followed by maxusage= , followed by any text upto next |
in the above case, I need to isolate
|maxusage=sometext,,,,...|
its simple and single statement
File.WriteAllText("c:\\test.txt", Regex.Replace(File.ReadAllText("c:\\test.txt"), "\|maxusage=[^\|]+\|", ""));
Note that it certainly works (it would not in case visual studio doesn't implement lazy quantifiers):
/\|maxusage=.*?\|/
use this regex \|maxusage.*?\|
Why not use string.Split() in order to split your string and investigate it?
string[] parts = text.Split('|');
foreach(string s in parts){
//iterate of array and find what you are looking for
}
Here is C# code my friend,
using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
string pattern = #"\p{Sc}*(?<amount>\s?\d+[.,]?\d*)\p{Sc}*";
string replacement = "${amount}";
string input = "$16.32 12.19 £16.29 €18.29 €18,29";
string result = Regex.Replace(input, pattern, replacement);
Console.WriteLine(result);
}
}
// The example displays the following output:
// 16.32 12.19 16.29 18.29 18,29

Appending to end of line in eclipse

Is there a way to append string to the end of lines in Eclipse?
Search and find seems like it would work, but using find with just the regex expression $ does not find any strings. .$ will find something, but running find replace with this deletes the last character of your line, which is undesirable. Does anyone know a way to accomplish this in Eclipse? Is there something I am doing wrong with my regex that might make Eclipse not understand this, while other editors like vim handle it just fine.. (in Vi / Vim :0,$s/$/appended to end of line/).
Surely I am not the only person who wishes there was this functionality... It's offered by most other good editors. Could this be considered a bug?
I agree that this is a bug in eclipse. I tried the same as you with the same results. I also tried to use the regex search string "(?<=.)$" to try to ignore the single character match in the replace but that failed as well. One should be able to search for end of string to append.
Here's a trick to make it work,
Find: "(.)$"
Replace: $1foo
This replaces the single character match before the end of line and appends foo.
That's a lot of hoop jumping but at least it works.
I'm wondering if the best bet would be to run a Java program on the list of variables before you copy them in. I'm not sure of the format of the file which you have cut and paste from but if it is just a list with only the variable names on each line, try this:
ArrayList<String> importarray = new ArrayList<String>();
ArrayList<String> rebuildarray = new ArrayList<String>();
BufferedReader bufferedfile = new BufferedReader();
public static void main(String[] args) {
readFile();
processFile();
}
static void readFile() {
String file = "C:\\path\\file.txt";
try {
String line;
importstart = new BufferedReader(new FileReader(file));
for (line = importstart.readLine(); line != null; line = importstart.readLine()) {
importarray.add (line);
}
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
static void processFile() {
String complete = "";
for (String string : importarray) {
complete = string + "\";";
rebuildarray.add(complete);
}
}
Adding this in would provide an array of variable names with " "; " on the end.
Alternatively, you could use this array in the String declaration and do:
for (String variable : rebuildarray) {
final String string = variable;
doSomething(string);
}
This would negate the need for the addition of ";.
Note sure if this is a bit too much, or even entirely what you were looking for, but they are a couple of ideas.
In my case, using Eclipse Luna (4.4.0), the accepted solution didn't work. It is only replacing the first line and leaving the others. But the worked (wanted to added a semi-colon):
find: ^.*$
Replace: $0;

Regex Rejecting matches because of Instr

What's the easiest way to do an "instring" type function with a regex? For example, how could I reject a whole string because of the presence of a single character such as :? For example:
this - okay
there:is - not okay because of :
More practically, how can I match the following string:
//foo/bar/baz[1]/ns:foo2/#attr/text()
For any node test on the xpath that doesn't include a namespace?
(/)?(/)([^:/]+)
Will match the node tests but includes the namespace prefix which makes it faulty.
I'm still not sure whether you just wanted to detect if the Xpath contains a namespace, or whether you want to remove the references to the namespace. So here's some sample code (in C#) that does both.
class Program
{
static void Main(string[] args)
{
string withNamespace = #"//foo/ns2:bar/baz[1]/ns:foo2/#attr/text()";
string withoutNamespace = #"//foo/bar/baz[1]/foo2/#attr/text()";
ShowStuff(withNamespace);
ShowStuff(withoutNamespace);
}
static void ShowStuff(string input)
{
Console.WriteLine("'{0}' does {1}contain namespaces", input, ContainsNamespace(input) ? "" : "not ");
Console.WriteLine("'{0}' without namespaces is '{1}'", input, StripNamespaces(input));
}
static bool ContainsNamespace(string input)
{
// a namspace must start with a character, but can have characters and numbers
// from that point on.
return Regex.IsMatch(input, #"/?\w[\w\d]+:\w[\w\d]+/?");
}
static string StripNamespaces(string input)
{
return Regex.Replace(input, #"(/?)\w[\w\d]+:(\w[\w\d]+)(/?)", "$1$2$3");
}
}
Hope that helps! Good luck.
Match on :? I think the question isn't clear enough, because the answer is so obvious:
if(Regex.Match(":", input)) // reject
You might want \w which is a "word" character. From javadocs, it is defined as [a-zA-Z_0-9], so if you don't want underscores either, that may not work....
I dont know regex syntax very well but could you not do:
[any alpha numeric]\*:[any alphanumeric]\*
I think something like that should work no?
Yeah, my question was not very clear. Here's a solution but rather than a single pass with a regex, I use a split and perform iteration. It works as well but isn't as elegant:
string xpath = "//foo/bar/baz[1]/ns:foo2/#attr/text()";
string[] nodetests = xpath.Split( new char[] { '/' } );
for (int i = 0; i < nodetests.Length; i++)
{
if (nodetests[i].Length > 0 && Regex.IsMatch( nodetests[i], #"^(\w|\[|\])+$" ))
{
// does not have a ":", we can manipulate it.
}
}
xpath = String.Join( "/", nodetests );