I have a function similar to the one below appearing in multiple files. I want to use regex to get rid of all references to outputString, since clearly, they're wasteful.
... other functions, class declarations, etc
public String toString()
{
String outputString = "";
return ... some stuff
+ outputString;
}
... other functions, class declarations, etc
I'm happy to do this in multiple passes. So far I've got regexes to find the first and last line (String outputString = "";$ and ( \+ outputString;)$). However, I've got two problems: first, I want to get rid of the whitespace that results in deleting the two lines that refer to outputString. Second, I need the final ; on the second last line to move up to the line above it.
As a bonus, I'd also like to know what's wrong with adding the line start anchor (^) to either of the regexes I specified. It seems like doing so would tighten them up, but when I try something like ^( \+ outputString;)$ I get zero results.
After all's said and done the function above should look like this:
... other functions, class declarations, etc
public String toString()
{
return ... some stuff;
}
... other functions, class declarations, etc
Here's an example of what "some stuff" might be:
"name" + ":" + getName()+ "," +
"id" + ":" + getId()+ "]" + System.getProperties().getProperty("line.separator") +
" " + "student = "+(getStudent()!=null?Integer.toHexString(System.identityHashCode(getStudent())):"null")
Here's a concrete example:
Current:
public void delete()
{
Student existingStudent = student;
student = null;
if (existingStudent != null)
{
existingStudent.delete();
}
}
public String toString()
{
String outputString = "";
return super.toString() + "["+
"name" + ":" + getName()+ "," +
"id" + ":" + getId()+ "]" + System.getProperties().getProperty("line.separator") +
" " + "student = "+(getStudent()!=null?Integer.toHexString(System.identityHashCode(getStudent())):"null")
+ outputString;
}
public String getId()
{
return id;
}
Required:
public void delete()
{
Student existingStudent = student;
student = null;
if (existingStudent != null)
{
existingStudent.delete();
}
}
public String toString()
{
return super.toString() + "["+
"name" + ":" + getName()+ "," +
"id" + ":" + getId()+ "]" + System.getProperties().getProperty("line.separator") +
" " + "student = "+(getStudent()!=null?Integer.toHexString(System.identityHashCode(getStudent())):"null");
}
public String getId()
{
return id;
}
1st pass:
Find:
.*outputString.*\R
Replace with empty string.
Demo:
https://regex101.com/r/g3aYnp/2
2nd pass:
Find:
(toString\(\)[\s\S]+\))(\s*\R\s*?\})
Replace:
$1;$2
https://regex101.com/r/oxsNRW/3
Assuming that the wanted part of the return expression does not contain any semi colons (i.e. ;) then you can do it in one replace. Search for:
^ +String outputString = "";\R( +return [^;]+?)\R +\+ outputString;
and replace with:
\1;
The idea is to match all three lines in one go, to keep the wanted part and to add the ;.
An interesting point in this replacement. My first attempt had ... return [^;]+)\R +\+ ... and it failed whereas ... return [^;]+)\r\n +\+ ... worked. The \R version appeared to leave a line-break before the final ;. Turning on menu => View => Show symbol => Show end of line reveals that the greedy term within the capture group collected the \r and the \R matched only the \n. Changing to a non-greedy form allowed the \R to match the entire \r\n.
Related
I'm having an issue removing expressions from a QString using QRegExp. I tried a countless number of regex to no avail. What am I doing wrong?
Sample Text (QString myString) In this instance, myString contains "\u0006\u0007\u0013Hello".
myString.remove(QRegExp("\\[u][0-9]{4}"));
It does not remove any instances of \uXXXX where X = numbers.
However, when I am specific such as:
myString.remove("\u0006");
It does remove it.
String literals are not always the same as character sequence
for (char c : "\u0006\u0007\u0013Hello".toCharArray()) {
System.out.println( c + " (" + (int)c + ")" );
}
System.out.println( "--------------" );
for (char c : "\\u0006\\u0007\\u0013Hello".toCharArray()) {
System.out.println( c + " (" + (int)c + ")" );
}
In the first example \u0006 is encoding an unicode code point, whereas in second the string actually contains a backslash.
The string literal only exist at compile time, at runtime they are character sequences.
Regexes are working over character sequence not over string litteral, and also backlash have special meaning and need to be escaped.
Also note that \u0041 is another way to encode A.
Maybe what you are looking for are unicode categories, maybe following can help:
string.replaceAll( "\\p{Cc}", "" )
I have a functions that turns the string :
select * from run_on_hive(server('hdp230m2.labs.teradata.com'),username('vijay'),password('vijay'),dbname('default'),query('analyze table default.test01 compute statistics'));
to:
select * from run_on_hive(server('hdp230m2.labs.teradata.com'),username('vijay'),'****',dbname('default'),query('analyze table default.test01 compute statistics'));
The function looks like :
static SimpleRegexMask::Ptr newUDFMask(const String &udfName, const int paramPos)
{
return SimpleRegexMask::newInstance(
udfName,
udfName + "([^[:alpha:]]*)\\((([^,]*,){" + toString(paramPos - 1) + "})([^,]*)(,[^\\)]*)\\)",
udfName + "\\1\\(\\2'****'\\5\\)"
);
}
These are the functions in the above one. I hope it explains what I am trying to do
static Ptr newInstance(
const String &baseRegex,
const String &replaceRegex,
const String &matchFormatString
)
{
return new SimpleRegexMask(baseRegex, replaceRegex, matchFormatString);
}
SimpleRegexMask::SimpleRegexMask(
const String &baseRegex,
const String &replaceRegex,
const String &matchFormatString
)
{
try {
basePattern_ = boost::regex(
baseRegex, boost::regex_constants::icase|boost::regex_constants::perl
);
replacePattern_ = boost::regex(
replaceRegex, boost::regex_constants::icase|boost::regex_constants::perl
);
matchFormatString_ = matchFormatString;
} catch (const boost::regex_error& ex) {
// programming error i.e. the regex supplied is not valid
NOT_REACHED;
}
}
However, I want to modify the string to
select * from run_on_hive(server('hdp230m2.labs.teradata.com'),username('vijay'),password('****'),dbname('default'),query('analyze table default.test01 compute statistics'));
How shall I modify the above function to do that? Where am I going wrong. Please let me know.
TIA.
You may use
return SimpleRegexMask::newInstance(
udfName,
"(" + udfName + "[^[:alpha:]]*)\\(((?:[^,]*,){" + toString(paramPos - 1) + "}[^,(]*\\(['\"])[^,'\"]*(['\"]\\),[^)]*)\\)",
"\\1\\(\\2****\\3\\)"
);
See the regex demo.
I optimized capturing group usage here (reducing their number) and used [^,(]*\\(['\"])[^,'\"]*(['\"]\\) pattern to match the password part the following way: captures password(' or password(" into Group 2, then just match any 0+ chars other than ,, " or ', and then capture the ') or ") into the subsequent capturing group.
Note that the UDF name is captured into Group 1 and you do not need to hardcode it in the replacement string. So, if it is RUN_ON_HIVE in the string, it will be RUN_ON_HIVE in the result even if you have run_on_hive in the pattern (since you are using a case insensitive modifier).
For example
int val = 13;
Serial.begin(9600);
val = DigitalWrite(900,HIGH);
I really want to extract special symbols like = and ;.
I've been able to extracted symbols that appear adjacent in the code, but I need all occurrences.
I tried [^ "//"A-Za-z\t\n0-9]* and [\;\=\{\}\,]+. Neither worked.
what's wrong?
i had made a rule for my scanner like below.(had been changed)
semicolon [;]([\n]|[^ "//"])
assignment (.)?[=]+
brace ([{]|[}])([\n]|[^ "//"])
roundbarcket ("()")" "
the problem was occurred like these situations
int val= 13; // it couldn't recognize "=" because "val" and "=" is adjoined. i want to recognize them either adjoined or not
serial.read(); // it couldn't recognize () and ; with individually. if i add semicolon rule and roundbarcket rule, (); was recognized.
how can i solve them ?
You want to break "DigitalWrite(900,HIGH);" into "DigitalWrite" "(" "900" "," "HIGH" ")" ";". I think looping each substring is the fastest way.
string text = "val = DigitalWrite(900,HIGH);";
string[] symbols = new string[] { "(", ")", ",", "=", ";"};
List<string> tokens = new List<string>();
string word = "";
for( int i = 0; i < text.Length; i++ )
{
string letter = text.Substring( i, 1 );
if( !letter.Equals( " " ) )
{
if( tokens.Contains( letter ) )
{
if( word.Length > 0 )
{
tokens.Add( word );
word = "";
}
tokens.Add( letter );
}
else
{
word += letter;
if(i == text.Length - 1 )
tokens.Add( word );
}
}
}
So searching for ";" and "=" is the ultimate goal you want to achieve?
In such case, why don't you just use something like .find() function?
Or, you can split strings by ";" first and search for "=" after.
If you want to grab text between "=" and ";", try use =([^;]*); or =(.*?);
I am struggling with a huge Excel sheet where I need to extract from a certain cell (A1),
all occurrences of a string pattern e.g. "TCS" + the following 4 characters after the pattern match e.g. TCS1234 comma-separated into another cell (B1).
Example:
Cell A1 contains the following string:
HRS164, SRS3439(s), SRS3440(s), SRS3441(s), SRS3442(s), SRS3443(s), SRS3444(s), SRS3445(s), SRS3449(s), SRS3450(s), SRS3451(s), SRS3452(s), SYSBASE.SSS300(s), TCS3715(s), TCS3716(s), TCS3717(s), TCS4037(s), TCS1234
All TCS-Numbers shall be comma-separated in B1:
TCS3715, TCS3716, TCS3717, TCS4037, TCS1234
It is not necessary to also extract the followed "(s)".
Could someone please help me (excel rookie) with this challenge?
TIA Erika
Here is what I would use for something like that: also a user defined function:
Function GetTCS(TheString)
For Each TItem In Split(TheString, ", ")
If Left(TItem, 3) = "TCS" Then GetTCS = GetTCS & TItem & " "
Next
GetTCS = Replace(Trim(GetTCS), " ", ", ")
End Function
This returns "TCS3715(s), TCS3716(s), TCS3717(s), TCS4037(s), TCS1234" out of your string. If you don't know how to create a user defined function, just ask, it's pretty straight forward and I'd be happy to show you. Hope this helps.
Try the following User Defined Function:
Public Function Xtract(r As Range) As String
Dim s As String, L As Long, U As Long
Dim msg As String, i As Long
s = Replace(r(1).Text, " ", "")
ary = Split(s, ",")
L = LBound(ary)
U = UBound(ary)
Xtract = ""
msg = ""
For i = L To U
If Left(ary(i), 3) = "TCS" Then
If msg = "" Then
msg = Left(ary(i), 7)
Else
msg = msg & "," & Left(ary(i), 7)
End If
End If
Next i
Xtract = msg
End Function
If the TCS-parts are always at the end of the string as in your example, I would use (in B1):
=REPLACE(A1,1,FIND("TCS",A1)-1,"")
Given the text
public void MyFunction(int i, String str, boolean doIt) {
Log.i(TAG, "Enter MyFunction(int i, String str, boolean doIt)");
I want to make some replacements on the second line, but not the first
public void MyFunction(int i, String str, boolean doIt) {
Log.i(TAG, "Enter MyFunction( i:" + i + ", str:" + str ", doIt:" + doIt + ")");
So far using the following regex I manage to get these results:
find "\w+\s+(\w+)([,\)])"
replace with "$1:" + $1 + "$2"
public void MyFunction(i:" + i + ", str:" + str ", doIt:" + doIt + ") ") {
Log.i(TAG, "Enter MyFunction( i:" + i + ", str:" + str ", doIt:" + doIt + ") ");
Is there any way to force the replace to be executed only on the Log.i lines?
EDIT:
I tried the following regex
"Log\.i\(.*?\((\s*(\w+\s+(\w+)([,\)]))+"
but $1,$2,$3 only contains the last match (the last argument: doIt)
$1=boolean doIt)
$2=doIt
$3=)
when there should be 3 sets of $1,$2,$3, one for each argument.
If you know how to retrieve multiple matches, that would also make for a solution
I caved,
I used this little perl to do the job:
next unless /Log\.i/;
s/TAG,/TAGG/;
s/(final\s+)?[^ \(]+\s+(\w+)([,\)])/$2:\" \+ $2 \+ \"$3/g;
s/TAGG/TAG,/;
with the command line:
perl -pi <scriptname> <file>
If someone still wants to contribute some, I understand I could have run perl as Eclipse external tool to process the java files. How do I do that?
UPDATE:
I wrote a post on how to use external perl to run the script from within Eclipse IDE
see the post