This is not correct use of wildcards ? I'm attempting to match String that contains a date. I don't want to include the date in the returned String or the String value that prepends the matched String.
object FindText extends App{
val toFind = "find1"
val line = "this is find1 the line 1 \n 21/03/2015"
val find = (toFind+".*\\d{2}/\\d{2}/\\d{4}").r
println(find.findFirstIn(line))
}
Output should be : "find1 the line 1 \n "
but String is not found.
Dot does not match newline characters by default. You can set a DOTALL flag to make it happen (I have also added a "positive look-ahead - the (?=...) thingy - since you did not want the date to be included in the match": val find = (toFind+"""(?s).*(?=\d{2}/\d{2}/\d{4})""").r
(Note also, that in scala you do not need to escape special characters in strings, enclosed in a triple-quote pairs ... pretty neat).
The problem lies with the newline in the test string. A .* does not match newlines apparently. Replacing this with .*\\n?.* should fix it. One could also use a multiline flag in the regex such as:
val find = ("(?s)"+toFind+".*\\d{2}/\\d{2}/\\d{4}").r
Related
Given string:
some_function(inputId = "select_something"),
(...)
some_other_function(inputId = "some_other_label")
I would like to arrive at:
some_function(inputId = ns("select_something")),
(...)
some_other_function(inputId = ns("some_other_label"))
The key change here is the element ns( ... ) that surrounds the string available in the "" after the inputId
Regex
So far, I have came up with this regex:
:%substitute/\(inputId\s=\s\)\(\"[a-zA-Z]"\)/\1ns(/2/cgI
However, when deployed, it produces an error:
E488: Trailing characters
A simpler version of that regex works, the syntax:
:%substitute/\(inputId\s=\s\)/\1ns(/cgI
would correctly inser ns( after finding inputId = and create string
some_other_function(inputId = ns("some_other_label")
Challenge
I'm struggling to match the remaining part of the string, ex. "select_something") and return it as:
"select_something")).
You have many problems with your regex.
[a-zA-Z] will only match one letter. Presumably you want to match everything up to the next ", so you'll need a \+ and you'll also need to match underscores too. I would recommend \w\+. Unless more than [a-zA-Z_] might be in the string, in which case I would do .\{-}.
You have a /2 instead of \2. This is why you're getting E488.
I would do this:
:%s/\(inputId = \)\(".\{-}\)"/\1ns(\2)/cgI
Or use the start match atom: (that is, \zs)
:%s/inputId = \zs\".\{-}"/ns(&)/cgI
You can use a negated character class "[^"]*" to match a quoted string:
%s/\(inputId\s*=\s*\)\("[^"]*"\)/\1ns(\2)/g
I am trying to work on regular expressions. I have a mainframe file which has several fields. I have a flat file parser which distinguishes several types of records based on the first three letters of every line. How do I write a regular expression where the first three letters are 'CTR'.
Beginning of line or beginning of string?
Start and end of string
/^CTR.*$/
/ = delimiter
^ = start of string
CTR = literal CTR
$ = end of string
.* = zero or more of any character except newline
Start and end of line
/^CTR.*$/m
/ = delimiter
^ = start of line
CTR = literal CTR
$ = end of line
.* = zero or more of any character except newline
m = enables multi-line mode, this sets regex to treat every line as a string, so ^ and $ will match start and end of line
While in multi-line mode you can still match the start and end of the string with \A\Z permanent anchors
/\ACTR.*\Z/m
\A = means start of string
CTR = literal CTR
.* = zero or more of any character except newline
\Z = end of string
m = enables multi-line mode
As such, another way to match the start of the line would be like this:
/(\A|\r|\n|\r\n)CTR.*/
or
/(^|\r|\n|\r\n)CTR.*/
\r = carriage return / old Mac OS newline
\n = line-feed / Unix/Mac OS X newline
\r\n = windows newline
Note, if you are going to use the backslash \ in some program string that supports escaping, like the php double quotation marks "" then you need to escape them first
so to run \r\nCTR.* you would use it as "\\r\\nCTR.*"
^CTR
or
^CTR.*
edit:
To be more clear: ^CTR will match start of line and those chars. If all you want to do is match for a line itself (and already have the line to use), then that is all you really need. But if this is the case, you may be better off using a prefab substr() type function. I don't know, what language are you are using. But if you are trying to match and grab the line, you will need something like .* or .*$ or whatever, depending on what language/regex function you are using.
Regex symbol to match at beginning of a line:
^
Add the string you're searching for (CTR) to the regex like this:
^CTR
Example: regex
That should be enough!
However, if you need to get the text from the whole line in your language of choice, add a "match anything" pattern .*:
^CTR.*
Example: more regex
If you want to get crazy, use the end of line matcher
$
Add that to the growing regex pattern:
^CTR.*$
Example: lets get crazy
Note: Depending on how and where you're using regex, you might have to use a multi-line modifier to get it to match multiple lines. There could be a whole discussion on the best strategy for picking lines out of a file to process them, and some of the strategies would require this:
Multi-line flag m (this is specified in various ways in various languages/contexts)
/^CTR.*/gm
Example: we had to use m on regex101
Try ^CTR.\*, which literally means start of line, CTR, anything.
This will be case-sensitive, and setting non-case-sensitivity will depend on your programming language, or use ^[Cc][Tt][Rr].\* if cross-environment case-insensitivity matters.
^CTR.*$
matches a line starting with CTR.
Not sure how to apply that to your file on your server, but typically, the regex to match the beginning of a string would be :
^CTR
The ^ means beginning of string / line
There's are ambiguities in the question.
What is your input string? Is it the entire file? Or is it 1 line at a time? Some of the answers are assuming the latter. I want to answer the former.
What would you like to return from your regular expression? The fact that you want a true / false on whether a match was made? Or do you want to extract the entire line whose start begins with CTR? I'll answer you only want a true / false match.
To do this, we just need to determine if the CTR occurs at either the start of a file, or immediately following a new line.
/(?:^|\n)CTR/
(?i)^[ \r\n]*CTR
(?i) -- case insensitive -- Remove if case sensitive.
[ \r\n] -- ignore space and new lines
* -- 0 or more times the same
CTR - your starts with string.
I am trying to parse a file that contains parameter attributes. The attributes are setup like this:
w=(nf*40e-9)*ng
but also like this:
par_nf=(1) * (ng)
The issue is, all of these parameter definitions are on a single line in the source file, and they are separated by spaces. So you might have a situation like this:
pd=2.0*(84e-9+(1.0*nf)*40e-9) nf=ng m=1 par=(1) par_nf=(1) * (ng) plorient=0
The current algorithm just splits the line on spaces and then for each token, the name is extracted from the LHS of the = and the value from the RHS. My thought is if I can create a Regex match based on spaces within parameter declarations, I can then remove just those spaces before feeding the line to the splitter/parser. I am having a tough time coming up with the appropriate Regex, however. Is it possible to create a regex that matches only spaces within parameter declarations, but ignores the spaces between parameter declarations?
Try this RegEx:
(?<=^|\s) # Start of each formula (start of line OR [space])
(?:.*?) # Attribute Name
= # =
(?: # Formula
(?!\s\w+=) # DO NOT Match [space] Word Characters = (Attr. Name)
[^=] # Any Character except =
)* # Formula Characters repeated any number of times
When checking formula characters, it uses a negative lookahead to check for a Space, followed by Word Characters (Attribute Name) and an =. If this is found, it will stop the match. The fact that the negative lookahead checks for a space means that it will stop without a trailing space at the end of the formula.
Live Demo on Regex101
Thanks to #Andy for the tip:
In this case I'll probably just match on the parameter name and equals, but replace the preceding whitespace with some other "parse-able" character to split on, like so:
(\s*)\w+[a-zA-Z_]=
Now my first capturing group can be used to insert something like a colon, semicolon, or line-break.
You need to add Perl tag. :-( Maybe this will help:
I ended up using this in C#. The idea was to break it into name value pairs, using a negative lookahead specified as the key to stop a match and start a new one. If this helps
var data = #"pd=2.0*(84e-9+(1.0*nf)*40e-9) nf=ng m=1 par=(1) par_nf=(1) * (ng) plorient=0";
var pattern = #"
(?<Key>[a-zA-Z_\s\d]+) # Key is any alpha, digit and _
= # = is a hard anchor
(?<Value>[.*+\-\\\/()\w\s]+) # Value is any combinations of text with space(s)
(\s|$) # Soft anchor of either a \s or EOB
((?!\s[a-zA-Z_\d\s]+\=)|$) # Negative lookahead to stop matching if a space then key then equal found or EOB
";
Regex.Matches(data, pattern, RegexOptions.IgnorePatternWhitespace | RegexOptions.ExplicitCapture)
.OfType<Match>()
.Select(mt => new
{
LHS = mt.Groups["Key"].Value,
RHS = mt.Groups["Value"].Value
});
Results:
I want to write a regex that will match any time the substring "my-app" is encountered inside any given string.
I have the following Groovy code:
String regex = ".*my-app*"
String str = getStringFromUserInput()
if(str.matches(regex) {
println "Match!"
} else {
println "Doesn't match..."
}
When getStringFromUserInput() returns a string like "blahmy-appfizz", the code above still reports Doesn't match.... So I figured that hyphens must be a special character in regexes and tried changing the regex to:
String regex = ".*my--app*"
But still nothing has changed. Any ideas as to where I'm going wrong?
The hyphen is no special character.
matches validates the entire input. Try:
String regex = ".*my-app.*"
Note that p* matches zero or more p's and p.* matches a p followed by zero or more chars (other than line breaks).
Assuming getStringFromUserInput() does not leave any line break char in the input. In which case you'd need to do a trim() to get rid of it, since the .* does not match line break chars.
String.contains seems like a simpler solution than a regex, e.g.
String stringFromUser = 'my-app'
assert 'foomy-appfoo'.contains(stringFromUser)
assert !'foo'.contains(stringFromUser)
I have a text file with the following structure:
KEYWORD0 DataKey01-DataValue01 DataKey02-DataValue02 ... DataKey0N-DataValue0N
KEYWORD1 DataKey11-DataValue11 DataKey12-DataValue12 DataKey13-DataValue13
_________DataKey14-DataValue14 DataKey1N-DataValue1N (1)
// It is significant that the additional datakeys are on a new line
(1) the underline is not part of the data. I used it to align the data.
Question: How do I use a regex to convert my data to this format?
<KEYWORD0>
<DataKey00>DataValue00</DataKey00>
<DataKey01>DataValue01</DataKey01>
<DataKey02>DataValue02</DataKey02>
<DataKey0N>DataValue0N</DataKey0N>
</KEYWORD0>
<KEYWORD1>
<DataKey10>DataValue10</DataKey10>
<DataKey11>DataValue11</DataKey11>
<DataKey12>DataValue12</DataKey12>
<DataKey13>DataValue12</DataKey13>
<DataKey14>DataValue12</DataKey14>
<DataKey1N>DataValue1N</DataKey1N>
</KEYWORD1>
Regex is for masochists, it's a very simple text parser in VB.NET (converted from C# so check for bugs):
Public Class MyFileConverter
Public Sub Parse(inputFilename As String, outputFilename As String)
Using reader As New StreamReader(inputFilename)
Using writer As New StreamWriter(outputFilename)
Parse(reader, writer)
End Using
End Using
End Sub
Public Sub Parse(reader As TextReader, writer As TextWriter)
Dim line As String
Dim state As Integer = 0
Dim xmlWriter As New XmlTextWriter(writer)
xmlWriter.WriteStartDocument()
xmlWriter.WriteStartElement("Keywords")
' Root element required for conformance
While (InlineAssignHelper(line, reader.ReadLine())) IsNot Nothing
If line.Length = 0 Then
If state > 0 Then
xmlWriter.WriteEndElement()
End If
state = 0
Continue While
End If
Dim parts As String() = line.Split(Function(c) [Char].IsWhiteSpace(c), StringSplitOptions.RemoveEmptyEntries)
Dim index As Integer = 0
If state = 0 Then
state = 1
xmlWriter.WriteStartElement(parts(System.Math.Max(System.Threading.Interlocked.Increment(index),index - 1)))
End If
While index < parts.Length
Dim keyvalue As String() = parts(index).Split("-"C)
xmlWriter.WriteStartElement(keyvalue(0))
xmlWriter.WriteString(keyvalue(1))
xmlWriter.WriteEndElement()
index += 1
End While
End While
If state > 0 Then
xmlWriter.WriteEndElement()
End If
xmlWriter.WriteEndElement()
xmlWriter.WriteEndDocument()
End Sub
Private Shared Function InlineAssignHelper(Of T)(ByRef target As T, value As T) As T
target = value
Return value
End Function
End Class
Note that I added a root element to the XML because .Net XML objects only like reading and writing conformant XML.
Also note that the code uses an extension I wrote for String.Split.
^(\w)\s*((\w)\s*)(\r\n^\s+(\w)\s*)*
This is starting to get in the neighborhood but I think this is just easier to do in a programming language... just process the file line by line...
You need to use the Groups and Matches feature of Regex in .NET and apply something like:
([A-Z\d]+)(\s([A-Za-z\d]+)\-([A-Za-z\d]+))*
Find a Match and select the first Gruop to find the KEYWORD
Loop through the Matches of Group 3 and 4 to catch the DataKey and DataValue for that KEYWORD
Go to 1
If the DataValue and DataKey items don't can't contain < or > or '-' chars or spaces you can do something like this:
Read your file in a string and to a replaceAll with a regex similar to this: ([^- \t]+)-([^- \t]+) and use this as a replacement (<$1>$2</$1>). This will convert something like this: DataKey01-DataValue01 into something like this: <DataKey01>DataValue01</DataKey01>.
After that you need to run another global replace but this regex ^([^ \t]+)(\s+(?:<[^>]+>[^<]+</[^>]+>[\s\n]*)+) and replace with <$1>$2</$1> again.
This should do the trick.
I don't program in VB.net so i have no idea if the actual syntax is correct (you might need to double or quadruple the \ in some cases). You should make sure the enable the Multiline option for the second pass.
To explain:
([^- \t]+)-([^- \t]+)
([^- \t]+) will match any string of chars not containing or - or \t. This is marked as $1 (notice the parentheses around it)
- will match the - char
([^- \t]+) will again match any string of chars not containing or - or \t. This is also marked as $2 (notice the parentheses around it)
The replacement will just convert a ab-cd string matched with <ab>cd</ab>
After this step the file looks like:
KEYWORD0 <DataKey00>DataValue00</DataKey00> <DataKey01>DataValue01</DataKey01>
<DataKey02>DataValue02</DataKey02> <DataKey0N>DataValue0N</DataKey0N>
KEYWORD1 <DataKey10>DataValue10</DataKey10> <DataKey11>DataValue11</DataKey11>
<DataKey12>DataValue12</DataKey12> <DataKey13>DataValue12</DataKey13>
<DataKey14>DataValue12</DataKey14> <DataKey1N>DataValue1N</DataKey1N>
^([^ \t]+)(\s+(?:<[^>]+>[^<]+</[^>]+>[\s\n]*)+)
^([^ \t]+) mark and match any string of non or \t beginning at the line (this is $1)
( begin a mark
\s+ white space
(?: non marked group starting here
<[^>]+> match an open xml tag: <ab>
[^<]+ match the inside of a tag bc
</[^>]+> match an closing tag </ab>
[\s\n]* some optional white space or newlines
)+ close the non marked group and repeat at least one time
) close the mark (this is $2)
The replacement is straight forward now.
Hope it helps.
But you should probably try to make a simple parser if this is not a one off job :)