Using regular expression to findi any character or no character? - regex

I'm trying to create a regular expression to find all lines that contain a specific character for example "a". Using
^(.+)a
will only render the lines that don't start with the character "a", but contain them. Is there a way to express any characters or no characters?

I think the regex a should work in most line-by-line matchers.
See linepogl's answer for character-matching the whole line.

Use
^.*a.*$
The + means at least one, while the * means none or more.

The answer is /a/. Its fairly rediculous to have the matching variable actually reproduce the contents of the entire source string.

Related

Regex: Trying to extract all values (separated by new lines) within an XML tag

I have a project that demands extracting data from XML files (values inside the <Number>... </Number> tag), however, in my regular expression, I haven't been able to extract lines that had multiple data separated by a newline, see the below example:
As you can see above, I couldn't replicate the multiple lines detection by my regular expression.
If you are using a script somewhere, your first plan should be to use a XML parser. Almost every language has one and it should be far more accurate compared to using regex. However, if you just want to use regex to search for strings inside npp, then you can use \s+ to capture multiple new lines:
<Number>(\d+\s)+<\/Number>
https://regex101.com/r/MwvBxz/1
I'm not sure I fully understand what you are trying to do so if this doesn't do it then let me know what you are going for.
You can use this find+replace combo to remove everything which is not a digit in between the <Number> tag:
Find:
.*?<Number>(.*?)<\/Number>.*
Replace:
$1
finally i was able to find the right regular expression, I'll leave it below if anyone needs it:
<Type>\d</Type>\n<Number>(\d+\n)+(\d+</Number>)
Explanation:
\d: Shortcut for digits, same as [1-9]
\n: Newline.
+: Find the previous element 1 to many times.
Have a good day everybody,
After giving it some more thought I decided to write a second answer.
You can make use of look arounds:
(?<=<Number>)[\d\s]+(?=<\/Number>)
https://regex101.com/r/FiaTKD/1

Assistance with a regular expression

I am not good with regular expressions, and I could use some help with a couple of expressions I am working on. I have a line of text, such as Text here then 999-99 and I'd like to isolate that number sequence at the end. It could be either 999-99 or 999-99-9. The following seems to work:
\d{3}-\d{2}(-\d{1})?
But I notice that it really just seems to be searching anywhere within the text, as I can add text after the number sequence and it still matches. This needs to be more strict, so that the line must end with this exact sequence, and nothing after it. I tried ending with $ instead of ?, but that never seems to create a match (it always returns false).
I could also use some help with character replacement. I am working on a program which deals with OCR scanning, and occasionally the string value that comes back contains undisplayable characters, represented by the ܀ symbol. Is there a regular expression which will replace the ܀ characters with a space?
Try this regular expression.
([\d-]+)$
This should work. Just end your regex with $. It represents end of line
\d{3}-\d{2}(-\d{1})?$
Use the word-boundary metacharacter, \b:
\b\d{3}-\d{2}(-\d)?\b
You can also remove the {1} from the last \d since it's redundant.

Single Regex for filtering roman numerals from the text files

I am stuck in between of a problem where only one pass of regular expression is allowed( some old hard code). I need the regex for roman numerals.
I have tried the standard one i.e. ^(?i)M*(D?C{0,3}|C[DM])(L?X{0,3}|X[LC])(V?I{0,3}|I[VX])$, but the problem is it allows null('') values also.
Is there any way around to check is problem?
To require that at least one character must be present, you can use a lookahead (?=.) at the start of your regular expression:
^(?=.)(?i)M*(D?C{0,3}|C[DM])(L?X{0,3}|X[LC])(V?I{0,3}|I[VX])$
Another solution is to separately test that your string is not the empty string.
I like this one:
\b(?=[MDCLXVI]+\b)M{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})\b

Regular expression question

I have some text like this:
dagGeneralCodes$_ctl1$_ctl0
Some text
dagGeneralCodes$_ctl2$_ctl0
Some text
dagGeneralCodes$_ctl3$_ctl0
Some text
dagGeneralCodes$_ctl4$_ctl0
Some text
I want to create a regular expression that extracts the last occurrence of dagGeneralCodes$_ctl[number]$_ctl0 from the text above.
the result should be: dagGeneralCodes$_ctl4$_ctl0
Thanks in advance
Wael
This should do it:
.*(dagGeneralCodes\$_ctl\d\$_ctl0)
The .* at the front is greedy so initially it will grab the entire input string. It will then backtrack until it finds the last occurrence of the text you want.
Alternatively you can just find all the matches and keep the last one, which is what I'd suggest.
Also, specific advice will probably need to be given depending on what language you're doing this in. In Java, for example, you will need to use DOTALL mode to . matches newlines because ordinarily it doesn't. Other languages call this multiline mode. Javascript has a slightly different workaround for this and so on.
You can use:
[\d\D]*(dagGeneralCodes\$_ctl\d+\$_ctl0)
I'm using [\d\D] instead of . to make it match new-line as well. The * is used in a greedy way so that it will consume all but the last occurrence of dagGeneralCodes$_ctl[number]$_ctl0.
I really like using this Regular Expression Cheatsheet; it's free, a single page, and printed, fits on my cube wall.

Regex AND'ing

I have to two strings that I want to match everything that doesn't equal them, the first string can be followed by a number of characters. I tried something like this, negating two ors and negating that result.
?!(?!^.*[^Factory]$|?![^AppName])
Any ideas?
Try this regular expression:
(?!.*Factory$|.*AppName)^.*
This matches every string that does not end with Factory and does not contain AppName.
dfa's answer is by far the best option. But if you can't use it for some reason, try:
^(?!.*Factory|AppName)
It's very difficult to determine from your question and your regex what you're trying to do; they seem to imply opposite behaviors. The regex I wrote will not match if Factory appears anywhere in the string, or AppName appears at the beginning of it.
what about
if (!match("(Factory|AppName)")) {
// your code
}
Would it work if you looked for the existence of those two strings and then negated the regex?