Regular expression for rest of line after first x characters - regex

I have a bunch of lines with IDs as the first six characters, and data I don't need after. Is there a way to identify everything after the ID section so Find and Replace can replace it with whitespace?

/.{6}\K.*//
If you want something more specific, please be more specific in your question.

Related

Regex: Trying to extract all values (separated by new lines) within an XML tag

I have a project that demands extracting data from XML files (values inside the <Number>... </Number> tag), however, in my regular expression, I haven't been able to extract lines that had multiple data separated by a newline, see the below example:
As you can see above, I couldn't replicate the multiple lines detection by my regular expression.
If you are using a script somewhere, your first plan should be to use a XML parser. Almost every language has one and it should be far more accurate compared to using regex. However, if you just want to use regex to search for strings inside npp, then you can use \s+ to capture multiple new lines:
<Number>(\d+\s)+<\/Number>
https://regex101.com/r/MwvBxz/1
I'm not sure I fully understand what you are trying to do so if this doesn't do it then let me know what you are going for.
You can use this find+replace combo to remove everything which is not a digit in between the <Number> tag:
Find:
.*?<Number>(.*?)<\/Number>.*
Replace:
$1
finally i was able to find the right regular expression, I'll leave it below if anyone needs it:
<Type>\d</Type>\n<Number>(\d+\n)+(\d+</Number>)
Explanation:
\d: Shortcut for digits, same as [1-9]
\n: Newline.
+: Find the previous element 1 to many times.
Have a good day everybody,
After giving it some more thought I decided to write a second answer.
You can make use of look arounds:
(?<=<Number>)[\d\s]+(?=<\/Number>)
https://regex101.com/r/FiaTKD/1

KNIME regex expression to return 6th line

I have a column with string values present in several lines. I would like to only have the values in the 6th line, all the lines have varying lengths, but all the cells in the column have the information I need in the 6th line.
I am honestly absolutely new and have no background in Java nor KNIME - I have scoured this forum and other internet sources, and none seem to tackle what I need in KNIME specifically - I found something similar but it doesn't work in KNIME:
Regex for nth line in a text file
Your answer will probably need to be broken into two parts
How to do a regex search in KNIME
How to do a regex search for the 6th line
I can help with the regex search, but I don't know KNIME
To start with, you want to know how to search for a single line which is
([^\n]*\n)
This looks for
*: 0 or more of
[^\n]: anything that isn't a new line
followed by \n: a new line
and (): groups them together into a single match
We can then expand this into: ([^\n]*\n){5}([^\n]*\n){1} Which creates 2 capture groups, one with the first 5 lines, the second with the 6th line.
If KNIME supports Non-Capturing groups you can then expand that into the following so that you only have one matching capture group. You can decide for yourself which you like best.
(?:[^\n]*\n){5}([^\n]*\n){1}
I've created an example you can test on RegExr
Regardless of which way you go, make sure to document the regex with comments or stick it into a variable with a very clear name since they aren't particularly human readable

Regex for excluding lines starting with // from a word counter

I am building a novel-writing tool that includes in-line annotations designated by "//" a la JavaScript.
I want to be able to count all of the words that don't belong to an annotation (and therefore belong to the 'real' novel) so that a writer can use this to track their word count goals.
For word counts so far, I've been using: /\S+/g
I've successfully found a way to exclude full lines with a // prefix with ^(?!\/\/).+$/gm
But,
They don't work together, i.e. \S+^(?!\/\/).+$/gm
How would I exclude words between a // and the end of a line? i.e. These words are included.//but these aren't
Some example text with all cases:
// Scene Name - This is a scene description.`
// !Location
// #John #David
Hello, I am very grateful to the Stack Overflow community for teaching me how to fix every problem I've ever had. //wow good content
And here's some more text. This is 30 words.
What am I missing?
[Edit: I am using /\S+/g for the word count regex, not /\w+/g, which counts contractions as two words]
I suggest you divide the operation in two, first you replace using the following (simple) regex:
/\/\/.*/gm
It simply matches any 2 slashes followed by any characters.
Just replace with an empty string. Now you have a nice text without slashes and you can use your word-counting regex to Count the Words.
This pattern should be what you need. ^.+?(?=//)|^(?!//).+
Demo
Let me know if you have any questions.

Regex Match String WIth Repeating Characters

I am looking to use regex to match a string that has multiple instances of the same text. So for instance in this example:
Some text goes here 357313 More text goes here 654321
Some text goes here 123456 More text goes here 123456
Some text goes here 123456 More text goes here 654321
I would want it to match the second option and not the first and third options. I am fairly new to regex but have spent hours looking online to try and figure out if there is a solution to this problem. The strings are not known in order to use them in the search, I need to use regex to figure out if they match or not.
Any help or assistance would be appreciated!
Thanks!
this matches a line like
[some characters][some digits][some chracters][the same digits as before][some characters]
/.+(\d+).+$1.+/
is that what you are searching for?
edit:
/[^\d]+(\d+)[^\d]+$1[^\d]+/
to make shure the [some characters] are no digits
I believe you want something like the following, where it assumes you want one or more matches of some text followed by your unique string.
/^(.+123456){1,}$/
I just realized you may actually be looking to find strings that contain the same sequence of characters more than once. This doesn't really seem like a problem fit for regex to me. While it may be possible to the more advanced regex users I would say that it may not be a good idea to write such a complicated regex. I would refer you to http://en.wikipedia.org/wiki/Longest_common_substring_problem which may have information that would apply here.

Regex: How to dynamically get words after the first word and not the last word in a '_' separated string?

Working on a migrations class in php.
If I have a string like this:
create_users_roles_table
and I want to get the words between the first and the last word correctly, plus being able to get the word correct if there's only one word inbetween like:
create_users_table
How do I go about that?
I've done:
(\B)_([a-zA-Z]+)_?([a-zA-Z]+)_table
and that works fine when I do create_users_roles_table
and produces users and roles.
But when only doing create_users_table it produces user and s.
Obviously I need it to produce only users.
Anyone?
I think it should read
(\B)_([a-zA-Z]+)_?([a-zA-Z]+)?_table
But this won't work if there are three words in between. I'd suggest stripping the words and then splitting them separately, since I don't think regular expressions can handle variable number of capture groups.
If you can be sure of how many words there can be, you can always hard code this. For tree or less words you can use
(\B)_([a-zA-Z]+)(?:_([a-zA-Z]+))?(?:_([a-zA-Z]+))?_table