Extract data from a table of content with regex [closed]

Extract data from a table of content with regex [closed] - regex

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Closed 8 years ago.
This question appears to be off-topic because it lacks sufficient information to diagnose the problem. Describe your problem in more detail or include a minimal example in the question itself.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Improve this question
Consider the following String, which is a table of content
Table of Content
Name abc ......... 20
Name fghkjkj kjkj . 31
Name.with.dot ..... 45
I want to extract the section's name 'Name abc' 'Name fghkjkj kjkj' and 'Name.with.dot'
I didn't found yet the right regex to achieve that goal, any insights?

I think the following should work:
^.*?(?= \.+ \d+$)
assuming you're working line by line or have MULTILINE mode enabled. The positive lookahead assertion makes sure that we end the match as soon as only dots and a number follow on the line.
Explanation:
^ # Start of line
.*? # Match any number of characters, as few as possible
(?= # Look ahead to assert that the following matches from here:
[ ] # a space
\.+ # one or more dots
[ ] # a space
\d+ # a number
$ # End of line
) # End of lookahead

This positive lookahead based regex should work:
^.+?(?= +\.+ +\d+$)
Live Demo: http://www.rubular.com/r/B5EdXF3SIz

This will do the trick:
^Name[ .]\w+(?:[. ]\w+)?
Explanation:
^ # Start of string
Name # Literal string 'Name'
[ .] # Space or period
\w+ # One or more word characters
(?: # Start non-capturing group
[ .] # Space or period
\w+ # One or more word characters
) # Close noo-capturing group
? # Make previous group optional
Live demo here.

Related

removing specific thousand separator in specific column in notepad++ [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
I have set of numbers format like 1.8789 and would like the output to become 1878.9
These number is inside specific column and have million lines to be update.
I didn't find any similar to solve this.
Below is the highlight screenshot.
data set

For the exact style/precision of number you gave, you may try the following find and replace, in regex mode:
Find: (\d+)\.(\d{3})(\d+)
Replace: $1$2.$3
Demo

Try this code ...
Find: (?<=1)\.(\d{3})
Replace with: $1.

As far as I understand, you want to change only the last column. Here is a way to go:
Ctrl+H
Find what: \.(\d{3})(?=\d*$)
Replace with: $1.
CHECK Wrap around
CHECK Regular expression
Replace all
Explanation:
\. # a dot
(\d{3}) # group 1, 3 digits
(?= # positive lookahead, make sure we have after:
\d* # 0 or more digits
$ # end of line
) # end lookahead
Replacement:
$1 # content of group 1, 3 digits
. # a dot
Screenshot (before):
Screenshot (after):

Regex for italian tags [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I'm working on a regex to take #users
In italian, sometime there is something like l'#Orazio.
I can't find the right way to take this.
I'm using this line /(?<=^|\s)(#(\S+))/
this is my online tester
https://regex101.com/r/s5BTm0/12
as you can see I have an issue with the case 4
Any tips?

You may use
(?<!\S)(?:\w+['’])?#(\S+)
(?<!\w)(?:\w+['’])?#(\S+)
See the regex demo
Details
(?<!\S) - whitespace or start of string must appear immediately to the left of the current location
(?<!\w) - a location the is not immediately preceded with a word char
(?:\w+['’])? - an optinal sequence of 1+ word chars and then ' or ’
# - a # char
(\S+) - Capturing group 1: one or more non-whitespace chars.

Use an optional group:
(?<=^|\s)(?:\w?'?)(#(\S+))
https://regex101.com/r/s5BTm0/5
For more than 1 letter in front, use:
(?<=^|\s)(?:(\w*')?)(#(\S+))
https://regex101.com/r/s5BTm0/8
And to match all your updated cases:
https://regex101.com/r/s5BTm0/10
For different types of apostrophes:
(?<=^|\s)(?:(\w*['´])?)(#(\S+))
https://regex101.com/r/s5BTm0/13

Regex to move return value to function parameter [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I have many lines like the following one:
PlayerInfo[playerid][pValue] = cache_get_value_name_int(i, "field");
However, due to library changes, I am now needed to replace this line with the following:
cache_get_value_name_int(i, "field", PlayerInfo[playerid][pValue]);
The problem is that PlayerInfo[playerid][pValue] is the "word" that changes. Every other line replaces this. Same things happens with "field".
I have a lot of lines which need replacing, at least a couple of hundreds of lines, so I want to find some sort of regex to replace them easily.
Any solutions for this?

In Notepad++ you can use this regex:
([^\s]+) = cache_get_value_name_int\(i,\s*("[^"]+")\s*\);
It searches for some number of non-space characters (captured as group 1), followed by = (you might want to use \s*=\s* if spacing can vary), followed by cache_get_value_name_int(i,, a string enclosed in " (captured as group 2) and then a trailing ) and ;.
and replace it with
cache_get_value_name_int\(i, $2, $1\);
Note that you may need to add \s* in places to account for different spacing.
If the value i can also change, you can use this regex which captures that string as well:
([^\s]+) = cache_get_value_name_int\((\w+),\s*("[^"]+")\s*\);
and replace it with:
cache_get_value_name_int\($2, $3, $1\);

Ctrl+H
Find what: ^(\S+) = (cache_get_value_name_int\(\w+, "\w+")
Replace with: $2, $1
check Wrap around
check Regular expression
Replace all
Explanation:
^ # beginning of line
(\S+) # group 1, 1 or more non space characters
= # space, equal sign, space
( # start group 2
cache_get_value_name_int\( # literally
\w+ # 1 or more word characters
, # comma, space
"\w+" # 1 or more word characters surrounded with quotes
) # end group 2
Replacement:
$2 # content of group 2
, # comma & space
$1 # content of group 1
Result for given example:
cache_get_value_name_int(i, "field", PlayerInfo[playerid][pValue]);
Screen capture:

Filter lines based on range of value, using regex [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
What regex will work to match only certain rows which have a value range (e.g. 20-25 days) in the text raw data (sample below):
[product-1][arbitrary-text][expiry-17days]
[product-2][arbitrary-text][expiry-22days]
[product-3][arbitrary-text][expiry-29days]
[product-4][arbitrary-text][expiry-25days]
[product-5][arbitrary-text][expiry-10days]
[product-6][arbitrary-text][expiry-12days]
[product-7][arbitrary-text][expiry-20days]
[product-8][arbitrary-text][expiry-26days]
'product' and 'expiry' text is static (doesn't change), while their corresponding values change.
'arbitrary-text' is also different for each line/product. So in the sample above, the regex should only match/return lines which have the expiry between 20-25 days.
Expected regex matches:
[product-2][arbitrary-text][expiry-22days]
[product-4][arbitrary-text][expiry-25days]
[product-7][arbitrary-text][expiry-20days]
Thanks.

Please check the following regex:
/(.*-2[0-5]days\]$)/gm
( # start capturing group
.* # matches any character (except newline)
- # matches hyphen character literally
2 # matches digit 2 literally
[0-5] # matches any digit between 0 to 5
days # matches the character days literally
\] # matches the character ] literally
$ # assert position at end of a line
) # end of the capturing group
Do note the use of -2[0-5]days to make sure that it doesn't match:
[product-7][arbitrary-text][expiry-222days] # won't match this

tested this one and it works as expected:
/[2-2]+[0-5]/g
[2-2] will match a number between 2 and 2 .. to restrict going pass the 20es range.
[0-5] second number needs to be between 0 and 5 "the second digit"
{2} limit to 2 digits.
Edit : to match the entire line char for char , this shoudl do it for you.
\[\w*\-\d*\]\s*\[\w*\-[2-2]+[0-5]\w*\]
Edit2: updated to account for Arbitrary text ...
\[(\w*-\d*)\]+\s*\[(\w*\-\w*)\]\s*\[(\w*\-[2-2]+[0-5]\w*)\]
edit3: Updated to match any character for the arbitrary-text.
\[(\w*-\d*)\]\s*\[(.*)\]\s*\[(\w*\-[2-2][0-5]\w*)\]

.*\D2[0-5]d.*
.* matches everything.
\D prevents numbers like 123 and 222 from being valid matches.
2[0-5] covers the range.
d so it doesn't match the product number.
I pasted your sample text into http://regexr.com
It's a useful tool for building regular expressions.

You can try this one :
/(.*-2[0-5]days\]$)/gm
try it HERE

String mask in perl for know format variations [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I am trying to pull information from a list of folders that are organised in a logical manner but have optional parts.
Below is my folder structure with optional fields noted inside <> :
artist - album_nr. album_title <(type)> <(issue_info)> (year) [quality]
So some examples of directories would be named like this
Emperor - 03. Reverence (EP) (1997) [flac]
Emperor - 05b. IX Equilibrium (reissue 2007) (1999) [cue-flac]
Exodus - 01a. Bonded By Blood (1985) [cue-flac]
Exodus - 01b. Bonded By Blood (remaster 2008) (1985) [cue-flac]
Exodus - 03.Tempo of the Damned (EP) (remaster 2008) (1985) [cue-flac]
I need a regex that will correctly pull the relevant parts into an array for further processing but am struggling , mostly because of the optional fields.
At most, the array will contain 7 pieces of information and 5 pieces of information at the very least.
If anyone can help me I will be extremely grateful and it will save me a lot of manual effort.

Using extended notation for legibility:
my $re = qr/
([^-]+?) # artist
\h* #
- # literal '-'
\h* #
([0-9]+[a-z]?) # album number
\. # literal '.'
\h* #
([^(]+?) # album title
\h* #
(?:\(([^)]+)\))? # type (optional)
\h* #
(?:\(([^)]+)\))? # issue info (optional)
\h* #
\(([^)]+)\) # year
\h* #
\[(.+)\] # quality
/x;
Note that this regex always returns seven values (on match) because there are seven captures.
The "trick" to the optional parts you said you were having trouble with is to
navigate among capturing, non-capturing, and literal parentheses. Those portions of the regex break down as follows:
(?: # begin non-capturing grouping (for '?' quantifier at the end)
\( # literal '('
( # begin capture
[^)]+ # any character other than ')', one or more times
) # end capture
\) # literal ')'
) # end non-capturing grouping
? # zero or one quantifier (make everything in group optional)
Edit: In the comments, Jerry correctly points out that there's potential ambiguity about what matched when only one of the optional fields (type or issue info) is present in the data. This can be fixed by making the regex less permissive (at the risk of failing to match some data -- always check whether or not a match was successful). This works for the sample data you provided:
(?:\((\w+\h+[0-9]{4}+)\))? # issue info (optional)
If we do that, it also seems prudent to make the year more restrictive as well.
\(([0-9]{4})\) # year

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Extract data from a table of content with regex [closed] - regex

This positive lookahead based regex should work: ^.+?(?= +\.+ +\d+$) Live Demo: http://www.rubular.com/r/B5EdXF3SIz

Related

removing specific thousand separator in specific column in notepad++ [closed]

Regex for italian tags [closed]

Regex to move return value to function parameter [closed]

Filter lines based on range of value, using regex [closed]

String mask in perl for know format variations [closed]

Categories

Resources