Regex: Finding all line breaks without an " as previous character

Regex: Finding all line breaks without an " as previous character - regex

I have a file with a bunch of data looking like this:
"sc14b61ecf5ef162","sc14b61b07ba1690","1264806000","1264806000","780","1080","Navn arrangement:
Dørene åpner:
Arr.start:
Arr:slutt:
Dørene stenger:
HA (navn):
HA (tlf):
Type arrangement: (her: om konsert, gjerne sjanger)
Forvetet antall gjester:"
"sc14b61f9e35f569","sc14b61bf07647db","1265583600","1265583600","1020","1260","Nord/Sør
Foredrag
Ønsker skjenking"
This repeats itself many times (with different data). I would like it to look like this:
"sc14b61ecf5ef162","sc14b61b07ba1690","1264806000","1264806000","780","1080","Navn arrangement:Dørene åpner:Arr.start:Arr:slutt:Dørene stenger:HA (navn):HA (tlf):Type arrangement: (her: om konsert, gjerne sjanger)Forvetet antall gjester:"
"sc14b61f9e35f569","sc14b61bf07647db","1265583600","1265583600","1020","1260","Nord/Sør Foredrag Ønsker skjenking"
I think that what I need is some way to remove all line-breaks that does not have an " in front of it, but my regex is weak.
I'm using Textwrangler (the text editor for OS X).

This is called negative look behind. This should do the trick.
(?<!")\n
Per #ax in the comments, on a Mac, you may need to change the \n to \r like so:
(?<!")\r
If that still doesnt work, sometimes you may need to combine the two:
(?<!")\r\n
One of these should meet your needs.

Related

Format a text file by regex match and replace

I have a text file that looks like the following:
Chanelle
Jettie
Winnie
Jen
Shella
Krysta
Tish
Monika
Lynwood
Danae
2649
2466
2890
2224
2829
2427
2816
2648
2833
2453
I need to make it look like this
Chanelle 2649
Jettie 2466
... ...
I tried a lot on sublime editor but couldn't figure out the regex to do that. Can somebody demonstrate if it can be done.

I tested the following in Notepad++ but it should work universally.
Use this as the search string:
(?:(\s+[A-Za-z]+)(\r?\n))((?:\s*[A-Za-z]*\r?\n)+)\s+(\d+)
and this as the replacement:
$1 $4$2$3
Running a replace with it once will do one line at a time, if you run it multiple times it'll continue to replace lines until there are no matching lines left.
Alternatively, you can use this as the replacement if you want to have the values aligned by tabs, but it's not going to match in all cases:
$1\t\t$4$2$3

While the regex answer by SeinopSys will work, you don't need a regex to do this - instead, you can take advantage of Sublime's multiple cursors.
Place your cursor at the beginning of line 1, then hold down Shift↓ to select all the names.
Hit CtrlShiftL (Selection -> Split into Lines) to split the selection into lines.
CtrlC to copy.
Place your cursor on line 11 (the first number line) and press CtrlShift↓ (Windows/OS X) or AltShift↓ (Linux) to place a cursor at the beginning of each number line.
Hit CtrlV to paste the names before the numbers.
You can now delete the names at the top and you're all set. Alternatively, you could use CtrlX to cut the names in step 3.

RegEx stops working when adding too many characters

So I'm trying to return every Event inside a icr file (calender file based on vCalendar [.vcs]) with Regex (inside AutoIt). So an event inside a icr file starts with the line BEGIN:VEVENT and ends with END:VEVENT. I read the file to variable x and replace every new line in x with '[n', so the RegEx looks something like (BEGIN:VEVENT\[n(?:\[n|[^\[]+)+END:VEVENT) (begin, a number greater 0 of newlines or chars unequal to [ and end)
This works fine when I insert something like 'foo[nBEGIN:VEVENT[ndata[nEND:VEVENT[nbar' but here comes the problem: I have two teststrings, upper one is returning a result, lower one isnt:
1[nBEGIN:VEVENT[ndata1[nEND:VEVENT[nxxxxxxxxxxx[BEGIN:VEVENT[ndata2[nEND:VEVENT
1[nBEGIN:VEVENT[ndata1[nEND:VEVENT[nxxxxxxxxxxxx[BEGIN:VEVENT[ndata2[nEND:VEVENT
You can test it for yourself at regex101.com

Try using this pattern, it will not limit what is inside the VEVENT[n:
(BEGIN:VEVENT\[ndata(?:\[n|[^\[])+END:VEVENT)
Example: http://regex101.com/r/zL2sK1

For everyone too lazy to take a look at the comments: this is the solution l'L'l came up with, with some tweaking by me:
(?:BEGIN:VEVENT)\[n(.+?)\[.(?:END:VEVENT)

VB.Net Beginner: Replace with Wildcards, Possibly RegEx?

I'm converting a text file to a Tab-Delimited text file, and ran into a bit of a snag. I can get everything I need to work the way I want except for one small part.
One field I'm working with has the home addresses of the subjects as a single entry ("1234 Happy Lane Somewhere, St 12345") and I need each broken down by Street(Tab)City(Tab)State(Tab)Zip. The one part I'm hung up on is the Tab between the State and the Zip.
I've been using input=input.Replace throughout, and it's worked well so far, but I can't think of how to untangle this one. The wildcards I'm used to don't seem to be working, I can't replace ("?? #####") with ("??" + ControlChars.Tab + "#####")...which I honestly didn't expect to work, but it's the only idea on the matter I had.
I've read a bit about using Regex, but have no experience with it, and it seems a bit...overwhelming.
Is Regex my best option for this? If not, are there any other suggestions on solutions I may have missed?
Thanks for your time. :)
EDIT: Here's what I'm using so far. It makes some edits to the line in question, taking care of spaces, commas, and other text I don't need, but I've got nothing for the State/Zip situation; I've a bad habit of wiping something if it doesn't work, but I'll append the last thing I used to the very end, if that'll help.
If input Like "Guar*###/###-####" Then
input = input.Replace("Guar:", "")
input = input.Replace(" ", ControlChars.Tab)
input = input.Replace(",", ControlChars.Tab)
input = "C" + ControlChars.Tab + strAccount + ControlChars.Tab + input
End If
input = System.Text.RegularExpressions.Regex.Replace(" #####", ControlChars.Tab + "#####") <-- Just one example of something that doesn't work.
This is what's written to input in this example
" Guar: LASTNAME,FIRSTNAME 999 E 99TH ST CITY,ST 99999 Tel: 999/999-9999"
And this is what I can get as a result so far
C 99999/9 LASTNAME FIRSTNAME 999 E 99TH ST CITY ST 99999 999/999-9999
With everything being exactly what I need besides the "ST 99999" bit (with actual data obviously omitted for privacy and professional whatnots).
UPDATE: Just when I thought it was all squared away, I've got another snag. The raw data gives me this.
# TERMINOLOGY ######### ##/##/#### # ###.##
And the end result is giving me this, because this is a chunk of data that was just fine as-is...before I removed the Tabs. Now I need a way to replace them after they've been removed, or to omit this small group of code from a document-wide Tab genocide I initiate the code with.
#TERMINOLOGY###########/##/########.##
Would a variant on rgx.Replace work best here? Or can I copy the code to a variable, remove Tabs from the document, then insert the variable without losing the tabs?

I think what you're looking for is
Dim r As New System.Text.RegularExpressions.Regex(" (\d{5})(?!\d)")
Dim input As String = rgx.Replace(input, ControlChars.Tab + "$1")
The first line compiles the regular expression. The \d matches a digit, and the {5}, as you can guess, matches 5 repetitions of the previous atom. The parentheses surrounding the \d{5} is known as a capture group, and is responsible for putting what's captured in a pseudovariable named $1. The (?!\d) is a more advanced concept known as a negative lookahead assertion, and it basically peeks at the next character to check that it's not a digit (because then it could be a 6-or-more digit number, where the first 5 happened to get matched). Another version is
" (\d{5})\b"
where the \b is a word boundary, disallowing alphanumeric characters following the digits.

Complex regex situation

I have a results list that looks like this:
1lemon_king9mumu (2-1), YearofHell (2-0), kriswithak (2-1)0.44440.75000.4444
2mumu6lemon_king (1-2), MogwaiAC (2-0), Dathanja (2-1)0.66670.62500.5655
3MogwaiAC6Dathanja (2-0), mumu (0-2), Jebnarf (2-1)0.55560.57140.5417
4Jebnarf6YearofHell (2-1), kriswithak (2-0), MogwaiAC (1-2)0.44440.62500.4266
5YearofHell3Jebnarf (1-2), lemon_king (0-2), Mig82 (2-1)0.66670.37500.6012
6Dathanja3MogwaiAC (0-2), Mig82 (2-1), mumu (1-2)0.55560.37500.5417
7Mig823Bye, Dathanja (1-2), YearofHell (1-2)0.33330.42860.3750
8kriswithak0Jebnarf (0-2), lemon_king (1-2)0.83330.20000.6875
I want to be able to pull the username of the person AFTER the rank (first number) but it is mashed together with points gained by the player, as well as their first opponent.
For example, the first persons name is "Lemon_king", and his opponents were "Mumu", "YearofHell" and "Kriswithak". The numbers on the right are irrelevant for me, but the major problem I have is that the number of points won by the player is there. Lemon_King wins 9 points for first place. I would normally just get the name by looking for the string between 1 and 9, but players usernames can have a 9 in it as well.
Can anyone think of a good solution to this problem to be able to grab the persons username?
Thanks

I think you'd need a list of the usernames to compare against; it doesn't look like the results list is "regular" enough for a regular expression.
For example the line
7Mig823Bye, Dathanja
Could be "Mig82" 3 points vs "Bye, Dathanja", but it could also be "Mig8", 23 points, "Bye, Dathanja" or "Mig8", 2 points, "3Bye, Dathanja".
Is that correct? Because if it is, you aren't going to get away with a simple solution.
Edit: Wilson commented that getting the list of usernames might be an option. In that case, something like the following might work:
/^\d+?(username1|username2|username3)\d+?(username1|username2|username3)/
It will probably take some fiddling to get right.
Here's a plnkr demonstrating it with the data you provided: http://plnkr.co/edit/nJeGfbfHgvh5zJcTWRXS?p=preview
That said, a regex might not be the right tool for this job.

As far as I can tell, you want something like
(?x) # allow whitespace and comments just like
# any real programming language
^ # beginning of line
( \d+ ) # starts with one or more digits: CAPTURE 1
(?= \D ) # must have a non-digit following
( \w+ ) # capture one or more "word" characters: CAPTURE 2
( \d ) # next is a single digit: CAPTURE 3
(?= \D ) # must have a non-digit following
( \w+ ) # capture one or more "word" characters: CAPTURE 4
# now add things for the rest of the line if you want
Your username should now be in the second capture. I’ve been a tad more careful than strictly necessary, but if you end up munging this, you may need that. I’ve alos put all the captures in case you want to move stuff around or pull more stuff out.

Please provide a bit more information, if you want the thing between the first number and second number:
[0-9]+([^0-9])
The first group will contain the first username.
Please comment on this (so I check) an edit your question with more detail though.

I wouldnt use regex. It will be a pain to debug it, and you'll never be 100% certain you've covered all the edge cases.
Try doing 'manual' parsing using your language of choice's built in string manipulation functions.

Regexp: Keyword followed by value to extract

I had this question a couple of times before, and I still couldn't find a good answer..
In my current problem, I have a console program output (string) that looks like this:
Number of assemblies processed = 1200
Number of assemblies uninstalled = 1197
Number of failures = 3
Now I want to extract those numbers and to check if there were failures. (That's a gacutil.exe output, btw.) In other words, I want to match any number [0-9]+ in the string that is preceded by 'failures = '.
How would I do that? I want to get the number only. Of course I can match the whole thing like /failures = [0-9]+/ .. and then trim the first characters with length("failures = ") or something like that. The point is, I don't want to do that, it's a lame workaround.
Because it's odd; if my pattern-to-match-but-not-into-output ("failures = ") comes after the thing i want to extract ([0-9]+), there is a way to do it:
pattern(?=expression)
To show the absurdity of this, if the whole file was processed backwards, I could use:
[0-9]+(?= = seruliaf)
... so, is there no forward-way? :T

pattern(?=expression) is a regex positive lookahead and what you are looking for is a regex positive lookbehind that goes like this (?<=expression)pattern but this feature is not supported by all flavors of regex. It depends which language you are using.
more infos at regular-expressions.info for comparison of Lookaround feature scroll down 2/3 on this page.

If your console output does actually look like that throughout, try splitting the string on "=" when the word "failure" is found, then get the last element (or the 2nd element). You did not say what your language is, but any decent language with string splitting capability would do the job. For example
gacutil.exe.... | ruby -F"=" -ane "print $F[-1] if /failure/"

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js