I have a text file as below containing multiple lines with = in the middle of each line.
User name = user1
Date expire = Oct 20, 2019
I want to find Date expire and replace the right side of = which is the date with something else via sed. For example, Oct 25, 2019.
I know basic usage of sed 's/foo/bar/g' but that is used for fixed strings. I want to change part of the sentence by detecting a special character.
How can I do that?
Could you please try following.
sed '/Date expire/s/\(.*= \).*/\1 your_new_text_here/' Input_file
Using sed mechanism of storing matched regex values into tempraory buffer. Taking everything into 1st buffer till = and then keeping rest of the line's value without storing onto buffer. Finally substituting whole line with 1st value and new value
For the life of me, I can't figure out the combination of the regular expression characters to use to parse the part of the string I want. The string is part of a for loop giving a line of 400 thousand lines (out of order). The string I have found by matching with the unique number passed by an array for loop.
For every string I'm trying to get a date number (such as 20151212 below).
Given the following examples of the strings (pulled from a CSV file with 400k++ lines of strings):
String1:
314513,,Jr.,John,Doe,652622,U51523144,,20151212,A,,,,,,,
String2:
365422,johnd#blankity.com,John,Doe.,Jr,987235,U23481,z725432,20160221,,,,,,,,
String3:
6231,,,,31248,U51523144,,,CB,,,,,,,
There are several complications here...
Some names have a "," in them, so it makes it more than 15 commas.
We don't know the value of the date, just that it is a date format such as (get-date).tostring("yyyyMMdd")
For those who can think of a better way...
We are given two CSV files to match. Algorithmic steps:
Look in the CSV file 1 for the ID Number (found on the 2nd column)
** No ID Numbers will be blank for CSV file 1
Look in the CSV file 2 and match the ID number from CSV file 1. On this same line, get the date. Once have date, append in 5th column on CSV file 1 with the same row as ID number
** Note: CSV file 2 will have $null for some of the values in the ID
number column
I'm open to suggestions (including using the Import-Csv cmdlet in which I am not to familiar with the flags and syntax of for loops with those values yet).
You could try something like this:
,(19|20)[0-9]{2}(0[1-9]|1[012])(0[1-9]|[12][0-9]|3[01]),
This will match all dates in the given format from 1900 - 2099. It is also specific enough to rule out most other random numbers, although without a larger sample of data, it's impossible to say.
Then in PowerShell:
gc data.csv | where { $_ -match ",((19|20)[0-9]{2}(0[1-9]|1[012])(0[1-9]|[12][0-9]|3[01])),"} | % { $matches[1] }
In the PowerShell match we added capturing parenthesis around what we want, and reference the group via the group number in the $matches index.
If you are only interested in matching one line based on a preceding id you could use a lookbehind. For example,
$id=314513; # Or maybe U23481
gc c:\temp\reg.txt | where { $_ -match "(?<=$id.*),((19|20)[0-9]{2}(0[1-9]|1[012])(0[1-9]|[12][0-9]|3[01])),"} | % { $matches[1] }
I have a file containing a large number of protein sequences. Each sequence is headed up by an initial "protein ID number" (GI number for those that know). I am using a awk command that allows me to print between two regular expressions. Using this, I can enter a list of GI numbers into one regex field where each GI number is separated by a "|". The second regex is a regex I added in after every protein, allowing me to perform the awk function (ABC123).
Therefore the code I am using is as follows
awk '/GI1|GI2|GI3|GI4|GIX.../,/ABC123/' database.txt > output.txt
As you can see from the above code, I am searching within database.txt and writing a new file. The problem is, when I open output.txt the list of GI's is in the wrong order. In output.txt I need them to occur in the same order as they occur in the first regex field i.e
GI1
GI2
GI3...
Instead, they occur in the order which they are found in database.txt, so in output.txt they look all jumbled i.e
Gi3
GI4
GI1
GI2
GI5
Does anyone know how I can get the list of GIs in the output file to match the same order as the list of GIs I input in the 1st regex field?
Try this command,
awk '/GI1|GI2|GI3|GI4|GIX.../,/ABC123/' database.txt | sort -k1.3,1.3 > output.txt
Now your output.txt contains the sorted list.
The specification 1.3,1.3 says that the sort key must starts at field 1 position 3 and ends at the same place.
I am trying to import a tsv file into a mysql db but I am having trouble since the file has no unique delimiters to identify where a new row starts. The only unique identifier is a date followed by a space followed by time. Example: 6/19/2010 16:04:43
Could someone please point me in the right direction or help me make a bash script that puts a semicolon ";" in front of that string. So the end result will be ;6/19/2010 16:04:43
The tricky part is that in this file there will be other date fields and other time fields but this is the only string that will have a space in between the two.
cat file | sed 's#[0-9]\{1,2\}/[0-9]\{1,2\}/[0-9]\{4\} #;&#g' >resultfile. Test before using.
I have a requirement.
I have some files in a folder among which some file names looks like say
**EUDataFiles20100503.txt, MigrateFiles20101006.txt.**
Basically these are the files that I need to work upon.
Now I have a config file where it is mentioned as the file pattern type as
EUDataFilesYYYYMMDD, MigrateFilesYYYYMMDD.
Basically the idea is that, the user can configure the file pattern and based on the pattern mentioned, I need to search for those files that are present in the folder.
i.e. at runtime the YYYYMMDD will get replaced by the Year Month and Date Values. It does not matter what dates will be there(but not with time stamp ; only dates)).
And the EUDataFiles or MigrateFiles names will be there.(they are fixed)
i.e. If the folder has a file name as EUDataFile20100504.txt(i.e. Year 2010, Month 05, Day 04) , I should ignore this file as it is not EUDataFiles20100504.txt (kindly note that the name is plural - File(s) and not file for which the system will ignore the file).
Similarly, if the Pattern given as EUDataFilesYYYYMMDD and if the file present is of type EUDataFilesYYYYDDMM then also the system should ignore.
How can I solve this problem? Is it doable using regular expression(Replacing the pattern at runtime)?
If so can anyone be good enough in helping me out?
I am using C#3.0 and dotnet framework 3.5.
Thanks
You could construct a regex from your basic file name plus (depending on the pattern) sub-regexes.
The sub-regexes could be
yyyy = #"\d{4}"
(unless you want to restrict a certain year range)
mm = #"(1[0-2]|0[1-9])"
dd = #"(3[01]|[12][0-9]|0[1-9])"
Build your regex by adding them in the correct order:
re = #"\AEUDataFiles" + yyyy + mm + dd + #"\.txt\Z"
Then you can check whether the filename(s) you've found match the regex:
foundMatch = Regex.IsMatch(subjectString, re);
Of course, this isn't a validation for correct dates (20100231 would pass), but that's probably not a problem in this case.