Remove everything before first # in mysql field - regex

Excuse my ignorance.
I need to replace all data in a mysql field before and including the first # .
example field = golfers
data at the first hole the golfer missed a 9 inch putt and said "#hit it bad
new data hit it bad

update table set new_column_name = substring(column_name, instr(column_name, '#') + 1);

Related

Regex (re2 googlesheets) multiple values in multiline cell

Getting stuck on how to read and pretty up these values from a multiline cell via arrayformula.
Im using regex as preceding line can vary.
just formulas please, no custom code
The first column looks like a set of these:
```
[config]
name = the_name
texture = blah.dds
cost = 1000
[effect0]
value = 1000
type = ATTR_A
[effect1]
value = 8
type = ATTR_B
[feature0]
name = feature_blah
[components]
0 = comp_one,1
[resources]
res_one = 1
res_five = 1
res_four = 1
<br/>
Where to be useful elsewhere, at minimum it needs each [tag] set ([effect\d], [feature\d], ect) to be in one column each, for example the 'effects' column would look like:
ATTR_A:1000,ATTR_B:8
and so on.
Desired output can also be seen in the included spreadsheet
<br/>
<b>Here is the example spreadsheet:</b>
https://docs.google.com/spreadsheets/d/1arMaaT56S_STTvRr2OxCINTyF-VvZ95Pm3mljju8Cxw/edit?usp=sharing
**Current REGEXREPLACE**
Kinda works, finds each 'type' and 'value' great, just cant figure out how to extract just that from the rest, tried capture (and non-capturing) groups before and after but didnt work
=ARRAYFORMULA(REGEXREPLACE($A3:$A,"[\n.][effect\d][\n.](.)\n(.)","1:$1 2:$2"))
**Current SUBSTITUTE + REGEXEXTRACT + REGEXREPLACE**
A different approach entirely, also kinda works, longer form though and left with having to parse the values out of that string, where got stuck again. Idea was to use this to simplify, then regexreplace like above. Getting stuck removing content around the final matches though, and if can do that then above approach is fine too.
// First ran a substitute
=ARRAYFORMULA(SUBSTITUTE(SUBSTITUTE($A3:$A,char(10),";"),";;",char(10)))
// Then variation of this (gave up on single line 'effect/d' so broke it up to try and get it working)
=ARRAYFORMULA(IF(A3:A<>"",IFERROR(REGEXEXTRACT(A3:A,"(?m)^(?:[effect0]);(.)$")&";;")&""&IFERROR(REGEXEXTRACT(A3:A,"(?m)^(?:[effect1]);(.)$")&";;")&""&IFERROR(REGEXEXTRACT(A3:A,"(?m)^(?:[effect2]);(.)$")&";;"),""))
// Then use regexreplace like above
=ARRAYFORMULA(REGEXREPLACE($B3:$B,"value = (.);type = (.);;","1:$1 2:$2"))
**--EDIT--**
Also, as my updated 'Desired Output' sheet shows (see timestamped comment below), bonus kudos if you can also extract just the values of matching 'type's to those extra columns (see spreadsheet).
All good if you cant though, just realized would need that too for lookups.
**--END OF EDIT--**
<br/>
Ive tried dozens of things, discarding each in turn, had a quick look in version history to grab out two promising attempts and shared them in separate sheets.
One of these also used SUBSTITUTE to simplify input column, im happy for a solution using either RAW or the SUBSTITUTE results.
<br/>
**Potentially Useful links:**
https://github.com/google/re2/wiki/Syntax
<br/>
<b>Just some more words:</b>
I also have looked at dozens of stackoverflow and google support pages, so tried both REGEXEXTRACT and REGEXREPLACE, both promising but missing that final tweak. And i tried dozens of tweaks already on both.
Any help would be great, and hopefully help others in future since examples with spreadsheets are great since every new REGEX seems to be a new adventure ;)
<br/>
P.S. if we can think of better title for OP, please say in comment or your answer :)
paste in B3:
=ARRAYFORMULA(SUBSTITUTE(TRIM(TRANSPOSE(QUERY(TRANSPOSE(
IF(C3:E<>"", C2:E2&":"&C3:E, )),,999^99))), " ", ", "))
paste in C3:
=ARRAYFORMULA(IFNA(REGEXEXTRACT($A3:$A, "(\d+)\ntype = "&C2)))
paste in D3:
=ARRAYFORMULA(IFNA(REGEXEXTRACT($A3:$A, "(\d+)\ntype = "&D2)))
paste in E3:
=ARRAYFORMULA(IFNA(REGEXEXTRACT($A3:$A, "(\d+)\ntype = "&E2)))
paste in F3:
=ARRAYFORMULA(IFNA(REGEXEXTRACT(A3:A, "\[feature\d+\]\nname = (.*)")))
paste in G3:
=ARRAYFORMULA(IFNA(REGEXEXTRACT(A3:A, "\[components\]\n\d+ = (.*)")))
paste in H3:
=ARRAYFORMULA(IFNA(REGEXREPLACE(INDEX(SPLIT(REGEXEXTRACT(
REGEXREPLACE(A3:A, "\n", ", "), "\[resources\], (.*)"), "["),,1), ", , $", )))
spreadsheet demo
This was a fun exercise. :-)
Caveat first: I have added some "input data". Examples:
[feature1]
name = feature_active_spoiler2
[components]
0 = spoiler,1
1 = spoilerA, 2
So the output has "extra" output.
See the tab ADW's Solution.

Renaming files with no fixed char length in Python

I am currently learning Python 2.7 and am really impressed by how much it can do.
Right now, I'm working my way through basics such as functions and loops. I'd reckon a more 'real-world' problem would spur me on even further.
I use a satellite recording device to capture TV shows etc to hard drive.
The naming convention is set by the device itself. It makes finding the shows you want to watch after the recording more difficult to find as the show name is preceded with lots of redundant info...
The recordings (in .mts format) are dumped into a folder called "HBPVR" at the root of the drive. I'd be running the script on my Mac when the drive is connected to it.
Example.
"Channel_4_+1-15062015-2100-Exams__Cheating_the_....mts"
or
"BBC_Two_HD-19052015-2320-Newsnight.mts"
I included the double-quotes.
I'd like a Python script that (ideally) would remove the broadcaster name, reformat the date info, strip the time info and then put the show's name to the front of the file name.
E.g "BBC_Two_HD-19052015-2320-Newsnight.mts" ->> "Newsnight 19 May 2015.mts"
What may complicate matters is that the broadcaster names are not all of equal length.
The main pattern is that broadcaster name runs up until the first hyphen.
I'd like to be able to re-run this script at later points for newer recordings and not have already renamed recordings renamed further.
Thanks.
Try this:
import calendar
input = "BBC_Two_HD-19052015-2320-Newsnight.mts"
# Remove broadcaster name
input = '-'.join(input.split("-")[1:])
# Get show name
show = ''.join(' '.join(input.split("-")[2:]).split(".mts")[:-1])
# Get time string
timestr = ''.join(input.split("-")[0])
day = int(''.join(timestr[0:2])) # The day is the first two digits
month = calendar.month_name[int(timestr[2:4])] # The month is the second two digits
year = timestr[4:8] # The year is the third through sixth digits
# And the new string:
new = show + " " + str(day) + " " + month + " " + year + ".mts"
print(new) # "Newsnight 19 May 2015.mts"
I wasn't quite sure what the '2320' was, so I chose to ignore it.
Thanks Coder256.
That has given me a bit more insight into how Python can actually help solve real world (first world!) problems like mine.
It tried it out with some different combos of broadcaster and show names and it worked.
I would like though to use the script to rename a batch of recordings/files inside the folder from time to time.
The script did throw and error when processing an already re-named recording, which is to be expected I guess. Should the renamed file have a special character at the start of its name to help avoid this happening?
e.g "_Newsnight 19 May 2015.mts"
Or is there a more aesthetically pleasing way of doing this, with special chars being added on etc.
Thanks.
One way to approach this, since you have a defined pattern is to use regular expressions:
>>> import datetime
>>> import re
>>> s = "BBC_Two_HD-19052015-2320-Newsnight.mts"
>>> ts, name = re.findall(r'.*?-(\d{8}-\d{4})-(.*?)\.mts', s)[0]
>>> '{} {}.mts'.format(name, datetime.datetime.strptime(ts, '%d%m%Y-%H%M').strftime('%d %b %Y'))
'Newsnight 19 May 2015.mts'

Get information from a TXT File

I have a txt file that has 7 columns and I am trying to extract data from. Essentially there is a column that has a lot of minimum values, a column solely consists of dashes, column of only maximum values, and a few others that I would like to break into their own lists (I think thats the way to go). Any help would be much appreciated. Thanks!
Edit: Sorry I should have been clearer. I am using Python 3.5, grabbing right from the txt and using split actually. I guess I should ask where to go from there. I currently have it loading a file and using split(). End game I would like to be able to put each column into its own list so I can calculate averages, percentages, etc. Thanks again, sorry about the bad initial post, its my first time posting here.
file = open("year2000.txt")
for line in file:
z = line.strip()
z = line.find(" ")
min_sal1 = line[:z]
min_sal2 = min_sal1.replace(',', '')
min_sal3 = min_sal2.find('.')
min_sal4 = min_sal1[:min_sal3]
min_sal = int(min_sal4)
print(min_sal4)
y = z.find(' ', 2)
x = z.find(' ', 3)
max_sal = line[y:x]
print(max_sal)
After running this, I get a list of all min salarys like it should, however for max values I am getting just a bunch of blank lines. I also plan on putting each type of value into their own lists. Thanks

Parsing Unbalanced Text into Tables in R

I am trying to pull data from some text files on the SEC's EDGAR webpage and I keep running into a similar problem where there are tables that visually look very simple in the text file, but I have trouble parsing them into something useful in R. In particular, I can't seem to figure out how to balance some of the tables when there are either values missing in a column, especially at the end.
The approach I've taken so far is to read in the text files with readLines and split the strings based on the tab delimiters, but this doesn't always work when there are missing values. Is there a better approach or some way to intelligently coerce each row into a data frame? I can't seem to get rbind.fill to work in this case.
Here is my most recent attempt:
raw.data = readLines("http://www.sec.gov/Archives/edgar/data/1349353/0001349353-13-000002.txt")
# parse basic document information
companyName = gsub("\t\tCOMPANY CONFORMED NAME:\t\t\t","",raw.data[grep("\t\tCOMPANY CONFORMED NAME:\t\t\t",raw.data)])
cik = gsub("\t\tCENTRAL INDEX KEY:\t\t\t","",raw.data[grep("\t\tCENTRAL INDEX KEY:\t\t\t",raw.data)])
secfilename = gsub("<FILENAME>","",raw.data[grep("<FILENAME>",raw.data)])
# trim down to table
table13f = raw.data[(grep("<TABLE>",raw.data)+1):(grep("</TABLE>",raw.data)-1)]
table13f = table13f[!grepl("INFORMATION TABLE",table13f, ignore.case=TRUE)]
table13f = table13f[!grepl("VOTING AUTHORITY",table13f, ignore.case=TRUE)]
table13f = table13f[!grepl("NAME OF ISSUER",table13f, ignore.case=TRUE)]
table13f = table13f[nchar(table13f)>0]
# extract data vectors
splittable = strsplit(table13f,"\t")
splittable2 = data.frame(splittable)
Thanks in advance for the help and/or advice!
You should be able to parse the last table13f string using the following line:
data <- read.csv(text=table13f,header = T,quote = "\"", sep = "\t", fill = T)

Access 2010 Query add text to end of existing text if condition is met

I have a column of data, diagnosis codes to be exact. the problem is that when the data is imported it turns 111.0 into 111 (or any whole number). I am wondering if there is an update query I can run that will add the ".0" to the end of any value that is 3 characters long. I had a problem of it stripping a value from 008.45 to 8.45 but I figured that part out using:
UPDATE Master SET DIAGNOSIS01 = LEFT("00", 3-LEN(DIAGNOSIS01)) + DIAGNOSIS01
WHERE LEN(DIAGNOSIS01)<3 AND Len(DIAGNOSIS01)>0;
I got that from here on stackoverflow. Is there a variation of this update query I can use to add to the right if it's only 3 digits?
Additional info... formats of the values in this column include xxx.x or xxx.xx with x being a number
When it comes to sql I am very new so please treat me like I'm 3... ;)
UPDATE Master
SET Master.DIAGNOSIS01 = IIf(Len([Master].[DIAGNOSIS01])=3,[Master].[DIAGNOSIS01] & ".0",[Master].[DIAGNOSIS01]);