i have now 250 million lines of text from a database.
I want to highlight only certain values, that are only in the third column.
I use this \b1011(3[1-9]\d[1-9]|[4]\d\d\d|5[0-8][0-3][0-6])\b for highlight all values between 10113101 to 10115836.
Can one exclude the numbers from column 4?
Edit: a column means for me the text between the spaces
1 2 3 4 5 ..... columns
307607 1317011864 10113101 -25 13135611 2700 0 0 0 12 0 0 0 walk029h.rwx
2264 910115836 10114632 -15 20111192 900 0 0 0 11 0 0 0 walk029.rwx
326169 1010523891 10115836 -1 20911192 0 0 0 0 11 0 0 0 walk12h.rwx
38718 826265392 10113628 0 10114603 2700 0 0 0 11 0 0 0 street2.rwx
241512 1317011864 636346 0 10113987 900 0 0 0 12 0 0 0 walk029h.rwx
38718 826266129 10113448 0 10114310 900 0 0 0 10 0 0 0 tree5m.rwx
38718 826266243 10113898 0 10114810 900 0 0 0 10 0 0 0 tree9m.rwx
This pattern will capture the numbers you want in the third column only. Refer to capture group 1 for their values.
^(?:\S+\s){2}\b(1011(?:3[1-9]\d{2}|4\d{3}|5[0-8][0-3][0-6]))\b.*
All I did was modify yours to add the prefix and removed some redundancy.
Related
I'm stumped here. I have this measure.
Contains = IF(CONTAINS(Flags, Flags[FlagsConcat], SELECTEDVALUE(SlicerFlags[FlagNames])),1,-1)
The concatenated column looks something like this
Flags
id Bco Nat Gur Ga An Sim Oak Ort FlagsConcat
1826 0 0 0 0 0 0 1 1 Oakpoint,Orthoselect
1784 0 0 0 0 0 0 1 1 Oakpoint,Orthoselect
1503 0 0 0 1 0 0 0 0 Guardian
1502 0 0 0 1 0 0 0 0 Guardian
1500 0 0 0 1 0 0 0 0 Guardian
1499 0 0 0 1 0 0 0 0 Guardian
1326 0 0 0 1 0 0 0 0 Guardian
925 0 0 0 1 0 0 0 0 0 Guardian
and here is the values I am grabbing from the selectedvalue()
FlagNames
Benco
National
Guardian-Simply Clear
Guardian
Angel Align
Simply Clear
Oakpoint
Orthoselect
None
If I select Guardian in the slicerFlags table then I get a return value of 1 but if I select either Oakpoint or Orthoselect then I get a -1 even those there are 2 rows in the table that have either word in the FlagsConcat column. I tried putting spaces before and after the comma but that made no difference. Anyone know why the contains() function isn't showing true when looking for Oakpoint or Orthoselect? Thanks in advance.
#Jeroen's code is correct...
Contains = COUNTROWS(FILTER(Flags, CONTAINSSTRING(Flags[FlagsConcat], SELECTEDVALUE(SlicerFlags[FlagNames]))))+0
I have an array with sections of touching values in it. For example:
0 0 1 0 0 0 0 0 0 0
0 1 1 1 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 2 2 2 0 0
0 0 0 0 0 0 0 2 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 3 0 0 0 0 0 0 0
0 0 3 0 0 0 0 0 0 0
0 0 3 0 0 0 0 0 0 0
from this, I created a set of af::arrays: minX, maxX, minY, maxY. These define the box that encloses each group.
so for this example:
minX would be: [1,5,2] // 1 for label(1), 5 for label(2) and 2 for label(3)
maxX would be: [3,7,2] // 3 for label(1), 7 for label(2) and 2 for label(3)
minY would be: [0,3,7] // 0 for label(1), 3 for label(2) and 7 for label(3)
maxY would be: [1,4,9] // 1 for label(1), 4 for label(2) and 9 for label(3)
So if you take the i'th element from each of those arrays, you can get the upperleft/lowerright bounds of a box that encloses the corresponding label.
I would like use these values to pull out subarrays from this larger array. My goal is to put these values enclosed in the boxes into a flat list. In GPU memory, I also have calculated how many entries I would need for each box using the max/min X/Y values. So in this example - the result of the flat list should be:
result=[0 1 0 1 1 1 2 2 2 0 0 2 3 3 3]
where the first 6 entries are from the box
______
|0 1 0 |
|1 1 1 |
------
the second 6 entries are from the box
______
|2 2 2 |
|0 0 2 |
------
and the final three entries are from the box
___
| 3 |
| 3 |
| 3 |
---
I cannot figure out how to index into this af::array with min/max values in memory that resides on the GPU (and do not want to transfer them to the CPU). I was trying to see if gfor/seq would work for me, but it appears that af::seq cannot use array data, and everything I have tried with using af::index i could not get to work for me either.
I am able to change how I represent min/max (I could store indices for upper left/lower right) but my main goal is to do this efficiently on the GPU without moving data back and forth between the GPU and CPU.
How can this be achieved efficiently with ArrayFire?
Thank you for your help
How did you get there so far? which language are you using?
I guess you could be tiling the results to 3rd dimensions to handle each regions separately and end up with min/max vectors in GPU memory.
I have a text file like this:
6.2341 -0.4024 -2.0936 Cl 0 0 0 0 0 0 0 0 0 0 0 0
0.1148 -3.7525 1.0392 S 0 0 0 0 0 0 0 0 0 0 0 0
-2.5441 -0.8745 1.3714 F 0 0 0 0 0 0 0 0 0 0 0 0
The format is: columns 1 to 10, 11 to 20, 21 to 30 are x,y,z coordinates in (10.4) format, i.e. length=10, 4 digits after the decimal point; column 31 is always a space; columns 32 to 32 are the atom type; the remaining columns are not important.
However, for some unknown reason, the atom type field is right-shifted by two columns, like this:
6.2341 -0.4024 -2.0936 Cl 0 0 0 0 0 0 0 0 0 0 0 0
0.1148 -3.7525 1.0392 S 0 0 0 0 0 0 0 0 0 0 0 0
-2.5441 -0.8745 1.3714 F 0 0 0 0 0 0 0 0 0 0 0 0
How to use the sed command and regular expression to match these lines and delete the two extra spaces?
sed -r 's/(.{30}) /\1/' will do the trick.
Group the first 30 characters, match two additional spaces, replace the whole with the grouped characters.
If you don't mind using neither sed nor regular expressions you can just use cut to remove the 2 offending characters:
$ cut --complement -c31,32 file
6.2341 -0.4024 -2.0936 Cl 0 0 0 0 0 0 0 0 0 0 0 0
0.1148 -3.7525 1.0392 S 0 0 0 0 0 0 0 0 0 0 0 0
-2.5441 -0.8745 1.3714 F 0 0 0 0 0 0 0 0 0 0 0 0
I have a text file organized into columns that looks like this:
# v Col 50
EE84 1484.74 1364.99 62.5 2 1 0 1
EE85 505.23 841.63 60. 2 1 0 1
EE86 945.95 913.39 100. 1 0 0 0
P3 972.44 1126.12 100. 1 0 0 0
P28 980.0 1119.0 100. 1 0 0 0
P100 964.03 1125.93 100. 1 0 0 0
P102 963.49 1133.71 100. 1 0 0 0
P106 974.06 1150.73 100. 1 0 0 0
P108 1017.36 1062.47 100. 1 0 0 0
P109 965.31 1151.14 100. 1 0 0 0
composed of several hundreds lines.
I need to add a value, say 0 in column 50 for each of the lines in the file, so it will look like this:
# v Col 50
EE84 1484.74 1364.99 62.5 0 2 1 0 1
EE85 505.23 841.63 60. 0 2 1 0 1
EE86 945.95 913.39 100. 0 1 0 0 0
P3 972.44 1126.12 100. 0 1 0 0 0
P28 980.0 1119.0 100. 0 1 0 0 0
P100 964.03 1125.93 100. 0 1 0 0 0
P102 963.49 1133.71 100. 0 1 0 0 0
P106 974.06 1150.73 100. 0 1 0 0 0
P108 1017.36 1062.47 100. 0 1 0 0 0
P109 965.31 1151.14 100. 0 1 0 0 0
I could paste the file into LibreOffice Calc, add the column and then paste it back, but that messes with the columns alignment.
I'm using Sublime Text 3 as my text editor, which enables the user to apply regex commands.
Which regex command could I use to do this?
Use Ctrl + H to open the Search and Replace, enable Regular Expression.
Find What: ^[^#].{48}\K
Replace With: 0
Demo
You can search using this regex:
/^((?:\S+\s+){49})/gm
Replace with this expression:
"${1}0 "
RegEx Demo
I am trying for about 2 hours, and I'm not sure whether what I want to do even works.
I have a large file with some data that looks like
43034452 LONGSHIRTPAIETTE 17.30
27.90
0110
COLOR : : : : :
: : :
-11 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
43034453 LONG SHIRT PAI ETTE 16.40
25.90
0110
COLOR : : : : :
: : :
-3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
43034454 BASIC 4.99
8.90
0110
COLOR : : : : :
: : :
-5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
(The file has 36k rows.)
What I want to do is to get this whole thing clean.
In the end, the rows should look like
43034452;LONGSHIRTPAIETTE;17.30;27.90;0110
43034453;LONG SHIRT PAI ETTE;16.40;25.90;0110
43034454;BASIC;4.99;8.90;0110
So there is a lot of data that I don't need. I'm using Notepad++ to do my regex.
My regex string looks like ([0-9]*)\s{6,}([A-Z]*)\s*([0-9\.]*)\s*([0-9\.]*)\s*([0-9]*) at the moment.
This brings me the first number followed by 6 spaces. (It has to be like this because some rows start with FF and FF are not letters. It's some kind of sign that I can't identify but if I let Notepad++ show all signs I see FF.)
So as a result I get
\1: 43034452
\2: LONGSHIRTPAIETTE
\3: 17.30
\4: 27.90
\5: 0110
like expected, but on the next row it stops on the space. If I add \s to the pattern, then it also selects all spaces after the word part. And I obviously can't say "only one space", can I?
So my question is, can I use regex to get a selection like the one I want?
If so, what am I doing wrong?
Try this:
([0-9]+)\s{6,}((?:[A-Z]+\ )+)\s*([0-9\.]+)\s+([0-9\.]+)\s+([0-9]+)
Note a few things:
Tightening the *s to + where this is appropriate, so you're enforcing some characters in those columns, or actual whitespace
The use of a non-capturing group to repeat one or more instances of a word then a space.
Use the below regex
([0-9]*)\s{6,}([A-Z]+(?:\s+[A-Z]+)*)\s*([0-9\.]*)\s*([0-9\.]*)\s*([0-9]*).*?(?=\n\S|$)
and then replace the match with \1;\2;\3;\4;\5
Don't forget to enable the DOTALL modifier s.
DEMO
Your approach is correct.. just replace * with + (more than one) in your regex.
/([0-9]+)\s{6,}([A-Z ]+)\s+([0-9\.]+)\s+([0-9\.]+)\s+([0-9]+)/g
See the DEMO.