How to mix regex and string in value.replace in OpenRefine / GoogleRefine?

How to mix regex and string in value.replace in OpenRefine / GoogleRefine? - regex

I'm just trying to add "+33 " and remove the first "0" in a phone number like 04 35 73 84 93 (in France) to get +33 4 35 73 84 93 in a database of contacts where a field contains only the phone number.
I tried :
value.replace(/^'0'/,'+33 ')
There is no error, but the result is the same as the original.
I thought it would be very simple (I am a beginner with Open Refine), but it seems I am missing a bigger thing here!
Anyone can help? I searched quite a lot and this seems so simple that no one is speaking about it!

Related

Is there a way to put a section of a line at the start of every subsequent line using regular expressions?

I have a text file in which there is a line with the category and then all items of that category in lines below it. This is followed by 2 empty lines and then the title of the next category and more items in the category. I want to know how I could use regular expressions (specifically with Notepad++) in order to put the category at the start of each of the item's lines so I can save the file as a CSV or TAB file.
I started by isolating one of the categories as such:
Городищенский поссовет 1541
Арабовщина 535
Болтичи 11
Бриксичи 59
Великое Село 160
Гарановичи 34
Грибовщина 3
Душковцы 5
Зеленая 182
Кисели 97
Колдычево 145
Конюшовщина 16
Микуличи 31
Мостытычи 18
Насейки 5
Новоселки 45
Омневичи 53
Поручин 43
Пруды 24
Станкевичи 42
Ясенец 33
I then got as far as getting to be finding for
(.+)(поссовет)(\t\d{4}\r\n)(^.*$\r\n)
and replacing with
$1$2\t$4
which makes the first line
Арабовщина 535
turn into
Городищенский поссовет Арабовщина 535
which is what I want to happen to the rest of the lines but I couldn't get any farther.

Input keeps missing record at end of line

I am learning little sas book. Below is a code from book. and raw data. The issue is when I run it, the final data set keeps missing the record at end of line, i.e., it keeps missing 75 and 56, and label them as missing ("."). Could anyone point out where could possible be the problem? When I add spaces after 75 and 56 at line ends, the problem is gone.
DATA class;
INFILE 'c:\MyRawData\Scores.dat';
INPUT Score ##;
RUN;
PROC UNIVARIATE DATA = class;
VAR Score;
TITLE;
RUN;
Data in that file:
56 78 84 73 90 44 76 87 92 75
85 67 90 84 74 64 73 78 69 56
87 73 100 54 81 78 69 64 73 65
after run it shows more like
56 78 84 73 90 44 76 87 92 .
85 67 90 84 74 64 73 78 69 .
87 73 100 54 81 78 69 64 73 65

My suspicion is that you have something wrong with your end of lines; either you have a spurious character, or your end of line isn't correct in some fashion. Most likely you are using a windows file and you are running in Unix, so you have
75CRLF85
and since Unix uses only LF for line terminator, it sees "75CR" endofline "85", not "75" endofline "85" like it should.
In that case you can either do what you did - add a space, though that likely will still leave some 'blank' records in there - or use TERMSTR in your infile statement to tell SAS how to properly read the file in.
Otherwise, you may have some spurious end characters - for example, if you pasted this from the web, it's possible you have a non-breaking space that is not converted to a regular space.
You can find out by doing this:
data _null_;
infile 'c:\rawdata\myfile.dat';
input #;
put _infile_ $HEX60.;
run;
The 60 is 2x the length of the line. That tells you what SAS is seeing. What you should see:
3536203738203834203733203930203434203736203837203932203735
3835203637203930203834203734203634203733203738203639203536
383720373320313030203534203831203738203639203634203733203635
Digits in ASCII are 30+digit, so 35 is a 5, 36 is a 6, etc. Space is 20. The first line:
35|36|20|37|38|20|38|34|20|37|33|20| ...
so 5 6 space 7 8 space 3 8 space 7 3 space. If you see something else after the 37 35, then you know there is a problem. You might see any of the following:
0A = Line feed.
0D = Carriage return.
A0 = Nonbreaking (web) space.
There are lots of other things you could see, but those are the most likely to trip you up. Pasting from the web is often a problem.

Regular Expression to capture a TMatchCollection of Paragraphs (Using Delphi XE 6)

I'm trying to capture a collection of paragraphs that look like those shown below.
I would like to capture each paragraph in a separate collection. I have figured out how to capture each line independently, but not the full paragraph.
I'm using PCRE engine.
Any help would be greatly appreciated. I think there are may be new lines/line breaks at the end of each line also...if that makes a difference. Some paragraphs may be 5 lines long, or as short as 2 lines.
FORECAST VALID 04/0000Z 33.8N 77.3W
MAX WIND 85 KT...GUSTS 105 KT.
64 KT... 20NE 20SE 0SW 20NW.
50 KT... 40NE 50SE 20SW 40NW.
34 KT...100NE 110SE 70SW 60NW.
FORECAST VALID 04/1200Z 36.3N 74.4W
MAX WIND 90 KT...GUSTS 110 KT.
64 KT... 30NE 30SE 0SW 20NW.
50 KT... 50NE 50SE 30SW 40NW.
34 KT...100NE 110SE 80SW 70NW.
FORECAST VALID 05/0000Z 39.4N 70.2W
MAX WIND 60 KT...GUSTS 75 KT.
50 KT... 60NE 80SE 60SW 60NW.
34 KT...100NE 130SE 110SW 90NW.

using numeric or alphabetic codes in statements; for use in "if" statements

I am wondering how to do something in COBOL. I am trying to write a program that uses if statements to output matching data records from a data file. But I have not done it like this yet see what I need to do is make codes for the different data types.
blue = 1
brown = 2.
So I tried it like this but it wouldn't work. This I have declared in the master-record:
01 COLOR-IN PIC (9)
05 BLUE VALUE 1.
05 BROWN VALUE 2.
Then I figured I could just write an if statement like
IF COLOR-IN = BLUE
PERFORM 200-OUTPUT.
So what I am asking is how do I make the colors equal a numeric or alphabetic code. What kind of statement should I write.
I figured it out. I used the 88 statements. Like this
88 MALE VALUE 'M'.
But I have another problem. The output does list the records that meet the 'if' statement criteria, however, I need to code in the program the actual hair and eye color so that when the program executes it prints the hair and eye color instead of 1 or 2. Can anyone give me an example or hint on how to do that?

+1 for learning about 88s. They are very useful.
A table (array) of labels that correspond to your values is what you're looking for. If you use alphabetic codes, as in your
88 MALE VALUE 'M' case, then your table has an entry for the value and for the label.
01 INPUT-VALUE PIC X(1).
88 MALE VALUE "M".
88 FEMALE VALUE "F".
01 LABELS-AND-VALUES-AREA.
05 LABELS-AND-VALUES.
07 ONE-LABEL-AND-VALUE OCCURS 2.
09 ONE-LABEL PIC X(6).
09 ONE-VALUE PIC X(1).
05 FILLER REDEFINES LABELS-AND-VALUES
VALUE "MALE MFEMALEF".
01 I PIC S9(4) COMP.
01 DISPLAY-LABEL PIC x(6).
MOVE "?" TO DISPLAY-LABEL
PERFORM VARYING I FROM 1 BY 1 UNTIL I > 2
IF INPUT-VALUE = ONE-VALUE(I)
MOVE ONE-LABEL(I) TO DISPLAY-LABEL
END-IF
END-PERFORM
If you use numerics for your input values, you can skip the lookup and go right to the label you want.
01 INPUT-VALUE PIC 9(1).
88 MALE VALUE "1".
88 FEMALE VALUE "2".
88 VALID-INPUT VALUE "1", "2".
01 LABELS-AND-VALUES-AREA.
05 LABELS-AND-VALUES.
07 ONE-LABEL-AND-VALUE OCCURS 2.
09 ONE-LABEL PIC X(6).
05 FILLER REDEFINES LABELS-AND-VALUES
VALUE "MALE FEMALE".
01 DISPLAY-LABEL PIC x(6).
IF VALID-INPUT
MOVE ONE-LABEL(INPUT-VALUE) TO DISPLAY-LABEL
ELSE
MOVE "?" TO DISPLAY-LABEL
END-IF
For this case, you might want to add some code for missing/unknown data.
Update
I added some code to handle missing/unknown data.

Separate adress-chunk: making 3 columns out of one

i have a spreadsheed in calc. with some records. There is a column that contains the following information
Ecole Saint-Exupery
Rue Saint-Malo 24
67544 Paris
Well i need to have those lines divided into at least three columns
name: Ecole Saint-Exupery
street: Rue Saint-Malo 24
postal code and town 67544 Paris
Or even better - i have divided the postal code and town into two seperate columns!?
Question: is this possible? Can (or should) i do this in calc (open document-formate)?
Do i need to have to use a regex and perl or am i able to solve this issues without an regex?
Note - finally i need to transfer the data into MySQL-database...
I look forward to a tipp...
greetings
BTW: you can see all the things in a real world-live-demo: http://192.68.214.70/km/asps/schulsuche.asp?q=a&a=50&s=1750 - see the filed
Schulname
Straße
PLZ Ort
These field contains three things - the name, the street and the Postal Code and the town!
Question: can this be divided into parts!? If you copy and paste the information - and drop it to calc then you get all the information in only one cell. How to divide and seperate all those information into three cells or even four?
BTW - i tried to translate the information to hex-code - see the follwoing...:
Staatl. Realschule Grafenau
Rachelweg 20
94481 Grafenau
00000000: 53 74 61 61 74 6C 2E 20 52 65 61 6C 73 63 68 75
00000010: 6C 65 20 47 72 61 66 65 6E 61 75 20 0A 52 61 63
00000020: 68 65 6C 77 65 67 20 32 30 0A 39 34 34 38 31 20
00000030: 20 47 72 61 66 65 6E 61 75 20 20
but i do not know if this helps here!??
Can you help me to solve the problem. Do i need to have a regex!?
Many thanks in advance for any and all help!

You may not need a regex. You should be able to take the contents of the cell in question and split it up using the newline character that is present. I am not familiar with calc, but if there is a split() or explode() function that returns an array, then splitting on a newline will yield the 3 pieces you are looking for.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to mix regex and string in value.replace in OpenRefine / GoogleRefine? - regex

Related

Is there a way to put a section of a line at the start of every subsequent line using regular expressions?

Input keeps missing record at end of line

Regular Expression to capture a TMatchCollection of Paragraphs (Using Delphi XE 6)

using numeric or alphabetic codes in statements; for use in "if" statements

Separate adress-chunk: making 3 columns out of one

Categories

Resources