Replace comma between quotes in CSV with Regex - regex

We have for example a string like this:
"COURSE",247,"28/4/2016 12:53 Europe/Brussels",1,"Verschil tussen merk, product en leveranciersverantwoordelijke NL","Active Enro"
The Goal is to replace the comma between "merk, product" and to keep the comma like "," and ", & ," so we can split the file correctly.
Any suggestions?
Kind regards

First of all, you should check Understanding CSV files and their handling in ABAP article.
For a one-time job, you can use this regex (but note that with longer strings, it may not work well, use it as a means of last resort):
,(?!(?:[^"]*"[^"]*")*[^"]*$)
See the regex demo
Pattern details:
, - a comma that...
(?! - is not followed with....
(?: -
[^"]* - zero or more chars other than "
" - a double quote
[^"]*" - see above
)* - zero or more sequences of the above grouped patterns
[^"]* - zero or more chars other than "
$ - end of string
) - end of negative lookahead

I've found a better solution than Regex, by using the class CL_RSDA_CSV_CONVERTER. No need of reinventing the wheel.
See code below:
TYPES: BEGIN OF ttab,
rec(1000) TYPE c,
END OF ttab.
TYPES: BEGIN OF tdat,
userid(100) TYPE c,
activeuser(100) TYPE c,
firstname(100) TYPE c,
lastname(100) TYPE c,
middlename(100) TYPE c,
supervisor(100) TYPE c,
supervisor_firstname(100) TYPE c,
supervisor_lastname(100) TYPE c,
supervisor_middle(100) TYPE c,
scheduled_offering_id(100) TYPE c,
description(100) TYPE c,
domain(100) TYPE c,
registration(100) TYPE c,
current_registration(100) TYPE c,
max_registration(100) TYPE c,
item_type(100) TYPE c,
item_id(100) TYPE c,
item_revision_date(100) TYPE c,
revision_number(100) TYPE c,
title(100) TYPE c,
status(100) TYPE c,
start_date(100) TYPE c,
end_date(100) TYPE c,
location(100) TYPE c,
instructor_fistname(100) TYPE c,
instructor_lastname(100) TYPE c,
instructor_middlename(100) TYPE c,
column_number(100) TYPE c,
label(100) TYPE c,
value(100) TYPE c,
description2(100) TYPE c,
start_date_short(100) TYPE c,
begda TYPE begda,
start_time(100) TYPE c,
start_time_24_hour(100) TYPE c,
start_12_hour_type(100) TYPE c,
start_timezone(100) TYPE c,
end_date_short(100) TYPE c,
endda TYPE endda,
end_time(100) TYPE c,
end_time_24_hour(100) TYPE c,
end_12_hour_type(100) TYPE c,
end_timezone(100) TYPE c,
pernr TYPE pernr_d,
END OF tdat.
CONSTANTS: co_delete TYPE pspar-actio VALUE 'DEL',
co_attendance TYPE string VALUE '2002',
co_att_prelp TYPE prelp-infty VALUE '2002',
co_att_subty TYPE string VALUE '3000'.
DATA:
itab TYPE TABLE OF ttab WITH HEADER LINE,
idat TYPE TABLE OF tdat WITH HEADER LINE,
lw_idat LIKE LINE OF idat,
lw_found_training LIKE LINE OF idat,
file_str TYPE string,
lv_uname TYPE syuname,
lo_person TYPE REF TO zhr_cl_pa_person,
lv_input_time TYPE tims,
lv_output_time TYPE tims,
lv_day(2) TYPE c,
lv_month(2) TYPE c,
lv_year(4) TYPE c,
lv_time(6) TYPE c,
lv_abap_date TYPE string,
lv_lock_return LIKE bapireturn1,
ls_attendance LIKE bapihrabsatt_in,
lt_attendance_output TYPE TABLE OF bapiret2,
ls_return LIKE bapireturn,
ls_return1 LIKE bapireturn1,
lt_absatt_data TYPE TABLE OF pprop,
lw_absatt_data LIKE LINE OF lt_absatt_data,
lt_pa2002 TYPE TABLE OF pa2002,
lw_pa2002 LIKE LINE OF lt_pa2002,
lw_msg TYPE bapireturn1,
lt_p2002 TYPE TABLE OF p2002,
lw_p2002 LIKE LINE OF lt_p2002,
lc_pgmid TYPE old_prog VALUE 'ZKA_TEXT_UPDATE',
lr_upd_cluster TYPE REF TO cl_hrpa_text_cluster,
ls_text TYPE hrpad_text,
ls_pskey TYPE pskey,
lt_text_194 TYPE hrpad_text_tab,
lv_text TYPE string,
lo_ref TYPE REF TO cx_hrpa_invalid_parameter,
lw_struct TYPE tdat,
lo_csv TYPE REF TO cl_rsda_csv_converter.
CALL METHOD cl_rsda_csv_converter=>create
RECEIVING
r_r_conv = lo_csv.
CREATE OBJECT lr_upd_cluster.
*--------------------------------------------------*
* selection screen design
*-------------------------------------------------*
SELECTION-SCREEN BEGIN OF BLOCK selection1 WITH FRAME.
PARAMETERS: p_file TYPE localfile.
SELECTION-SCREEN SKIP.
SELECTION-SCREEN BEGIN OF LINE.
SELECTION-SCREEN COMMENT 4(51) text-002.
PARAMETERS p_futatt AS CHECKBOX DEFAULT 'X'.
SELECTION-SCREEN END OF LINE.
SELECTION-SCREEN BEGIN OF LINE.
SELECTION-SCREEN COMMENT 4(51) text-001.
PARAMETERS p_active AS CHECKBOX DEFAULT 'X'.
SELECTION-SCREEN END OF LINE.
SELECTION-SCREEN END OF BLOCK selection1.
*--------------------------------------------------*
* at selection screen for field
*-------------------------------------------------*
AT SELECTION-SCREEN ON VALUE-REQUEST FOR p_file.
CALL FUNCTION 'KD_GET_FILENAME_ON_F4'
EXPORTING
static = 'X'
CHANGING
file_name = p_file.
*--------------------------------------------------*
* start of selection
*-------------------------------------------------*
START-OF-SELECTION.
file_str = p_file.
CALL FUNCTION 'GUI_UPLOAD'
EXPORTING
filename = file_str
TABLES
data_tab = itab
EXCEPTIONS
file_open_error = 1
file_read_error = 2
no_batch = 3
gui_refuse_filetransfer = 4
invalid_type = 5
no_authority = 6
unknown_error = 7
bad_data_format = 8
header_not_allowed = 9
separator_not_allowed = 10
header_too_long = 11
unknown_dp_error = 12
access_denied = 13
dp_out_of_memory = 14
disk_full = 15
dp_timeout = 16
OTHERS = 17.
*--------------------------------------------------------------------*
* Delete file headers
*--------------------------------------------------------------------*
DELETE itab INDEX 1.
*--------------------------------------------------*
* process and display output
*-------------------------------------------------*
LOOP AT itab .
CLEAR idat.
CALL METHOD lo_csv->csv_to_structure
EXPORTING
i_data = itab-rec
IMPORTING
e_s_data = lw_struct.
MOVE-CORRESPONDING lw_struct TO idat.
APPEND idat.
ENDLOOP.

Read the file using a CSV reader.
Replace the commas in each field.
Write the file using a CSV writer.
You don’t need regular expressions for this task.

Related

Alert deprecated: Stdlib.String.set

The following code returns an error and says that the syntax is deprecated. What is the correct way to change a character in a string?
let hello = "Hello!" ;;
hello.[1] <- 'a' ;;
Alert deprecated: Stdlib.String.set
Use Bytes.set instead.
Error: This expression has type string but an expression was expected of type
bytes
Strings are immutable (or at least soon they will be), so you can't change their contents. You can, of course, create a copy of a string with the one character different, e.g.,
let with_nth_char m c =
String.mapi (fun i b -> if i = m then c else b)
and
# with_nth_char 1 'E' "hello";;
- : string = "hEllo"
But if you need to change characters in an array then you shouldn't use the string data type but instead rely on bytes which is a type for mutable strings. You can use Bytes.of_strings and Bytes.to_string to translate strings to bytes and vice verse.

Form character*(*) for character variable generation

I have a third party script for a subroutine that I need to work with. This subroutine is as follows
Subroutine COpen(io, Name )
Character*(*) Name
Character*1023 NameIn, NameOut
NameIn = Trim(Name)//' '
Call Get_OrMakeFileName( NameIn, NameOut )
Open(io,file=NameOut,access="APPEND")
End
I don't understand the Character*(*) name syntax. Isn't the typical way to declare string variables simply. character :: name*4 with the *4 part designating the number of characters? Can anyone please explain the purpose of this alternate syntax? What kind of object does it generate?
In short: character*(*) declares a character variable of assumed length.
There are a number of ways of declaring the length of a character variable. One, as seen in the question's code, is
character*1023 ...
where a literal constant follows the *. Equivalent to that is
character([len=]1023) ...
(len= being optional). In this case the length needn't be a literal constant.
These two forms declare a variable of a particular length. There are two other forms of a length for a character variable:
assumed length - character([len=]*) ... ;
deferred length - character([len=]:) ....
Like with character*1023 the assumed and deferred length declarations may be written in this style:
character*(*) ... ! Assumed length
character*(:) ... ! Deferred length
character*(1023) ... ! For completeness
Well, what does "assumed length" mean?
For a dummy argument such as Name it's length is taken from the length of the actual argument of the procedure. With character :: Name*4 the argument is of length 4, regardless of the length of the argument to the subroutine (as long as it's of length at least 4). When the dummy is of assumed length it is of length 12 if the argument is of length 12, and so on.
Although not in the question, a character named constant may also assume its length from the defining expression:
character*(*), parameter :: label='This long'
Deferred length is left to other questions.

Differences with CFscript calling value methods

I'm working with ColdFusion and CFScript. At the moment I've no problems, but noticed that I can call values in 3 ways:
Value
'Value'
'#Value#'
What are the differences between them? Thanks in advance!
Value
CF searches for a variable called Value (case insensitive) starting with the VARIABLES scope and then progressing through other scopes (like URL and FORM), stopping at the first variable found.
'Value'
A literal string with the characters V, a, l, u and e.
'#Value#'
A string where Value will be evaluated (CF evalautes stuff between #). If the variable Value (case insensitive) is a so called simple value, the variable will be cast to a string. Otherwise, an exception is thrown since non-simple (ie complex ) values are not automatically cast as strings. This is basically equivalent to '' & Value & '' (string concatenation).
Value = 'Hello World !!';
writeOutput(Value);
>> Hello World !!
writeOutput('Value');
>> Value
writeOutput('#Value#');
>> Hello World !!
writeOutput( evaluate('Value') );
>> Hello World !!

Extract a single character from a Fortran string

I need a program to convert from base a to base b, where base a and b could be from 2 to 36.
My idea was to use strings as the numbers, convert to base 10 as an intermediary and then convert from base 10 to base b. As I'm new on Fortran I can't understand quite the functions and substring, right now I'm getting the error:
intToChar = cadena(int,int)
1
Error: Unclassifiable statement at (1)
On the next code:
CHARACTER FUNCTION intToChar(int)
IMPLICIT NONE
INTEGER, INTENT(IN) :: int
CHARACTER(LEN = 36) :: cadena
cadena = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ"
intToChar = cadena(int,int)
END FUNCTION intToChar
I'm following this tutorial
The syntax to select a substring from a character variable uses a colon :, not a comma ,. The line the compiler is complaining about should be:
intToChar = cadena(int:int)
This will select the single character as position int from cadena, which appears to be your goal with that function.

Idea working in types

Please see details in My previous question
1) cpf0.ml:
type string = char list
type name = string
type symbol =
| Symbol_name of name
2) problem.ml:
type symbol =
| Ident of Cpf0.string
In this problem.ml it has two definitions for type string, and surely it's giving me an error, but is it posible that I can make them have a same type? I need an idea.
module Str = struct type t = string end;;
module StrOrd = Ord.Make (Str);;
module StrSet = Set.Make (StrOrd);;
module StrMap = Map.Make (StrOrd);;
module SymbSet = Set.Make (SymbOrd);;
let rec ident_of_symbol = function
| Ident s -> s
let idents_of_symbols s =
SymbSet.fold (fun f s -> StrSet.add (ident_of_symbol f) s) s StrSet.empty;;
This expression has type Cpf0.string = char list but an expression was expected of type Util.StrSet.elt = string
You can use the name "string" for different types in different modules if you like, though (as Basile Starynkevitch points out) it's confusing. It would be better to pick a different name. If you really need to reuse the name, you can specify the module every time. If you don't specify a module, you'll get the predefined meaning (or the meaning from the innermost opened module).
It seems to me the problem in your quoted code is that this line:
module Str = struct type t = string end;;
doesn't specify a module name for string, so it refers to the predefined string. It seems possible you wanted to say:
module Str = struct type t = Cpf0.string end;;
It's hard to tell, however. There's not enough context for me to really understand what you're trying to do.
string is a predefined type in Ocaml (ie in the Pervasives module); it is e.g. the type of string literal constants like "this string". Use some other name (otherwise you, and any one reading your code, will be very confused)