Split sample in Stata

Split sample in Stata - stata

I have a variable X containing 3100 values.
I need to split X into variable Y and Z. Y containing 1500 first values of X and Z containing the rest of X.
I'm not sure whether it works with
split X
or any other code

Did you try it?
split is for splitting strings and for splitting them into parts according to their contents.
You appear to want something like separate X, by(_n <= 1500) followed by renaming if you wish. Two generate statements would also work fine.

Related

Remove Multiple Elements in a Python List Once

(Using Python 3)
Given this list named numList: [1,1,2,2,3,3,3,4].
I want to remove exactly one instance of “1” and “3” from numList.
In other words, I want a function that will turn numList into: [1,2,2,3,3,4].
What function will let me remove an X number of elements from a Python list once per element I want to remove?
(The elements I want to remove are guaranteed to exist in the list)
For the sake of clarity, I will give more examples:
[1,2,3,3,4]
Remove 2 and 3
[1,3,4]
[3,3,3]
Remove 3
[3,3]
[1,1,2,2,3,4,4,4,4]
Remove 2, 3 and 4
[1,1,2,4,4,4]
I’ve tried doing this:
numList=[1,2,2,3,3,4,4,4]
remList = [2,3,4]
for x in remList:
numList.remove(x)
This turns numList to [1,2,3,4,4] which is what I want. However, this has a complexity of:
O((len(numList))^(len(remList)))
This is a problem because remList and numList can have a length of 10^5. The program will take a long time to run. Is there a built-in function that does what I want faster?
Also, I would prefer the optimum function which can do this job in terms of space and time because the program needs to run in less than a second and the size of the list is large.

Your approach:
for x in rem_list:
num_list.remove(x)
is intuitative and unless the lists are going to be very large I might do that because it is easy to read.
One alternative would be:
result = []
for x in num_list:
if x in rem_list:
rem_list.remove(x)
else:
result.append(x)
This would be O(len(rem_list) ^ len(num_list)) and faster than the first solution if len(rem_list) < len(num_list).
If rem_list was guaranteed to not contain any duplicates (as per your examples) you could use a set instead and the complexity would be O(len(num_list)).

How to read a file creating a list

I have some code, looks like this:
main :-
open('input.txt', read, Input),
repeat,
read_line_to_codes(Input, Line),
maplist(my_representation, Line, FinalLine),
( Line \= end_of_file -> writeln(FinalLine), fail ; true ),
close(Input).
FinalLine is a list of integers, including some underscores (based on the input file). Since this loops, I am wondering how to dynamically, each iteration of the loop, add the FinalLine list to another list. Basically this will create a list of lists.
And since I know the specifications of my input file, I know it loops 16 times, therefore I want a list of 16 lists. So although I don't know how to do this, I am pretty sure the best way would be to make a predicate that I call, instead of the output I am doing now (writeln(FinalLine)), to dynamically create this list of lists.
Hope this makes sense. Would appreciate any help, thanks!

While historically I/O in Prolog was often presented in terms of repeat/fail loops, recursion is often (almost always?) the superior way to implement iteration. Especially if you need to remember data from one iteration to the next; failing causes backtracking, which unbinds your variables from the previously computed data. On backtracking you lose any data you had not saved away using yet more impure constructs. Recursion is simpler.
Recursion forces you to decompose the program into more than one predicate, but that is a good idea anyway. For example, separating opening the stream from reading it makes your program more reusable and more testable, because streams may be constructed from things other than files.
% dummy
my_representation(Codes, Result) :-
atom_codes(Result, Codes).
stream_representations(Input, Lines) :-
read_line_to_codes(Input, Line),
( Line == end_of_file
-> Lines = []
; my_representation(Line, FinalLine),
Lines = [FinalLine | FurtherLines],
stream_representations(Input, FurtherLines) ).
main :-
open('input.txt', read, Input),
stream_representations(Input, Lines),
close(Input),
writeln(Lines).
Test input file:
hello
world
hello, world!
this file ends here
Test run:
?- main.
[hello,world,hello, world!,this file ends here]
true.

A 2-dimensional list of lists is a matrix, and the easiest way to visualize it in your head is a table consisting of rows and columns. Here's a piece of example code to clarify the concept.
This code generates a matrix based on the maximum amounts of columns you want as MaxX, and the maximum amount of rows you want as MaxY. Each position in the matrix has a cell(point(X,Y)) coordinate to visualize the output more easily.
%If the MaxY has been reached for the Y axis, step back
generate_matrix([],_,_,MaxY,MaxY) :- !.
%If MaxX has been reached, new row
generate_matrix([Row|Tp],X,MaxX,Y,MaxY):-
generate_row(Row, X, MaxX, Y),
Y1 is Y+1,
generate_matrix(Tp,0,MaxX,Y1,MaxY).
%If the MaxX has been reached for the X axis, step back
generate_row([], MaxX,MaxX,_) :- !.
generate_row([cell(point(X,Y))|T], X, MaxX, Y) :-
XNew is X + 1,
generate_row(T, XNew, MaxX, Y).
You could easily replace cell(point(X,Y) with something you want to place there instead. I hope this clarifies the concept for you, be sure to ask for clarification if it doesn't.
Testquery to generate a 10 by 10 matrix/grid/table/2-dimensional list of lists:
%generate_matrix(Matrix, 0, MaxX, 0, MaxY)
generate_matrix(Matrix, 0, 10, 0, 10).
You can add dimensions to represent more complex structures. Adding a Z-axis gives you a 3D-cube.

root mean square (RMSD) of two datasets

I'm dragging along in python, learning so slow but making progress. Have hit a wall, and don't even know where to start on this.
I have other scripts that get me to where I am now: two output CSV files with multiple rows containing 4 numbers each. The first number is an identifier integer, the other three are X, Y, Z coordinates.
Now the OTHER file is the same thing, with the same set of identifier integers, but a different set of X, Y, Z coordinates.
For each identifier integer, I want to calculate the RMSD between the X,Y,Z. In other words, I think I need to do (X2-X1)^2 + (Y2-Y1)^2 + (Z2-Z1)^2 then take the square root of that. This will give me a float as an output answer, which I'd like to write into an output file of two columns: one with the the identifier integer, and the second is the output from this script.
I actually have no idea where to start on this one.. I've never had to work with two files at once. Gah!
thanks so much!!
sorry I have no script to even start here!

Getting the value of a number cell with VBA in Excel

So I have section columns which look like this:
8.01
8.02
8.03
8.04
8.05
8.06
8.07
8.08
8.09
8.10
And so on and so forth. I have it set up so that it will always show the trailing zeroes (8.10 and 8.20) but when I use VBA to get the value of the cell it still shows 8.1 and 8.2. I'm using
x = Cells.Value
but it won't work the way I need it to. I have to keep the column as number for sorting and other reasons so changing the type isn't really an option I don't think. How do I assign a number with a trailing zero to a variable in VBA? Do I need to run a test case with REGEX or something?

As follow up from comments, this one works:
x = Cells(1,1).Text
or
x = Format(Cells(1,1).Value,"0.00")
you can change "0.00" to any other format you want, e.g.
x = Format(Cells(1,1).Value, "Standard")
code above supposed that x has String type

regex matching multiple values when they might not exist

I am trying to right a preg_match_all to match horse race distance.
My source lists races as:
xmxfxy
I want to match the m value, the f value, the y value. However different races will maybe only have m, or f, or y, or two of them or even all three.
// e.g. $raw = 5f213y;
preg_match_all('/(\d{1,})m|(\d{1,})f|(\d{1,})y/', $raw, $distance);
The above sort of works, but for some reason the matches appear in unpredictable positions in the returned array. I guess it is because it is running the match 3 times for each OR. How do I match all three (that may or may not exist) in a single run.
EDIT
A full sample string is:
Hardings Catering Services Handicap (Div I) Cl6 5f213y

If I understand you correctly, you're processing listings (like the one in your question) one at a time. If that's the case, you should be using preg_match, not preg_match_all, and the regex should match the whole "distance" code, not individual components of it. Try this:
preg_match('#\b(?:(?<M>\d+)m|(?<F>\d+)f|(?<Y>\d+)y){1,3}\b#',
$raw, $distance);
The results are now stored in a one-dimensional array, but you don't need to worry about the group numbers anyway; you can access them by name instead (e.g., $distance['M'], $distance['F'], $distance['Y']).
Note that, while this regex matches codes with one, two, or three components, it doesn't require the letters to be unique. There's nothing to stop it from matching something like 1m2m3m (a weakness shared by your own approach, by the way).

you can use "?" as a conditional
preg_match_all('/((\d{1,})m)?|((\d{1,})f)?|((\d{1,})y)?/', $raw, $distance);

If I understand what you're asking correctly, you would like to get each number from these values separately? This works for me:
$input = "Hardings Catering Services Handicap (Div I) Cl6 5f213y";
preg_match_all('/((\d+)(m|f|y))/', $input, $matches);
After the preg_match_all() executes, $matches[2] holds an array of the numbers that matched (in this case, $matches[2][0] is 5 and $matches[2][1] is 213.
If all three values exist, m will be in $matches[2][0], f in $matches[2][1], and y in $matches[2][2]. If any values are missing, the next value gets bumped up a spot. It may also come in handy that $matches[3] will hold an array of the corresponding letter matched on, so if you need to check whether it was an m, f, or y, you can.
If this isn't what you're after, please provide an example of the output you would like to see for this or another sample input.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Split sample in Stata - stata

I have a variable X containing 3100 values. I need to split X into variable Y and Z. Y containing 1500 first values of X and Z containing the rest of X. I'm not sure whether it works with split X or any other code

Did you try it? split is for splitting strings and for splitting them into parts according to their contents. You appear to want something like separate X, by(_n <= 1500) followed by renaming if you wish. Two generate statements would also work fine.

Related

Remove Multiple Elements in a Python List Once

How to read a file creating a list

root mean square (RMSD) of two datasets

Getting the value of a number cell with VBA in Excel

regex matching multiple values when they might not exist

Categories

Resources