I have the following text:
KN_Divers_Blau | -6.429897 8.010333 -0.80 0.422 -22.14101 9.28122
Ind_ROA_Ave | .3407456 .3389998 1.01 0.315 -.3241539 1.005645
Ind_Tobin_Q1_Ave | -.5065654 .2104229 -2.41 0.016 -.9192797 -.0938511
Ind_Growth_Ave | -1.404911 1.852805 -0.76 0.448 -5.038922 2.229101
Pat_Dum | -18.31015 5.452194 -3.36 0.001 -29.00385 -7.616457
|
year |
1981 | -5.575117 2.805975 -1.99 0.047 -11.07863 -.0715993
1982 | -6.171125 5.447273 -1.13 0.257 -16.85517 4.512919
1983 | -11.8282 8.84588 -1.34 0.181 -29.17812 5.521726
1984 | -20.39602 11.73682 -1.74 0.082 -43.41611 2.624069
1985 | -23.7097 14.29652 -1.66 0.097 -51.75028 4.330874
1986 | -29.43432 16.51849 -1.78 0.075 -61.83297 2.964339
1987 | -35.30922 18.5138 -1.91 0.057 -71.62137 1.002936
1988 | -49.09056 19.95166 -2.46 0.014 -88.22289 -9.958242
1989 | -53.98487 21.88913 -2.47 0.014 -96.91725 -11.05248
1990 | -67.58938 23.41111 -2.89 0.004 -113.5069 -21.67185
1991 | -78.59984 25.52294 -3.08 0.002 -128.6594 -28.54026
1992 | -88.89806 28.22778 -3.15 0.002 -144.2628 -33.53332
1993 | -98.40131 31.35391 -3.14 0.002 -159.8975 -36.90512
1994 | -102.953 33.25041 -3.10 0.002 -168.1689 -37.73712
1995 | -116.2812 37.25681 -3.12 0.002 -189.355 -43.20726
1996 | -118.0298 38.76035 -3.05 0.002 -194.0527 -42.00698
1997 | -118.4325 38.4338 -3.08 0.002 -193.8149 -43.05017
1998 | -123.8912 37.96394 -3.26 0.001 -198.352 -49.43038
1999 | -128.3908 39.44807 -3.25 0.001 -205.7626 -51.01913
2000 | -133.2699 40.31404 -3.31 0.001 -212.3401 -54.19972
2001 | -126.159 37.63045 -3.35 0.001 -199.9658 -52.35232
2002 | -119.8247 36.05833 -3.32 0.001 -190.5479 -49.10146
2003 | -109.2157 34.54755 -3.16 0.002 -176.9758 -41.45563
2004 | -114.1801 33.58204 -3.40 0.001 -180.0465 -48.31378
2005 | 0 (omitted)
|
_cons | -187.8645 62.81122 -2.99 0.003 -311.0597 -64.66936
KN_Divers_Blau | -6.57637 8.068413 -0.82 0.415 -22.4014 9.248663
Ind_ROA_Ave | .3641781 .3411348 1.07 0.286 -.3049088 1.033265
Ind_Tobin_Q1_Ave | -.5070564 .2105863 -2.41 0.016 -.9200911 -.0940217
Ind_Growth_Ave | -1.424116 1.871656 -0.76 0.447 -5.095101 2.246869
Pat_Dum | -18.51642 5.463958 -3.39 0.001 -29.23319 -7.799652
|
year |
1981 | -4.660021 2.721933 -1.71 0.087 -9.998702 .678659
1982 | -5.557028 5.497126 -1.01 0.312 -16.33885 5.224794
1983 | -10.63977 8.795378 -1.21 0.227 -27.89063 6.611104
1984 | -18.76668 11.39263 -1.65 0.100 -41.1117 3.578331
1985 | -23.61831 14.32697 -1.65 0.099 -51.71861 4.481984
1986 | -29.10203 16.61986 -1.75 0.080 -61.6995 3.495445
1987 | -34.29028 18.46377 -1.86 0.063 -70.50431 1.923745
1988 | -48.44084 19.75174 -2.45 0.014 -87.18104 -9.70065
1989 | -54.73721 22.04372 -2.48 0.013 -97.97281 -11.50162
1990 | -67.16001 23.65404 -2.84 0.005 -113.554 -20.76601
1991 | -77.92565 25.97627 -3.00 0.003 -128.8744 -26.97694
1992 | -88.53438 28.49949 -3.11 0.002 -144.432 -32.63673
1993 | -97.72113 31.57967 -3.09 0.002 -159.6601 -35.78213
1994 | -102.3819 33.38187 -3.07 0.002 -167.8557 -36.90815
1995 | -115.8907 37.23702 -3.11 0.002 -188.9258 -42.85566
1996 | -118.6755 39.02702 -3.04 0.002 -195.2214 -42.12961
1997 | -118.675 38.75563 -3.06 0.002 -194.6886 -42.66145
1998 | -124.622 38.53307 -3.23 0.001 -200.1991 -49.04492
1999 | -128.1722 39.91359 -3.21 0.001 -206.4569 -49.88741
2000 | -133.1516 40.6607 -3.27 0.001 -212.9017 -53.40144
2001 | -126.7362 38.51777 -3.29 0.001 -202.2833 -51.18914
2002 | -119.7739 36.83191 -3.25 0.001 -192.0145 -47.53344
2003 | -108.5075 34.97694 -3.10 0.002 -177.1097 -39.90524
2004 | -111.8748 33.35352 -3.35 0.001 -177.2929 -46.45662
2005 | 0 (omitted)
|
_cons | -178.691 61.08993 -2.93 0.003 -298.5101 -58.87189
And am looking for a regex expression that removes all the years, so the output would like something this:
KN_Divers_Blau | -6.429897 8.010333 -0.80 0.422 -22.14101 9.28122
Ind_ROA_Ave | .3407456 .3389998 1.01 0.315 -.3241539 1.005645
Ind_Tobin_Q1_Ave | -.5065654 .2104229 -2.41 0.016 -.9192797 -.0938511
Ind_Growth_Ave | -1.404911 1.852805 -0.76 0.448 -5.038922 2.229101
Pat_Dum | -18.31015 5.452194 -3.36 0.001 -29.00385 -7.616457
|
|
_cons | -187.8645 62.81122 -2.99 0.003 -311.0597 -64.66936
KN_Divers_Blau | -6.57637 8.068413 -0.82 0.415 -22.4014 9.248663
Ind_ROA_Ave | .3641781 .3411348 1.07 0.286 -.3049088 1.033265
Ind_Tobin_Q1_Ave | -.5070564 .2105863 -2.41 0.016 -.9200911 -.0940217
Ind_Growth_Ave | -1.424116 1.871656 -0.76 0.447 -5.095101 2.246869
Pat_Dum | -18.51642 5.463958 -3.39 0.001 -29.23319 -7.799652
|
|
_cons | -178.691 61.08993 -2.93 0.003 -298.5101 -58.87189
I'm extremely new to regex, and have tried searching for the following and replace it with nothing in Notepad++:
year.*[\n].*1981.*[\n].*2005
The problem is it finds all the values in between the first years columns and the second one and removes everything in between.
Is there a way to have the search term find each years column once? (So in my example, it would find and replace the years column twice in total)
Many thanks in advance.
You may use the following pattern:
^\s*year.*(?:[\r\n]+\s*\d{4}\b.*)*[\r\n]+
..and replace with an empty string.
Demo.
Breakdown:
^ Beginning of line.
\s* Match zero or more whitespace characters.
year.* Match "year" followed by any number of characters.
(?: Start of a non-capturing group.
[\r\n]+ Match one or more line-break character.
\s* Match zero or more whitespace characters.
\d{4}\b.* Match four digits followed by any number of characters.
) Close the non-capturing group.
* Match zero or more occurrences of the previous group.
[\r\n]+ Match one or more line-break character.
Related
I have the following text:
gvkey |
1017 | .9610464 1.04128 0.92 0.356 -1.079825 3.001917
1018 | -.0599428 1.306879 -0.05 0.963 -2.621379 2.501493
1021 | -.0766854 .9906029 -0.08 0.938 -2.018231 1.86486
1034 | -2.678616 1.308118 -2.05 0.041 -5.24248 -.1147511
1056 | 1.694514 .9563385 1.77 0.076 -.1798751 3.568903
1065 | 1.106467 .9584568 1.15 0.248 -.7720734 2.985008
10001 | .7988226 1.019213 0.78 0.433 -1.198799 2.796444
10010 | .8203764 .9429188 0.87 0.384 -1.02771 2.668463
10022 | 1.590896 .9615904 1.65 0.098 -.2937862 3.475579
10030 | .0067641 .9798901 0.01 0.994 -1.913785 1.927313
10039 | 3.767551 .9168058 4.11 0.000 1.970645 5.564458
10056 | 2.29646 .9789753 2.35 0.019 .3777042 4.215217
10066 | 2.635614 .9398462 2.80 0.005 .7935496 4.477679
10088 | 1.679799 .930843 1.80 0.071 -.1446195 3.504218
10089 | -16.62772 1.017178 -16.35 0.000 -18.62135 -14.63409
10093 | .3149815 .9174881 0.34 0.731 -1.483262 2.113225
10097 | 2.976634 .9224759 3.23 0.001 1.168615 4.784654
10107 | -.1184532 .9405728 -0.13 0.900 -1.961942 1.725036
10115 | 1.899066 .9165281 2.07 0.038 .102704 3.695428
208068 | -1.236473 .9326577 -1.33 0.185 -3.064448 .5915026
209341 | -.804362 .9516883 -0.85 0.398 -2.669637 1.060913
213449 | -1.248011 .9460252 -1.32 0.187 -3.102186 .6061647
220546 | -4.424031 .9431063 -4.69 0.000 -6.272485 -2.575576
221821 | -.9759739 .9240414 -1.06 0.291 -2.787062 .8351139
222111 | -3.733076 .9440901 -3.95 0.000 -5.583458 -1.882693
223098 | -2.892674 1.158793 -2.50 0.013 -5.163865 -.6214818
242977 | -1.324193 .9371738 -1.41 0.158 -3.161019 .5126345
|
_cons | .1156292 .915384 0.13 0.899 -1.678491 1.909749
------------------------------------------------------------------------------------
gvkey |
1017 | .9610464 1.04128 0.92 0.356 -1.079825 3.001917
1018 | -.0599428 1.306879 -0.05 0.963 -2.621379 2.501493
1021 | -.0766854 .9906029 -0.08 0.938 -2.018231 1.86486
1034 | -2.678616 1.308118 -2.05 0.041 -5.24248 -.1147511
1056 | 1.694514 .9563385 1.77 0.076 -.1798751 3.568903
1065 | 1.106467 .9584568 1.15 0.248 -.7720734 2.985008
10001 | .7988226 1.019213 0.78 0.433 -1.198799 2.796444
10010 | .8203764 .9429188 0.87 0.384 -1.02771 2.668463
10022 | 1.590896 .9615904 1.65 0.098 -.2937862 3.475579
10030 | .0067641 .9798901 0.01 0.994 -1.913785 1.927313
10039 | 3.767551 .9168058 4.11 0.000 1.970645 5.564458
10056 | 2.29646 .9789753 2.35 0.019 .3777042 4.215217
10066 | 2.635614 .9398462 2.80 0.005 .7935496 4.477679
10088 | 1.679799 .930843 1.80 0.071 -.1446195 3.504218
10089 | -16.62772 1.017178 -16.35 0.000 -18.62135 -14.63409
10093 | .3149815 .9174881 0.34 0.731 -1.483262 2.113225
10097 | 2.976634 .9224759 3.23 0.001 1.168615 4.784654
10107 | -.1184532 .9405728 -0.13 0.900 -1.961942 1.725036
10115 | 1.899066 .9165281 2.07 0.038 .102704 3.695428
208068 | -1.236473 .9326577 -1.33 0.185 -3.064448 .5915026
209341 | -.804362 .9516883 -0.85 0.398 -2.669637 1.060913
213449 | -1.248011 .9460252 -1.32 0.187 -3.102186 .6061647
220546 | -4.424031 .9431063 -4.69 0.000 -6.272485 -2.575576
221821 | -.9759739 .9240414 -1.06 0.291 -2.787062 .8351139
222111 | -3.733076 .9440901 -3.95 0.000 -5.583458 -1.882693
223098 | -2.892674 1.158793 -2.50 0.013 -5.163865 -.6214818
242977 | -1.324193 .9371738 -1.41 0.158 -3.161019 .5126345
|
_cons | .1156292 .915384 0.13 0.899 -1.678491 1.909749
------------------------------------------------------------------------------------
And am looking for a regex expression that removes all the gvkeys, so the output would like something this:
|
_cons | .1156292 .915384 0.13 0.899 -1.678491 1.909749
------------------------------------------------------------------------------------
|
_cons | .1156292 .915384 0.13 0.899 -1.678491 1.909749
------------------------------------------------------------------------------------
I'm very new to regex, and have tried searching for the following and replace it with nothing in Notepad++:
gvkey.*[\n].*_cons
The problem is it finds all the values in between the first gvkey column and the second one and removes everything in between.
Is there a way to have the search term find each gvkey column once? (So in my example, it would find and replace the gvkey column twice in total)
Many thanks in advance.
You may use this regex for search in MULTILINE mode (it is called matches newline option in Notepad++:
^\h*gvkey[\s\S]*?\R(?=\h+\|)
And replace with an empty string.
RegEx Demo
RegEx Details:
^: Line start
\h*: matches 0 or more horizontal whitespaces
gvkey: Matches gvkey string
[\s\S]*?: Matches 0 or more of any character including newlines (lazy)
\R: Matches any newlines
(?=\h+\|): Positive lookahead to assert that we have 0 or more horizontal whitespaces followed by a pipe character ahead of us
Why just you don't try this?
Find what: ^(?!.*_con).*\s
Replace with: blank
I have a panel dataset with the following years:
tab year
year | Freq. Percent Cum.
------------+-----------------------------------
2000 | 31 12.55 12.55
2001 | 31 12.55 25.10
2002 | 30 12.15 37.25
2003 | 31 12.55 49.80
2004 | 31 12.55 62.35
2005 | 31 12.55 74.90
2006 | 31 12.55 87.45
2007 | 31 12.55 100.00
------------+-----------------------------------
Total | 247 100.00
When I do xtreg dv iv i.year, I see that year 2000 is not included, as well as 2007:
xtreg local_gr rtxdum i.year
note: 2007.year omitted because of collinearity
Random-effects GLS regression Number of obs = 247
Group variable: province_n~e Number of groups = 31
R-sq: Obs per group:
within = 0.6194 min = 7
between = 0.0016 avg = 8.0
overall = 0.2356 max = 8
Wald chi2(7) = 341.51
corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000
------------------------------------------------------------------------------
local_gr | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
rtxdum | -753799.7 291543.7 -2.59 0.010 -1325215 -182384.5
|
year |
2001 | 388246 291543.7 1.33 0.183 -183169.2 959661.2
2002 | 745406.4 294294.5 2.53 0.011 168599.8 1322213
2003 | 1175610 291543.7 4.03 0.000 604194.4 1747025
2004 | 1773982 291543.7 6.08 0.000 1202567 2345397
2005 | 2600005 291543.7 8.92 0.000 2028589 3171420
2006 | 4425318 291543.7 15.18 0.000 3853903 4996734
2007 | 0 (omitted)
|
_cons | 1564670 447832.4 3.49 0.000 686934.1 2442405
-------------+----------------------------------------------------------------
sigma_u | 2217878.8
sigma_e | 1150064.9
rho | .78809251 (fraction of variance due to u_i)
------------------------------------------------------------------------------
The message says 2007 was omitted due to collinearity, but I don't understand why year 2000 would not show up in the results?
Because it is the base level. You can see it by using the allbaselevels option:
webuse nlswork, clear
xtset idcode
xtreg ln_w grade tenure i.race not_smsa south, allbaselevels
Random-effects GLS regression Number of obs = 28,091
Group variable: idcode Number of groups = 4,697
R-sq: Obs per group:
within = 0.1005 min = 1
between = 0.4498 avg = 6.0
overall = 0.3305 max = 15
Wald chi2(6) = 6509.50
corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000
------------------------------------------------------------------------------
ln_wage | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
grade | .07605 .0018128 41.95 0.000 .0724969 .0796031
tenure | .0361319 .0006298 57.37 0.000 .0348975 .0373663
|
race |
white | 0 (base)
black | -.0530121 .0102916 -5.15 0.000 -.0731832 -.0328409
other | .0762678 .0415911 1.83 0.067 -.0052492 .1577849
|
not_smsa | -.1289554 .0074296 -17.36 0.000 -.1435172 -.1143936
south | -.0786512 .0075533 -10.41 0.000 -.0934555 -.063847
_cons | .6759773 .0244723 27.62 0.000 .6280125 .7239421
-------------+----------------------------------------------------------------
sigma_u | .26440074
sigma_e | .30295598
rho | .43235646 (fraction of variance due to u_i)
------------------------------------------------------------------------------
Suppose you have several variables:
+--------------------------+------------+------------+-----------+-------+
| | Population | Median_Age | Sex_Ratio | GDP |
| Country | | | | |
+--------------------------+------------+------------+-----------+-------+
| United States of America | 3999 | | 1.01 | 16000 |
+--------------------------+------------+------------+-----------+-------+
| Afghanistan | 544 | 19 | 0.97 | 4456 |
+--------------------------+------------+------------+-----------+-------+
| China | 5000 | 26 | 0.96 | 10000 |
+--------------------------+------------+------------+-----------+-------+
Let us suppose that Median_Age under United States of America is empty.
How do I replace this missing value to 27 if Country contains United, or United States?
Here's a modified example that better illustrates the solution:
clear
input strL Country Population Median_Age Sex_Ratio GDP
"United States of America" 3999 . 1.01 5000
"Afghanistan" 544 19 0.97 457
"United Emirates" 7546 44 7.01 2000
"China" 10000 26 0.96 3400
"United Fictionary Nation" 6789 . 8.03 7689
end
list, abbreviate(10)
+-----------------------------------------------------------------------+
| Country Population Median_Age Sex_Ratio GDP |
|-----------------------------------------------------------------------|
1. | United States of America 3999 . 1.01 5000 |
2. | Afghanistan 544 19 .97 457 |
3. | United Emirates 7546 44 7.01 2000 |
4. | China 10000 26 .96 3400 |
5. | United Fictionary Nation 6789 . 8.03 7689 |
+-----------------------------------------------------------------------+
replace Median_Age = 27 if ( strmatch(Country, "*United States*") | ///
strmatch(Country, "*United*") ) & ///
missing(Median_Age)
list, abbreviate(10)
+-----------------------------------------------------------------------+
| Country Population Median_Age Sex_Ratio GDP |
|-----------------------------------------------------------------------|
1. | United States of America 3999 27 1.01 5000 |
2. | Afghanistan 544 19 .97 457 |
3. | United Emirates 7546 44 7.01 2000 |
4. | China 10000 26 .96 3400 |
5. | United Fictionary Nation 6789 27 8.03 7689 |
+-----------------------------------------------------------------------+
I am having a hard time getting the following measure to work. I am trying to change the target based on a date filter. My filter is the Workday columns, where Workday is a standard date column. sMonth is a month columns formatted as whole number. I am looking to keep the slicer granular, in order to work by day, adding custom columns with month and year and basing the measure on those would help. This is what I have tried and couldn't get it to work:
Cars Inspected =
VAR
selectedMonth = MONTH(SELECTEDVALUE('All Cars Inspected'[Workday]))
RETURN CALCULATE(SUM(Targets[Target]),
FILTER(Targets,Targets[Location]="Texas"),
FILTER(Targets,Targets[Description]="CarsInspected"),
FILTER(Targets,Targets[sMonth]=selectedMonth))
I would appreciate if someone would suggest a different way of achieving the same result.
LE:
This is a mock-up of what I am trying to achieve:
The total cars get filtered by the Workday. I would like to make the Targets/Ranges dynamic. When the slider gets adjusted everything else is adjusted.
My tables look like this:
+-----------+--------------------+----------+
| Workday | TotalCarsInspected | Location |
+-----------+--------------------+----------+
| 4/4/2017 | 1 | Texas |
| 4/11/2017 | 149 | Texas |
| 4/12/2017 | 129 | Texas |
| 4/13/2017 | 201 | Texas |
| 4/14/2017 | 4 | Texas |
| 4/15/2017 | 6 | Texas |
+-----------+--------------------+----------+
+----------+--------+----------+---------------+--------+-----+--------+
| TargetID | sMonth | Location | Description | Target | Red | Yellow |
+----------+--------+----------+---------------+--------+-----+--------+
| 495 | 1 | Texas | CarsInspected | 3636 | 0.5 | 0.75 |
| 496 | 2 | Texas | CarsInspected | 4148 | 0.5 | 0.75 |
| 497 | 3 | Texas | CarsInspected | 4861 | 0.5 | 0.75 |
| 498 | 4 | Texas | CarsInspected | 4938 | 0.5 | 0.75 |
| 499 | 5 | Texas | CarsInspected | 5094 | 0.5 | 0.75 |
| 500 | 6 | Texas | CarsInspected | 5044 | 0.5 | 0.75 |
| 501 | 7 | Texas | CarsInspected | 5043 | 0.5 | 0.75 |
| 502 | 8 | Texas | CarsInspected | 4229 | 0.5 | 0.75 |
| 503 | 9 | Texas | CarsInspected | 4311 | 0.5 | 0.75 |
| 504 | 10 | Texas | CarsInspected | 4152 | 0.5 | 0.75 |
| 505 | 11 | Texas | CarsInspected | 3592 | 0.5 | 0.75 |
| 506 | 12 | Texas | CarsInspected | 3748 | 0.5 | 0.75 |
+----------+--------+----------+---------------+--------+-----+--------+
Let the Value for your gauge be the sum of TotalCarsInspected and set the Maximum value to the following measure:
Cars Inspected =
VAR selectedMonth = MONTH(MAX('All Cars Inspected'[Workday]))
RETURN LOOKUPVALUE(Targets[Target],
Targets[Location], "Texas",
Targets[Description], "CarsInspected",
Targets[sMonth], selectedMonth)
I am new to Stata and still learning.
I have a var shaped like that :
+-------+
| Phase |
+-------+
| I |
+-------+
| I |
+-------+
| II |
+-------+
| III |
+-------+
| II |
+-------+
My goal is to draw a histogram with the possible value (I,II,III) (x-axis) and the number of each (2,2,1) (y-axis).
I though I could make a loop and store the number of each possible in an array but arrays does not seem to be implemented in Stata.
Is the any kind of function that do what I want already implemented or I have to write a function to distinct the value, then count them, then draw the histogram ?
Thank you.
/edit :
processed.p |
hase | Freq. Percent Cum.
------------+-----------------------------------
I | 266 0.92 0.92
I/II | 1,006 3.50 4.42
II | 10,867 37.76 42.18
II/III | 344 1.20 43.37
III | 9,248 32.13 75.51
IV | 6,984 24.27 99.77
NA | 65 0.23 100.00
------------+-----------------------------------
Total | 28,780 100.00
I found a way of counting distinct values.
I found the solution :
tab processedphase, matcell(x)
in order to obtain
processed.p |
hase | Freq. Percent Cum.
------------+-----------------------------------
I | 266 0.92 0.92
I/II | 1,006 3.50 4.42
II | 10,867 37.76 42.18
II/III | 344 1.20 43.37
III | 9,248 32.13 75.51
IV | 6,984 24.27 99.77
NA | 65 0.23 100.00
------------+-----------------------------------
Total | 28,780 100.00
then :
matrix list x
svmat x