How can I import and use labels from one Stata file to the current? - stata

I have file aa with a variable x which is labeled with value label x_lab. I would like to use this value label on the variable x of Stata file bb:
use bb, clear
label value x x_lab
How can I import the value label x_lab?

You can use label save, which saves value labels in a do-file:
label save x_lab using label.do
use bb, clear
do label.do
See Stata help for label.

This answer technique didn't work for me as I wanted the variable labels created with e.g. label var connected "connected household", not the value labels.
Instead I used this advice: http://statalist.1588530.n2.nabble.com/st-How-to-export-variables-window-td3937733.html
*************
sysuse auto, clear
log using mylog, name(newlog) replace
foreach var of varlist _all{
di _col(3) "`var'" _col(20) "`:var label `var''"
}
log close newlog
//translate from proprietary format
translate mylog.smcl mylog.txt, replace
!start mylog.txt
*************
To fix the labels that extended over multiple lines so they just used a single one, I then replaced the \n > for the oversized labels with nothing (in regex mode in atom). I could easily save into TSV from there.
Specifically:
Clean up header and footer text in the logfile output.
On Mac: use "\n" instead of "\r\n".
On Windows: first "\r\n -> ""
then whitespace at beginning "\r\n " --> "\r\n"
then convert whitespace with 3 or more spaces in middle to tabs " +" --> "\t"
(Edit manually additional errors on tab if there are still some left)
save as mylog.tsv
open in Excel, and use table of labels as needed.

Related

Extracting the 'end' of a string, conditions or regular expression?

I have a data table - can be extracted to text or spreadsheet - The column has random text with areas in square metres that I want to copy to a new column in hectares. (So parse text and divide by 10,000).
e.g.
Deposited Plan 172499, 53,310 m2
Deposited Plan 166167, 853 m2
This plan has no area stated
Section 21 Block I Wellington District, 403,573 m2
Output column should have:
5.3310
0.0853
40.3573
Is there a way I can automate this in LibreOffice Calc, or with a regular expression editor like TextCrawler? Or perhaps using an AutoIt script?
Try with this
/([0-9]+,*[0-9]+\sm2)$/

How to read in table that depends on two sets previously defined

I am optimizing the choice of letters with the surfaces they require in the laser cutter to maximize the total frequency of words that they can form. I wrote this program for GLPK:
set unicodes;
param surfaces{u in unicodes};
table data IN "CSV" "surfaces.csv": unicodes <- [u], surfaces~s;
set words;
param frequency{w in words}, integer;
table data IN "CSV" "words.csv": words <- [word], frequency~frequency;
Then I want to give a table giving each word the count of each character with its unicode. The sets words and unicodes are already defined. According to page 42 of the manual, I can omit the set and the delimiter:
table name alias IN driver arg . . . arg : set <- [fld, ..., fld], par~fld, ..., par~fld;
...
set is the name of an optional simple set called control set. It can be omitted along with the
delimiter <-;
So I write this:
param spectrum{w in words, u in unicodes} >= 0;
table data IN "CSV" "spectrum.csv": words~word, unicodes~unicode, spectrum~spectrum;
I get the error:
Reading model section from lp...
lp:19: delimiter <- missing where expected
Context: ..., u in unicodes } >= 0 ; table data IN '...' '...' : words ~
If I write:
table data IN "CSV" "spectrum.csv": [words, unicodes] <- [word, unicode], spectrum~spectrum;
I get the error:
Reading model section from lp...
lp:19: syntax error in table statement
Context: ...} >= 0 ; table data IN '...' '...' : [ words , unicodes ] <-
How can I read in a table with data on two sets already defined?
Notes: the CSV files are similar to this:
surfaces.csv:
u,s
41,1
42,1.5
43,1.2
words.csv:
word,frequency
abc,10
spectrum.csv:
word,unicode,spectrum
abc,1,41
abc,2,42
abc,3,43
I found the answer with AMPL, A Mathematical Programming Language, which is a superset of GNU MathProg. I needed to define a set with the links between words and unicodes, and use that set as the control set when reading the table:
set links within {words, unicodes};
param spectrum{links} >= 0;
table data IN "CSV" "spectrum.csv": links <- [word, unicode], spectrum~spectrum;
And now I get:
...
INTEGER OPTIMAL SOLUTION FOUND
Time used: 0.0 secs
Memory used: 0.1 Mb (156430 bytes)
The "optional set" in the documentation is still misleading and I filed a bug report. For reference, the AMPL book is free to download and I used the transportation model scattered in page 47 in Section 3.2, page 173 in section 10.1, and page 179 in section 10.2.

How to assign the maximum amount of strings to macro automatically?

My question's title may be a little bit ambiguous.
Previously, I wanted to "acquire complete list of subdirs" and then read the files in these subdirs into Stata (see this post and this post).
Thanks to #Roberto Ferrer's great suggestion, I almost manage to do this. But I encountered another problem then. Because I have so many separate files, the length of local macro seems to hit its upper bound. After the command local n: word count Stata sends an error message:
macro substitution results in line that is too long.
The line resulting from substituting macros would be longer than allowed. The maximum allowed length is 645,216 characters, which is calculated on the basis of set maxvar. You can change that in Stata/SE and Stata/MP. What follows is relevant only if you are using Stata/SE or Stata/MP.
The maximum line length is defined as 16 more than the maximum macro length, which is currently 645,200 characters. Each unit increase in set maxvar increases the length maximums by 129.The maximum value of set maxvar is 32,767. Thus, the maximum line length may be set up to 4,227,159 characters if you set maxvar to its largest value.
r(920);
When I reduce the number of subdirs to 5, Stata works fine. Since having roughly 100 subdirs, I suppose to replicate the actions for 20 times. Well, it's manageable, but I still want to know if I can fully automate this process , more specifically, to "exhaust" the max allowable macro length,import the files and add another group of subdirs next time .
Below you can find my code:
//====================================
//=== read and clean projects data ===
//====================================
version 14
set linesize 80
set more off
clear
macro drop _all
set linesize 200
cd G:\Data_backup\Soufang_data
*----------------------------------
* Read all files within dictionary
*----------------------------------
* Import the first worksheets 1:"项目首页" 2:"项目概况" 3:"成交详情"
* worksheet1
filelist, directory("G:\Data_backup\Soufang_data") pattern(*.xlsx)
* Add pattern(*.xlsx) provent importing add file type( .doc or .dta)
gen tag = substr(reverse(dirname),1,6) == "esuoh/"
keep if tag==1
gen path = dirname+"\"+filename
qui valuesof path if tag==1
local filelist = r(values)
split dirname, parse("\" "/")
ren dirname4 citylist
drop dirname1-dirname3 dirname5
qui valuesof citylist if tag==1
local city = r(values)
local count = 1
local n:word count `filelist'
forval i = 1/`n' {
local file : word `i' of `filelist'
local cityname: word `i' of `city'
** don't add xlsx after `file', suffix has been added
** write "`file'" rather than `file', I don't know why but it works
qui import excel using "`file'",clear
cap qui sxpose,clear
cap qui drop in 1/1
gen city = "`cityname'"
if `count'==1 {
save house.dta,replace emptyok
}
else {
qui append using house
qui save house.dta,replace emptyok
}
local ++count
}
Thank you.
You do not need to store the whole list of files in a macro. filelist creates a database of files that you want to work with. Just save it and reload it for each file you want to process. You also use a very inefficient way to append datasets. As the appended dataset grows, the cost of reloading and saving it become very high and can slow down the whole process to a crawl.
Here's a sketch of how to process your Excel files
filelist, directory(".") pattern(*.xlsx)
save "myfiles.dta", replace
local n = _N
forval i = 1/`n' {
use in `i' using "myfiles.dta", clear
local f = dirname + "/" + filename
qui import excel using "`f'",clear
tempfile res`i'
save "`res`i''"
}
clear
forval i = 1/`n' {
append using "`res`i''"
}
save "final.dta", replace

How to parse/pull specific data out of a file with Python

I have an interesting issue I am trying to solve and I have taken a good stab at it but need a little help. I have a squishy file that contains some lua code. I am trying to read this file and build a file path out of it. However, depending on where this file was generated from, it may contain some information or it might miss some. Here is an example of the squishy file I need to parse.
Module "foo1"
Module "foo2"
Module "common.command" "common/command.lua"
Module "common.common" "common/common.lua"
Module "common.diagnostics" "common/diagnostics.lua"
Here is the code I have written to read the file and search for the lines containing Module. You will see that there are three different sections or columns to this file. If you look at line 3 you will have "Module" for column1, "common.command" for column2 and "common/command.lua" for column3.
Taking Column3 as an example... if there is data that exists in the 3rd column then I just need to strip the quotes off and grab the data in Column3. In this case it would be common/command.lua. If there is no data in Column3 then I need to get the data out of Column2 and replace the period (.) with a os.path.sep and then tack a .lua extension on the file. Again, using line 3 as an example I would need to pull out common.common and make it common/common.lua.
squishyContent = []
if os.path.isfile(root + os.path.sep + "squishy"):
self.Log("Parsing Squishy")
with open(root + os.path.sep + "squishy") as squishyFile:
lines = squishyFile.readlines()
squishyFile.close()
for line in lines:
if line.startswith("Module "):
path = line.replace('Module "', '').replace('"', '').replace("\n", '').replace(".", "/") + ".lua"
Just need some examples/help in getting through this.
This might sound silly, but the easiest approach is to convert everything you told us about your task to code.
for line in lines:
# if the line doesn't start with "Module ", ignore it
if not line.startswith('Module '):
continue
# As you said, there are 3 columns. They're separated by a blank, so what we're gonna do is split the text into a 3 columns.
line= line.split(' ')
# if there are more than 2 columns, use the 3rd column's text (and remove the quotes "")
if len(line)>2:
line= line[2][1:-1]
# otherwise, ...
else:
line= line[1] # use the 2nd column's text
line= line[1:-1] # remove the quotes ""
line= line.replace('.', os.path.sep) # replace . with /
line+= '.lua' # and add .lua
print line # prove it works.
With a simple problem like this, it's easy to make the program do exactly what you yourself would do if you did the task manually.

Adding FreeText annotation to PDF

I am using podofo for doing PDF operations, like adding annotations, signatures etc as per my requirement in my iOS application. I have first tried the only sample for the podofo library available which works great. But the problem with the sample is the annotations added doesn't show in any preview like Google, Adobe Reader etc. Thats a problem.
As per few guideline from Adobe I found that it requires to have Appearance Key for the FreeText annotation to appear. I have tried analyzing raw pdf file in a text editor to see the difference in the PDF which has correct Annotations, with podofo created PDF annotations. I found there are AP N keys with a stream object that is in encoded form for the annotation, which was missing from podofo sample.
Then after searching I found podofo's own sample and tried to use the code, which seems to be doing correctly, but didn't work either, I know I am missing something, but not sure what, and where, please have a look of the code below
+(void)createFreeTextAnnotationOnPage:(NSInteger)pageIndex doc:(PdfMemDocument*)aDoc rect:(CGRect)aRect borderWidth:(double)bWidth title:(NSString*)title content:(NSString*)content bOpen:(Boolean)bOpen color:(UIColor*)color {
PoDoFo::PdfMemDocument *doc = (PoDoFo::PdfMemDocument *) aDoc;
PoDoFo::PdfPage* pPage = doc->GetPage(pageIndex);
if (! pPage) {
// couldn't get that page
return;
}
PoDoFo::PdfRect rect;
rect.SetBottom(aRect.origin.y);
rect.SetLeft(aRect.origin.x);
rect.SetHeight(aRect.size.height);
rect.SetWidth(aRect.size.width);
PoDoFo::PdfString sTitle(reinterpret_cast<const PoDoFo::pdf_utf8*>([title UTF8String]));
PoDoFo::PdfString sContent(reinterpret_cast<const PoDoFo::pdf_utf8*>([content UTF8String]));
PoDoFo::PdfFont* pFont = doc->CreateFont( "Helvetica", new PoDoFo::PdfIdentityEncoding( 0, 0xffff, true ) );
std::ostringstream oss;
oss << "BT" << std::endl << "/" << pFont->GetIdentifier().GetName()
<< " " << pFont->GetFontSize()
<< " Tf " << std::endl;
[APDFManager WriteStringToStream:sContent :oss :pFont];
oss << "Tj ET" << std::endl;
PoDoFo::PdfDictionary fonts;
fonts.AddKey(pFont->GetIdentifier().GetName(), pFont->GetObject()->Reference());
PoDoFo::PdfDictionary resources;
resources.AddKey( PoDoFo::PdfName("Fonts"), fonts );
PoDoFo::PdfAnnotation* pAnnotation =
pPage->CreateAnnotation( PoDoFo::ePdfAnnotation_FreeText, rect );
pAnnotation->SetTitle( sTitle );
pAnnotation->SetContents( sContent );
//pAnnotation->SetAppearanceStream( &xObj );
pAnnotation->GetObject()->GetDictionary().AddKey( PoDoFo::PdfName("DA"), PoDoFo::PdfString(oss.str()) );
pAnnotation->GetObject()->GetDictionary().AddKey( PoDoFo::PdfName("DR"), resources );
}
+(void) WriteStringToStream:(const PoDoFo::PdfString & )rsString :(std::ostringstream &) oss :(PoDoFo::PdfFont*) pFont
{
PoDoFo::PdfEncoding* pEncoding = new PoDoFo::PdfIdentityEncoding( 0, 0xffff, true );
PoDoFo::PdfRefCountedBuffer buffer = pEncoding->ConvertToEncoding( rsString, pFont );
PoDoFo::pdf_long lLen = 0;
char* pBuffer = NULL;
std::auto_ptr<PoDoFo::PdfFilter> pFilter = PoDoFo::PdfFilterFactory::Create( PoDoFo::ePdfFilter_ASCIIHexDecode );
pFilter->Encode( buffer.GetBuffer(), buffer.GetSize(), &pBuffer, &lLen );
oss << "<";
oss << std::string( pBuffer, lLen );
oss << ">";
free( pBuffer );
delete pEncoding;
}
Any one in SO universe can please tell me what's wrong with above code, and how to add a correct FreeText Annotation so that it appears correctly everywhere.
Many Thanks.
The annotation in question looks like this:
19 0 obj
<<
/Type/Annot
/Contents(þÿ M Y A N N O T A T I O N)
/DA(BT\n/Ft18 12 Tf \n 1 0 0 rg \n<002D003900000021002E002E002F0034002100340029002F002E>Tj ET\n)
/DR<</Fonts<</Ft18 18 0 R>>>>
/M(D:20140616141406+05'00')
/P 4 0 R
/Rect[ 188.814117 748.970520 467.849731 795.476456]
/Subtype/FreeText
/T(þÿ A n n o t a t e P D F)
>>
endobj
Three observations:
It has a Default Appearance but not APpearance streams.
The contents of the Default Appearance are invalid.
The Default Resources are in the wrong object.
Item 1 may cause the appearance not to render in many simple viewers which only show finalized stuff (page content, annotation appearances, ...) but don't create appearances from the Default Appearance. You should, therefore, also supply an appearance stream.
Items 2 and 3 may cause the appearance not to render in more complex viewers which do try to create appearances from the Default Appearance and Default Resources but expect the DA to be correct and the DR correctly located. You should, therefore, correct the DA and move the DR.
In detail...
1 - Default Appearance but not APpearance streams
While according to the specification ISO 32000-1 the DA is required for free text annotation and AP is not, simpler PDF viewers may not have built-in code to create an appearance stream from the default appearance.
This is not completely surprising: While in case of your PDF there is not much to do, applying the default to some content can imply calculating the best size for text to fit into some area and similar tasks. Thus, simple, incomplete viewers tend to not implement this.
2 - Default Appearance contents are invalid
Your DA string contains BT and ET operators. If you look at section 12.7.3.3 Variable Text of ISO 32000-1, though, you'll see that during appearance creation the contents of DA are embedded into a BT .. ET envelope:
The appearance stream includes the following section of marked content, which represents the portion of the stream that draws the text:
/Tx BMC % Begin marked content with tag Tx
q % Save graphics state
… Any required graphics state changes, such as clipping …
BT % Begin text object
… Default appearance string ( DA ) …
… Text-positioning and text-showing operators to show the variable text …
ET % End text object
Q % Restore graphics state
EMC % End marked content
The default appearance string (DA) contains any graphics state or text state operators needed to establish the graphics state parameters, such as text size and colour, for displaying the field’s variable text. Only operators that are allowed within text objects shall occur in this string
But BT and ET are not allowed inside another BT .. ET text object!
Furthermore you add the text content inside your DA. As you see above, the text drawing operations are added right after your DA contents. Thus, you're in danger of having duplicate texts eventually.
3 - Default Resources dislocation
You have the Default Resources in the annotation dictionary. But the section 12.7.3.3 Variable Text of ISO 32000-1 mentioned above indicates:
The specified font value shall match a resource name in the Font entry of the default resource dictionary (referenced from the DR entry of the interactive form dictionary).
Thus, your DR will be ignored and are expected elsewhere. So your choice of font may at best be ignored
I am working on the similar things. I tried generating appearance stream manually, but found it's difficult. Actually the podofo sample code you post above works, but it's wrong in the way of adding appearance stream. You can't use SetAppearanceStream which is wrong either.
podofo's PdfPainter can draw text. It generates text stream. It looks like working for PdfPage only, but actually it works for XObject too. It's really a hidden feature!
My code sample:
PdfFont *pFont = ...;
// Add XObject
PdfXObject xObj(borderPdfRect, pPdfMemDocument);
PdfPainter painter;
painter.SetPage(&xObj);
painter.Save(); // Save graphics settings
// Draw text
painter.SetFont(pFont);
painter.GetFont()->SetFontSize(fontSize);
painter.SetColor(self.textColor.color.red, self.textColor.color.green, self.textColor.color.blue);
PdfString pdfStr(reinterpret_cast<const pdf_utf8*>([self.text UTF8String]));
painter.DrawMultiLineText(textPdfRect, pdfStr);
painter.Restore();
painter.FinishPage();
// Add xObj as appearance stream. Don't use SetAppearanceStream
PdfDictionary dict;
dict.AddKey("N", xObj.GetObject()->Reference());
pTextAnno->GetObject()->GetDictionary().AddKey("AP", dict);