LLVM Pass to Parse Debug Info - llvm

I've been writing some different LLVM passes to learn more about the system and I've run into something that I can't seem to find a definitive reference on: is there any way to iterate over all the debug symbols in an IR file when they're available?
In particular, I'd like to be able to parse this:
!6 = distinct !DICompositeType(tag: DW_TAG_structure_type, name: "conf_struct", file: !7, line: 3, size: 64, elements: !8)
I've tried using the llvm::DebugInfoFinder and that has methods to return counts/etc. but it doesn't seem to have anything to return data on a DICompositeType.
I thought there might be something of use in getIdentifiedStructTypes but that doesn't seem to get any of the metadata associated with the structs.
Thank you for any guidance with this!

Related

How to attach debug information into an instruction in a LLVM Pass

I am trying to collect some information from my LLVM optimization pass during runtime. In other words, I want to know the physical address of a specific IR instruction after compilation. So my idea is to convert the LLVM metadata into LLVM DWARF data that can be used during runtime. Instead of attaching the filename and line numbers, I want to attach my own information. My question falls into two parts:
Here is a code that can get the Filename and Line number of an instruction:
if (DILocation *Loc = I->getDebugLoc()) { // Here I is an LLVM instruction
unsigned Line = Loc->getLine();
StringRef File = Loc->getFilename();
StringRef Dir = Loc->getDirectory();
bool ImplicitCode = Loc->isImplicitCode();
}
But How can I set this fields? I could not find a relevant function.
How can I see the updated Debug Information during (filename and line numbers) runtime? I used -g for compiling but still I do not see the Debug Information.
Thanks
The function you need it setDebugLoc() and the info is only included in the result if you include enough of it. The module verifier will tell you what you're missing. These two lines might also be what's tripping you up.
module->addModuleFlag(Module::Warning, "Dwarf Version", dwarf::DWARF_VERSION);
module->addModuleFlag(Module::Warning, "Debug Info Version", DEBUG_METADATA_VERSION);

weka: how to generate libsvm training parameter

I am running libsvm through weka. Its output accuracy looks good to me, so I am planning to write a svm model by myself. However, weka didn't generate any training parameter, such as number of support vector. Therefore i cannot do anything. Searching the web, i found somebody said it would generate some parameters like the following:
optimization finished, #iter = 27
nu = 0.058475864943863545
obj = -1.871013102744184, rho = -0.19357337828800944
nSV = 9, nBSV = 0 `enter code here`
Total nSV = 9
but how come i didn't see any of them? any step that i missed? please help me. Thanks a lot.
Weka writes the output you mentioned to stderr.
So if you have started weka.sh or weka.bat from a terminal (or "command window" if you are on Windows), you should see that output appear in your terminal window after clicking "classify"
If you want to have access to this information via scripts, you can
redirect the output to a file and read in that file.
Here is how to edit the startup file weka.sh / weka.bat.
Edit this line (it is probably the last line) in order to write log info to a file instead of the terminal window:
java -cp $CP -Xmx8092m weka.gui.GUIChooser 2>>/opt/weka-stable/weka.log &
You can also add a properties file to your home directory to add more fine-grained behaviour.
https://weka.wikispaces.com/Properties+file
(You probably can also access information via the Weka Java API somehow, but you did not ask for that)

Identifying a Programming Language

So I have a software program that for reasons that are beyond this post, I will not include but to put it simply, I'd like to "MOD" the original software. The program is launched from a Windows Application named ViaNet.exe with accompanying DLL files such as ViaNetDll.dll. The Application is given an argument such as ./Statup.cat. There is also a WatchDog process that uses the argument ./App.cat instead of the former.
I was able to locate a log file buried in my Windows/Temp folder for the ViaNet.exe Application. Looking at the log it identifies files such as:
./Utility/base32.atc:_Encode32 line 67
./Utilities.atc:MemFun_:Invoke line 347
./Utilities.atc:_ForEachProperty line 380
./Cluster/ClusterManager.atc:ClusterManager:GetClusterUpdates line 1286
./Cluster/ClusterManager.atc:ClusterManager:StopSync line 505
./Cluster/ClusterManager.atc:ConfigSynchronizer:Update line 1824
Going to those file locations reveal files by those names, but not ending with .atc but instead .cat. The log also indicates some sort of Class, Method and Line # but .cat files are in binary form.
Searching the program folder for any files with the extension .atc reveals three -- What I can assume are uncompiled .cat files -- files. Low and behold, once opened it's obviously some sort of source code -- with copyright headers, lol.
global ConfigFolder, WriteConfigFile, App, ReadConfigFile, CreateAssocArray;
local mgrs = null;
local email = CreateAssocArray( null);
local publicConfig = ReadConfigFile( App.configPath + "\\publicConfig.dat" );
if ( publicConfig != null )
{
mgrs = publicConfig.cluster.shared.clusterGroup[1].managers[1];
local emailInfo = publicConfig.cluster.shared.emailServer;
if (emailInfo != null)
{
if (emailInfo.serverName != "")
{
email.serverName = emailInfo.serverName;
}
if (emailInfo.serverEmailAddress != "")
{
email.serverEmailAddress = emailInfo.serverEmailAddress;
}
if (emailInfo.adminEmailAddress != null)
{
email.adminEmailAddress = emailInfo.adminEmailAddress;
}
}
}
if (mgrs != null)
{
WriteConfigFile( ConfigFolder + "ZoneInfo.dat", mgrs);
}
WriteConfigFile( ConfigFolder + "EmailInfo.dat", email);
So to end this as simply as possible, I'm trying to find out two things. #1 What Programming Language is this? and #2 Can the .cat be decompiled back to .atc. files? -- and vice versa. Looking at the log it would appear that the Application is decoding/decompiling the .cat files already to interpret them verses running them as bytecode/natively. Searching for .atc on Google results in AutoCAD. But looking at the results, shows it to be some sort of palette files, nothing source code related.
It would seem to me that if I can program in this unknown language, let alone, decompile the existing stuff, I might get lucky with modding the software. Thanks in advance for any help and I really really hope someone has an answer for me.
EDIT
So huge news people, I've made quite an interesting discovery. I downloaded a patch from the vendor, it contained a batch file that was executing ViaNet.exe Execute [Patch Script].atc. I quickly discovered that you can use Execute to run both .atc and .cat files equally, same as with no argument. Once knowing this I assumed that there must be various arguments you can try, well after a random stroke of luck, there is one. That being Compile [Script].atc. This argument will compile also any .atc file to .cat. I've compiled the above script for comparison: http://pastebin.com/rg2YM8Q9
So I guess the goal now is to determine if it's possible to decompile said script. So I took a step further and was successful at obtaining C++ pseudo code from the ViaNet.exe and ViaNetDll.dll binaries, this has shed tons of understanding on the proprietary language and it's API they use. From what I can tell each execution is decompiled first then ran thru the interpreter. They also have nicknamed their language ATCL, still no idea what it stands for. While searching the API, I found several debug methods with names like ExecuteFile, ExecuteString, CompileFile, CompileString, InspectFunction and finally DumpObjCode. With the DumpObjCode method I'm able to perform some sort of dump of script files. Dump file for above script: http://pastebin.com/PuCCVMPf
I hope someone can help me find a pattern with the progress I made. I'm trying my best to go over the pseudo code but I don't know C++, so I'm having a really hard time understanding the code. I've tried to seperate what I can identify as being the compile script subroutines but I'm not certain: http://pastebin.com/pwfFCDQa
If someone can give me an idea of what this code snippet is doing and if it looks like I'm on the right path, I'd appreciate it. Thank you in advanced.

Efficient and stable YAML parser for cocos2d-x

I am developing a game using cocos2d-x and C++, and I need to load a bunch of YAML files for this application. I tried using the yaml-cpp library with quite good results.
The problem is that this library seems to be very unstable (at least under cocos2d-x on iOS), since almost 20% of the time it fails loading the same YAML file throwing "end of map not found", "invalid map element", or errors like these ones.
I followed the HowToParseADocument guide, so I think I got it correct. But, since it's not 100% reliable, I am looking for something more stable. Eg:
long size = 0;
unsigned char *yaml = FileUtils::getInstance()->getFileData("file.yml", "r", &size);
std::stringstream is;
is << yaml;
YAML::Parser parser(is);
YAML::Node doc;
while(parser.GetNextDocument(doc)) {
instance->settings = doc.Clone();
}
The parser usally breaks at the parser.GetNextDocument(doc) call. The document I am trying to read is plain YAML with key: value lists in this simple form:
# Comment
section1:
param1: value1
param2: value2
# Comment
section2:
param1: value1
param2: value2
Edit
I am not allowed to disclose the content of the original YAML file, but I can give you some information:
It only contains maps, and not arrays, aliases or other particular constructs
Those values are integers, float or strings
It has been linted with this free tool, with success.
The code I used to read it, posted up there, it's always in that form, and I do not modify it to make the app run correctly. It's just that the app starts and works or starts and does not work. Since I am changing nothing in the middle, I really do not understand what's happening.
It's a bit hard to guess at the solution because you won't provide an actual example, but:
Who owns the data at the unsigned char* returned by getFileData? If that function itself owns the data, then it is no longer valid after the function returns, and so all sorts of crazy stuff might happen.
To validate what's happening here (beyond looking at the implementation of getFileData), you could print out is.string() before calling YAML::Parser parser(is); and see if that prints the expected YAML.

llvm.stackprotect of LLVM

I just get started with LLVM. I am reading the code for stack protection which is located in lib/CodeGen/StackProtector.cpp. In this file, the InsertStackProtectors function will insert a call to llvm.stackprotect to the code:
// entry:
// StackGuardSlot = alloca i8*
// StackGuard = load __stack_chk_guard
// call void #llvm.stackprotect.create(StackGuard, StackGuardSlot)
// ...(Skip some lines)
CallInst::
Create(Intrinsic::getDeclaration(M, Intrinsic::stackprotector),
Args, "", InsPt);
This llvm.strackprotect(http://llvm.org/docs/LangRef.html#llvm-stackprotector-intrinsic) seems to be an intrinsic function of llvm, so I tried to find the source code of this function. However, I cannot find it...
I do find one line definition of this function in include/llvm/IR/Intrinsics.td, but it does not tell how it is implemented.
So my questions are:
Where can I find the code for this llvm.strackprotect function?
What is the purpose of these *.td files?
Thank you very much!
The .td file is LLVM's use of code-generation to reduce the amount of boilerplate code. In this particular case, ./include/llvm/IR/Intrinsics.gen is generated in the build directory and contains code describing the intrinsics specified in the .td file.
As for stackprotector, there's a bunch of code in the backend for handling it. See for instance lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp - in SelectionDAGBuilder::visitIntrinsicCall it generates the actual DAG nodes that implement this intrinsic