Finding a symbol in a referenced assembly using Roslyn Code Analysis - roslyn

I need to search the referenced assemblies of the compilation under analysis for types that have the classname specified, irrespective of the namespace it is within.
Essentially I need a function that searches the referenced assemblies in the same way the below method searches the compilation under analysis.
context.SemanticModel.Compilation.GetSymbolsWithName(classNameToFind, SymbolFilter.Type);
Is there a way to load a compilation abstraction using the assembly name? I am ok with crude ways as long as I do not have to use reflection....so that I can continue to work with ISymbol.

I don't think we have a special API for this; you can just go to the Compilation's GlobalNamespace and manually walk the namespace/type hierarchy.

Related

Is there a standard way to ensure that a piece of code is executed at global scope?

I have some code I want to execute at global scope. So, I can use a global variable in a compilation unit like this:
int execute_global_code();
namespace {
int dummy = execute_global_code();
}
The thing is that if this compilation unit ends up in a static library (or a shared one with -fvisibility=hidden), the linker may decide to eliminate dummy, as it isn't used, and with it my global code execution.
So, I know that I can use concrete solutions based on the specific context: specific compiler (pragma include), compilation unit location (attribute visibility default), surrounding code (say, make an dummy use of dummy in my code).
The question is, is there a standard way to ensure execute_global_code will be executed that can fit in a single macro which will work regardless of the compilation unit placement (executable or lib)? ie: only standard c++ and no user code outside of that macro (like a dummy use of dummy in main())
The issue is that the linker will use all object files for linking a binary given to it directly, but for static libraries it will only pull those object files which define a symbol that is currently undefined.
That means that if all the object files in a static library contain only such self-registering code (or other code that is not referenced from the binary being linked) - nothing from the entire static library shall be used!
This is true for all modern compilers. There is no platform-independent solution.
A non-intrusive to the source code way to circumvent this using CMake can be found here - read more about it here - it will work if no precompiled headers are used. Example usage:
doctest_force_link_static_lib_in_target(exe_name lib_name)
There are some compiler-specific ways to do this as grek40 has already pointed out in a comment.

How to GetSemanticModel for any syntax tree in referenced projects of project compilation

When I compile a project (using MSBuildWorkspace.Create().OpenProjectAsync().GetCompilationAsync(); I get a Compilation back which includes x SyntaxTrees for the x files within that project. But that project also references other projects. I can see the those projects within the References property of the Compilation, but it seems that calling Compliation.GetSemanticModel does not look through these.
For an arbitrary SyntaxTree that is somewhere within my project (including referenced projects) do I have to manually search through each referenced Compilation of the root project or is there are helper method or some other approach to this?
You have to call GetSemanticModel for the compilation that contains that tree. So if project A depends on B, and you've walked over to a Syntax Tree in B, you have to use the compilation for B to ask about B. That's because A doesn't actually "know" about anything in B per se, it's an opaque box.
Your best opportunity is probably to use the workspace API more. When you called OpenProjectAsync(), grab the Solution object from the .Solution property. From there you can walk across all the projects more directly, and if you have a tree you can also do Solution.GetDocument(tree).GetSemanticModelAsync().

How to best avoid double thunking in C++/CLI native types

Tradionally, I've been using MFC extension dll's and importing/exporting using dllimport/dllexport.
However, when the dll is changed to use /clr, this method becomes costly as calls can result in a double-thunk. I'm taking a massive performance hit at the moment and need to stop the double-thunks. The solution I've seen described suggests making sure everything is using __clrcall convention, but this does not work with dllexport.
Microsoft's own section on double-thunking suggests:
Similarly, if you export (dllexport, dllimport) a managed function, a native entry point is generated and any function that imports and calls that function will call through the native entry point. To avoid double thunking in this situation, do not use native export/import semantics; simply reference the metadata via #using (see #using Directive (C++)).
To me, this reads as though I can remove dllexport/dllimport from my classes and stick a #using in my stdafx.h. For native types, however, this results in LNK2028 (unresolved token) and LNK2019 (unresolved external symbol). It makes no difference whether I include the .lib in the linker or not; I still get this error.
So, my question is how best to avoid double-thunking and import native types from a C++/CLI library?
Regards
Nick
** UPDATE **
Some updates from testing.
As soon as the dll is compiled with /clr, double thunking occurs for native types (using dllexport/dllimport).
This can be mitigated by turning CLR support off on a file by file basis. This is a pain, and sometimes native types use the clr so this can't be done everywhere. And the callee must be compiled native too for it to work.
Methods can be marked __clrcall but this will result in a compilation error when mixed with dllexport. However, I have managed to make the following code work without a double-thunk:
// MFCCLRLIB_API is defined in the library only (as dllexport)
// but NOT defined when using (dllimport)
// MFCCLRLIB_CALL is defined as empty in the library,
// but __clrcall when using.
#ifndef _MFCCLRLIB
#define MFCCLRLIB_API
#define MFCCLRLIB_CALL __clrcall
#endif
class MFCCLRLIB_API ThunkHack
{
public:
ThunkHack();
ThunkHack(const ThunkHack&);
~ThunkHack();
};
class MFCCLRLIB_API ThunkHackCaller
{
public:
ThunkHackCaller(void);
~ThunkHackCaller(void);
virtual void MFCCLRLIB_CALL UseThunkClass(ThunkHack thunk);
};
This compiles and I can use the caller class now from outside the library and it doesn't result in a double thunk. Which is what I want. I'm concerned, however, that this is not the way to do it; I haven't read anything that suggests this approach is safe.
I'd really like some guidelines on how to used mixed-mode C++ libraries effeciently to avoid performance hits of the like we are seeing.
-Nick

How do I rectify inconsistent Unresolved External errors? [duplicate]

This question already has answers here:
What is an undefined reference/unresolved external symbol error and how do I fix it?
(39 answers)
Closed 9 years ago.
I'm trying to wrap my head around C++ developement using the SFML library. I'm following a tutorial (http://www.gamefromscratch.com/page/Game-From-Scratch-CPP-Edition-Part-7.aspx), and using visual studio 2010.
A problem I keep running into regards unresolved externals. I'm really struggling with this, because unlike most errors I run into, it doesn't seem to a) have anything to do with the code, and b) doesn't behave consistently. Rather than give y'all a specific example and ask for help solving that one example, I'm hoping to develop a more reliable way of attacking these problems. I'll give you an outline of a common occurance though.
I have a solution with 8 header files and 8 cpp files that correspond to them. The solution is stable: It compiles and runs with no errors or warnings.
I'll go into a header file and add this line:
virtual void DoNothing();
I'll then go into the matching cpp file and write the method:
void DoNothing(){};
I compile and run, and get 5 unresolved external errors. They don't point to any line of code, so I don't really know how to fix them, but I obviously did something wrong. Fair enough. Trying to get back to a stable state, I delete the two lines of code I had inserted, and compile. Even though the code is identical to the last stable state, I get the same unresolved external errors.
Trying random things, I go into another cpp file and reverse the order of two included header files. The game compiles now. If I switch the order of the included header files back, it compiles.
What the hell are unresolved external errors? Why don't they seem to behave consistently with the code I've entered? How do I read them to find out what the problem is, and how do I avoid them in the first place?
Thank you.
ps: If there are more specific details I should provide, please just let me know.
"Unresolved External" errors mean that your code is referring to something (usually a function or method, but can be a variable too) that does not exist. These are link errors, and not compile errors; that's why you don't get a line number and more helpful error messages.
Let me give you a little background on how C++ code is turned into an executable (and keep in mind that I'm simplifying things a bit.)
Each C++ source file (and not header file) in your project is compiled separately. A ".cpp" file and all the headers it includes are compiled into what is called an object file or object code. (These files have a ".obj" or ".o" extension.) You can also think of library files (that is ".lib" files on Windows and ".a" files on Linux) as a collection of these object files, stored for later use.
To produce the executable programs (e.g. the EXE or DLL file on Windows) all these object files are linked together are voila!
Now, the important thing here is that each source file is compiled in isolation and independent from other source files. So, if the code in one file calls a function that is implemented in another file, the compiler won't see the actual body of that function and can only assume that as long as the declaration of the called function is visible (i.e. the prototype, i.e. the line you write in headers,) then these files are going to be linked together eventually and will leave the task of actually making the call to the linker. This usually means that as long as you include the right headers, your compiler is going to be happy.
But the linker is going to be more tenacious and pedantic. At link time, you really really need to provide the body (i.e. the implementation) of all the functions that you use all over the project. It is your task to make sure that all the right object files and libraries are linked together and the implementation of each used function exists somewhere among them exactly once (no more, no less.)
This brings us to your problem. When you get an "unresolved external" linker error, this means that the body of a function you've called does not exist anywhere in object files and libraries that you are linking together.
Obviously, one of two things is happening. Either you have included the header for an external library, but have forgotten to link in the library file itself (which is not your problem here) or you've declared (i.e. written the prototype for) a function but have forgotten to implement its body.
Keep in mind that the linker is really strict here. If you declare something like this in your class:
class Foo {
void bar (int x);
};
and then in your ".cpp" file, implement this function:
void bar (int x)
{
// Do nothing
}
then you'll get an unresolved external error if you actually call Foo::bar() anywhere in your program, because the implemented bar() is not a method of Foo (you should have implemented void Foo::bar (int x) {}.) Similar things happen if you slightly misspell or get the type of the arguments wrong or whatnot.
Reading linker errors and making sense from them can be hard. Sometimes, the name that the linker is complaining about (the "symbol" it says it can't find) is all mangled beyond recognition. This has to do with *Application Binary Interface*s (ABI) and several decades of history and precedence. Anyways, most of the time, if you look closely and the link error message, you can see what the function name was and check your code (or libraries) and try again.
Also, though it's rare, it sometimes happens that in order to solve some link issues, you have to resort to completely rebuilding your project.
Every time I've seen behavior like this it has been because of a circular reference between projects. For example, project A has a reference to an object/symbol implemented in project B while at the same time project B has a reference to an object/symbol from project A. Every time you build your solution, the tools have to compile one project first, then the other. If you make a change to the second project to be compiled, the first one cannot see the change on the first round of compilations and the build fails. If you manage to manually build project B (against a now obsolete copy of library B), then the solution starts to build correctly. More complex cycles are possible (e.g. A depends on B, which depends on C, which depends on A). You don't mention multiple projects explicitly, but I bet you have them.
These circular references are common on large solutions that have been around for a long time and have grown slowly over time. People get in habit of adding links from everything to everything because they need one function from here, a struct from there...
Hunt down these dependencies. You should be able to do a full clean rebuild from nothing but the source code. Your dependency tree should look like... Well, a tree; not a graph.

Is it possible to write COM code in a static library and then link it to a DLL?

I am currently working on a project that has a number of COM objects written in C++ with ATL.
Currently, they are all defined in .cpp and .idl files that are directly compiled into the COM DLL.
To allow unit tests to be written easier, I am planning on moving the implementation of the COM objects out into a separate static library. That library can then be linked in to the main DLL, and the separate unit test project.
I am assuming that there's nothing particularly special about the code generated by ATL, and that this will work much like all other C++ code when it comes to linking with static libraries. However, I don't have too much actual knowledge of ATL myself so don't know if this is really the case.
Will this work as I'm expecting? Or are there pitfalls that I should look out for?
There are gotchas since LIBs are pulled in only if they are referenced, as opposed to OBJs which are explicitly included.
Larry Osterman discussed some of the subtleties a few years ago:
When I moved my code into a library, what happened to my ATL COM
objects?
A caveat: This post discusses details of how ATL7 works. For other
version of ATL, YMMV. The general principals apply for all
versions, but the details are likely to be different.
My group’s recently been working on reducing the number of DLLs
that make up the feature we’re working on (going from somewhere
around 8 to 4). As a part of this, I’ve spent the past couple of
weeks consolidating a bunch of ATL COM DLL’s.
To do this, I first changed the DLLs to build libraries, and then
linked the libraries together with a dummy DllInit routine (which
basically just called CComDllModule::DllInit()) to make the DLL.
So far so good. Everything linked, and I got ready to test the new
DLL.
For some reason, when I attempted to register the DLL, the
registration didn’t actually register the COM objects. At that
point, I started kicking my self for forgetting one of the
fundamental differences between linking objects together to make an
executable and linking libraries together to make an executable.
To explain, I’ve got to go into a bit of how the linker works. When
you link an executable (of any kind), the linker loads all the
sections in the object files that make up the executable. For each
extdef symbol in the object files, it starts looking for a public
symbol that matches the symbol.
Once all of the symbols are matched, the linker then makes a second
pass combining all the .code sections that have identical contents
(this has the effect of collapsing template methods that expand into
the same code (this happens a lot with CComPtr)).
Then a third pass is run. The third pass discards all of the
sections that have not yet been referenced. Since the sections
aren’t referenced, they’re not going to be used in the resulting
executable, so to include them would just bloat the executable.
Ok, so why didn’t my ATL based COM objects get registered? Well,
it’s time to play detective.
Well, it turns out that you’ve got to dig a bit into the ATL code to
figure it out.
The ATL COM registration logic gets picked in the CComModule
object. Within that object, there’s a method
RegisterClassObjects, which redirects to
AtlComModuleRegisterClassObjects. This function walks a list of
_ATL_OBJMAP_ENTRY structures and calls the RegisterClassObject
on each structure. The list is retrieved from the
m_ppAutoObjMapFirst member of the CComModule (ok, it’s really a
member of the _ATL_COM_MODULE70, which is a base class for the
CComModule). So where did that field come from?
It’s initialized in the constructor of the CAtlComModule, which
gets it from the __pobjMapEntryFirst global variable. So where’s
__pobjMapEntryFirst field come from?
Well, there are actually two fields of relevance,
__pobjMapEntryFirst and __pobjMapEntryLast.
Here’s the definition for the __pobjMapEntryFirst:
__declspec(selectany) __declspec(allocate("ATL$__a")) _ATL_OBJMAP_ENTRY* __pobjMapEntryFirst = NULL;
And here’s the definition for __pobjMapEntryLast:
__declspec(selectany) __declspec(allocate("ATL$__z")) _ATL_OBJMAP_ENTRY* __pobjMapEntryLast = NULL;
Let’s break this one down:
__declspec(selectany): __declspec(selectany) is a directive to
the linker to pick any of the similarly named items from the section
– in other words, if a __declspec(selectany) item is found
in multiple object files, just pick one, don’t complain about it
being multiply defined.
__declspec(allocate("ATL$__a")): This one’s the one that makes
the magic work. This is a declaration to the compiler, it tells the
compiler to put the variable in a section named "ATL$__a" (or
"ATL$__z").
Ok, that’s nice, but how does it work?
Well, to get my ATL based COM object declared, I included the
following line in my header file:
OBJECT_ENTRY_AUTO(<my classid>, <my class>)
OBJECT_ENTRY_AUTO expands into:
#define OBJECT_ENTRY_AUTO(clsid, class) \
__declspec(selectany) ATL::_ATL_OBJMAP_ENTRY __objMap_##class = {&clsid, class::UpdateRegistry, class::_ClassFactoryCreatorClass::CreateInstance, class::_CreatorClass::CreateInstance, NULL, 0, class::GetObjectDescription, class::GetCategoryMap, class::ObjectMain }; \
extern "C" __declspec(allocate("ATL$__m")) __declspec(selectany) ATL::_ATL_OBJMAP_ENTRY* const __pobjMap_##class = &__objMap_##class; \
OBJECT_ENTRY_PRAGMA(class)
Notice the declaration of __pobjMap_##class above – there’s
that declspec(allocate("ATL$__m")) thingy again. And that’s where
the magic lies. When the linker’s laying out the code, it sorts
these sections alphabetically – so variables in the ATL$__a
section will occur before the variables in the ATL$__z section.
So what’s happening under the covers is that ATL’s asking the linker
to place all the __pobjMap_<class name> variables in the
executable between __pobjMapEntryFirst and __pobjMapEntryLast.
And that’s the crux of the problem. Remember my comment above about
how the linker works resolving symbols? It first loads all the items
(code and data) from the OBJ files passed in, and resolves all the
external definitions for them. But none of the files in the wrapper
directory (which are the ones that are explicitly linked) reference
any of the code in the DLL (remember, the wrapper doesn’t do much more
than simply calling into ATL’s wrapper functions – it doesn’t
reference any of the code in the other files.
So how did I fix the problem? Simple. I knew that as soon as the
linker pulled in the module that contained my COM class definition,
it'd start resolving all the items in that module. Including the
__objMap_<class>, which would then be added in the right location so that ATL would be able to pick it up. I put a dummy function call
called ForceLoad<MyClass> inside the module in the library, and
then added a function called CallForceLoad<MyClass> to my DLL
entry point file (note: I just added the function – I didn’t
call it from any code).
And voila, the code was loaded, and the class factories for my COM
objects were now auto-registered.
What was even cooler about this was that since no live code called
the two dummy functions that were used to pull in the library, pass
three of the linker discarded the code!