Make IR variable numbering contiguous - llvm

After I add new instructions to the LLVM IR, the SSA notation numbering of variables does not remain contiguous.
E.g.:
%mul=
%mul1=
%mul2=
If I add a new 'mul' instruction after %mul using
CreateMul(op1,opt2,"mul");
then the output becomes as follows: OUTPUT:
%mul=
%mul3=
%mul1=
%mul2=
Is there any way to make the numbering contiguous in the IR?

Do you want the library to go on and modify the names of all variables after your insertion point just to make the numbering contiguous? Why would you need this (?) - IMHO it doesn't make much sense. Keep in mind that the numbering is meaningless semantically - it's just a simple way to generate unique names.
If you insist, you can always just duplicate all instructions following the insertion point and they will have new variable names assigned. By "duplicate" I mean create new instructions which are clones of the existing ones and re-insert them into the IR instead of the existing ones.

Related

Dictionary access optimization describen - does it take place anywhere?

Most of us compiler nerds have read Google's paper on V8 object property access where the resulting technique is just (in-)directly accessing an array member. My question is:
Does anyone optimize the dictionary access the same way (assigning a fixed index to a (compile time) fixed key)? It doesn't have to be applicable everywhere but perhaps when it's compilation-unit wide? Or the dictionary is readonly? Or between the compilation units at a pass? Whatever, maybe even unrolling the dict. access or inlineing it using a fixed array index instead of a key.
I do know how constant time lookup dictionaries work, but maybe the proposed optimization takes place to further boost the compiled languages (e.g. C++) where the hardware is coached to deal with V-table-like structures at runtime.
Please, if you know any of that, give me a hunch. Thank you much!
TL;DR I want to know of an existing way to optimize dictionary access, (e.g. accessing std::map via array index), not of internal struct/object arrangment in a particular language
Doubtful
While this could in theory be possible (being that std::map implementation is part of the Standard library) I know of no C++ compiler that performs such a trick.
And they do not really have to: if you want array indexing in C++, you pick an array and index it (possibly with named constants).

C++ - Map-like data structure with structural sharing/immutability

Functional programming languages often work on immutable data structures but stay efficient by structural sharing. E.g. you work on some map of information, if you insert an element, you will not modify the existing map but create a new updated version. To avoid massive copying and memory usage, the map will share (as good as possible) the unchanged data between both instances.
I would be interested if there exists some template library providing such a map like data structure for C++. I searched a bit and found nothing, beside internal classes in LLVM.
A Copy On Write b+tree sounds like what your looking for. It basically creates a new snapshot of itself every time it gets modified but it shares unmodified leaf nodes between versions. Most of the implementations I've seen tend to be baked into append only database log files. CouchDB has a very nice write up on them. They are however "relatively easy", as far as map data structures go, to implement.
You can use an ordinary map, but marking every element with a timestamp or "map version number". If you want to remove elements too, use two marks. If you might reinsert removed elements, then you need a list of values and pairs of marks per element.
For example, you search for the key "foo", and you find that it had the value 5 in versions 0 to 3 (included), then it was "removed", and then it had the value -8 in versions 9 to current.
This eats a lot of memory and time, though.

Clojure list vs. vector vs. set

I'm new to Clojure. Apologies if it's a stupid question!
Should I use a set instead of a vector or list each time I don't care about the order of the items? What is the common criteria to decide between these three when order is not necessary?
It really depends on how you will be using the items.
If you will be searching for items, use a set.
If you will be processing it sequentially use a list.
If you will be chopping it into even sized chunks (like while sorting) use a vector.
If you need to count the length use a vector.
If you will be typing these by hand use a vector (to save a little quoting)
In practice most of the processing I see involves turning the data into a seq and processing that so the distinctions between list and vector are often a matter personal taste.
In general, you want a set when your primary concern is "Is this thing in this group?" Besides not preserving order, sets also only hold a given value once. So if you care about the precise placement of values, a vector is more what you want. If you mainly care about testing for membership, a set is more appropriate.
Yes, use a set. Unless you have very, very good reasons to pick something else (performance, memory use, ...) the set is the correct choice.
Remember that programming is primarily about communicating with the human reader of your code and not with the computer. By using a set you make it totally clear that the order of elements is irrelevant (and you are not expecting duplicate values) helping the reader understand your intentions and your own mental mind set.

Could the S of S.O.L.I.D be extended for every single element of the code?

The S of the famous Object Oriented Programming design stands for:
Single responsibility principle, the notion that an object should have
only a single responsibility.
I was wondering, can this principle, be extended even to arrays, variables, and all the elements of a program?
For example, let's say we have:
int A[100];
And we use it to store the result of a function, but somehow we use the same A[100] to check, for example, what indexes of A have we already checked and elaborated.
Could this be considered wrong? Shouldn't we create another element to store, for example, the indexes that we have already checked? Isn't this an hint of future messy code?
PS: I'm sorry if my question is not comprehensible but English is not my primary language. If you have any problem understanding the point of it please let me know in a comment below.
If same A instance is used in different program code portions you must follow this principle. If A is a auxiliary variable, local one for example, I think you don't need to be care about it.
If you are tracking the use of bits of the array that have been updated, then you probably shouldn't be using an array, but a map instead.
In any case, if you need that sort of extra control over the array, then basically, you should be considering a class that contains both the contents of the array and the various information about what has and hasn't been done. So your array becomes local to the class object, as do your controls, and voila. You have single responsibility again.

What are some good methods to replace string names with integer hashes

Usually, entities and components or other parts of the game code in data-driven design will have names that get checked if you want to find out which object you're dealing with exactly.
void Player::Interact(Entity *myEntity)
{
if(myEntity->isNearEnough(this) && myEntity->GetFamilyName() == "guard")
{
static_cast<Guard*>(myEntity)->Say("No mention of arrows and knees here");
}
}
If you ignore the possibility that this might be premature optimization, it's pretty clear that looking up entities would be a lot faster if their "name" was a simple 32 bit value instead of an actual string.
Computing hashes out of the string names is one possible option. I haven't actually tried it, but with a range of 32bit and a good hashing function the risk of collision should be minimal.
The question is this: Obviously we need some way to convert in-code (or in some kind of external file) string-names to those integers, since the person working on these named objects will still want to refer to the object as "guard" instead of "0x2315f21a".
Assuming we're using C++ and want to replace all strings that appear in the code, can this even be achieved with language-built in features or do we have to build an external tool that manually looks through all files and exchanges the values?
Jason Gregory wrote this on his book :
At Naughty Dog, we used a variant of the CRC-32 algorithm to hash our strings, and we didn't encounter a single collision in over two years of development on Uncharted: Drake's Fortune.
So you may want to look into that.
And about the build step you mentioned, he also talked about it. They basically encapsulate the strings that need to be hashed in something like:
_ID("string literal")
And use an external tool at build time to hash all the occurrences. This way you avoid any runtime costs.
This is what enums are for. I wouldn't dare to decide which resource is best for the topic, but there are plenty to choose from: https://www.google.com/search?q=c%2B%2B+enum
I'd say go with enums!
But if you already have a lot of code already using strings, well, either just keep it that way (simple and usually enough fast on a PC anyway) or hash it using some kind of CRC or MD5 into an integer.
This is basically solved by adding an indirection on top of a hash map.
Say you want to convert strings to integers:
Write a class wraps both an array and a hashmap. I call these classes dictionaries.
The array contains the strings.
The hash map's key is the string (shared pointers or stable arrays where raw pointers are safe work as well)
The hash map's value is the index into the array the string is located, which is also the opaque handle it returns to calling code.
When adding a new string to the system, it is searched for already existing in the hashmap, returns the handle if present.
If the handle is not present, add the string to the array, the index is the handle.
Set the string and the handle in the map, and return the handle.
Notes/Caveats:
This strategy makes getting the string back from the handle run in constant time (it is merely an array deference).
handle identifiers are first come first serve, but if you serialize the strings instead of the values it won't matter.
Operator[] overloads for both the key and the value are fairly simple (registering new strings, or getting the string back), but wrapping the handle with a user-defined class (wrapping an integer) adds a lot of much needed type safety, and also avoids ambiguity if you want the key and the values to be the same types (overloaded[]'s wont compile and etc)
You have to store the strings in RAM, which can be a problem.