Roslyn: Reconstruct code from BasicBlocks in a CFG - roslyn

In Roslyn, a control flow graph has the list of BasicBlocks which constitute the program. Is it possible to get code statements from each block to reconstruct the entire code?

This is not possible using Roslyn's in-built methods in my knowledge.
I achieved a similar result by constructing a dictionary of blocks and sets of line numbers corresponding to those blocks using string matching. Additional filtering done to the same in order to improve accuracy in case of blocks that include locks etc.
The result is not 100 percent accurate, but it comes pretty close in most cases.

Related

Fast and frequent file access while executing C++ code

I am looking for suggestions on how best to implement my code for the following requirements. During execution of my c++ code, I frequently need to access data stored in a dictionary, which itself is stored in a text file. The dictionary contains 100 million entries, and at any point in time, my code would query data corresponding to some particular entry among those 100 million entries. There is no particular pattern in which those queries are made, and further during the lifetime of the program execution, not all entries in the dictionary are queried. Also, the dictionary will remain unchanged during the program's lifetime. The data corresponding to each entry is not all of the same length. The file size of my dictionary is ~24 GB, and I have only 16 GB of RAM memory. I need my application to be very fast, so I would like to know how best to implement such a system so that read access times can be minimized.
I am also the one who is creating the dictionary, so I do have the flexibility in breaking down my dictionary into several smaller volumes. While thinking about what I can do, I came up with the following, but not sure if either are good.
If I store the line offset for each entry in my dictionary from the beginning of the file, then to read the data for the corresponding entry, I can directly jump to the corresponding offset. Is there a way to do this using say ifstream without looping through all lines until the offset line? A quick search on the web seems to suggest this is not possible atleast with ifstream, are there are other ways this can be done?
The other extreme thought was to create a single file for each entry in the dictionary, so I would have 100 million files. This approach has the obvious drawback of overhead in opening and closing the file stream.
In general I am not convinced either of the approaches I have in mind are good, and so I would like some suggestions.
Well, if you only need key value accesses, and if the data is larger than what can fit in memory, the answer is a NoSQL database. That mean a hash type index for the key and arbitrary values. If you have no other constraint like concurrent accesses from many clients or extended scalability, you can roll your own. The most important question for a custom NoSQL database is the expected number of keys that will give the size of the index file. You can find rather good hashing algorithms around, and will have to make a decision between a larger index file and a higher risk of collisions. Anyway, unless you want to use a tera bytes index files, your code must be prepared to possible collisions.
A detailed explaination with examples is far beyond what I can write in a SO answer, but it should give you a starting point.
The next optimization will be what should be cached in memory. It depends on the way you expect the queries. If it is unlikely to query more than one time the same key, you can probably just rely on the OS and filesystem cache, and a slight improvement would be memory mapped files, else caching (of index and/or values) makes sense. Here again you can choose and implement a caching algorithm.
Or if you think that it is too complex for little gain, you can search if one of the free NoSQL databases could meet your requirement...
Once you decide using on-disk data structure it becomes less a C++ question and more a system design question. You want to implement a disk-based dictionary.
You should consider the following factors from now on are - what's your disk parameters? is it SSD? HDD? what's your average lookup rate per second? Are you fine having 20usec - 10ms latencies for your Lookup() method?
On-disk dictionaries require random disk seeks. Such seeks have a latency of dozens of microseconds for SSD and 3-10ms for HDD. Also, there is a limit on how many such seeks you can make a second. You can read this article for example. CPU stops being a bottleneck and IO becomes important.
If you want to pursue this direction - there are state of art C++ libraries that give you on-disk key-value store (no need for out-of- process database) or you can do something simple yourself.
If your application is a batch process and not a server/UI program, i.e. you have another finite stream of items that you want to join with your dictionary then I recommend reading about external algorithms like Hash Join or a MapReduce. In these cases, it's possible to organize your data in such way that instead of having 1 huge dictionary of 24GB you can have 10 dictionaries of size 2.4GB and sequentially load each one of them and join. But for that, I need to understand what kind of problem you are trying to solve.
To summarize, you need to design your system first before coding the solution. Using mmap or tries or other tricks mentioned in the comments are local optimizations (if at all), they are unlikely game-changers. I would not rush exploring them before doing back-on-the-envelope computations to understand the main direction.

Scanning files around 58 GB in coldfusion and finding a word count in that file

I have written code and through loop it actually scans words and find count of provided word. But this works only when data is somewhat less than 1 GB and request time out around 30 mins. Can anyone please suggest any better solution to scan and find count of word in a file so that I dont have to increase request time out more than 30 mins and it scans such huge amount of data. Is it possible through coldfusion or I should look for some other technology.
My Two Cents;
When it comes to processing large files in Coldfusion (or Lucee) I always consider using an alternative technology.
In order to process the file, you would need to load the file into memory ahead of processing. If you were working directly in Java you would use a buffer object and only keep the current line in memory at a time. If you delegate that work to Coldfusion, you don't guarantee that the lines are eligible for Garbage collection so its possible that you end up eating your memory and slowing things down. In addition to this, you need to store each unique word in some sort of structure against its word count, which means you'll have a few words used a lot of times, and many words used only once; another (potentially) large consumer of memory.
These things considered, my goto for larger files is either to start using Java directly to ensure that i'm using the correct buffering technique or creating a utility in an alternative tool such as python or ruby and delegating the work via cfexecute. That said, there's not reason you can't do this in CF you just need to look at what part of the process is slowing you down and dealing with that first and optimizing. Could you offload the job into a cfthread and pickup the results later? Would a better data structure or method of storage suit?
tl;dr
It's impossible to be any more specific without seeing the data and the code.

Memory block alternative that ISN'T fixed in minor time step?

I have a model with some inputs that are fed into a CMEX S-Function via the Memory block, and the S-Function provides outputs based on these inputs, and those inputs are fed back into the S-Function. Classic algebraic loop scenario. I was using a memory block to prevent this because our solver is usually variable-time step
According to the Mathworks documentation the Memory Block, and the Unit Delay block also, are fixed in minor time step in terms of their outputs. I realize that the inputs can be either discrete or continuous, but the output will only be continuous in the major time step, and fixed in the minor one.
Now, for various reasons, it is important to us that everything in the model get updated in the minor and major time steps. I came up with a way of using the PWork vector to store the previous value of state points and forward them to the CMEX S-Function that we use. This prevents algebraic loops from occuring and also ensures continuous in minor time step, but is not as elegant as I would like.
Does anyone have any alternative suggestions?
PS: As an aside, I believe that it is better to use the Memory block regardless of if my solver is Fixed Step or Variable Step, because it will internally just become a Unit Delay block for Fixed Step solvers. Is this assumption correct?
According to this article, using fast-response transfer function may achieve similar result without going into minor step.

What is the most efficient way to read formatted data from a large file?

Options:
1. Reading the whole file into one huge buffer and parsing it afterwards.
2. Mapping the file to virtual memory.
3. Reading the file in chunks and parsing them one by one.
The file can contain quite arbitrary data but it's mostly numbers, values, strings and so on formatted in certain ways (commas, brackets, quotations, etc).
Which option would give me greatest overall performance?
If the file is very large, then you might consider using multiple threads with option 2 or 3. Each thread can handle a single chunk of file/memory and you can overlap IO and computation (parsing) this way.
It's hard to give a general answer to your question as choosing the "right" strategy heavily depends on the organization of the data you are reading.
Especially if there's a really huge amount of data to be processed options 1. and 2. won't work anyways as the available amount of main memory poses an upper limit to any attempt like this.
Most probably the biggest gain in terms of efficiency can be accomplished by (re)structuring the data you are going to process.
Checking if there is any chance to organize the data in a way to save from needlessly processing whole chunks would be the primary spot I'd try to improve upon before addressing the problem mentioned in the question.
In terms of efficiency there's nothing but a constant to win in choosing any of the mentioned methods while on the other hand there might be much better improvement with the right organization of your data. The bigger the data the more important your decision will get.
Some facts about the data that seem interesting enough to take into consideration include:
Is there any regular pattern to the data you are going to process ?
Is the data mostly static or highly dynamic?
Does it have to be parsed sequentially or is it possible to process data in parallel?
It makes no sense to read the entire file all at once and then convert from text to binary data; it's more convenient to write, but you run out of memory faster. I would read the text in chunks and convert as you go. The converted data, in binary format instead of text, will likely take up less space than the original source text anyway.

Possible to distribute or parallel process a sequential program?

In C++, I've written a mathematical program (for diffusion limited aggregation) where each new point calculated is dependent on all of the preceding points.
Is it possible to have such a program work in a parallel or distributed manner to increase computing speed?
If so, what type of modifications to the code would I need to look into?
EDIT: My source code is available at...
http://www.bitbucket.org/damigu78/brownian-motion/downloads/
filename is DLA_full3D.cpp
I don't mind significant re-writes if that's what it would take. After all, I want to learn how to do it.
If your algorithm is fundamentally sequential, you can't make it fundamentally not that.
What is the algorithm you are using?
EDIT: Googling "diffusion limited aggregation algorithm parallel" lead me here, with the following quote:
DLA, on the other hand, has been shown
[9,10] to belong to the class of
inherently sequential or, more
formally, P-complete problems.
Therefore, it is unlikely that DLA
clusters can be sampled in parallel in
polylog time when restricted to a
number of processors polynomial in the
system size.
So the answer to your question is "all signs point to no".
Probably. There are parallel versions of most sequential algorithms, and for those sequential algorithms which are not immediately parallelisable there are usually parallel substitutes. This looks like be one of those cases where you need to consider parallelisation or parallelisability before you choose an algorithm. But unless you tell us a bit (a lot ?) more about your algorithm we can't provide much specific guidance. If it amuses you to watch SOers argue in the absence of hard data sit back and watch, but if you want answers, edit your question.
The toxiclibs website gives some useful insight into how one DLA implementation is done
There is cilk, which is an enhancement to the C language (unfortunately not C++ (yet)) that allows you to add some extra information to your code. With just a few minor hints, the compiler can automatically parallelize parts of your code, such as running multiple iterations of a for loop in parallel instead of in series.
Without knowing more about your problem, I'll just say that this looks like a good candidate to implement as a parallel prefix scan (http://en.wikipedia.org/wiki/Prefix_sum). The simplest example of this is an array that you want to make a running sum out of:
1 5 3 2 5 6
becomes
1 6 9 11 16 22
This looks inherently serial (as all the points depend on the ones previous), but it can be done in parallel.
You mention that each step depends on the results of all preceding steps, which makes it hard to parallelize such a program.
I don't know which algorithm you are using, but you could use multithreading for speedup. Each thread would process one step, but must wait for results that haven't yet been calculated (though it can work with the already calculated results if they don't change values over time). That essentially means you would have to use a locking/waiting mechanism in order to wait for results that haven't yet been calculated but are currently needed by a certain worker thread to go on.