Micron SDRAM bit decay (Refresh issue) - refresh

I am using Micron SDRAM "MT48LC8M16A2P" with Cirrus Logic EP9307 microprocessor.
I am using a RTOS on the system, as well.
The SDRAM refresh rate is being set to "5us" in the processor register, against 15.625 us specified by the datasheet.
I do not have a low power mode, and hence no self-refresh commands are sent to the SDRAM.
Observation:
-> I could observe bit rots in random sections of the SDRAM cells, when I start multi-tasking. Out of nowhere I go to a data-abort after about 10 mins of runtime.
-> I could observe known sections of the data memory getting changed.
-> I was able to avoid this issue by adding a refresh cyclic task, which touches each SDRAM rows and hence an explicit refresh is generated.
-> However,I could still observe bit rots in the memory cells, as soon as I connect the emulator to debug the code.
-> There is no issue seen with normal read and write operation to the SDRAM.
Questions:
-> Just wanted to clear my suspicion, if this could be a refresh issue or has anyone faced a similar situation ?
-> I have only done a one-time configuration of the internal SDRAM controller of the EP9307 microprocessor. Is there any configuration that needs to be updated at runtime ?
Thanks in advance.
-Gaurav

Related

Why does my program run faster on first launch than on next launches?

I have been working for 2.5 years on a personal flight sim project on my leisure time, written in C++ and using Opengl on a windows 7 PC.
I recently had to move to windows 10. Hardware is exactly the same. I reinstalled Code::blocks.
It turns out that on first launch of my project after the system start, performance is OK, similar to what I used to see with windows 7. But, the second, third, and all next launches give me lower performance, with significant less fluidity in frame rate compared to the first run, detectable by eye. This never happened with windows 7.
Any time I start my system, first run is fast, next ones are slower.
I had a look at the task manager while doing some runs. The first run is handled by one of the 4 cores of my CPU (iCore5-6500) at approximately 85%. For the next runs, the load is spread accross the 4 cores. During those slower runs on 4 cores, I tried to modify the affinity and direct my program to only one core without significant improvement in performance. The selected core was working at full load, though.
My C++ code doesn't explicitly use any thread function at this stage. From my modest programmer's point of view, there is only one main thread run in the main(). On the task manager, I can see that some 10 to 14 threads are alive when my program runs. I guess (wrongly?) that they are implicitly created by the use of joysticks, track ir or other communication task with GPU...
Could it come from memory not being correctly freed when my program stops? I thought windows would free it properly, even if I forgot some 'delete' after using 'new'.
Has anyone encountered a similar situation? Any explanation coming to your minds?
Any suggestion to better understand these facts? Obviously, my ultimate goal is to have a consistent performance level whatever the number of launches.
trying to upload screenshots of second run as viewed by task manager:
trying to upload screenshots of first run as viewed by task manager:
Well I got a problems when switching to win10 for clients at my work too here few I encountered all because Windows10 has changed up scheduling of processes creating a lot of issues like:
older windowses blockless thread synchronizations techniques not working anymore
well placed Sleep() helps sometimes. Btw. similar problems was encountered when switching from w2k to wxp.
huge slowdowns and frequent freezes for few seconds on older single threaded apps
usually setting affinity to single core solves this. You can do this also in task manager just to check and if helps then you can do this in code too. Here an example on how to do it with winapi:
Cache size estimation on your system?
messed up drivers timings causing zombies processes even total freeze and or BSOD
I deal with USB in my work and its a nightmare sometimes on win10. On top of all this Win10 tends to enforce wrong drivers on devices (like gfx cards, custom USB systems etc ...)
auto freeze close app if it does not respond the wndproc in time
In Windows10 the timeout is much much smaller than in older versions. If the case You can try running in compatibility mode (set in icon properties on desktop) for older windows (however does not work for #1,#2), or change the apps code to speed up response. For example in VCL you can call Process Messages from inside of blocking code to remedy this... Or you can use threads for the heavy lifting ... just be careful with rendering and using winapi as accessing some winapi (any window/visual related stuff) functions from outside main thread causes havoc ...
On top of all this old IDEs (especially for MCUs) don't work properly anymore and new ones are usually much worse (or unusable because of lack of functionality that was present in older versions) to work with so I stayed faith full to Windows7 for developer purposes.
If none of above helps then try to log the times some of your task did need ... it might show you which part of code is the problem. I usually do this using timing graph like this:
both x,y axises are time and each task has its own color and row in graph. the graph is scrolling in time (to the left side in my case) and has changeable time scale. The numbers are showing actual and max (or sliding avg) value ...
This way I can see if some task is not taking too much time or even overlaps its next execution, peaks are also nicely visible and all runs during runtime without any debug tools which might change the behavior of execution.

Cassandra crashes with Out Of Memory within minutes after starting

We have a Cassandra cluster with 3 nodes and replication factor 3 on AWS using EC2Snitch.
Instance type is c5.2xlarge (8 core and 16GB RAM).
The cluster had been working fine but suddenly since yesterday evening, the cassandra process on all the nodes started crashing. They are set to restart automatically but then they crash with Out of Memory Heap Space error in 1 or 2 or 3 minutes after start.
Heap configs:
MAX_HEAP_SIZE="4G"
HEAP_NEWSIZE="800M"
After this, we tried increasing the node size to r5.4x or 128 GB memory and assigned 64GB Heap but still the same thing happens, irrespective of all 3 nodes being started or only one node being started at a time. We could note that first garbage collection happens after some time and then consecutively within seconds, failing to free any further memory and eventually crashing.
We are not sure what is being pulled to memory immediately after starting.
Other parameters:
Cassandra version : 2.2.13
Database size is 250GB
hinted_handoff_enabled: true
commitlog_segment_size_in_mb: 64
memtable_allocation_type: offheap_buffers
Any help here, would be appreciated.
Edit:
We found that there is particular table when queried, it causes the casssandra node to crash.
cqlsh:my_keyspace> select count(*) from my_table ;
ReadTimeout: Error from server: code=1200 [Coordinator node timed out waiting for replica nodes' responses] message="Operation timed out - received only 0 responses." info={'received_responses': 0, 'required_responses': 1, 'consistency': 'ONE'}
So we think, it is related to the data being corrupt/huge in this particular table.
Thanks.
Some quick observations:
If you're building a new cluster, use the latest 3.11.x version. There's no point in building new on 2.2.
Based on your settings, it looks like you're using CMS GC. If you're not overly familiar with GC tuning, you may get more stability by switching to G1, and not specifying a HEAP_NEWSIZE (G1 figures out Eden sizing on its own).
If you're stuck on CMS, the guidance for setting HEAP_NEWSIZE at 100mb x cores, is wrong. To avoid new->old gen promotion, set HEAP_NEWSIZE at 40%-50% of total heap size and increase MaxTenuringThreshold to something like 6-8.
On a 16GB RAM machine with CMS GC, I would use a 8GB heap, and flip memtable_allocation_type: offheap_buffers back to heap_buffers.
Set commitlog_segment_size_in_mb back to 32. Usually when folks need to mess with that, it's to lower it, unless you've also changed max_mutation_size_in_kb.
You haven't mentioned what the application is doing when the crash happens. I suspect that a write-heavy load is happening. In that case, you may need more than 3 nodes, or look at rate-limiting the number of in-flight writes on the application side.
Additional info to help you:
CASSANDRA-8150 - A Cassandra committer discussion on good JVM settings.
Amy's Cassandra 2.1 Tuning Guide - Amy Tobey's admin guide has a lot of wisdom on good default settings for cluster configuration.
Edit
We are using G1 GC.
It is very, VERY important that you not set a heap new size (Xmn) with G1. Make sure that gets commented out.
select count(*) from my_table ;
Yes, unbound queries (queries without WHERE clauses`) will absolutely put undue stress on a node. Especially if the table is huge. These types of queries are something that Cassandra just doesn't do well. Find a way around using/needing this result.
You might be able to engineer this to work by setting your paging size smaller (driver side), or by using something like Spark. Or maybe with querying by token range, and totaling the result on the app-side. But you'll be much better off not doing it.
In addition to the CG and memory tuning suggestions by #aaron, you should also check that you are using the right compaction strategy for your data.
https://docs.datastax.com/en/dse/5.1/dse-dev/datastax_enterprise/config/configChooseCompactStrategy.html#Whichcompactionstrategyisbest
You should also check for corrupt SStables, as trying to fetch corrupted data will also manifest in the same way. (for example https://docs.datastax.com/en/cassandra-oss/3.0/cassandra/tools/toolsScrub.html)

Webcam disabled by AVG antivirus

I am creating a program with Qt on Windows using OpenCV.
When I launch the program, sometimes, the webcam won't start. cam.open(0) returns 1 (open succesfull) but the frames are empty. I spent many hours on this and I finally pinpoint the problem to "AVG Protection". About 50% of the time I launched the program, I was receiving empty frames. With AVG uninstalled, it works 100% of the time.
I guess AVG was sometimes detecting my program as malicious or something. I tried adding the program in the AVG exceptions but it did not change anything.
Could I do something programatically to prevent this ?
Thank you very much,
Alex
The same problem here for us. Some days ago customers started to report that our application does not receive video data from the camera anymore.
There are no errors or any exceptions raised.
Yesterday, I have reproduced this issue on a laptop with AVG installed. From what I can find from logs, the camera can be found and started by the application, but it does not receive any frames from it. Namely, presentFrame() method from dscamerasession.cpp is not being called back anymore by Windows process, as it used to before.
There is a question at AVG support:
https://support.avg.com/answers#!/feedtype=SINGLE_QUESTION_DETAIL&dc=All&criteria=ALLQUESTIONS&id=906b0000000DlgTAAS
Their answer is:
Please follow the instructions to change the Firewall settings to check the status : Open AVG Zen -> Internet Security -> Click on Menu in the top right corner -> Settings -> Components -> Firewall -> Customize -> Network Profiles -> Change the networks from Public to Private (If it is in Private, change it to Public).
Then check whether you are able to access it without any issues

Profiling a legacy application

I am using an old version of a metastorm workflow designer.
We support this while we rewriting it in Microsoft Technologies.
After a few changes the "MAP" (*.epc) has become exceedingly slow to work with and "PUBLISH".
The publish writes the map and its binaries to the DB which then a service will pick up and execute.
However the publish "hangs" never completing and taking from a completion time of 15 min to in excess of 3 hours but still not completing.
I can see the CPU is being hammered but memory seems fine.
I ran process monitor but it does not show me much which leads me to believe the process is doing something either than the norm or the map has grown to a point which is leading it to destruction.
My question: How else can I profile this black box exe?

How to Disable Dynamic Frequency Scaling?

I would like to do some microbenchmarks, and try to do them right. Unfortunately dynamic frequency scaling makes benchmarking highly unreliable.
Is there a way to programmatically (C++, Windows) find out if dynamic frequency scaling is enabled? If, can this be disabled in a program?
Ive tried to just use a warmup phase that uses 100% CPU for a second before the actual benchmark takes place, but this turned out to be not reliable either.
UPDATE: Even when I disable SpeedStep in the BIOS, cpu-z shows that the frequency changes between 1995 and 2826 GHz
In general, you need to do the following steps:
Call CallNtPowerInformation() and pass SystemPowerCapabilities to InformationLevel parameter, set lpInputBuffer and nInputBufferSize to NULL, then set lpOutputBuffer to SYSTEM_POWER_CAPABILITIES structure, and set nOutputBufferSize to the size of the structure. After this first call, SYSTEM_POWER_CAPABILITIES structure containing the current system power capabilities. To check whether the system supports processor throttling, read the value of ProcessorThrottle.
There are other two members we are interested in, they are, ProcessorMinThrottle and ProcessorMaxThrottle; they represents the minimum and maximum level of system processor throttling supported, expressed as a percentage. If both members has already values 100%, this means CPU throttling is currently disabled, so you don't need to reconfigure it.
To disable CPU throttling, you need to set ProcessorMinThrottle and ProcessorMaxThrottle to 100%. To do this, call CallNtPowerInformation() again and pass SystemPowerCapabilities to InformationLevel parameter; but now, set lpInputBuffer to the SYSTEM_POWER_CAPABILITIES structure in which the two members has been set to 100%. I'm sure you know what to do next.
In non-programmatic way, you can also get/set Windows Power Options using the Windows built-in command-line tools, that is, PowerCfg.
Further Reading
Power Management
Power Management Functions
So far, none of the above CallNtPowerInformation options worked for me. The relevant ProcessorThrottle field of SYSTEM_POWER_CAPABILITIES was FALSE and changing some SYSTEM_POWER_POLICYs didn't work.
However, https://www.geeks3d.com/20170213/how-to-disable-intel-turbo-boost-technology-on-a-notebook/#_24 outlines a way to make an option available in the power management settings.
With ProcMon, I was able to trace it back to the following registry manipulations:
Read the ActivePowerScheme SZ value under HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Power\User\PowerSchemes to get the active power plan
Set the ACSettingIndex and/or DCSettingIndex DWORD under Computer\HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Power\User\PowerSchemes\<above active power plan GUID>\54533251-82be-4824-96c1-47b60b740d00\be337238-0d82-4146-a960-4f3749d470c7 to 0 (Disabled, or whatever you choose) from 2 (High)
Unfortunately, the relevant keys are owned by the system, which either means you have to prompt the user (which has to have admin access) to change the permissions of the key or you have to use powercfg to manipulate the setting. The latter is preferrable and actually seems to work, even without admin access (courtesy of https://learn.microsoft.com/en-us/windows-server/administration/performance-tuning/hardware/power/power-performance-tuning#processor-performance-boost-mode):
powercfg -setacvalueindex scheme_current sub_processor PERFBOOSTMODE 0
powercfg -setdcvalueindex scheme_current sub_processor PERFBOOSTMODE 0
powercfg -setactive scheme_current
In Windows XP and later CPU speed is managed by power policy. Doesn't it turn off the scaling if you set "Max performance" mode in Windows power management dialog?
There're also some third party tools - SpeedSwitchXP for example.
Programmatically this could be done, I suppose, using CallNtPowerInformation function.