abaqus fortran in mpi mode gives different results using matmul

abaqus fortran in mpi mode gives different results using matmul - fortran

I am using Fortran90 freeformat coding and intel ifort compiler to create a user subroutine that is than further used in a finite element calculation in ABAQUS.
This routine works just fine one a single core.
However when using it in parallel mode (MPI) the simulation crashes with (signal 11)
I tracked it down to the following codelines.
This will cause errors
BBAR1 = J**(-2d0/3d0)*( MATMUL( F1,TRANSPOSE(F1) ) )
where this version will do it:
BBAR1 = ( MATMUL( F1,TRANSPOSE(F1) ) )
BBAR1 = J**(-2d0/3d0)*BBAR1
It is driving my crazy. Does anyone has any ideas why that is?
Grateful for any advice

Finally I got the solution to this.
Today I randomly tried to downgrade my intel compiler version from ifort 14.0 to ifort 12.1.5
Now everything works just fine....
This really odd. Somehow the parallel solution in ABAQUS works not properly with the newest Intel release.

Related

How to Change LLDB Version CLion Uses?

TL;DR: I want to change the version of LLDB that CLion (v2016.3.5) uses to LLDB v3.8.1. Can I do this? If so, how?
Longer explanation of the question:
CLion is a C++ IDE that I've been using for a few years now. Recently, they released version 2016.3.X. When they went from 2016.2 to 2016.3, they changed the "built in lldb" version from v3.8.1 to v3.9.0. This has caused a problem for me as v3.9.0 doesn't seem to want to work correctly.
When I, say, "print some_var_name" (while at a break point) I get an error (below):
Assertion failed: (D->getCachedLinkage() == LV.getLinkage()), function
getLVForDecl, file
/Applications/buildAgent/work/92515a49514b3993/lldb/llvm/tools/clang/lib/AST/Decl.cpp,
line 1360.
The source of this file can be found here: https://clang.llvm.org/doxygen/Decl_8cpp_source.html
My options are
(1) Figure out why that error is happening. Creating a simple "hello world"program and debugging seems to work. This tells me that it has something to do with my code base, I suppose. But I have over 20,000 lines of code. So figuring out what's doing it would be extremely time consuming. LLDB version 3.8.1 seems like a faster/easier fix since it was doing me just fine in the past.
(2) Use an old version of CLion (which, by default, utilizes LLDB version 3.8.1)
(3) Get the new(er) version(s) of CLion to use LLDB version 3.8.1.
Thanks for any help/guidance.

I assumed you could just enter the path in this preferences page:

Simulation terminated with exit code: 132

I am Mac OS 10.11 (El Capitan) user. I used 4.6 and when I tried to build some simulation I always get "Simulation terminated with exit code: 139" and couldn’t do nothing at all with that. I thought that when I install 5.0 then everything will be fine, but now I get something like that:
Simulation terminated with exit code: 132
Working directory: /Users/JL_Data/omnetpp-5.0/samples/tictoc
Command line: tictoc -r 0 --debug-on-errors=false omnetpp.ini
Environment variables:
PATH=/Users/JL_Data/omnetpp-5.0/bin::/usr/bin:/bin:/usr/sbin:/sbin
DYLD_LIBRARY_PATH=/Users/JL_Data/omnetpp-5.0/lib::
OMNETPP_IMAGE_PATH=/Users/JL_Data/omnetpp-5.0/images
And when I tried open some simulation in terminal I get:
Illegal instruction: 4
Do you have some idea what can I do with that problem? I tried to find something on the internet, but after one day I do not get any idea.
If you need some more information, please let me know.

As it is right now, your question is not completely clear, since it requires one to be familiar with omnet++ and probably some experience installing and setting it up. However, let me make a couple guesses.
First, Illegal instruction. This usually occurs when the binary was built for an architecture different than the one it's being run on; e.g. when then SSE2 or AVX instructions are present in the binary code, but are missing on the CPU.
See, for example, this SO question:
Find which assembly instruction caused an Illegal Instruction error without debugging
There is also a question that discusses exactly your problem, namely, "Illegal instruction: 4" on OS X:
What is the "Illegal Instruction: 4" error and why does "-mmacosx-version-min=10.x" fix it?
Now, since omnet++ appears to be an open source project, I expect it to have a mailing list and / or an IRC channel. Indeed, here is the communications page on the official website that links to a Google Groups-based mailing list:
https://omnetpp.org/get-involved
https://groups.google.com/forum/#!forum/omnetpp
I advise you to get in touch with the developers with a thorough description of your problem, since the chances of them knowing the solution are significantly higher compared to the chances of there being a user on SO who has faced similar problems when installing an identical version of omnet++ on an identical version of Mac OS X.

Fortran running error

I have a Fortran code that I have to run but unfortunately I don't have any experience with Fortran. I tried to run the code using different Fortran version and nothing works.
Here is the link for the code: http://cpc.cs.qub.ac.uk/summaries/adpw.
It would be great if someone could tell me which Fortran version should I use.
Here are the details:
When I try to run with gfortran:
gfortran numcbas.f < numcbas_c.data
Segmentation fault: 11
and when I run with g77:
g77 numcbas.f < numcbas_c.data
ld: warning: -macosx_version_min not specified, assuming 10.10
ld: warning: PIE disabled. Absolute addressing (perhaps -mdynamic-no-pic) not allowed in code signed PIE, but used in __start from /usr/lib/crt1.o.
To fix this warning, don't compile with -mdynamic-no-pic or link with -Wl,-no_pie
And here is start of the code:
program NUMCBAS
IMPLICIT DOUBLE PRECISION (A-H,O-Z)
MAIN DRIVING ROUTINE
CHARACTER*120 TITLE
DIMENSION IBUG(3),HRXS(10),IRXS(10)
COMMON /BASCON/ HRX(10),IRX(10),NIX,IRA
DATA TITLE /' '/
DATA HRXS/1.D-02,2.D-02,2.605D-02,7*0.D0/
DATA IRXS/30,120,500,7*0/,IBUG/3*0/
INTEGER :: NFTA=6,LUNUMB=13, LVAL=0
DOUBLE PRECISION :: BTOL=0.2D0, TINY=1.D-11
DOUBLE PRECISION :: ECMAX=10.D0, RLIM=10.D0, CHARGE=0.D0
NAMELIST /INPUT/ TITLE,LUNUMB,NIX,IRX,HRX,lval,IBUG,BTOL,
* TINY,ECMAX,RLIM,CHARGE
WRITE (6,1000)
and the input file:
&INPUT
TITLE='IONIC TARGET',
lval=0, ECMAX = 5.00D0,
RLIM = 12.0D0, CHARGE=1.0D0,/

It seems to me you are completely misunderstanding tho processes of compilation and running.
These lines are suspicious:
gfortran numcbas.f < numcbas_c.data
g77 numcbas.f < numcbas_c.data
There is no reason to redirect a data file to the compiler command. The compiler first has to create an executable program which you then can run with your data. Normally, a file ./a.out is created and you then run it
./a.out < some_data_to_stdin
It is very strange that you get a Segmentation fault from running gfortran without any other error message. Are you sure the commands you show above are exactly what you are running?

TI DM3730 (Design reference: beagleboard) computes wrong floating point operation results

The Situation
We have a board with a TI DM3730 processor (also known from the Beagleboard) with a Cortex A8 core (r3p2) in use with the following parameters:
Beagleboard Reference Design: Beagleboard-xM Rev-C
Kernel version: 3.2.8
Open CV library: 2.4.6
U-Boot: uboot-2013.04
Toolchain: Sourcery CodeBench ARM 2011.03
Buildroot: 2012.02
The setup is derived from this blog
Now we have written a program (written in C++ and compiled with GCC Version 4.5.2.) which uses the OpenCV library (to calculate some scores using support vector machines) and which behaves in some strange way:
The program runs on the board in its own process using defined test data: It produces repeatedly correct results.
The program runs in two or more processes (with the same defined test data): The results start to become wrong for each process, processes die with segfaults. The last remaining process runs correctly again.
The program runs in its own process (with the same defined test data again). Additionally, another process changes some exposure settings of an attached camera: The program starts to produce wrong results.
So we assume this is a very low level floating point problem.
What we tried
The complete system (all libraries, kernel, boot loader, etc.) have been compiled with compiler flags as suggested on the pandorawiki.org regarding Floating_Point_Optimization
-O3 -mcpu=cortex-a8 -mfpu=neon -ftree-vectorize -mfloat-abi=softfp
-ffast-math -fsingle-precision-constant
We tried to enable L1NEON in Cortex-A8 aux ctrl register according to the Beagle board FAQ and tried the other options mentioned there as well, but unfortunately to no avail.
All three different behaviors are reproducible, but not in the form of a minimal working example.
The same program source and the first and second scenario run correctly on Windows (using Visual Studio) and on a desktop running Linux (GCC), so it's probably not something our code does.
So the questions are now:
Are there any other known bugs with this setup and floating point operations which we are not aware of?
Are there any known compiler options which should be set or omitted which can lead to the observed results?
If a MWE would be helpful, we will look into providing one.
Any clues are welcome.

Ok, we now use an up-to-date buildroot (2014.08) with the included toolchain (arm-buildroot-linux-uclibcgueabi-), Linux-kernel 3.9.11, boost 1.55, Qt 4.8.6, and still OpenCV 2.4.6.
When compiling, we optimize for size (–Os) and for target-optimization we only use –pipe.
The following compiler-flags are currently not used anymore:
-mcpu=cortex-a8 -mfpu=neon -ftree-vectorize -mfloat-abi=softfp -ffast-math -fsingle-precision-constant
Unfortunately, we still don't know the exact reason for the original problem, but we are quite happy that the problem went away with this setup.
So maybe this answer helps some poor soul in the future... ;)

Abort Core Dumped in linux for a C++ progam that works in Visual Studio

I have a C++ project that was built and runs in Visual Studio.
When I try to run it in unix, it gives me
Abort (Core Dumped)
I am using the g++ version 3.2.2
How do i Fix this program ? It needs to run in linux.

First step is to learn how to use gdb or any of the other excellent debuggers for Linux.
That should be able to tell you exactly which source line caused the problem. Then work back from there.
Other than that, we can't really help without seeing that source code. Psychic debugging, whilst useful, is not a highly developed field of endeavour :-)

#All
Thanks a lot for your responses.I really appreciate it
My program worked with g++ 4.2.3. It was aborting with g++ 3.2.2.
The code that gave me the correct output in visual studio was
foundOpen = inStr.find("(");
foundClose = inStr.find(")");
string inGate;
inGate = inStr.substr(++foundOpen,foundClose-foundOpen);
But using g++, I had to make a small change to the substr function.
foundOpen = inStr.find("(");
foundClose = inStr.find(")");
string inGate;
inGate = inStr.substr(++foundOpen,foundClose-foundOpen-1);
I am also a beginner to using linux and don't know how to use gdb. Are there any good tutorials to learn gdb?

I'll take a flying guess: your program uses 'getch()' and you found the function in the library -lcurses or -lncurses and are using that library, but your program crashes as you said.
The trouble is, that function requires a certain amount of setup to work - unlike the similarly named but rather different function that is available on Windows.
Welcome to the real world - different platforms have different functions in the standard APIs; sometimes, two platforms have a function with the same name but different meanings.

Another wild guess: boolean initialization, we got bit by this one. The boolean was initialized automatically using VC++2003 but on Linux it was not (thus either true or false, flip a coin...).
Took a while to debug since in our case it did not crash and was intermittent. I wanted to slap the programmer on the head for not initializing his variable!

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

abaqus fortran in mpi mode gives different results using matmul - fortran

Finally I got the solution to this. Today I randomly tried to downgrade my intel compiler version from ifort 14.0 to ifort 12.1.5 Now everything works just fine.... This really odd. Somehow the parallel solution in ABAQUS works not properly with the newest Intel release.

Related

How to Change LLDB Version CLion Uses?

Simulation terminated with exit code: 132

Fortran running error

TI DM3730 (Design reference: beagleboard) computes wrong floating point operation results

Abort Core Dumped in linux for a C++ progam that works in Visual Studio

Categories

Resources