Getting FR_3085 ERROR when working in Informatica - informatica

In a lot of the task flow jobs I'm running, I constantly am getting the
FR_3085 ERROR: Row [1]: 2-th character is a null character, which is
not allowed in a text input file
error. These occur usually in data synchronization tasks but I sometimes see this in mapping configurations as well. How do I resolve this error?

This error occurs when you have NULL characters in your flat file.
One way for doing this is using the OS utilities for removing the NULL characters from your flat file automatically and this will depend on what OS you're using.

Related

AWS Lambda Extension throws exit status 127 (/usr/bin/env: node : No such file or directory)

I am creating a Lambda extension to get secret values from secret manager using as a template:
https://github.com/hariohmprasath/aws-lambda-extensions
I have zipped the files into the following structure.
extension.zip
--> extensions
--> secret-extension
--> secret-extension
--> node_modules
--> extensions-api.js
--> index.js
--> package.json
--> package-lock.json
--> secrets.js
Error:
{
"errorMessage": "RequestId: e5c06575-cf7d-46c0-b168-624e8e9cf572 Error: exit status 127",
"errorType": "Extension.Crash"
}
The Error is that /usr/bin/env: node : No such file or directory
At the top of the index.js file is the command #!/usr/bin/env node (in order to interpret the file in node)
The runtime environment is Nodejs 12 and have tried with 14 as well.(extension documentation says node 12 runtime is required)
What could be causing this issue?
The lambda runtime is a node runtime so node should be installed.
I have ls the folder and /env folder exists.
I know node exists within the runtime as node -v returns v14.20.0 or v12.22.11
I am on a windows machine
creating the extension (dont think the deployment could be causing
this because it was written on windows machine.
Any help would be appreciated.
So found out it has to do with a custom environment they are using for the example provided by AWS. Instead I went the route of using a runtime independent solution which has worked as expected.
Documentation
I suspect the issue you may have been encountering is the same as mine, and that issue was:
The #!/usr/bin/env node had the whitespace characters \r\n at the end of the line which obviously cannot be seen unless you have your editor display these, and this is how windows handles new lines (*nix systems use just \n); Now when the lambda reads the line, it is trying to interpret it as #!/usr/bin/env node\r which obviously won't exist, and can't run the file via node.
The problem with the logs is when you look at the logs, it won't render the \r as that, it could do 1 of 2 things depending on where you look at the logs:
It will interpret \r as a new line character, and thereby just print the whitespace, which is not obvious in the log message; OR the other situation that can occur (which is what happened to me):
It shows just : No such file or directory because it's interpreted the \r as a carriage return, which means it takes the cursor to the beginning of the line, and overwrites as it prints the new characters,
I am pretty confident this is your issue, and I will admit I didn't solve this 100% on my own as a person in my team had a similar issue with whitespace characters and only after allot of head banging did I think of it, and confirmed using hexdump -C to confirm the issue.

Parallel Writes to NFS-backed File

UPDATE: I had each node write to a separate file, and when the separate files were concatenated together the result was correct. I also updated the code to attempt a channel flush and file sync after each write of a single record, but there are still issues between nodes 0 and 1, now. If I make Node 0 sleep for a few seconds before it starts its iteration of the coforall loop, the records come out correct. If not, the last few hundred bytes of Node 0's records seem to be reliably overwritten with NULL bytes, up to the start of Node 1's records. The issues between Node 1 and Node 2, and Node 2 and Node 3, seem to not show up anymore.
Additionally, if I suppress either Node 0 or Node 1 from writing, I see the fully-formed records from the un-suppressed node written correctly to the file. In the case that Node 1 is suppressed, I see 9,997 100B records (or 999,700) correct bytes followed by NULL bytes in the file where Node 1's suppressed records would go. In the case that Node 0 is suppressed, I see exactly 999,700 NULL bytes in the file, after which Node 1's records begin.
Original Post:
I'm trying to troubleshoot an issue with parallel writes from different nodes to a shared NFS-backed file on disk. At the moment, I suspect that something is wrong with the way writes to the disk happen on the NFS server.
I'm working on adapting MPI+C code that uses pwrite to write to coordinated chunks of a file. If I try to have the equivalent locales in Chapel write to the file inside of a coforall loop, I end up with the bits of the file around the node boundaries messed up - usually the final few hundred bytes of each node's data are garbled. However, if I have just one locale iterate through the data on all locales and write it, the data comes out correctly. That is, I use the same data structures to calculate the offsets, but only Locale 0 seeks to that offset and performs the writes.
I've verified that the offsets into the file that each locale runs do not overlap, and I'm using a single channel per task, defined from within the on loc do block, so that tasks don't share a single channel.
Are there known issues with writing to a file from different locales? A lot of the documentation makes it seem like this is known to be safe, but an unsubstantiated guess seems to indicate that there are issues with caching of file contents; when examining the incorrect data, the bits that are incorrect seem to be the original data from the file in that location at the beginning of the program.
I've included the relevant routine below, in case you easily spot something I missed. To make this serial, I convert the coforall loc in Locales and on loc do block into a for j in 0..numLocales-1 loop, and replace here.id with j. Please let me know what else would help get to the bottom of this. Thanks!
proc write_share_of_data(data_filename: string, ref record_blocks) throws {
coforall loc in Locales {
on loc do {
var data_file: file = open(data_filename, iomode.cwr);
var data_writer = data_file.writer(kind=ionative, locking=false);
var line: [1..100] uint(8);
const indices = record_blocks[here.id].D;
var local_record_offset = + reduce record_blocks[0..here.id-1].D.size;
writeln("Loc ", here.id, ": record offset is ", local_record_offset);
var local_bytes_offset = terarec.terarec_width_disk * local_record_offset;
data_writer.seek(start=local_bytes_offset);
for i in indices {
var write_rec: terarec_t = record_blocks[here.id].records[i];
line[1..10] = write_rec.key;
line[11..98] = write_rec.value;
line[99] = 13; // \r
line[100] = 10; // \n
data_writer.write(line);
lines_written += 1;
}
data_file.fsync();
data_writer.close();
data_file.close();
}
}
return;
}
Adding an answer here that solved my particular problem, though it doesn't explain the behavior seen. I ended up changing the outer loop from coforall loc in Locales to for loc in Locales. This isn't too big of an issue since it's all writing to one file anyway - I doubt that multiple locales can actually make much headway in all attempting to write concurrently to a single file on an NFS server. As a result, the change still allows nodes to write the data they have locally to NFS, rather than forcing Node 0 to collect and then write the data on behalf of all locales. This amounts to only adding idle time to the write operation commensurate with the time it takes Locale 0 to start the remote task on other nodes when the previous node has finished writing, which for the application at hand is not a concern.
Have you tried specifying start/end in file.writer instead of using seek? Does that change anything? What about specifying the end offset for the channel.seek call? Does it matter if the file is created and has the appropriate size before you start?
Other than that, I wonder if this issue would appear for both NFS and Lustre. If it appears for both it might well be a Chapel bug. It sounds from your description that the C program was using this pattern, which points to it being a bug. But, have you run C code doing this on your setup? If it being a Chapel bug seems most likely after further investigation, we would appreciate a bug report issue with a reproducer.
I know that NFS does not always do what one would like, in terms of data consistency. It's my understanding that it has "close to open" semantics but it's unclear to me what that means in the context of opening a file and writing to a particular region within it, in parallel from different locales.
From Why NFS Sucks by Olaf Kirch:
An NFS client is permitted to cache changes locally and send them to
the server whenever it sees fit. This sort of lazy write-back greatly
helps write performance, but the flip side is that everyone else will
be blissfully unaware of these change before they hit the server. To
make things just a little harder, there is also no requirement for a
client to transmit its cached write in any particular fashion, so
dirty pages can (and often will be) written out in random order.
I read two implications from this paragraph that are relevant to your situation here:
The writes you do on different locales can be observed by the NFS server in an arbitrary order. (However as I understand it, the data should be sent to the server by the time your fsync call returns).
These writes are done at an OS page granularity (usually 4k). (Note that this is more a hypothesis I am making than it is a fact. It should be tested or further investigated).
It would be interesting to check if 2. is a plausible explanation for the behavior you are seeing. For example, you could explore having each locale operate on a multiple of 4096 records (or potentially try writing records of 4096 bytes each) and see if that changes the behavior. If 2 is indeed the explanation, it should be possible to create a C program that demonstrates the behavior as well.

GATE_Using for Thesis_Run-time Error

When I am trying to run corpus pipeline on language resources. It is throwing the below (even though I follow the order as Document reset, english tokeniser, sentence splitter)
Can someone help me with the process to debug this run-time error
Error:
gate.creole.ExecutionException: No sentences or tokens to process in document Password_Safe-window1.txt_0003E
Please run a sentence splitter and tokeniser first!
at gate.creole.POSTagger.execute(POSTagger.java:257)
at gate.util.Benchmark.executeWithBenchmarking(Benchmark.java:291)
at gate.creole.SerialController.runComponent(SerialController.java:225)
at gate.creole.SerialController.executeImpl(SerialController.java:157)
at gate.creole.SerialAnalyserController.executeImpl(SerialAnalyserController.java:223)
at gate.creole.SerialAnalyserController.execute(SerialAnalyserController.java:126)
at gate.util.Benchmark.executeWithBenchmarking(Benchmark.java:291)
at gate.gui.SerialControllerEditor$RunAction$1.run(SerialControllerEditor.java:1759)
at java.lang.Thread.run(Thread.java:745)
Edit:
The files are not empty. As i tried to implement #dedek's suggestion, it has thrown no errors. But raised one more problem as follows:
Exception in thread "ApplicationViewer1" java.lang.OutOfMemoryError: Java heap space
I think it is because your document is empty.
Can you confirm that?
There is a run-time param failOnMissingInputAnnotations of the POSTagger, set it to false and it should be ok.
See also the docs:
failOnMissingInputAnnotations - if set to false, the PR will not fail with an ExecutionException if no input Annotations are found and instead only log a single warning message per session and a debug message per document that has no input annotations (run-time, default = true).
Concerning the OutOfMemoryError: Java heap space
See following questions:
Getting OOM while using GATE on large data set
GATE PersistenceManager.loadObjectFromFile outofmemory error while loading .gapp files
JAVA PermGem memory

C++ SunOS ofstream error

Folks,
we collect large amounts of data and create error, status, info log files to let us know what's
going on. We use ofstreams to write to these files. After some period of time (days), we get a
file error (indicated by .good() call) on one of the ofstreams. In the affected log file, it
appears that the write of a single line begins but is interrupted by a write of the exact same
line. For example,
### Random Line of Text 1 ###
### Random Line of Text 2 ###
### Random Line of Text 3### Random Line of Text 3 ###
Each file/ofstream has a single thread that does the actual writing. We don't flush for performance
reasons and shouldn't have to.
Its always the same type of error.
It does only happen on one of three machines running the same code but we don't see any I/O errors
but maybe not looking in the right place.
Thanks for you time.

Strange semantic error

I have reinstalled emacs 24.2.50 on a new linux host and started a new dotEmacs config based on magnars emacs configuration. Since I have used CEDET to some success in my previous workflow I started configuring it. However, there is some strange behaviour whenever I load a C++ source file.
[This Part Is Solved]
As expected, semantic parses all included files (and during the initial setup parses all files specified by the semantic-add-system-include variables), but it prints this an error message that goes like this:
WARNING: semantic-find-file-noselect called for /usr/include/c++/4.7/vector while in set-auto-mode for /usr/include/c++/4.7/vector. You should call the responsible function into 'mode-local-init-hook'.
In the above example the error is printed for the STL vector but a corresponding error message is printed for every file included by the one I'm visiting and any subsequent includes. As a result it takes quite a long time to finish and unfortunately the process is repeated any type I open a new buffer.
[This Problem Is Solved Too]
Furthermore it looks like the parsing doesn't really work as when I place the point above a non-c primitive type (i.e. not int,double,float, etc) instead of printing the type's definition in the modeline an error message like
Idle Service Error semantic-idle-local-symbol-highlight-idle-function: "#<buffer DEPFETResolutionAnalysis.cc> - Wrong type argument: stringp, (((0) \"IndexMap\"))"
Idle Service Error semantic-idle-summary-idle-function: "#<buffer DEPFETResolutionAnalysis.cc> - Wrong type argument: stringp, ((\"fXBetween\" 0 nil nil))"
where DEPFETResolutionAnalysis.cc is the file & buffer I'm currently editing and IndexMap and fXBetween are types defined in files included by the file I'm editing/some file included by the file I'm editing.
I have not tested any further features of CEDET/semantic as the problem is pretty annoying. My cedet config can be found here.
EDIT: With the help of Alex Ott I kinda solved the first problem. It was due to my horrible cedet initialisation. See his first answer for the proper way to configure CEDET!
There still remains the problem with the Idle Service Error (which, when enabling global-semantic-idle-local-symbol-highlight-mode, occurs permanently, not only when checking the definition of the type at point).
And there is the new problem of how to disable the site-wise init file(s).
EDIT2: I have executed semantic-debug-idle-function in a buffer where the problem occurs and it produces a ~700kb [sic!] output. It looks like it is performing some operations on a data container which, by the looks of it, contains information on all the symbols defined in the files parsed. As I have parsed a rather large package (~20Mb source files) this table is rather large. Can semantic handle a database that large or is this impossible and the reason of my problem?
EDIT3: Deleting the content of ~/.semanticdb and reparsing all includes did the trick. I still need to disable the site-wise init files but as this is not related to CEDET I will close this question (the question related to the site-wise init files can be found here).
You need to change your init file so it will perform loading of CEDET only once, not in the hook that will be called for each .h/.hpp/.c/.cpp files. You can change this config as the base, and read more in following article.
The problem that you have is caused because Semantic is trying to analyze header files, and when it tries to open them, then its initialization routines are called again, and again...
The first problem was solved by correctly configuring CEDET which is discribed on Alex Ott's homepage. His answer solves this first problem. The config file specified in his answer is a great start for a nice config; I have used the very same to config CEDET for my needs.
The second problem vanished once I updated CEDET from 1.1 to the bazaar (repository) version, which is explained here and in Alex' article. Additionaly one must delete the content of the directory ~/.semanticdb (which contains the semantic database and was corrupted I guess).
I'd like to thank Alex Ott for his help and sticking with me throughout my journey to the solution :)