Clang C++ new operator large dynamic array slow compile time - c++

I am facing a very unusual issue using clang-1200.0.32.21 on Mac OS Catalina 10.15.7.
Essentially, this code takes a long time to compile, and has extremely high RAM usage (peeks around 10GB of RAM):
m_Table = new MapGenerator::Block[MAP_DIAMETER * MAP_HEIGHT * MAP_DIAMETER]();
===-------------------------------------------------------------------------===
Clang front-end time report
===-------------------------------------------------------------------------===
Total Execution Time: 50.8973 seconds (55.3352 wall clock)
---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
34.9917 (100.0%) 15.9056 (100.0%) 50.8973 (100.0%) 55.3352 (100.0%) Clang front-end timer
34.9917 (100.0%) 15.9056 (100.0%) 50.8973 (100.0%) 55.3352 (100.0%) Total
Changing it to the following instantly fixes it:
uint32_t table_size = Map::MAP_DIAMETER * Map::MAP_HEIGHT * Map::MAP_DIAMETER * sizeof(MapGenerator::Block);
m_Table = reinterpret_cast<MapGenerator::Block*>(malloc(table_size));
memset(m_Table, 0, table_size);
===-------------------------------------------------------------------------===
Clang front-end time report
===-------------------------------------------------------------------------===
Total Execution Time: 1.3105 seconds (2.1116 wall clock)
---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
1.1608 (100.0%) 0.1497 (100.0%) 1.3105 (100.0%) 2.1116 (100.0%) Clang front-end timer
1.1608 (100.0%) 0.1497 (100.0%) 1.3105 (100.0%) 2.1116 (100.0%) Total
If you are curious, these are the relevant definitions:
enum : int { MAP_HEIGHT = 15 };
enum : int { MAP_DIAMETER = 1000 };
union Block
{
struct
{
uint8_t valid : 1; // block valid
/* used to determine visible faces */
uint8_t v_top : 1;
uint8_t v_bottom : 1;
uint8_t v_front : 1;
uint8_t v_back : 1;
uint8_t v_right : 1;
uint8_t v_left : 1;
uint8_t v_base : 1;
uint8_t discard : 1; // delete flag
};
uint16_t bits;
};
Block* m_Table;
Is there any logical reason as to why new takes this long to compile? To my understanding it should not be different. Also, I do not have this issue on the MSVC (Microsoft C++) compiler on Windows.
EDIT:
This is a minimal reproducible sample:
#include <cstdint>
#include <cstdlib>
#include <cstring>
enum : int { MAP_HEIGHT = 15 };
enum : int { MAP_DIAMETER = 1000 };
union Block
{
struct
{
uint8_t valid : 1; // block valid
/* used to determine visible faces */
uint8_t v_top : 1;
uint8_t v_bottom : 1;
uint8_t v_front : 1;
uint8_t v_back : 1;
uint8_t v_right : 1;
uint8_t v_left : 1;
uint8_t v_base : 1;
uint8_t discard : 1; // delete flag
};
uint16_t bits;
};
Block* table = nullptr;
// UNCOMMENT THIS:
// #define USE_MALLOC
int main(int argc, char* argv[])
{
#ifndef USE_MALLOC
table = new Block[MAP_DIAMETER * MAP_HEIGHT * MAP_DIAMETER]();
#else
uint32_t table_size = MAP_DIAMETER * MAP_HEIGHT * MAP_DIAMETER * sizeof(Block);
table = reinterpret_cast<Block*>(malloc(table_size));
memset(table, 0, table_size);
#endif
(void) table;
#ifndef USE_MALLOC
delete[] table;
#else
free(table);
#endif
return 0;
}
I compile it with the command:
g++ -std=c++17 -g -Wall -Werror -ftime-report -c test.cpp -o test.o

Related

what is that configures the clock_gettime resolution?

I'm experimenting with the time-management of linux on a raspberry pi. For that I'm looking at clock_gettime (and clock_getres) for CLOCK_REALTIME.
I noticed that when I check clock_getres, it always says nanoseconds resolution (it returns 0 for tv_sec and 1 for tv_nsec) and also the values returned by clock_gettime point in that direction (e.g. I get values like 1642070078.415542996) but only if the system is fully booted with systemd etc running! When I put my test-program as a replacement for init (e.g. init=/test-clock in cmdline.txt), then suddenly getres still returns 1 for tv_nsec but all values returned by clock_gettime are with microsecond resolution!
test-program:
#include <stdint.h>
#include <stdio.h>
#include <time.h>
void test_clock(const clockid_t id, const char *const name)
{
struct timespec ts { 0, 0 };
uint64_t n = 0;
struct timespec start { 0, 0 }, end { 0, 0 };
clock_gettime(id, &start);
uint64_t start_ts = start.tv_sec * uint64_t(1000000000) + start.tv_nsec, end_ts = 0;
do {
clock_gettime(id, &end);
n++;
end_ts = end.tv_sec * uint64_t(1000000000) + end.tv_nsec;
}
while(end_ts - start_ts < 2500000000);
struct timespec now { 0, 0 };
int ns = 0;
for(int i=0; i<1000; i++) {
clock_gettime(id, &now);
if (now.tv_nsec % 1000)
ns++;
}
if (clock_getres(id, &ts))
fprintf(stderr, "clock_getres(%d) failed (%s)\n", id, name);
else
printf("%s:\t%ld.%09ld\t%f/s\t%d\t%ld.%09ld\n", name, ts.tv_sec, ts.tv_nsec, n / 2.5, ns, now.tv_sec, now.tv_nsec);
}
int main(int argc, char *argv[])
{
printf("clock\tresolution\tcalls/s\t# with ns\tnow\n");
test_clock(CLOCK_REALTIME, "CLOCK_REALTIME");
test_clock(CLOCK_TAI, "CLOCK_TAI");
test_clock(CLOCK_MONOTONIC, "CLOCK_MONOTONIC");
test_clock(CLOCK_MONOTONIC_RAW, "CLOCK_MONOTONIC_RAW");
return 0;
}
Compile with:
g++ -Ofast -ggdb3 -o test-clock test-clock.cpp -lrt
Expected output:
clock resolution calls/s # with ns now
CLOCK_REALTIME: 0.000000001 48881594.400000/s 1000 1642071062.213603835
CLOCK_TAI: 0.000000001 49500959.200000/s 1000 1642071101.713668922
CLOCK_MONOTONIC: 0.000000001 49248353.200000/s 1000 2402707.303582035
CLOCK_MONOTONIC_RAW: 0.000000001 47072281.600000/s 1000 2402705.604860726
What I see when starting test-clock as init replacement:
clock resolution calls/s # with ns now
CLOCK_REALTIME: 0.000000001 853001.200000/s 0 19.216404000
CLOCK_TAI: 0.000000001 736536.000000/s 0 21.718848000
CLOCK_MONOTONIC: 0.000000001 853367.200000/s 0 24.220166000
CLOCK_MONOTONIC_RAW: 0.000000001 855598.800000/s 0 26.721360000
The 4th column tells me that no readings were with nanosecond resolution.
So what I would like to know is: how can I configure the kernel/glibc/whatever so that it gives me nanosecond resolution at boot as well?
Any ideas?

TensorRT increasing memory usage (leak?)

I'm having a loop where I parse an ONNX model into TensorRT, create an engine and do inference.
I make sure I call x->destroy() on all objects and I use cudaFree for each cudaMalloc.
Yet, I keep getting an increase in memory usage through nvidia-smi over consecutive iterations.
I'm really not sure where the problem comes from. The cuda-memcheck tool reports no leaks either.
Running Ubuntu 18.04, TensorRT 7.0.0, CUDA 10.2 and using a GTX 1070.
The code, the ONNX file along with a CMakeLists.txt are available on this repo
Here's the code
#include <memory>
#include <iostream>
#include <cuda_runtime_api.h>
#include <NvOnnxParser.h>
#include <NvInfer.h>
class Logger : public nvinfer1::ILogger
{
void log(Severity severity, const char* msg) override
{
// suppress info-level messages
if (severity != Severity::kINFO)
std::cout << msg << std::endl;
}
};
int main(int argc, char * argv[])
{
Logger gLogger;
auto builder = nvinfer1::createInferBuilder(gLogger);
const auto explicitBatch = 1U << static_cast<uint32_t>(nvinfer1::NetworkDefinitionCreationFlag::kEXPLICIT_BATCH);
auto network = builder->createNetworkV2(explicitBatch);
auto config = builder->createBuilderConfig();
auto parser = nvonnxparser::createParser(*network, gLogger);
parser->parseFromFile("../model.onnx", static_cast<int>(0));
builder->setMaxBatchSize(1);
config->setMaxWorkspaceSize(128 * (1 << 20)); // 128 MiB
auto engine = builder->buildEngineWithConfig(*network, *config);
builder->destroy();
network->destroy();
parser->destroy();
config->destroy();
for(int i=0; i< atoi(argv[1]); i++)
{
auto context = engine->createExecutionContext();
void* deviceBuffers[2]{0};
int inputIndex = engine->getBindingIndex("input_rgb:0");
constexpr int inputNumel = 1 * 128 * 64 * 3;
int outputIndex = engine->getBindingIndex("truediv:0");
constexpr int outputNumel = 1 * 128;
//TODO: Remove batch size hardcoding
cudaMalloc(&deviceBuffers[inputIndex], 1 * sizeof(float) * inputNumel);
cudaMalloc(&deviceBuffers[outputIndex], 1 * sizeof(float) * outputNumel);
cudaStream_t stream;
cudaStreamCreate(&stream);
float inBuffer[inputNumel] = {0};
float outBuffer[outputNumel] = {0};
cudaMemcpyAsync(deviceBuffers[inputIndex], inBuffer, 1 * sizeof(float) * inputNumel, cudaMemcpyHostToDevice, stream);
context->enqueueV2(deviceBuffers, stream, nullptr);
cudaMemcpyAsync(outBuffer, deviceBuffers[outputIndex], 1 * sizeof(float) * outputNumel, cudaMemcpyDeviceToHost, stream);
cudaStreamSynchronize(stream);
cudaFree(deviceBuffers[inputIndex]);
cudaFree(deviceBuffers[outputIndex]);
cudaStreamDestroy(stream);
context->destroy();
}
engine->destroy();
return 0;
}
Looks like the issue was coming from the repetitive IExecutionContext creation despite destroying it at the end of every iteration. Creating/deleting the context at the same time as the engine fixed the issue for me. Nevertheless, it could still be a bug where context creation leaks a little bit of memory and that leak accumulates over time. Filed a github issue.

undefined symbol error when loading a module, using AWS C++ SDK, in Freeswitch

I am asking this question again as the mods decided to close my question here as a duplicate, within minutes of it being asked (and also down-voted!!). Now I have gone through all 33 answers of what was thought to be an answer to my solution, but it didn't help. So I am asking again.
I am trying to build a FreeSWITCH module to add text-to-speech functionality using AWS Polly & the AWS C++ SDK.
Dev environment is Debian 8, g++ 4.9.2. AWS C++ SDK is built using instructions here except that I turned off shared libs (produces .a lib files).
The AWS C++ SDK was built as recommended here (basically C++ code with C++ linkage). mod_polly.cpp is built with C++ linkage as well to produce mod_polly.so. It does refer to some C headers & functions. This was built as -
g++ -shared -o mod_polly.so -L/usr/local/lib/ -laws-cpp-sdk-polly -laws-cpp-sdk-core -fPIC -g -ggdb -std=c++11 -Wall -Werror -I/usr/src/freeswitch/src/include/ -I/usr/src/freeswitch/libs/libteletone/src/ mod_polly.cpp
Source below -
extern "C" {
#include <switch.h>
}
#include <fstream>
#define BIG_ENDIAN_SYSTEM (*(uint16_t *)"\0\xff" < 0x100)
#define REVERSE_BYTES(...) do for(size_t REVERSE_BYTES=0; REVERSE_BYTES<sizeof(__VA_ARGS__)>>1; ++REVERSE_BYTES)\
((unsigned char*)&(__VA_ARGS__))[REVERSE_BYTES] ^= ((unsigned char*)&(__VA_ARGS__))[sizeof(__VA_ARGS__)-1-REVERSE_BYTES],\
((unsigned char*)&(__VA_ARGS__))[sizeof(__VA_ARGS__)-1-REVERSE_BYTES] ^= ((unsigned char*)&(__VA_ARGS__))[REVERSE_BYTES],\
((unsigned char*)&(__VA_ARGS__))[REVERSE_BYTES] ^= ((unsigned char*)&(__VA_ARGS__))[sizeof(__VA_ARGS__)-1-REVERSE_BYTES];\
while(0)
#include <aws/core/Aws.h>
#include <aws/core/auth/AWSCredentials.h>
#include <aws/core/client/ClientConfiguration.h>
#include <aws/core/utils/Outcome.h>
#include <aws/polly/PollyClient.h>
#include <aws/polly/model/SynthesizeSpeechRequest.h>
#include <aws/polly/model/SynthesizeSpeechResult.h>
#include <aws/polly/model/TextType.h>
#include <aws/polly/model/LanguageCode.h>
#include <aws/polly/model/OutputFormat.h>
#include <aws/polly/model/VoiceId.h>
typedef unsigned long DWORD; // 32-bit unsigned integer
typedef unsigned short WORD; // 16-bit unsigned integer
struct riff // Data Bytes Total
{
char chunkID[4]; // "RIFF" 4 4
DWORD riffSize; // file size - 8 4 8
char typeID[4]; // "WAVE" 4 12
char formatChunkID[4]; // "fmt " 4 16
DWORD formatChunkSize; // 16 bytes 4 20
WORD formatTag; // 2 22
WORD noOfChannels; // 2 24
DWORD samplesPerSec; // 4 28
DWORD bytesPerSec; // 4 32
WORD blockAlign; // 2 34
WORD bitsPerSample; // 2 36
char dataChunkID[4]; // "data" 4 40
DWORD dataChunkSize; // not fixed 4 44
};
static struct {
switch_mutex_t *mutex;
switch_thread_rwlock_t *running_rwlock;
switch_memory_pool_t *pool;
int running;
} process;
static struct {
Aws::Auth::AWSCredentials *credentials;
Aws::Client::ClientConfiguration *config;
Aws::SDKOptions *options;
} globals;
switch_loadable_module_interface_t *MODULE_INTERFACE;
static char *supported_formats[SWITCH_MAX_CODECS] = { 0 };
/* Prototypes */
SWITCH_MODULE_LOAD_FUNCTION(mod_polly_load);
SWITCH_MODULE_SHUTDOWN_FUNCTION(mod_polly_shutdown);
SWITCH_MODULE_DEFINITION(mod_polly, mod_polly_load, mod_polly_shutdown, NULL);
// ------------------------------------------------------------------------------------------------------------------
/* Implementation */
std::ostream& operator<<(std::ostream& out, const riff& h)
{
if BIG_ENDIAN_SYSTEM {
struct riff hdr = std::move(h);
REVERSE_BYTES(hdr.riffSize);
REVERSE_BYTES(hdr.formatChunkSize);
REVERSE_BYTES(hdr.formatTag);
REVERSE_BYTES(hdr.noOfChannels);
REVERSE_BYTES(hdr.samplesPerSec);
REVERSE_BYTES(hdr.bytesPerSec);
REVERSE_BYTES(hdr.blockAlign);
REVERSE_BYTES(hdr.bitsPerSample);
REVERSE_BYTES(hdr.dataChunkSize);
return out
.write(hdr.chunkID, 4)
.write((const char *)&hdr.riffSize, 4)
.write(hdr.typeID, 4)
.write(hdr.formatChunkID, 4)
.write((const char *)&hdr.formatChunkSize, 4)
.write((const char *)&hdr.formatTag, 2)
.write((const char *)&hdr.noOfChannels, 2)
.write((const char *)&hdr.samplesPerSec, 4)
.write((const char *)&hdr.bytesPerSec, 4)
.write((const char *)&hdr.blockAlign, 2)
.write((const char *)&hdr.bitsPerSample, 2)
.write(hdr.dataChunkID, 4)
.write((const char *)&hdr.dataChunkSize, 4);
} else {
return out
.write(h.chunkID, 4)
.write((const char *)&h.riffSize, 4)
.write(h.typeID, 4)
.write(h.formatChunkID, 4)
.write((const char *)&h.formatChunkSize, 4)
.write((const char *)&h.formatTag, 2)
.write((const char *)&h.noOfChannels, 2)
.write((const char *)&h.samplesPerSec, 4)
.write((const char *)&h.bytesPerSec, 4)
.write((const char *)&h.blockAlign, 2)
.write((const char *)&h.bitsPerSample, 2)
.write(h.dataChunkID, 4)
.write((const char *)&h.dataChunkSize, 4);
}
}
riff init_pcm_header(std::ostream& in)
{
// get length of file
in.seekp(0, in.end);
DWORD sz = in.tellp();
in.seekp(0, in.beg);
struct riff result = {
{'R','I','F','F'}, // chunkID
sz + 0x24, // riffSize (size of stream + 0x24) or (file size - 8)
{'W','A','V','E'}, // typeID
{'f','m','t',' '}, // formatChunkID
16, // formatChunkSize
1, // formatTag (PCM)
1, // noOfChannels (mono)
8000, // samplesPerSec (8KHz)
16000, // bytesPerSec ((Sample Rate * BitsPerSample * Channels) / 8)
2, // blockAlign ((bits per sample * channels) / 8)
16, // bitsPerSample (multiples of 8)
{'d','a','t','a'}, // dataChunkID
sz // dataChunkSize (sample size)
};
return result;
}
struct voice_sync {
char* session_uuid;
Aws::IOStream *audio_stream;
switch_size_t blockAlign;
};
typedef struct voice_sync voice_sync_t;
static switch_status_t polly_file_open(switch_file_handle_t *handle, const char *path)
{
voice_sync_t *sync_info = (voice_sync_t*)malloc(sizeof(voice_sync_t));
sync_info->audio_stream = new Aws::StringStream(std::ios::in | std::ios::out | std::ios::binary);
handle->private_info = sync_info;
handle->samplerate = 8000;
handle->channels = 1;
handle->pos = 0;
handle->format = 0;
handle->sections = 0;
handle->seekable = 0;
handle->speed = 0.5;
switch_log_printf(SWITCH_CHANNEL_LOG, SWITCH_LOG_DEBUG, "submitting text [%s] to polly", path);
Aws::Polly::PollyClient polly_client(*globals.credentials, *globals.config);
Aws::Polly::Model::SynthesizeSpeechRequest request;
request.SetLanguageCode(Aws::Polly::Model::LanguageCode::en_US);
request.SetOutputFormat(Aws::Polly::Model::OutputFormat::pcm);
request.SetSampleRate("8000");
request.SetTextType(Aws::Polly::Model::TextType::text); // or ssml
request.SetVoiceId(Aws::Polly::Model::VoiceId::Matthew);
request.SetText(path);
if (handle->params) {
// get the session UUID for this channel
// note: this doesnt fire for a standard call session in the audio context; is there a way to make sure it is there?
const char *uuid = switch_event_get_header(handle->params, "session");
if (!zstr(uuid)) {
sync_info->session_uuid = switch_core_strdup(handle->memory_pool, uuid);
switch_log_printf(SWITCH_CHANNEL_UUID_LOG(sync_info->session_uuid), SWITCH_LOG_DEBUG, "Polly linked to session %s\n", sync_info->session_uuid);
}
}
sync_info->audio_stream->clear();
// sync_info->audio_stream.open(filename.c_str(), std::ios::out | std::ios::binary);
auto outcome = polly_client.SynthesizeSpeech(request);
// Output operation status
if (outcome.IsSuccess()) {
switch_log_printf(SWITCH_CHANNEL_LOG, SWITCH_LOG_DEBUG, "received audio response for %s", request.GetServiceRequestName());
Aws::Polly::Model::SynthesizeSpeechResult& result = ((Aws::Polly::Model::SynthesizeSpeechResult&)(outcome));
Aws::IOStream* audio_stream = &result.GetAudioStream();
// this is raw PCM so we need to add a wav header!
riff header = init_pcm_header(*audio_stream);
*sync_info->audio_stream << header;
// tansfer audio data into stream
*sync_info->audio_stream << audio_stream->rdbuf();
sync_info->audio_stream->seekp(0, sync_info->audio_stream->beg);
// update handle information about audio stream
handle->samplerate = header.samplesPerSec;
handle->channels = header.noOfChannels;
handle->format = header.formatTag;
handle->duration = header.dataChunkSize / header.bytesPerSec +1;
handle->samples_in = header.dataChunkSize / header.blockAlign +1;
sync_info->blockAlign = header.blockAlign;
switch_log_printf(SWITCH_CHANNEL_LOG, SWITCH_LOG_DEBUG, "polly audio stream ready; duration: %ld secs", handle->duration);
return SWITCH_STATUS_SUCCESS;
}
switch_log_printf(SWITCH_CHANNEL_LOG, SWITCH_LOG_ERROR, "something went wrong retrieving audio from polly");
return SWITCH_STATUS_FALSE;
}
static switch_status_t polly_file_close(switch_file_handle_t *handle)
{
switch_log_printf(SWITCH_CHANNEL_LOG, SWITCH_LOG_DEBUG, "closiing polly audio stream");
voice_sync_t *sync_info = (voice_sync_t*)handle->private_info;
//sync_info->audio_stream->close(); -- doesnt exist on stringstream
delete sync_info->audio_stream;
if (sync_info->session_uuid) {
switch_safe_free(sync_info->session_uuid);
}
switch_safe_free(sync_info);
handle->private_info = NULL;
return SWITCH_STATUS_SUCCESS;
}
static switch_status_t polly_file_read(switch_file_handle_t *handle, void *data, size_t *len)
{
voice_sync_t *sync_info = (voice_sync_t*)handle->private_info;
switch_size_t bytes;
sync_info->audio_stream->read((char *)data, *len * sync_info->blockAlign);
if ((bytes = sync_info->audio_stream->gcount()) <= 0) {
return SWITCH_STATUS_FALSE;
}
*len = bytes / sync_info->blockAlign;
return SWITCH_STATUS_SUCCESS;
}
SWITCH_MODULE_LOAD_FUNCTION(mod_polly_load)
{
switch_log_printf(SWITCH_CHANNEL_LOG, SWITCH_LOG_DEBUG, "Initializing polly audio interface");
supported_formats[0] = (char*)"polly";
/*
switch_application_interface_t *app_interface;
switch_api_interface_t *api_interface;
*/
switch_file_interface_t *file_interface;
*module_interface = switch_loadable_module_create_module_interface(pool, modname);
file_interface = (switch_file_interface_t*)switch_loadable_module_create_interface(*module_interface, SWITCH_FILE_INTERFACE);
file_interface->interface_name = modname;
file_interface->extens = supported_formats;
file_interface->file_open = polly_file_open;
file_interface->file_close = polly_file_close;
file_interface->file_read = polly_file_read;
MODULE_INTERFACE = *module_interface;
memset(&process, 0, sizeof(process));
memset(&globals, 0, sizeof(globals));
process.pool = pool;
switch_thread_rwlock_create(&process.running_rwlock, pool);
switch_mutex_init(&process.mutex, SWITCH_MUTEX_NESTED, pool);
globals.options = new Aws::SDKOptions();
globals.options->loggingOptions.logLevel = Aws::Utils::Logging::LogLevel::Debug;
globals.credentials = new Aws::Auth::AWSCredentials();
globals.credentials->SetAWSAccessKeyId("your aws key");
globals.credentials->SetAWSSecretKey("your aws secret");
globals.config = new Aws::Client::ClientConfiguration();
globals.config->region = "eu-west-1"; // Ireland
switch_log_printf(SWITCH_CHANNEL_LOG, SWITCH_LOG_DEBUG, "Initializing aws api");
Aws::InitAPI(*globals.options);
switch_thread_rwlock_wrlock(process.running_rwlock);
process.running = 1;
switch_thread_rwlock_unlock(process.running_rwlock);
switch_log_printf(SWITCH_CHANNEL_LOG, SWITCH_LOG_DEBUG, "Ready to rock!");
/* indicate that the module should continue to be loaded */
return SWITCH_STATUS_SUCCESS;
}
SWITCH_MODULE_SHUTDOWN_FUNCTION(mod_polly_shutdown)
{
switch_log_printf(SWITCH_CHANNEL_LOG, SWITCH_LOG_DEBUG, "Shutting down polly polly audio interface");
switch_thread_rwlock_wrlock(process.running_rwlock);
process.running = 0;
switch_thread_rwlock_unlock(process.running_rwlock);
switch_log_printf(SWITCH_CHANNEL_LOG, SWITCH_LOG_DEBUG, "Closing aws api");
Aws::ShutdownAPI(*globals.options);
delete globals.credentials;
delete globals.config;
delete globals.options;
switch_log_printf(SWITCH_CHANNEL_LOG, SWITCH_LOG_DEBUG, "Module shutdown finished");
return SWITCH_STATUS_UNLOAD;
}
Now when i try to load this on Freeswitch , it throws an error
2019-07-31 22:00:51.918181 [CRIT] switch_loadable_module.c:1522 Error Loading module /usr/local/freeswitch/mod/mod_polly.so
/usr/local/freeswitch/mod/mod_polly.so: undefined symbol: _ZNK3Aws35AmazonSerializableWebServiceRequest7GetBodyEv
Freeswitch is C code with C++ guards in header files (extern "C" declaration).
Looking at symbols in mod_polly.so
readelf -Ws mod_polly.so | grep _ZNK3Aws35AmazonSerializableWebServiceRequest7GetBodyEv
66: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND _ZNK3Aws35AmazonSerializableWebServiceRequest7GetBodyEv
590: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND _ZNK3Aws35AmazonSerializableWebServiceRequest7GetBodyEv
Now my basic understanding of the post here tells me that the symbol is present in the so file but Freeswitch cannot find it or load it.
Now this error has very likely to do with mixing C/C++ code but looking at this and this hasn't helped me figure out how to fix it.
I do not want to build Freeswitch to load my module and I am thinking I shouldn't have to, as that renders this project un-scalable.
What am I missing here?
PS:
readelf -Ws libaws-cpp-sdk-core.a | grep AmazonSerializableWebServiceRequest7GetBodyEv
165: 0000000000000000 716 FUNC GLOBAL DEFAULT 42 _ZNK3Aws35AmazonSerializableWebServiceRequest7GetBodyEv
Symbol is defined in libaws-cpp-sdk-core.a which is part of the compilation command for mod_polly.cpp
#Sam V - turns out it was ordering issue while building. Change in ordering of build command to
g++ -shared -o mod_polly.so -fPIC -g -ggdb -std=c++11 -Wall -Werror -I/usr/src/freeswitch/src/include/ -I/usr/src/freeswitch/libs/libteletone/src/ mod_polly.cpp -L/usr/local/lib/ -laws-cpp-sdk-polly -laws-cpp-sdk-core
fixed the problem. Your 1st comment was the key. Thanks.

Undefined reference when a function is listed in a header file, but fine if I copy and paste code directly

I am working on a large project in C++ that is proprietary, so I can't actually share the source code. I have most of the code compiled, but there is one function, in a particular file, that is giving me quite a bit of trouble.
I've created a simple example that shows what the problem is. We have:
WTPSHORT.H
#ifndef WTP_H
#define WTP_H 1
#define VERSION "Version 2.1"
#define DATE "Nov., 2001"
/* ANSI C header files */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <math.h>
#include <fcntl.h>
#include <ctype.h>
#define TRUE 1
#define FALSE 0
/************ Data structures for Water Treatment Plant ***************/
struct Effluent { /****** Data Packet for All Unit Processes *******/
/* Operating data: */
double DegK; /* Temperature (Deg K) */
double Flow; /* Average flow (MGD) */
double Peak; /* Max hourly flow (MDG) */
/* Unit process counters: */
short cl2cnt; /* Number of times chlorine added. */
/* Measurable Water Quality Parameters: */
double pH; /* [H+]=pow(10,-pH) (-) */
/* More variable definitions go here */
double beta_br; /* Constant for chlorine to bromine reactivity ratio */
double time_step; /* Time step (hrs) */
}; /************ End of struct Effluent ************/
/*****************************************/
struct ProcessTrain { /* Control Structure for Process Train */
struct UnitProcess *head; /* First UnitProcess in ProcessTrain */
struct UnitProcess *null; /* Always NULL */
struct UnitProcess *tail; /* Last UnitProcess in ProcessTrain */
char file_name[120]; /* Full path and extension */
}; /*****************************************/
struct UnitProcess { /********** Treatment Process ***************/
struct UnitProcess *next; /* Double Linked list */
struct UnitProcess *prev; /* " " " */
short type; /* Defined unit process types */
short pad; /* Maintain 32 bit alinment of pointers */
union { /* Design and operating parameters: */
void *ptr;
struct Influent *influent;
struct Mechdbp *mechdbp; //FOR MECH MODEL
struct Alum *alum;
struct Gac *gac;
struct Filter *filter;
struct Basin *basin;
// struct Membrane *membrane;
struct Mfuf *mfuf;
struct Nf *nf;
struct Iron *iron;
struct chemical *chemical;
struct clo2 *clo2;
struct lime *lime;
/*struct WTP_effluent *wtp_effluent; No longer needed - WJS, 11/98 */
struct Avg_tap *avg_tap;
struct End_of_system *end_of_system;
} data;
struct Effluent eff;
};
struct Influent { /* Raw Water Data */
double pH; /* (-) */
double temp; /* Average temperature (C) */
double low_temp; /* Low temperature for disinfection (C) */
double toc; /* (mg/L) */
double uv254; /* (1/cm) */
double bromide; /* (mg/L) */
double alkalinity; /* (mg/L as CaCO3) */
double calcium; /* Calcium Hardness (mg/L as CaCO3) */
double hardness; /* Total Hardness (mg/L as CaCO3) */
double nh3; /* Ammonia (mg/L as N) */
double ntu; /* Turbidity */
double crypto_req; /* Crypto Log removal + Log Inact. required*/
double clo2_crypto_ct_mult; /* Multiplier */
double peak_flow; /* Peak Hourly Flow for disinfection (MGD) */
double avg_flow; /* Average Flow (MGD) */
int swflag; /* TRUE=Surface Water; FALSE=Ground Water */
// char *run_name;
};
void s1_s2_est(struct UnitProcess *unit);
/* define(s) for UnitProcess.type */
#define VACANT 0
#define INFLUENT 1
#define RAPID_MIX 2
#define SLOW_MIX 3
#define SETTLING_BASIN 4
#define FILTER 5
#define BASIN 6
#define CONTACT_TANK 7
#define CLEARWELL 8
#define O3_CONTACTOR 9
#define GAC 10
#define MFUF_UP 11
#define NF_UP 12
#endif
And then there are two source files in the project:
s1s2_est.c
/* s1s2_est.c -- December, 2000*/
#include "WTPSHORT.H"
void s1_s2_est(struct UnitProcess *unit)
{
double toc, uva, s1_0, s2h_0, s2star_0, s2t_0, s1_f, s2h_f, s2star_f, s2t_f, H;
struct Effluent *eff;
eff = &unit->eff;
/* Get these inputs */
toc = eff->TOC;
uva = eff->UV;
s1_0 = eff->s1;
s2h_0 = eff->s2h;
s2star_0 = eff->s2_star;
H = pow(10.0, -eff->pH);
s2t_0 = s2h_0 + s2star_0;
s1_f = s1_0;
s2t_f = s2t_0;
s2star_f = s2star_0;
s2h_f = s2h_0;
if(eff->s1_s2_estflag == 'C')
{
/* Safety check */
if (toc < 0.0) toc = 0.0;
if (uva < 0.0) uva = 0.0;
s1_f = 5.05 * pow(toc, 0.57) * pow(uva, 0.54);
s2t_f = 13.1 * pow(toc, 0.38) * pow(uva, 0.40);
/* No increases in the S values allowed */
if(s1_f > s1_0 ) s1_f = s1_0;
if(s2t_f > s2t_0) s2t_f = s2t_0;
/* Speciate S2 */
s2h_f = s2t_f * eff->k21r * H / (eff->k21f + eff->k21r * H);
s2star_f = s2t_f * eff->k21f / (eff->k21f + eff->k21r * H);
}
if(eff->s1_s2_estflag != 'C' && unit->type == INFLUENT)
{/* Speciate S2 in raw water*/
s2h_f = s2t_f * eff->k21r * H / (eff->k21f + eff->k21r * H);
s2star_f = s2t_f * eff->k21f / (eff->k21f + eff->k21r * H);
}
/* Update Effluent data structure */
eff->s1 = s1_f;
eff->s2h = s2h_f;
eff->s2_star = s2star_f;
}/* End subroutine "s1_s2_est()"*/
and then
main.cpp
#include <stdio.h>
#include "WTPSHORT.H"
int main(int argc, char **argv)
{
UnitProcess *myunit;
s1_s2_est(myunit);
printf("done\n");
return 0;
}
When compiling and linking I see this error:
C:\WINDOWS\system32\cmd.exe /C "C:/MinGW/bin/mingw32-make.exe -j8 SHELL=cmd.exe -e -f Makefile"
"----------Building project:[ simple - Debug ]----------"
mingw32-make.exe[1]: Entering directory 'C:/Users/joka0958/Desktop/wtp/compiledwtp/simple'
C:/MinGW/bin/g++.exe -c "C:/Users/joka0958/Desktop/wtp/compiledwtp/simple/main.cpp" -g -O0 -Wall -o ./Debug/main.cpp.o -I. -I.
C:/MinGW/bin/gcc.exe -c "C:/Users/joka0958/Desktop/wtp/compiledwtp/simple/s1s2_est.c" -g -O0 -Wall -o ./Debug/s1s2_est.c.o -I. -I.
C:/Users/joka0958/Desktop/wtp/compiledwtp/simple/main.cpp: In function 'int main(int, char**)':
C:/Users/joka0958/Desktop/wtp/compiledwtp/simple/main.cpp:7:22: warning: 'myunit' is used uninitialized in this function [-Wuninitialized]
s1_s2_est(myunit);
^
C:/MinGW/bin/g++.exe -o ./Debug/simple #"simple.txt" -L.
./Debug/main.cpp.o: In function `main':
C:/Users/joka0958/Desktop/wtp/compiledwtp/simple/main.cpp:7: undefined reference to `s1_s2_est(UnitProcess*)'
collect2.exe: error: ld returned 1 exit status
mingw32-make.exe[1]: [Debug/simple] Error 1
mingw32-make.exe: [All] Error 2
simple.mk:78: recipe for target 'Debug/simple' failed
mingw32-make.exe[1]: Leaving directory 'C:/Users/joka0958/Desktop/wtp/compiledwtp/simple'
Makefile:4: recipe for target 'All' failed
2 errors, 1 warnings
So the question is: Why am I getting an undefined reference?
I realize this is one of those errors that probably masks another problem, but I have really exhausted all possibilities, in my mind, of what could be causing the problem. Note that this is part of a larger project where many other functions compile and link properly.
By the way I am using Codelite with the MinGW compiler on Windows 10.
I'm sorry for all the consternation this question caused. It turns out that there was C++-specific functions within this file, but because the file was named with a *.c, codelite defaulted to the C compiler to actually compile this particular source file. Once I changed the filename, the code compiled successfully. Thanks for all of your useful suggestions!

Pointer to peripheral hardware as template parameter

I want to access STM32F0 peripheral register through C++ templates. A GPIO Port is defined as follows by vendor header file:
excerpt stm32f0xx.h
#define __IO volatile //!< Defines 'read / write' permissions
typedef struct
{
__IO uint32_t MODER;
__IO uint16_t OTYPER;
uint16_t RESERVED0;
__IO uint32_t OSPEEDR;
__IO uint32_t PUPDR;
__IO uint16_t IDR;
uint16_t RESERVED1;
__IO uint16_t ODR;
uint16_t RESERVED2;
__IO uint32_t BSRR;
__IO uint32_t LCKR;
__IO uint32_t AFR[2];
__IO uint16_t BRR;
uint16_t RESERVED3;
} GPIO_TypeDef;
#define PERIPH_BASE ((uint32_t)0x40000000)
#define AHB2PERIPH_BASE (PERIPH_BASE + 0x08000000)
#define GPIOA_BASE (AHB2PERIPH_BASE + 0x00000000)
#define GPIOA ((GPIO_TypeDef *) GPIOA_BASE)
I created a template class for output handling.
main.cpp:
template <uintptr_t port, uint8_t pin>
class Output {
public:
static void set() {
GPIO_TypeDef *castedPort = reinterpret_cast<GPIO_TypeDef *>(port);
castedPort->ODR = (1 << pin);
}
};
int main(void)
{
Output<GPIOA_BASE, 5>::set();
while(1)
{
}
}
This code runs fine, if I compile it with launchpad g++ for arm. But I want to test my code
with GoogleTest, so I made a test for it and tried to compile it.
intArgument.cpp:
#include "gtest/gtest.h"
typedef struct {
/* see above definition */
} GPIO_TypeDef;
uint32_t gpioa[10] = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0};
template <uintptr_t port, int pin>
class Output {
public:
static void set() {
GPIO_TypeDef * castedPort = reinterpret_cast<GPIO_TypeDef *>(port);
castedPort->ODR = (1 << pin);
}
};
TEST(OutputTest, OutputDataRegisterWritten) {
Output<gpioa, 5>::set();
GPIO_TypeDef * port = reinterpret_cast<GPIO_TypeDef *>(gpioa);
EXPECT_EQ((1 << 5), port->ODR);
}
int main(int argc, char **argv) {
::testing::InitGoogleTest(&argc, argv);
return RUN_ALL_TESTS();
}
But now compile fails. A cast to a int via reinterpret_cast is not allowed, because then it is no more a constant expression.
fabian#ubuntu:~/workspace/stackOverflowQuestion$ g++ -std=c++11 intArgument.cpp -lgtest -pthread -o intptrArgument.out
intArgument.cpp: In member function ‘virtual void OutputTest_OutputDataRegisterWritten_Test::TestBody()’:
intArgument.cpp:23:18: error: conversion from ‘uint32_t* {aka unsigned int*}’ to ‘long unsigned int’ not considered for non-type template argument
Output<gpioa, 5>::set();
^
intArgument.cpp:23:18: error: could not convert template argument ‘gpioa’ to ‘long unsigned int’
intArgument.cpp:23:26: error: invalid type in declaration before ‘;’ token
Output<gpioa, 5>::set();
So i tried to change type of port to GPIO_TypeDef *.
pointerArgument.cpp:
typedef struct {
/* see above definition */
} GPIO_TypeDef;
GPIO_TypeDef gpioa;
// using GPIO_TypeDef * as template argument
template <GPIO_TypeDef * port, int pin>
class Output {
public:
static void set() {
port->ODR = (1 << pin);
}
};
TEST(OutputTest, OutputDataRegisterWritten) {
Output<&gpioa, 5>::set();
EXPECT_EQ((1 << 5), gpioa.ODR);
}
int main(int argc, char **argv) {
::testing::InitGoogleTest(&argc, argv);
return RUN_ALL_TESTS();
}
It compiled and test is passed.
fabian#ubuntu:~/workspace/stackOverflowQuestion$ g++ -std=c++11 pointerArgument.cpp -lgtest -pthread -o pointerArgument.out
fabian#ubuntu:~/workspace/stackOverflowQuestion$ ./test.out
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from OutputTest
[ RUN ] OutputTest.OutputDataRegisterWritten
[ OK ] OutputTest.OutputDataRegisterWritten (0 ms)
[----------] 1 test from OutputTest (1 ms total)
[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (1 ms total)
[ PASSED ] 1 test.
But using this approach is failing for the arm compiler:
main.cpp
template <GPIO_TypeDef * port, uint8_t pin>
class Output {
public:
static void set() {
port->ODR = (1 << pin);
}
};
int main(void)
{
Output<GPIOA, 5>::set();
while(1)
{
}
}
compiler error:
[cc] main.cpp:13:17: error: '1207959552u' is not a valid template argument for 'GPIO_TypeDef*' because it is not the address of a variable
[cc] main.cpp:13:25: error: invalid type in declaration before ';' token
I understand both errors, but is there any way to get this work? I searched for compiler flags, but did not find
any, that might change this behaviour. A #define TESTING combined with #ifdef/#ifndef might work, but I dont
like it, because then tested code differs from produced code. Perhaps there is a nicer solution?
Used Compilers:
g++ (i686-posix-dwarf-rev3, Built by MinGW-W64 project), 4.9-2014q4 by Launchpad for STM32F0XX
g++ (Ubuntu 4.9.2-0ubuntu1~14.04) 4.9.2 for Testing
With linker flag Wl,section-start you can define the start address of a specified section. So first I forced my register mocks to be in a own section:
GPIO_TypeDef gpioa __attribute__((section(".myTestRegisters")));
GPIO_TypeDef gpiob __attribute__((section(".myTestRegisters")));
Also I defined const uintptr_t address values for section start and each register.
const uintptr_t myTestRegisterSectionAddress = 0x8000000;
const uintptr_t gpioaAddress = myTestRegisterSectionAddress;
const uintptr_t gpiobAddress = myTestRegisterSectionAddress + sizeof(GPIO_TypeDef);
These values I can use as template parameter.
template <uintptr_t port, int pin>
class Output {
public:
static void set() {
GPIO_TypeDef * castedPort = reinterpret_cast<GPIO_TypeDef *>(port);
castedPort->ODR = (1 << pin);
}
};
TEST(OutputTest, OutputDataRegisterWritten) {
Output<gpioaAddress, 5>::set();
EXPECT_EQ(1 << 5, gpioa.ODR);
Output<gpiobAddress, 10>::set();
EXPECT_EQ(1 << 10, gpiob.ODR);
}
int main(int argc, char **argv) {
::testing::InitGoogleTest(&argc, argv);
return RUN_ALL_TESTS();
}
To force a start address for a section you can use Wl,-section-start=.{sectionName}={startAddress}.
So in my case I use:
g++ intArgument.cpp -std=c++11 -g -o intArgument.out -lgtest -pthread -Wl,-section-start=.myTestRegisters=0x8000000
Running application results in:
fabian#ubuntu:~/workspace/stackOverflowQuestion$ ./intArgument.out
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from OutputTest
[ RUN ] OutputTest.OutputDataRegisterWritten
[ OK ] OutputTest.OutputDataRegisterWritten (0 ms)
[----------] 1 test from OutputTest (0 ms total)
[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (0 ms total)
[ PASSED ] 1 test.
And a objdump shows following information:
fabian#ubuntu:~/workspace/stackOverflowQuestion$ objdump -S -j .myTestRegisters intArgument.out
intArgument.out: file format elf64-x86-64
Disassembly of section .myTestRegisters:
0000000008000000 <gpioa>:
...
000000000800000c <gpiob>:
This works not with optimizations because the elements can be swapped.