Related
I'm working on developing both the client(C) and server(C++) side of an RF connection. I need to send a float value, but the way the architecture is set up I have to arrange my message in a struct that limits me to 3 uint8t parameters: p0, p1, p2. My solution was to break the float into an array of 4 uint8_ts and send in 2 separate messages and use p0 as an identifier whether the message contains the first or second half.
So far I have something like this:
Server (C++):
sendFloat(float f)
{
messageStruct msg1, msg2;
uint8_t* array = (uint8_t*)(&f);
msg1.p0 = 1; //1 means it's the first half
msg1.p1 = array[0];
msg1.p2 = array[1];
msg2.p0 = 0; //0 means it's the second half
msg2.p1 = array[2];
msg2.p2 = array[3];
sendOverRf(msg1);
sendOverRf(msg2);
}
Client(C):
processReceivedMessage (uint32_t id, uint32_t byteA, uint32_t byteB) //(p0,p1,p2) are routed here
{
static uint32_t firsHalfOfFloat;
uint32_t ondHalfOfFloat;
float combinedFloat;
if(id == 1) //first half
{
firstHalfOfFloat = (byteA << 8) | byteB;
}
else //second half
{
secondHalfOfFloat = (byteA << 8) | byteB;
combinedFloat = (float)((firstHalfOfFloat << 16) | secondHalfOfFloat);
}
writeFloatToFile(combinedFloat);
}
then on request the client must then send that float back
Client(C):
sendFloatBack(uint8_t firstHalfIdentifier) // is commanded twice by server with both 0 and 1 ids
{
messageStruct msg;
float f = getFloatFromFile();
uint8_t* array = (uint8_t*)(&f);
msg.p0 = firstHalfIdentifier;
if(firstHalfIdentifier == 1) //First half
{
msg.p1 = array[0];
msg.p2 = array[1];
}
else //Second half
{
msg.p1 = array[2];
msg.p2 = array[3];
}
sendOverRf(msg);
}
and finally the Server (C++) gets the value back:
retrieveFunc()
{
float f;
uint32_t firstHalf;
uint32_t secondHalf;
messageStruct msg = recieveOverRf();
firstHalf = (msg.p1 << 8) | msg.p2;
msg = receiveOverRf();
firstHalf = (msg.p1 << 8) | msg.p2;
f = (firstHalf << 16) | secondHalf;
}
but I'm getting really wrong values back. Any help would be great.
Unions are a very convenient way to disassemble a float into individual bytes and later put the bytes back together again. Here's some example code showing how you can do it:
#include <stdio.h>
#include <stdint.h>
typedef union {
uint8_t _asBytes[4];
float _asFloat;
} FloatBytesConverter;
int main(int argc, char** argv)
{
FloatBytesConverter fbc;
fbc._asFloat = 3.14159;
printf("Original float value is: %f\n", fbc._asFloat);
printf("The bytes of the float are: %u, %u, %u, %u\n"
, fbc._asBytes[0]
, fbc._asBytes[1]
, fbc._asBytes[2]
, fbc._asBytes[3]);
// Now let's put the float back together from the individual bytes
FloatBytesConverter ac;
ac._asBytes[0] = fbc._asBytes[0];
ac._asBytes[1] = fbc._asBytes[1];
ac._asBytes[2] = fbc._asBytes[2];
ac._asBytes[3] = fbc._asBytes[3];
printf("Restored float is %f\n", ac._asFloat);
return 0;
}
memcpy is your friend.
float toFloat(const uint8_t *arr)
{
float result;
memcpy(&result, arr, sizeof(result));
return result;
}
uint8_t *toArray(const float x, uint8_t * const arr)
{
memcpy(arr, &x, sizeof(x));
return arr;
}
void sendFloat(float f)
{
messageStruct msg1, msg2;
uint8_t array[4];
toArray(f, array);
msg1.p0 = 1; //1 means it's the first half
msg1.p1 = array[0];
msg1.p2 = array[1];
msg2.p0 = 0; //0 means it's the second half
msg2.p1 = array[2];
msg2.p2 = array[3];
sendOverRf(msg1);
sendOverRf(msg2);
}
float retrieveFunc(void)
{
float f;
unit8_t array[4]
messageStruct msg = recieveOverRf();
array[0] = msg.p1;
array[1] = msg.p2;
msg = receiveOverRf();
array[2] = msg.p1;
array[3] = msg.p2;
return toFloat(array);
}
Well, bytes are bytes as far as architectures are concerned.
We assume that we're using IEEE 754 (or whatever) on both sides.
We can put one float (4 bytes) into/outof one uint32_t with memcpy
But, we have to deal with processor endianness.
Edit:
problem with this is that p0, p1, p2 are all uint8_ts. –
TheBigJabronie
Oops, my bad. I've updated the code to use only bytes [I've left my original/incorrect answer below for reference].
Here is the updated code. The functions will work on either host, regardless of the endianness of each architecture.
Note that your main issue is the encoding of the float. So, the code below assumes that the packets arrive intact (i.e. retry/resend is done in the lower RF layer)
void
sendFloat(float f)
{
messageStruct msg;
uint32_t i32;
assert(sizeof(float) == sizeof(uint32_t));
// get bytes of the float in native endian order
memcpy(&i32,&f,sizeof(i32));
// handle endianness
i32 = htonl(i32);
// send MSW half
msg.p0 = 1;
msg.p1 = i32 >> 24;
msg.p2 = i32 >> 16;
sendOverRf(msg);
// send LSW half
msg.p0 = 2;
msg.p1 = i32 >> 8;
msg.p2 = i32 >> 0;
sendOverRf(msg);
}
float
recvFloat(void)
{
uint32_t i32 = 0;
float f;
messageStruct msg;
// NOTE: the two packets _should_ come in the same order as the sender, but
// we'll handle out of order packets to be complete
for (int rcount = 0; rcount < 2; ++rcount) {
msg = recieveOverRf();
uint32_t tmp = msg.p1;
tmp <<= 8;
tmp |= msg.p2;
switch (msg.p0) {
case 1:
i32 |= tmp << 16;
break;
case 2:
i32 |= tmp << 0;
break;
}
}
// handle endianness
i32 = ntohl(i32);
// get bytes into float
memcpy(&f,&i32,sizeof(float));
return f;
}
My original code and further assumptions.
We can do this in a single message with room to spare.
Here is the code I would use.
void
sendFloat(float f)
{
messageStruct msg;
uint32_t i32;
assert(sizeof(float) == sizeof(uint32_t));
// get bytes of the float in native endian order
memcpy(&i32,&f,sizeof(i32));
// handle endianness
i32 = htonl(i32);
// means we're sending a float
msg.p0 = CMD_FLOAT;
msg.p1 = i32;
msg.p2 = 0;
sendOverRf(msg);
}
float
recvFloat(void)
{
uint32_t i32;
float f;
messageStruct msg = recieveOverRf();
// ensure we got correct message
if (msg.p0 != CMD_FLOAT)
exit(1);
// get int in network order
i32 = msg.p1;
// handle endianness
i32 = ntohl(i32);
// get bytes into float
memcpy(&f,&i32,sizeof(float));
return f;
}
I am trying to generate a simple, constant sine tone using SDL_audio. I have a small helper class that can be called to turn the tone on/off, change the frequency, and change the wave shape. I have followed some examples I could find on the web and got the following:
beeper.h
#pragma once
#include <SDL.h>
#include <SDL_audio.h>
#include <cmath>
#include "logger.h"
class Beeper {
private:
//Should there be sound right now
bool soundOn = true;
//Type of wave that should be generated
int waveType = 0;
//Tone that the wave will produce (may or may not be applicable based on wave type)
float waveTone = 440;
//Running index for sampling
float samplingIndex = 0;
//These are useful variables that cannot be changed outside of this file:
//Volume
const Sint16 amplitude = 32000;
//Sampling rate
const int samplingRate = 44100;
//Buffer size
const int bufferSize = 1024;
//Samples a sine wave at a given index
float sampleSine(float index);
//Samples a square wave at a given index
float sampleSquare(float index);
public:
//Initializes SDL audio, audio device, and audio specs
void initializeAudio();
//Function called by SDL audio_callback that fills stream with samples
void generateSamples(short* stream, int length);
//Turn sound on or off
void setSoundOn(bool soundOnOrOff);
//Set timbre of tone produced by beeper
void setWaveType(int waveTypeID);
//Set tone (in Hz) produced by beeper
void setWaveTone(int waveHz);
};
beeper.cpp
#include <beeper.h>
void fillBuffer(void* userdata, Uint8* _stream, int len) {
short * stream = reinterpret_cast<short*>(_stream);
int length = len;
Beeper* beeper = (Beeper*)userdata;
beeper->generateSamples(stream, length);
}
void Beeper::initializeAudio() {
SDL_AudioSpec desired, returned;
SDL_AudioDeviceID devID;
SDL_zero(desired);
desired.freq = samplingRate;
desired.format = AUDIO_S16SYS; //16-bit audio
desired.channels = 1;
desired.samples = bufferSize;
desired.callback = &fillBuffer;
desired.userdata = this;
devID = SDL_OpenAudioDevice(SDL_GetAudioDeviceName(0,0), 0, &desired, &returned, SDL_AUDIO_ALLOW_FORMAT_CHANGE);
SDL_PauseAudioDevice(devID, 0);
}
void Beeper::generateSamples(short *stream, int length) {
int samplesToWrite = length / sizeof(short);
for (int i = 0; i < samplesToWrite; i++) {
if (soundOn) {
if (waveType == 0) {
stream[i] = (short)(amplitude * sampleSine(samplingIndex));
}
else if (waveType == 1) {
stream[i] = (short)(amplitude * 0.8 * sampleSquare(samplingIndex));
}
}
else {
stream[i] = 0;
}
//INFO << "Sampling index: " << samplingIndex;
samplingIndex += (waveTone * M_PI * 2) / samplingRate;
//INFO << "Stream input: " << stream[i];
if (samplingIndex >= (M_PI*2)) {
samplingIndex -= M_PI * 2;
}
}
}
void Beeper::setSoundOn(bool soundOnOrOff) {
soundOn = soundOnOrOff;
//if (soundOnOrOff) {
// samplingIndex = 0;
//}
}
void Beeper::setWaveType(int waveTypeID) {
waveType = waveTypeID;
//samplingIndex = 0;
}
void Beeper::setWaveTone(int waveHz) {
waveTone = waveHz;
//samplingIndex = 0;
}
float Beeper::sampleSine(float index) {
double result = sin((index));
//INFO << "Sine result: " << result;
return result;
}
float Beeper::sampleSquare(float index)
{
int unSquaredSin = sin((index));
if (unSquaredSin >= 0) {
return 1;
}
else {
return -1;
}
}
The callback function is being called and the generateSamples function is loading data into the stream, but I cannot hear anything but a very slight click at irregular periods. I have had a look at the data inside the stream and it follows a pattern that I would expect for a scaled sine wave with a 440 Hz frequency. Is there something obvious that I am missing? I did notice that the size of the stream is double what I put when declaring the SDL_AudioSpec and calling SDL_OpenAudioDevice. Why is that?
Answered my own question! When opening the audio device I used the flag SDL_AUDIO_ALLOW_FORMAT_CHANGE which meant that SDL was actually using a float buffer instead of the short buffer that I expected. This was causing issues in a couple of places that were hard to detect (the stream being double the amount of bytes I was expecting should have tipped me off). I changed that parameter in SDL_OpenAudioDevice() to 0 and it worked as expected!
My yaw and pitch values aren't working when I try use trigonometry, even though the math is wrong when I run my aimbot application it should at least aim at different angles when I move but that is not the case. It's just stuck in one angle looking there. My yaw value when I run the application gets infinitesimally close to 0 so I think my calculations are going beyond the yaw value interval
Note that the intervals for the yaw value are [ 0, 360 ] and pitch is [ -90, 90 ]
#include "proc.h"
#include <iostream>
#include <thread>
#include <chrono>
#include <math.h>
#include <tgmath.h>
#define PI 3.14159265
//WORD m_XPos = 0x04;
//WORD m_YPos = 0x08;
//WORD m_ZPos = 0xC
//WORD YawVal = 0x0040;
//WORD PitchVal = 0x0044;
int main()
{
// Linking the game to my code essentially here
DWORD procID;
procID = getProcID(L"ac_client.exe");
HANDLE handle = 0;
handle = OpenProcess(PROCESS_ALL_ACCESS, FALSE, procID);
//base address for the player and base address for the enemy
uintptr_t localBaseAddress = 0x50F4F4;
uintptr_t entityBaseAddress = 0x50F4F8;
// setting up a link to get the values of playerx,y,z and view angles
std::vector<int> playerOffsets = { 0x4 };
uintptr_t playerX = findDMAAddress(handle, localBaseAddress, playerOffsets);
playerOffsets = { 0x8 };
uintptr_t playerY = findDMAAddress(handle, localBaseAddress, playerOffsets);
playerOffsets = { 0xC };
uintptr_t playerZ = findDMAAddress(handle, localBaseAddress, playerOffsets);
playerOffsets = { 0x40 };
uintptr_t playerYaw = findDMAAddress(handle, localBaseAddress, playerOffsets);
playerOffsets = { 0x44 };
uintptr_t playerPitch = findDMAAddress(handle, localBaseAddress, playerOffsets);
// setting up a link to get enemy x,y,z
/* note this is just one enemy which is the second enemy in the entity list
chain */
std::vector<int> enemyOffsets = { 0x8, 0x4 };
uintptr_t enemyX = findDMAAddress(handle, entityBaseAddress, enemyOffsets);
enemyOffsets = { 0x8, 0x8 };
uintptr_t enemyY = findDMAAddress(handle, entityBaseAddress, enemyOffsets);
enemyOffsets = { 0x8, 0xC };
uintptr_t enemyZ = findDMAAddress(handle, entityBaseAddress, enemyOffsets);
float myX = 0;
float myY = 0;
float myZ = 0;
float botX = 0;
float botY = 0;
float botZ = 0;
float placeholderX = 0;
float placeholderY = 0;
float placeholderZ = 0;
float magnitude = 0;
float tempYawValue = 0;
float tempPitchValue = 0;
float yawValue = 0;
float pitchValue = 0;
while (true)
{
// assign values to the floats i initialized previously for calculations
ReadProcessMemory(handle, (BYTE*)enemyX, &botX, sizeof(botX), 0);
ReadProcessMemory(handle, (BYTE*)enemyY, &botY, sizeof(botY), 0);
ReadProcessMemory(handle, (BYTE*)enemyZ, &botZ, sizeof(botZ), 0);
ReadProcessMemory(handle, (BYTE*)playerX, &myX, sizeof(myX), 0);
ReadProcessMemory(handle, (BYTE*)playerY, &myY, sizeof(myY), 0);
ReadProcessMemory(handle, (BYTE*)playerZ, &myZ, sizeof(myZ), 0);
ReadProcessMemory(handle, (BYTE*)playerYaw, &tempYawValue, sizeof(tempYawValue), 0);
ReadProcessMemory(handle, (BYTE*)playerPitch, &tempPitchValue, sizeof(tempPitchValue), 0);
//getting origin to get distance between me and enemy
placeholderX = botX - myX;
placeholderY = botY - myY;
placeholderZ = botZ - myZ;
magnitude = sqrt(pow(placeholderX, 2) + pow(placeholderY, 2) + pow(placeholderZ, 2));
//math for view angles
yawValue = (atan2(tempYawValue, tempPitchValue)) * (180 / PI);
pitchValue = acos(placeholderZ / magnitude) * (180 / PI);
// constantly writing to player view angles memory usign the angles i calculated
WriteProcessMemory(handle, (BYTE*)playerYaw, &yawValue, sizeof(yawValue), 0);
WriteProcessMemory(handle, (BYTE*)playerPitch, &pitchValue, sizeof(pitchValue), 0);
}
}
The shader takes an SSBO of Photons that have a position, direction, wavelength and intensity and each thread is responsible for tracing exactly one photon through the grid, where at each grid cell the photon hits, the intensity is accumulated for each wavelength to create a spectral distribution for each grid cell.
The problem is that the shader works perfectly for 100,000 photons, but doesn't return a result for 1,000,000 photons.
I looked into the sizes for the SSBOs and all were within my GPUs (NVIDIA Quadro P6000) limits of 2GB:
SSBO Grid size: 1.5GB
SSBO Photons Size: 0.02GB
If I change the logic at some places it works with one million photons (see lines 87 and 114 for comments).
I currently can't see any explanation of why the shader fails for 1,000,000 photons, but works for 100,000 photons. The logic is the same and the buffer sizes are within limits. (That the buffer size can't be a problem is also confirmed by that it works when changing the logic.)
Below is the source code. If you want to try it yourself here is the code on github: https://github.com/TheJhonny007/TextureTracerDebug
Compute Shader:
#version 430
#extension GL_EXT_compute_shader: enable
#extension GL_EXT_shader_storage_buffer_object: enable
#extension GL_ARB_compute_variable_group_size: enable
const uint TEX_WIDTH = 1024u;
const uint TEX_HEIGHT = TEX_WIDTH;
const uint MIN_WAVELENGTH = 380u;
const uint MAX_WAVELENGTH = 740u;
const uint NUM_WAVELENGTHS = MAX_WAVELENGTH - MIN_WAVELENGTH;
// Size: 24 bytes -> ~40,000,000 photons per available gigabyte of ram
struct Photon {
vec2 position;// m
vec2 direction;// normalized
uint wavelength;// nm
float intensity;// 0..1 should start at 1
};
layout(std430, binding = 0) buffer Photons {
Photon photons[];
};
// Size: 1440 bytes -> ~700,000 pixels per available gigabyte of ram
struct Pixel {
uint intensityAtWavelengths[NUM_WAVELENGTHS];// [0..1000]
};
layout(std430, binding = 1) buffer Pixels {
//Pixel pixels[TEX_WIDTH][TEX_HEIGHT];
// NVIDIAs linker takes ages to link if the sizes are specified :(
Pixel[] pixels;
};
uniform float xAxisScalingFactor;
vec2 getHorizontalRectangleAt(int i) {
float x = pow(float(i), xAxisScalingFactor);
float w = pow(float(i + 1), xAxisScalingFactor);
return vec2(x, w);
}
uniform float rectangleHeight;
struct Rectangle {
float x;
float y;
float w;
float h;
};
layout (local_size_variable) in;
void addToPixel(uvec2 idx, uint wavelength, uint intensity) {
if (idx.x >= 0u && idx.x < TEX_WIDTH && idx.y >= 0u && idx.y < TEX_HEIGHT) {
uint index = (idx.y * TEX_WIDTH) + idx.x;
atomicAdd(pixels[index].intensityAtWavelengths[wavelength - MIN_WAVELENGTH], intensity);
}
}
/// Returns the rectangle at the given indices.
Rectangle getRectangleAt(ivec2 indices) {
vec2 horRect = getHorizontalRectangleAt(indices.x);
return Rectangle(horRect.x, rectangleHeight * float(indices.y), horRect.y, rectangleHeight);
}
uniform float shadowLength;
uniform float shadowHeight;
/// Returns the indices of the rectangle at the given location
ivec2 getRectangleIdxAt(vec2 location) {
int x = 0;
int y = int(location.y / rectangleHeight);
return ivec2(x, y);
}
float getRayIntersectAtX(Photon ray, float x) {
float slope = ray.direction.y / ray.direction.x;
return slope * (x - ray.position.x) + ray.position.y;
}
ivec2 getRayRectangleExitEdge(Photon ray, Rectangle rect) {
float intersectHeight = getRayIntersectAtX(ray, rect.x + rect.w);
// IF ONE OF THE FIRST TWO CONDITIONS GETS REMOVED IT WORKS WITH 1'000'000 PHOTONS OTHERWISE ONLY 100'000 WHY?
if (intersectHeight < rect.y) {
return ivec2(0, -1);
} else if (intersectHeight > rect.y + rect.h) {
return ivec2(0, 1);
} else {
return ivec2(1, 0);
}
}
void main() {
uint gid = gl_GlobalInvocationID.x;
if (gid >= photons.length()) return;
Photon photon = photons[gid];
ivec2 photonTexIndices = getRectangleIdxAt(photon.position);
while (photonTexIndices.x < TEX_WIDTH && photonTexIndices.y < TEX_HEIGHT &&
photonTexIndices.x >= 0 && photonTexIndices.y >= 0) {
// need to convert to uint for atomic add operations...
addToPixel(uvec2(photonTexIndices), photon.wavelength, uint(photon.intensity * 100.0));
ivec2 dir = getRayRectangleExitEdge(photon, getRectangleAt(photonTexIndices));
photonTexIndices += dir;
// When the ray goes out of bounds on the bottom then mirror it to simulate rays coming from
// the other side of the planet. This works because of the rotational symmetry of the system.
// IF COMMENTET OUT IT WORKS WITH 1'000'000 PHOTONS OTHERWISE ONLY 100'000 WHY?
if (photonTexIndices.y < 0) {
photonTexIndices.y = 0;
photon.position.y *= -1.0;
photon.direction.y *= -1.0;
}
}
}
Tracer.hpp
#ifndef TEXTURE_TRACER_HPP
#define TEXTURE_TRACER_HPP
#include <glm/glm.hpp>
#include <random>
namespace gpu {
// 6 * 4 = 24 Bytes
struct Photon {
glm::vec2 position; // m
glm::vec2 direction; // normalized
uint32_t waveLength; // nm
float intensity; // 0..1 should start at 1
};
class TextureTracer {
public:
TextureTracer();
uint32_t createShadowMap(size_t numPhotons);
private:
void initTextureTracer();
void traceThroughTexture(uint32_t ssboPhotons, size_t numPhotons);
Photon emitPhoton();
std::vector<Photon> generatePhotons(uint32_t count);
struct {
uint32_t uRectangleHeight;
uint32_t uShadowLength;
uint32_t uShadowHeight;
uint32_t uXAxisScalingFactor;
} mTextureTracerUniforms;
uint32_t mTextureTracerProgram;
std::mt19937_64 mRNG;
std::uniform_real_distribution<> mDistributionSun;
std::uniform_int_distribution<uint32_t> mDistributionWavelength;
std::bernoulli_distribution mDistributionBoolean;
};
} // namespace gpu
#endif // TEXTURE_TRACER_HPP
Tracer.cpp
#include "TextureTracer.hpp"
#include <GL/glew.h>
#include <algorithm>
#include <fstream>
#include <iostream>
#include <random>
#include <string>
#include <vector>
void GLAPIENTRY MessageCallback(GLenum source, GLenum type, GLuint id,
GLenum severity, GLsizei length,
const GLchar *message, const void *userParam) {
if (type == GL_DEBUG_TYPE_ERROR)
fprintf(stderr, "GL ERROR: type = 0x%x, severity = 0x%x, message = %s\n",
type, severity, message);
else
fprintf(stdout, "GL INFO: type = 0x%x, severity = 0x%x, message = %s\n",
type, severity, message);
}
namespace gpu {
const double TEX_HEIGHT_TO_RADIUS_FACTOR = 4;
const double TEX_SHADOW_LENGTH_FACTOR = 8;
const uint32_t TEX_WIDTH = 1024u;
const uint32_t TEX_HEIGHT = TEX_WIDTH;
const double RADIUS = 6'371'000.0;
const double RADIUS_FACTORED = RADIUS * TEX_HEIGHT_TO_RADIUS_FACTOR;
const double SUN_RADIUS = 695'510'000.0;
const double DIST_TO_SUN = 149'600'000'000.0;
const double ATMO_HEIGHT = 42'000.0;
std::string loadShader(const std::string &fileName) {
std::ifstream shaderFileStream(fileName, std::ios::in);
if (!shaderFileStream.is_open()) {
std::cerr << "Could not load the GLSL shader from '" << fileName << "'!"
<< std::endl;
exit(-1);
}
std::string shaderCode;
while (!shaderFileStream.eof()) {
std::string line;
std::getline(shaderFileStream, line);
shaderCode.append(line + "\n");
}
return shaderCode;
}
void TextureTracer::initTextureTracer() {
mTextureTracerProgram = glCreateProgram();
uint32_t rayTracingComputeShader = glCreateShader(GL_COMPUTE_SHADER);
std::string code = loadShader("../resources/TextureTracer.glsl");
const char *shader = code.c_str();
glShaderSource(rayTracingComputeShader, 1, &shader, nullptr);
glCompileShader(rayTracingComputeShader);
glAttachShader(mTextureTracerProgram, rayTracingComputeShader);
glLinkProgram(mTextureTracerProgram);
mTextureTracerUniforms.uRectangleHeight =
glGetUniformLocation(mTextureTracerProgram, "rectangleHeight");
mTextureTracerUniforms.uShadowHeight =
glGetUniformLocation(mTextureTracerProgram, "shadowHeight");
mTextureTracerUniforms.uShadowLength =
glGetUniformLocation(mTextureTracerProgram, "shadowLength");
mTextureTracerUniforms.uXAxisScalingFactor =
glGetUniformLocation(mTextureTracerProgram, "xAxisScalingFactor");
glDetachShader(mTextureTracerProgram, rayTracingComputeShader);
glDeleteShader(rayTracingComputeShader);
}
TextureTracer::TextureTracer()
: mRNG(1L), mDistributionSun(
std::uniform_real_distribution<>(-SUN_RADIUS, SUN_RADIUS)),
mDistributionWavelength(
std::uniform_int_distribution<uint32_t>(380, 739)),
mDistributionBoolean(std::bernoulli_distribution(0.5)) {
glEnable(GL_DEBUG_OUTPUT);
glDebugMessageCallback(MessageCallback, nullptr);
initTextureTracer();
}
double raySphereDistance(glm::dvec2 origin, glm::dvec2 direction,
glm::dvec2 center, double radius) {
glm::dvec2 m = origin - center;
double b = glm::dot(m, direction);
double c = glm::dot(m, m) - (radius * radius);
if (c > 0.0 && b > 0.0)
return -1.0;
double discr = b * b - c;
// A negative discriminant corresponds to ray missing sphere
if (discr < 0.0)
return -1.0;
// Ray now found to intersect sphere, compute smallest t value of intersection
return glm::max(0.0, -b - glm::sqrt(discr));
}
Photon TextureTracer::emitPhoton() {
std::uniform_real_distribution<> distributionEarth(0.0, ATMO_HEIGHT);
glm::dvec2 target = {0.0, RADIUS + distributionEarth(mRNG)};
double d;
do {
d = glm::length(glm::dvec2(mDistributionSun(mRNG), mDistributionSun(mRNG)));
} while (d > SUN_RADIUS);
glm::dvec2 startPosition =
glm::dvec2(-DIST_TO_SUN, mDistributionBoolean(mRNG) ? d : -d);
glm::dvec2 direction = glm::normalize(target - startPosition);
startPosition +=
direction * raySphereDistance(startPosition, direction, {0.0, 0.0},
RADIUS + ATMO_HEIGHT);
return {glm::vec2(0.0, startPosition.y), glm::vec2(direction),
mDistributionWavelength(mRNG), 1.0f};
}
std::vector<Photon> TextureTracer::generatePhotons(uint32_t count) {
std::vector<Photon> photons(count);
std::generate(photons.begin(), photons.end(),
[this]() { return emitPhoton(); });
return photons;
}
void TextureTracer::traceThroughTexture(uint32_t ssboPhotons,
size_t numPhotons) {
glUseProgram(mTextureTracerProgram);
glUniform1f(mTextureTracerUniforms.uRectangleHeight,
RADIUS_FACTORED / TEX_HEIGHT);
const double shadowLength =
TEX_SHADOW_LENGTH_FACTOR * (DIST_TO_SUN * RADIUS) / (SUN_RADIUS - RADIUS);
glUniform1f(mTextureTracerUniforms.uShadowLength, shadowLength);
glUniform1f(mTextureTracerUniforms.uShadowHeight, RADIUS_FACTORED);
const double xAxisScalingFactor =
glm::log(shadowLength) / glm::log(static_cast<double>(TEX_WIDTH));
glUniform1f(mTextureTracerUniforms.uXAxisScalingFactor,
static_cast<float>(xAxisScalingFactor));
const uint32_t MIN_WAVELENGTH = 380u;
const uint32_t MAX_WAVELENGTH = 740u;
const uint32_t NUM_WAVELENGTHS = MAX_WAVELENGTH - MIN_WAVELENGTH;
size_t pixelBufferSize =
TEX_WIDTH * TEX_HEIGHT * NUM_WAVELENGTHS * sizeof(uint32_t);
uint32_t ssboPixels;
glGenBuffers(1, &ssboPixels);
glBindBuffer(GL_SHADER_STORAGE_BUFFER, ssboPixels);
glBufferData(GL_SHADER_STORAGE_BUFFER, pixelBufferSize, nullptr,
GL_DYNAMIC_COPY);
glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 0, ssboPhotons);
glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 1, ssboPixels);
const uint32_t numThreads = 32u;
const uint32_t numBlocks = numPhotons / numThreads;
std::cout << "numBlocks: " << numBlocks << std::endl;
glDispatchComputeGroupSizeARB(numBlocks, 1, 1, numThreads, 1, 1);
glMemoryBarrier(GL_SHADER_STORAGE_BARRIER_BIT);
struct Pixel {
uint32_t intensityAtWavelengths[NUM_WAVELENGTHS];
};
std::vector<Pixel> pixels(TEX_WIDTH * TEX_HEIGHT);
glBindBuffer(GL_SHADER_STORAGE_BUFFER, ssboPixels);
glGetBufferSubData(GL_SHADER_STORAGE_BUFFER, 0, pixelBufferSize,
pixels.data());
glBindBuffer(GL_SHADER_STORAGE_BUFFER, 0);
for (int y = 0; y < TEX_HEIGHT; ++y) {
printf("%4i | ", y);
for (int x = 0; x < TEX_WIDTH; ++x) {
Pixel p = pixels[y * TEX_WIDTH + x];
int counter = 0;
for (uint32_t i : p.intensityAtWavelengths) {
counter += i;
}
if (counter == 0) {
printf(" ");
} else if (counter > 100'000'000) {
printf("%4s", "\u25A0");
} else if (counter > 10'000'000) {
printf("%4s", "\u25A3");
} else if (counter > 1'000'000) {
printf("%4s", "\u25A6");
} else if (counter > 100'000) {
printf("%4s", "\u25A4");
} else {
printf("%4s", "\u25A1");
}
}
std::cout << std::endl;
}
glDeleteBuffers(1, &ssboPixels);
glUseProgram(0);
}
uint32_t TextureTracer::createShadowMap(size_t numPhotons) {
std::vector<Photon> photons = generatePhotons(numPhotons);
uint32_t ssboPhotons;
glGenBuffers(1, &ssboPhotons);
glBindBuffer(GL_SHADER_STORAGE_BUFFER, ssboPhotons);
glBufferData(GL_SHADER_STORAGE_BUFFER, sizeof(Photon) * photons.size(),
photons.data(), GL_DYNAMIC_COPY);
traceThroughTexture(ssboPhotons, photons.size());
glDeleteBuffers(1, &ssboPhotons);
glDeleteProgram(mTextureTracerProgram);
glDisable(GL_DEBUG_OUTPUT);
glDebugMessageCallback(nullptr, nullptr);
return 0;
}
}
main.cpp
#include <GL/glew.h>
#include <GL/glut.h>
#include "TextureTracer.hpp"
int main(int argc, char *argv[]) {
glutInit(&argc, argv);
glutCreateWindow("OpenGL needs a window o.O");
glewInit();
auto mapper = gpu::TextureTracer();
// WITH 100'000 PHOTONS IT WORKS, WITH 1'000'000 PHOTONS NOT WHY?
mapper.createShadowMap(100'000);
return 0;
}
Operating systems cancel GPU program executions if they take too long. On Windows it is generally two seconds and on Linux it is five seconds most of the time, but it can vary.
This is to detect GPU programs that are stuck and cancel them. There are different methods to get around this timeout, but they all require admin/root privileges, which is not always available.
If possible the execution can be split up into multiple invocations like in the following snippet:
const uint32_t passSize = 2048u;
const uint32_t numPasses = (numPhotons / passSize) + 1;
const uint32_t numThreads = 64u;
const uint32_t numBlocks = passSize / numThreads;
glUniform1ui(glGetUniformLocation(mTextureTracerProgram, "passSize"), passSize);
for (uint32_t pass = 0u; pass < numPasses; ++pass) {
glUniform1ui(glGetUniformLocation(mTextureTracerProgram, "pass"), pass);
glDispatchComputeGroupSizeARB(numBlocks, 1, 1, numThreads, 1, 1);
glMemoryBarrier(GL_SHADER_STORAGE_BARRIER_BIT);
glFlush();
glFinish();
}
The glFlush() and glFinish() calls are important or the executions will get bundled together and the OS triggers a timeout anyways.
In the shader you just need to access the right sections of the input data like so:
// other stuff
uniform uint pass;
uniform uint passSize;
void main() {
uint gid = gl_GlobalInvocationID.x;
uint passId = pass * passSize + gid;
if (passId >= photons.length()) return;
Photon photon = photons[passId];
// rest of program
}
This is all.
If you want to disable the OS timeouts here is a relevant post for Linux: https://stackoverflow.com/a/30520538/5543884
And here is a post regarding Windows: https://stackoverflow.com/a/29759823/5543884
I'm having some very weird issues using the following hardware elements:
Arduino Uno
Wi-Fi shield
GPS receiver
Accelerometer
Barometer
I wanted to off-load the sensor readings to an SD card as needed, but before I can even code the SD functions, the mere inclusion of the SD.h library renders my code useless.
My code is as follows:
#include <SoftwareSerial.h>
#include <TinyGPS.h>
#include <SD.h>
/* This sample code demonstrates the normal use of a TinyGPS object.
It requires the use of SoftwareSerial, and assumes that you have a
4800-baud serial GPS device hooked up on pins 3(rx) and 4(tx).
*/
//For baraometer
#include <Wire.h>
#define BMP085_ADDRESS 0x77 // I2C address of BMP085
const unsigned char OSS = 2; // Oversampling Setting
// Calibration values
int ac1;
int ac2;
int ac3;
unsigned int ac4;
unsigned int ac5;
unsigned int ac6;
int b1;
int b2;
int mb;
int mc;
int md;
// b5 is calculated in bmp085GetTemperature(...), this variable is also used in bmp085GetPressure(...)
// So ...Temperature(...) must be called before ...Pressure(...).
long b5;
//End of baraometer
//ACcelerometer
// These constants describe the pins. They won't change:
const int xpin = A1; // x-axis of the accelerometer
const int ypin = A2; // y-axis
const int zpin = A3; // z-axis (only on 3-axis models)
//end of accel
TinyGPS gps;
SoftwareSerial nss(3, 4);
static void gpsdump(TinyGPS &gps);
static bool feedgps();
static void print_float(float val, float invalid, int len, int prec);
static void print_int(unsigned long val, unsigned long invalid, int len);
static void print_date(TinyGPS &gps);
static void print_str(const char *str, int len);
void setup()
{
//Make sure the analog-to-digital converter takes its reference voltage from
// the AREF pin
analogReference(EXTERNAL);
pinMode(xpin, INPUT);
pinMode(ypin, INPUT);
pinMode(zpin, INPUT);
//Barometer
Wire.begin();
bmp085Calibration();
//GPS
Serial.begin(115200);
nss.begin(57600);
Serial.print("Testing TinyGPS library v. "); Serial.println(TinyGPS::library_version());
Serial.println("by Mikal Hart");
Serial.println();
Serial.print("Sizeof(gpsobject) = "); Serial.println(sizeof(TinyGPS));
Serial.println();
Serial.println("Sats HDOP Latitude Longitude Fix Date Time Date Alt Course Speed Card Distance Course Card Chars Sentences Checksum");
Serial.println(" (deg) (deg) Age Age (m) --- from GPS ---- ---- to London ---- RX RX Fail");
Serial.println("--------------------------------------------------------------------------------------------------------------------------------------");
}
void loop()
{
//Accelerometer
Serial.print( analogRead(xpin));
Serial.print("\t");
//Add a small delay between pin readings. I read that you should
//do this but haven't tested the importance
delay(1);
Serial.print( analogRead(ypin));
Serial.print("\t");
//add a small delay between pin readings. I read that you should
//do this but haven't tested the importance
delay(1);
Serial.print( analogRead(zpin));
Serial.print("\n"); // delay before next reading:
bool newdata = false;
unsigned long start = millis();
// Every second we print an update
while (millis() - start < 1000)
{
if (feedgps())
newdata = true;
}
//barometer
float temperature = bmp085GetTemperature(bmp085ReadUT()); //MUST be called first
float pressure = bmp085GetPressure(bmp085ReadUP());
float atm = pressure / 101325; // "standard atmosphere"
float altitude = calcAltitude(pressure); //Uncompensated caculation - in Meters
Serial.print("Temperature: ");
Serial.print(temperature, 2); //display 2 decimal places
Serial.println(" C");
Serial.print("Pressure: ");
Serial.print(pressure, 0); //whole number only.
Serial.println(" Pa");
Serial.print("Standard Atmosphere: ");
Serial.println(atm, 4); //display 4 decimal places
Serial.print("Altitude: ");
Serial.print(altitude, 2); //display 2 decimal places
Serial.println(" M");
Serial.println();//line break
//end of barometer
gpsdump(gps);
}
static void gpsdump(TinyGPS &gps)
{
float flat, flon;
unsigned long age, date, time, chars = 0;
unsigned short sentences = 0, failed = 0;
static const float LONDON_LAT = 51.508131, LONDON_LON = -0.128002;
print_int(gps.satellites(), TinyGPS::GPS_INVALID_SATELLITES, 5);
print_int(gps.hdop(), TinyGPS::GPS_INVALID_HDOP, 5);
gps.f_get_position(&flat, &flon, &age);
print_float(flat, TinyGPS::GPS_INVALID_F_ANGLE, 9, 5);
print_float(flon, TinyGPS::GPS_INVALID_F_ANGLE, 10, 5);
print_int(age, TinyGPS::GPS_INVALID_AGE, 5);
print_date(gps);
print_float(gps.f_altitude(), TinyGPS::GPS_INVALID_F_ALTITUDE, 8, 2);
print_float(gps.f_course(), TinyGPS::GPS_INVALID_F_ANGLE, 7, 2);
print_float(gps.f_speed_kmph(), TinyGPS::GPS_INVALID_F_SPEED, 6, 2);
print_str(gps.f_course() == TinyGPS::GPS_INVALID_F_ANGLE ? "*** " : TinyGPS::cardinal(gps.f_course()), 6);
print_int(flat == TinyGPS::GPS_INVALID_F_ANGLE ? 0UL : (unsigned long)TinyGPS::distance_between(flat, flon, LONDON_LAT, LONDON_LON) / 1000, 0xFFFFFFFF, 9);
print_float(flat == TinyGPS::GPS_INVALID_F_ANGLE ? 0.0 : TinyGPS::course_to(flat, flon, 51.508131, -0.128002), TinyGPS::GPS_INVALID_F_ANGLE, 7, 2);
print_str(flat == TinyGPS::GPS_INVALID_F_ANGLE ? "*** " : TinyGPS::cardinal(TinyGPS::course_to(flat, flon, LONDON_LAT, LONDON_LON)), 6);
gps.stats(&chars, &sentences, &failed);
print_int(chars, 0xFFFFFFFF, 6);
print_int(sentences, 0xFFFFFFFF, 10);
print_int(failed, 0xFFFFFFFF, 9);
Serial.println();
}
static void print_int(unsigned long val, unsigned long invalid, int len)
{
char sz[32];
if (val == invalid)
strcpy(sz, "*******");
else
sprintf(sz, "%ld", val);
sz[len] = 0;
for (int i=strlen(sz); i<len; ++i)
sz[i] = ' ';
if (len > 0)
sz[len-1] = ' ';
Serial.print(sz);
feedgps();
}
static void print_float(float val, float invalid, int len, int prec)
{
char sz[32];
if (val == invalid)
{
strcpy(sz, "*******");
sz[len] = 0;
if (len > 0)
sz[len-1] = ' ';
for (int i=7; i<len; ++i)
sz[i] = ' ';
Serial.print(sz);
}
else
{
Serial.print(val, prec);
int vi = abs((int)val);
int flen = prec + (val < 0.0 ? 2 : 1);
flen += vi >= 1000 ? 4 : vi >= 100 ? 3 : vi >= 10 ? 2 : 1;
for (int i=flen; i<len; ++i)
Serial.print(" ");
}
feedgps();
}
static void print_date(TinyGPS &gps)
{
int year;
byte month, day, hour, minute, second, hundredths;
unsigned long age;
gps.crack_datetime(&year, &month, &day, &hour, &minute, &second, &hundredths, &age);
if (age == TinyGPS::GPS_INVALID_AGE)
Serial.print("******* ******* ");
else
{
char sz[32];
sprintf(sz, "%02d/%02d/%02d %02d:%02d:%02d ",
month, day, year, hour, minute, second);
Serial.print(sz);
}
print_int(age, TinyGPS::GPS_INVALID_AGE, 5);
feedgps();
}
static void print_str(const char *str, int len)
{
int slen = strlen(str);
for (int i=0; i<len; ++i)
Serial.print(i<slen ? str[i] : ' ');
feedgps();
}
static bool feedgps()
{
while (nss.available())
{
if (gps.encode(nss.read()))
return true;
}
return false;
}
// Stores all of the bmp085's calibration values into global variables
// Calibration values are required to calculate temp and pressure
// This function should be called at the beginning of the program
void bmp085Calibration()
{
Serial.write("\n\nCalibrating ... ");
ac1 = bmp085ReadInt(0xAA);
ac2 = bmp085ReadInt(0xAC);
ac3 = bmp085ReadInt(0xAE);
ac4 = bmp085ReadInt(0xB0);
ac5 = bmp085ReadInt(0xB2);
ac6 = bmp085ReadInt(0xB4);
b1 = bmp085ReadInt(0xB6);
b2 = bmp085ReadInt(0xB8);
mb = bmp085ReadInt(0xBA);
mc = bmp085ReadInt(0xBC);
md = bmp085ReadInt(0xBE);
Serial.write("Calibrated\n\n");
}
// Calculate temperature in deg C
float bmp085GetTemperature(unsigned int ut){
long x1, x2;
x1 = (((long)ut - (long)ac6)*(long)ac5) >> 15;
x2 = ((long)mc << 11)/(x1 + md);
b5 = x1 + x2;
float temp = ((b5 + 8)>>4);
temp = temp /10;
return temp;
}
// Calculate pressure given up
// calibration values must be known
// b5 is also required so bmp085GetTemperature(...) must be called first.
// Value returned will be pressure in units of Pa.
long bmp085GetPressure(unsigned long up){
long x1, x2, x3, b3, b6, p;
unsigned long b4, b7;
b6 = b5 - 4000;
// Calculate B3
x1 = (b2 * (b6 * b6)>>12)>>11;
x2 = (ac2 * b6)>>11;
x3 = x1 + x2;
b3 = (((((long)ac1)*4 + x3)<<OSS) + 2)>>2;
// Calculate B4
x1 = (ac3 * b6)>>13;
x2 = (b1 * ((b6 * b6)>>12))>>16;
x3 = ((x1 + x2) + 2)>>2;
b4 = (ac4 * (unsigned long)(x3 + 32768))>>15;
b7 = ((unsigned long)(up - b3) * (50000>>OSS));
if (b7 < 0x80000000)
p = (b7<<1)/b4;
else
p = (b7/b4)<<1;
x1 = (p>>8) * (p>>8);
x1 = (x1 * 3038)>>16;
x2 = (-7357 * p)>>16;
p += (x1 + x2 + 3791)>>4;
long temp = p;
return temp;
}
// Read 1 byte from the BMP085 at 'address'
char bmp085Read(byte address)
{
Wire.beginTransmission(BMP085_ADDRESS);
Wire.write(address);
Wire.endTransmission();
Wire.requestFrom(BMP085_ADDRESS, 1);
while(!Wire.available()) {};
return Wire.read();
}
// Read 2 bytes from the BMP085
// First byte will be from 'address'
// Second byte will be from 'address'+1
int bmp085ReadInt(byte address)
{
unsigned char msb, lsb;
Wire.beginTransmission(BMP085_ADDRESS);
Wire.write(address);
Wire.endTransmission();
Wire.requestFrom(BMP085_ADDRESS, 2);
while(Wire.available()<2)
;
msb = Wire.read();
lsb = Wire.read();
return (int) msb<<8 | lsb;
}
// Read the uncompensated temperature value
unsigned int bmp085ReadUT(){
unsigned int ut;
// Write 0x2E into Register 0xF4
// This requests a temperature reading
Wire.beginTransmission(BMP085_ADDRESS);
Wire.write((byte)0xF4);
Wire.write((byte)0x2E);
Wire.endTransmission();
// Wait at least 4.5 ms
delay(5);
// Read two bytes from registers 0xF6 and 0xF7
ut = bmp085ReadInt(0xF6);
return ut;
}
// Read the uncompensated pressure value
unsigned long bmp085ReadUP(){
unsigned char msb, lsb, xlsb;
unsigned long up = 0;
// Write 0x34+(OSS<<6) into register 0xF4
// Request a pressure reading w/ oversampling setting
Wire.beginTransmission(BMP085_ADDRESS);
Wire.write(0xF4);
Wire.write(0x34 + (OSS<<6));
Wire.endTransmission();
// Wait for conversion, delay time dependent on OSS
delay(2 + (3<<OSS));
// Read register 0xF6 (MSB), 0xF7 (LSB), and 0xF8 (XLSB)
msb = bmp085Read(0xF6);
lsb = bmp085Read(0xF7);
xlsb = bmp085Read(0xF8);
up = (((unsigned long) msb << 16) | ((unsigned long) lsb << 8) | (unsigned long) xlsb) >> (8-OSS);
return up;
}
void writeRegister(int deviceAddress, byte address, byte val) {
Wire.beginTransmission(deviceAddress); // Start transmission to device
Wire.write(address); // Send register address
Wire.write(val); // Send value to write
Wire.endTransmission(); // End transmission
}
int readRegister(int deviceAddress, byte address){
int v;
Wire.beginTransmission(deviceAddress);
Wire.write(address); // Register to read
Wire.endTransmission();
Wire.requestFrom(deviceAddress, 1); // Read a byte
while(!Wire.available()) {
// waiting
}
v = Wire.read();
return v;
}
float calcAltitude(float pressure){
float A = pressure/101325;
float B = 1/5.25588;
float C = pow(A,B);
C = 1 - C;
C = C /0.0000225577;
return C;
}
Granted, right now, it is merely a conglomeration of multiple example sketches, but they work. I get a sampled reading from the accelerometer, the GPS unit and the barometer once a second. However once I simply add the line #include <SD.h> to the sketch, it fails to run correctly. The serial monitor does not display anything. I have similar versions of the above sketch (omitted as they are much lengthier), but I get the same result: either jumbled text or nothing on the Serial monitor. If I comment out the line that include the SD.h library, everything works fine....
Are there known issues with the SD.h library or conflicts? And yes, I am NOT using the necessary pins for the SD access (digital pin #4) for my sensor connections....
UPDATE:
I at least figured out it has something to do with the SoftSerial (SoftSerial.h) library and the use of the SoftSerial object (which I called nss). I can load all libraries and get everything to work if I do not call nss.begin. Is there a reason why that would conflict?
Turns out I was out of memory. Having the Serial go unresponsive like that is a common symptom. This link ultimately is what I used to trace and conclude my memory issue.
First thing would be to check the Arduino site, on the SD documentation (here) there's a mention that the communication between the microcontroller and the SD card uses SPI (documentation here) which takes place on digital pins 11, 12 and 13. I wouldn't be surprised if this was the source of your problems with the Serial monitor.
Reading some comments in Sd2Card.h, it might be tricky to get your setup to work properly:
/**
* Define MEGA_SOFT_SPI non-zero to use software SPI on Mega Arduinos.
* Pins used are SS 10, MOSI 11, MISO 12, and SCK 13.
*
* MEGA_SOFT_SPI allows an unmodified Adafruit GPS Shield to be used
* on Mega Arduinos. Software SPI works well with GPS Shield V1.1
* but many SD cards will fail with GPS Shield V1.0.
*/
Even if you put MEGA_SOFT_SPI to a non 0 value, you'd probably still fail to pass the (defined(__AVR_ATmega1280__)||defined(__AVR_ATmega2560__)) check.
I would suggest trying your same sketch without the TinyGPS to try to pinpoint the issue.
Also, check out this sketch it seems to be doing something similar to what you're doing, maybe you can fix yours based on what's done here.
Use pin 4 for CS and change the MOSI, MISO and SCK pins in the library SD in Sd2card.h, hope you will get rid of the problem