PVlib: Temperature coefficients - pvlib

I'm modeling a module not found in the databases, so the data I got on the module is from the datasheet.
What is the recommended way of doing this?
The temperature coefficients in the datasheet is given with delta tempearature percentage gain.
in pvlib it's delta temperature * temperature coeficient.
If I understand this correctly I would use an exponential function to calculate the short circuit current given a temperature like so:
Isc * (1 + a_isc^(temperature - referance temperature))
while the way pvlib does it is linear? %/C vs. A/C
Anyway, how to give the best number to pvlib?
Edit: link to datasheet
https://www.jasolar.com.cn/uploadfile/2020/1128/20201128025603510.pdf
Edit2: Example conversion
From datasheet:
Isc = 11.58A
a_Isc = 0.044%/C
(STC values)
conversion:
a_Isc / 100 * 11.58 = 0,0051 A/C

Isc * (1 + a_isc^(temperature - referance temperature))
Maybe exponential-style temperature coefficients like that exist, but I've never seen one in PV. This particular equation doesn't make much sense anyway -- when temperature == reference temperature, it should return Isc right? But this equation would return 2*Isc.
Anyway, to give these coefficients to pvlib, checking the units is the main thing. Temperature coefficients can be "raw" and have units A/°C, V/°C, or W/°C, or they can be normalized by their STC values and have units %/°C or 1/°C. The values in that datasheet are normalized (values in %/°C, convert to 1/°C by dividing by 100). Coefficients in 1/°C can be converted to non-normalized units by multiplying by the corresponding STC value.
It's a shame that pvlib is not consistent in what units it wants; some pvlib functions want the coefficients to be normalized and some don't. You have to check the documentation of the function you're using to see what units it wants. For example the documentation for pvlib.pvsystem.calcparams_cec says:
alpha_sc (float) – The short-circuit current temperature coefficient
of the module in units of A/C.
And pvlib.pvsystem.pvwatts_dc (docs):
gamma_pdc (numeric) – The temperature coefficient of power.
Typically -0.002 to -0.005 per degree C. [1/C]

Related

Basic example of how to do numerical integration in C++

I think most people know how to do numerical derivation in computer programming, (as limit --> 0; read: "as the limit approaches zero").
//example code for derivation of position over time to obtain velocity
float currPosition, prevPosition, currTime, lastTime, velocity;
while (true)
{
prevPosition = currPosition;
currPosition = getNewPosition();
lastTime = currTime;
currTime = getTimestamp();
// Numerical derivation of position over time to obtain velocity
velocity = (currPosition - prevPosition)/(currTime - lastTime);
}
// since the while loop runs at the shortest period of time, we've already
// achieved limit --> 0;
This is the basic building block for most derivation programming.
How can I do this with integrals? Do I use a for loop and add or what?
Numerical derivation and integration in code for physics, mapping, robotics, gaming, dead-reckoning, and controls
Pay attention to where I use the words "estimate" vs "measurement" below. The difference is important.
Measurements are direct readings from a sensor.
Ex: a GPS measures position (meters) directly, and a speedometer measures speed (m/s) directly.
Estimates are calculated projections you can obtain through integrating and derivating (deriving) measured values.
Ex: you can derive position measurements (m) with respect to time to obtain speed or velocity (m/s) estimates, and you can integrate speed or velocity measurements (m/s) with respect to time to obtain position or displacement (m) estimates.
Wait, aren't all "measurements" actually just "estimates" at some fundamental level?
Yeah--pretty much! But, they are not necessarily produced through derivations or integrations with respect to time, so that is a bit different.
Also note that technically, virtually nothing can truly be measured directly. All sensors get reduced down to a voltage or a current, and guess how you measure a current?--a voltage!--either as a voltage drop across a tiny resistance, or as a voltage induced through an inductive coil due to current flow. So, everything boils down to a voltage. Even devices which "measure speed directly" may be using pressure (pitot-static tube on airplane), doppler/phase shift (radar or sonar), or looking at distance over time and then outputting speed. Fluid speed, or speed with respect to fluid such as air or water, can even be measured via a hot wire anemometer by measuring the current required to keep a hot wire at a fixed temperature, or by measuring the temperature change of the hot wire at a fixed current. And how is that temperature measured? Temperature is just a thermo-electrically-generated voltage, or a voltage drop across a diode or other resistance.
As you can see, all of these "measurements" and "estimates", at the low level, are intertwined. However, if a given device has been produced, tested, and calibrated to output a given "measurement", then you can accept it as a "source of truth" for all practical purposes and call it a "measurement". Then, anything you derive from that measurement, with respect to time or some other variable, you can consider an "estimate". The irony of this is that if you calibrate your device and output derived or integrated estimates, someone else could then consider your output "estimates" as their input "measurements" in their system, in a sort of never-ending chain down the line. That's being pedantic, however. Let's just go with the simplified definitions I have above for the time being.
The following table is true, for example. Read the 2nd line, for instance, as: "If you take the derivative of a velocity measurement with respect to time, you get an acceleration estimate, and if you take its integral, you get a position estimate."
Derivatives and integrals of position
Measurement, y Derivative Integral
Estimate (dy/dt) Estimate (dy*dt)
----------------------- ----------------------- -----------------------
position [m] velocity [m/s] - [m*s]
velocity [m/s] acceleration [m/s^2] position [m]
acceleration [m/s^2] jerk [m/s^3] velocity [m/s]
jerk [m/s^3] snap [m/s^4] acceleration [m/s^2]
snap [m/s^4] crackle [m/s^5] jerk [m/s^3]
crackle [m/s^5] pop [m/s^6] snap [m/s^4]
pop [m/s^6] - [m/s^7] crackle [m/s^5]
For jerk, snap or jounce, crackle, and pop, see: https://en.wikipedia.org/wiki/Fourth,_fifth,_and_sixth_derivatives_of_position.
1. numerical derivation
Remember, derivation obtains the slope of the line, dy/dx, on an x-y plot. The general form is (y_new - y_old)/(x_new - x_old).
In order to obtain a velocity estimate from a system where you are obtaining repeated position measurements (ex: you are taking GPS readings periodically), you must numerically derivate your position measurements over time. Your y-axis is position, and your x-axis is time, so dy/dx is simply (position_new - position_old)/(time_new - time_old). A units check shows this might be meters/sec, which is indeed a unit for velocity.
In code, that would look like this, for a system where you're only measuring position in 1-dimension:
double position_new_m = getPosition(); // m = meters
double position_old_m;
// `getNanoseconds()` should return a `uint64_t timestamp in nanoseconds, for
// instance
double time_new_sec = NS_TO_SEC((double)getNanoseconds());
double time_old_sec;
while (true)
{
position_old_m = position_new_m;
position_new_m = getPosition();
time_old_sec = time_new_sec;
time_new_sec = NS_TO_SEC((double)getNanoseconds());
// Numerical derivation of position measurements over time to obtain
// velocity in meters per second (mps)
double velocity_mps =
(position_new_m - position_old_m)/(time_new_sec - time_old_sec);
}
2. numerical integration
Numerical integration obtains the area under the curve, dy*dx, on an x-y plot. One of the best ways to do this is called trapezoidal integration, where you take the average dy reading and multiply by dx. This would look like this: (y_old + y_new)/2 * (x_new - x_old).
In order to obtain a position estimate from a system where you are obtaining repeated velocity measurements (ex: you are trying to estimate distance traveled while only reading the speedometer on your car), you must numerically integrate your velocity measurements over time. Your y-axis is velocity, and your x-axis is time, so (y_old + y_new)/2 * (x_new - x_old) is simply velocity_old + velocity_new)/2 * (time_new - time_old). A units check shows this might be meters/sec * sec = meters, which is indeed a unit for distance.
In code, that would look like this. Notice that the numerical integration obtains the distance traveled over that one tiny time interval. To obtain an estimate of the total distance traveled, you must sum all of the individual estimates of distance traveled.
double velocity_new_mps = getVelocity(); // mps = meters per second
double velocity_old_mps;
// `getNanoseconds()` should return a `uint64_t timestamp in nanoseconds, for
// instance
double time_new_sec = NS_TO_SEC((double)getNanoseconds());
double time_old_sec;
// Total meters traveled
double distance_traveled_m_total = 0;
while (true)
{
velocity_old_mps = velocity_new_mps;
velocity_new_mps = getVelocity();
time_old_sec = time_new_sec;
time_new_sec = NS_TO_SEC((double)getNanoseconds());
// Numerical integration of velocity measurements over time to obtain
// a distance estimate (in meters) over this time interval
double distance_traveled_m =
(velocity_old_mps + velocity_new_mps)/2 * (time_new_sec - time_old_sec);
distance_traveled_m_total += distance_traveled_m;
}
See also: https://en.wikipedia.org/wiki/Numerical_integration.
Going further:
high-resolution timestamps
To do the above, you'll need a good way to obtain timestamps. Here are various techniques I use:
In C++, use my uint64_t nanos() function here.
If using Linux in C or C++, use my uint64_t nanos() function which uses clock_gettime() here. Even better, I have wrapped it up into a nice timinglib library for Linux, in my eRCaGuy_hello_world repo here:
timinglib.h
timinglib.c
Here is the NS_TO_SEC() macro from timing.h:
#define NS_PER_SEC (1000000000L)
/// Convert nanoseconds to seconds
#define NS_TO_SEC(ns) ((ns)/NS_PER_SEC)
If using a microcontroller, you'll need to read an incrementing periodic counter from a timer or counter register which you have configured to increment at a steady, fixed rate. Ex: on Arduino: use micros() to obtain a microsecond timestamp with 4-us resolution (by default, it can be changed). On STM32 or others, you'll need to configure your own timer/counter.
use high data sample rates
Taking data samples as fast as possible in a sample loop is a good idea, because then you can average many samples to achieve:
Reduced noise: averaging many raw samples reduces noise from the sensor.
Higher-resolution: averaging many raw samples actually adds bits of resolution in your measurement system. This is known as oversampling.
I write about it on my personal website here: ElectricRCAircraftGuy.com: Using the Arduino Uno’s built-in 10-bit to 16+-bit ADC (Analog to Digital Converter).
And Atmel/Microchip wrote about it in their white-paper here: Application Note AN8003: AVR121: Enhancing ADC resolution by oversampling.
Taking 4^n samples increases your sample resolution by n bits of resolution. For example:
4^0 = 1 sample at 10-bits resolution --> 1 10-bit sample
4^1 = 4 samples at 10-bits resolution --> 1 11-bit sample
4^2 = 16 samples at 10-bits resolution --> 1 12-bit sample
4^3 = 64 samples at 10-bits resolution --> 1 13-bit sample
4^4 = 256 samples at 10-bits resolution --> 1 14-bit sample
4^5 = 1024 samples at 10-bits resolution --> 1 15-bit sample
4^6 = 4096 samples at 10-bits resolution --> 1 16-bit sample
See:
So, sampling at high sample rates is good. You can do basic filtering on these samples.
If you process raw samples at a high rate, doing numerical derivation on high-sample-rate raw samples will end up derivating a lot of noise, which produces noisy derivative estimates. This isn't great. It's better to do the derivation on filtered samples: ex: the average of 100 or 1000 rapid samples. Doing numerical integration on high-sample-rate raw samples, however, is fine, because as Edgar Bonet says, "when integrating, the more samples you get, the better the noise averages out." This goes along with my notes above.
Just using the filtered samples for both numerical integration and numerical derivation, however, is just fine.
use reasonable control loop rates
Control loop rates should not be too fast. The higher the sample rates, the better, because you can filter them to reduce noise. The higher the control loop rate, however, not necessarily the better, because there is a sweet spot in control loop rates. If your control loop rate is too slow, the system will have a slow frequency response and won't respond to the environment fast enough, and if the control loop rate is too fast, it ends up just responding to sample noise instead of to real changes in the measured data.
Therefore, even if you have a sample rate of 1 kHz, for instance, to oversample and filter the data, control loops that fast are not needed, as the noise from readings of real sensors over very small time intervals will be too large. Use a control loop anywhere from 10 Hz ~ 100 Hz, perhaps up to 400+ Hz for simple systems with clean data. In some scenarios you can go faster, but 50 Hz is very common in control systems. The more-complicated the system and/or the more-noisy the sensor measurements, generally, the slower the control loop must be, down to about 1~10 Hz or so. Self-driving cars, for instance, which are very complicated, frequently operate at control loops of only 10 Hz.
loop timing and multi-tasking
In order to accomplish the above, independent measurement and filtering loops, and control loops, you'll need a means of performing precise and efficient loop timing and multi-tasking.
If needing to do precise, repetitive loops in Linux in C or C++, use the sleep_until_ns() function from my timinglib above. I have a demo of my sleep_until_us() function in-use in Linux to obtain repetitive loops as fast as 1 KHz to 100 kHz here.
If using bare-metal (no operating system) on a microcontroller as your compute platform, use timestamp-based cooperative multitasking to perform your control loop and other loops such as measurements loops, as required. See my detailed answer here: How to do high-resolution, timestamp-based, non-blocking, single-threaded cooperative multi-tasking.
full, numerical integration and multi-tasking example
I have an in-depth example of both numerical integration and cooperative multitasking on a bare-metal system using my CREATE_TASK_TIMER() macro in my Full coulomb counter example in code. That's a great demo to study, in my opinion.
Kalman filters
For robust measurements, you'll probably need a Kalman filter, perhaps an "unscented Kalman Filter," or UKF, because apparently they are "unscented" because they "don't stink."
See also
My answer on Physics-based controls, and control systems: the many layers of control

How can I estimate the DC output of a solar plant consisting of multiple modules and inverters in PVLib?

I'm using the ModelChain class to estimate DC and AC values for a fictitious solar plant. Input parameters include module, inverter, number of strings, number of modules, number of inverters, albedo, PVGIS TMY data, etc. I apply simple math to calculate number of modules per string and number of strings per inverter then I create one PVSystem object consisting of a single PVArray, per inverter. The I run the ModelChain model for each inverter and, for simplicity, add up the AC output to estimate the total AC for all arrays like this:
for idx in range(0, num_of_inverters):
array = {
'name': f'pvsystem-{idx+1}-array',
'mount': mount,
'module': module_name,
'module_parameters': module_parameters,
'module_type': module_type,
'albedo': albedo,
'strings': strings_per_inverter,
'modules_per_string': modules_per_string,
'temperature_model_parameters': temperature_model_parameters,
}
pvsystem=pvlib.pvsystem.PVSystem(arrays=[pvlib.pvsystem.Array(**array)], inverter_parameters=inverter_parameters)
mc = pvlib.modelchain.ModelChain(pvsystem, location)
mc.run_model(tmy_weather)
total_ac += mc.results.ac.sum()
According to PVLib documentation, the AC output is yearly in Watts hour.
But now I need to get the DC output as well (yearly in Watts hours) so I can calculate the DC/AC ratio. Running mc.results.dc gives me a Dataframe with several values (columns) that are hard to grasp for a newbie like me:
i_sc : Short-circuit current (A)
i_mp : Current at the maximum-power point (A)
v_oc : Open-circuit voltage (V)
v_mp : Voltage at maximum-power point (V)
p_mp : Power at maximum-power point (W)
i_x : Current at module V = 0.5Voc, defines 4th point on I-V curve for modeling curve shape
i_xx : Current at module V = 0.5(Voc+Vmp), defines 5th point on I-V curve for modeling curve shape
I tried using p_mp and adding it up: mc.results.dc['p_mp'].sum() but the output is much bigger than the estimated AC. I usually expect the DC/AC ratio to be somewhere > 1 and <= 1.5, roughly. However, I'm getting DC values that are like 3-5 times bigger which probably means I'm doing something wrong.
Example: 1 string, 1 inverter, 10 modules per string:
Output (yearly):
AC: 869.61kW
DC: 3326.36kW
Ratio: 3.83
Any help is appreciated.
As for why the total DC and AC generation values are so different, it's because the inverter is way undersized for the array. The inverter is rated for 250 W maximum, which is not much more than what a single module produces at STC (calculate by Impo * Vmpo as below, or noticing the "220" in the module name), and you have ten modules total. So the inverter will be saturated at even very low light, and the total AC production will be severely curtailed as a result. I think if you make a plot (mc.results.ac.plot()) you will see that the daily inverter output curve is clipped at 250 W while the simulated DC power can be nearly 10x higher. It's always a good idea to plot your time series when things aren't making sense!
In [23]: pvlib.pvsystem.retrieve_sam('cecinverter')['ABB__MICRO_0_25_I_OUTD_US_208__208V_']['Paco']
Out[23]: 250.0
In [24]: pvlib.pvsystem.retrieve_sam('sandiamod')['Canadian_Solar_CS5P_220M___2009_'][['Impo', 'Vmpo']]
Out[24]:
Impo 4.54629
Vmpo 48.3156
Name: Canadian_Solar_CS5P_220M___2009_, dtype: object
A couple other notes:
Please be careful about units:
Summing (really, integrating) an hourly time series of power (Watts) produces energy (Watt-hours). An annual output in kW doesn't make sense, since kW is for power and power is an instantaneous rate of energy generation. If this is new to you, it might be helpful to think about speed vs distance: a car might be traveling at 60mph at any given time point, but the total distance it travels in a year is measured in miles, not mph. Power is energy per unit time just like speed is distance per unit time.
Summing voltages (num_of_inverters * mc.results.dc['v_mp'].sum()) makes no sense that I can see. Volt-hours doesn't seem like a useful unit to me outside of some very specialized power electronics engineering contexts.
The term "DC/AC ratio" is typically understood to mean the ratio of rated capacities, not annual productions. So for the example in your gist, the DC/AC ratio would be calculated as (220 W/module * 10 modules/string * 2 strings/inverter = 4400 W DC) / (250 W AC) = 17.6 (which is a crazy DC/AC ratio).

Compute FFT in frequency axis when signal is in rawData in Matlab

I have a signal of frequency 10 MHz sampled at 100 MS/sec. How to compute FFT in matlab in terms of frequency when my signal is in rawData (length of this rawData is 100000), also
what should be the optimum length of NFFT.(i.e., on what factor does NFFT depend)
why does my Amplitude (Y axis) change with NFFT
whats difference between NFFT, N and L. How to compute length of a signal
How to separate Noise and signal from a single signal (which is in rawData)
Here is my code,
t=(1:40);
f=10e6;
fs=100e6;
NFFT=1024;
y=abs(rawData(:1000,2));
X=abs(fft(y,NFFT));
f=[-fs/2:fs/NFFT:(fs/2-fs/NFFT)];
subplot(1,1,1);
semilogy(f(513:1024),X(513:1024));
axis([0 10e6 0 10]);
As you can find the corresponding frequencies in another post, I will just answer your other questions:
Including all your data is most of the time the best option. fft just truncates your input data to the requested length, which is probably not what you want. If you known the period of your input single, you can truncate it to include a whole number of periods. If you don't know it, a window (ex. Hanning) may be interesting.
If you change NFFT, you use more data in your fft calculation, which may change the amplitude for a given frequency slightly. You also calculate the amplitude at more frequencies between 0 and Fs/2 (half of the sampling frequency).
Question is not clear, please provide the definition of N and L.
It depends on your application. If the noise is at the same frequency as your signal, you are not able to separate it. Otherwise, you can a filter (ex. bandpass) to extract the frequencies of interest.

My neural net learns sin x but not cos x

I have build my own neural net and I have a weird problem with it.
The net is quite a simple feed-forward 1-N-1 net with back propagation learning. Sigmoid is used as activation function.
My training set is generated with random values between [-PI, PI] and their [0,1]-scaled sine values (This is because the "Sigmoid-net" produces only values between [0,1] and unscaled sine -function produces values between [-1,1]).
With that training-set, and the net set to 1-10-1 with learning rate of 0.5, everything works great and the net learns sin-function as it should. BUT.. if I do everything exately the same way for COSINE -function, the net won't learn it. Not with any setup of hidden layer size or learning rate.
Any ideas? Am I missing something?
EDIT: My problem seems to be similar than can be seen with this applet. It won't seem to learn sine-function unless something "easier" is taught for the weights first (like 1400 cycles of quadratic function). All the other settings in the applet can be left as they initially are. So in the case of sine or cosine it seems that the weights need some boosting to atleast partially right direction before a solution can be found. Why is this?
I'm struggling to see how this could work.
You have, as far as I can see, 1 input, N nodes in 1 layer, then 1 output. So there is no difference between any of the nodes in the hidden layer of the net. Suppose you have an input x, and a set of weights wi. Then the output node y will have the value:
y = Σi w_i x
= x . Σi w_i
So this is always linear.
In order for the nodes to be able to learn differently, they must be wired differently and/or have access to different inputs. So you could supply inputs of the value, the square root of the value (giving some effect of scale), etc and wire different hidden layer nodes to different inputs, and I suspect you'll need at least one more hidden layer anyway.
The neural net is not magic. It produces a set of specific weights for a weighted sum. Since you can derive a set weights to approximate a sine or cosine function, that must inform your idea of what inputs the neural net will need in order to have some chance of succeeding.
An explicit example: the Taylor series of the exponential function is:
exp(x) = 1 + x/1! + x^2/2! + x^3/3! + x^4/4! ...
So if you supplied 6 input notes with 1, x1, x2 etc, then a neural net that just received each input to one corresponding node, and multiplied it by its weight then fed all those outputs to the output node would be capable of the 6-term taylor expansion of the exponential:
in hid out
1 ---- h0 -\
x -- h1 --\
x^2 -- h2 ---\
x^3 -- h3 ----- y
x^4 -- h4 ---/
x^5 -- h5 --/
Not much of a neural net, but you get the point.
Further down the wikipedia page on Taylor series, there are expansions for sin and cos, which are given in terms of odd powers of x and even powers of x respectively (think about it, sin is odd, cos is even, and yes it is that straightforward), so if you supply all the powers of x I would guess that the sin and cos versions will look pretty similar with alternating zero weights. (sin: 0, 1, 0, -1/6..., cos: 1, 0, -1/2...)
I think you can always compute sine and then compute cosine externally. I think your concern here is why the neural net is not learning the cosine function when it can learn the sine function. Assuming that this artifact if not because of your code; I would suggest the following:
It definitely looks like an error in the learning algorithm. Could be because of your starting point. Try starting with weights that gives the correct result for the first input and then march forward.
Check if there is heavy bias in your learning - more +ve than -ve
Since cosine can be computed by sine 90 minus angle, you could find the weights and then recompute the weights in 1 step for cosine.

Converting an FFT to a spectogram

I have an audio file and I am iterating through the file and taking 512 samples at each step and then passing them through an FFT.
I have the data out as a block 514 floats long (Using IPP's ippsFFTFwd_RToCCS_32f_I) with real and imaginary components interleaved.
My problem is what do I do with these complex numbers once i have them? At the moment I'm doing for each value
const float realValue = buffer[(y * 2) + 0];
const float imagValue = buffer[(y * 2) + 1];
const float value = sqrt( (realValue * realValue) + (imagValue * imagValue) );
This gives something slightly usable but I'd rather some way of getting the values out in the range 0 to 1. The problem with he above is that the peaks end up coming back as around 9 or more. This means things get viciously saturated and then there are other parts of the spectrogram that barely shows up despite the fact that they appear to be quite strong when I run the audio through audition's spectrogram. I fully admit I'm not 100% sure what the data returned by the FFT is (Other than that it represents the frequency values of the 512 sample long block I'm passing in). Especially my understanding is lacking on what exactly the compex number represents.
Any advice and help would be much appreciated!
Edit: Just to clarify. My big problem is that the FFT values returned are meaningless without some idea of what the scale is. Can someone point me towards working out that scale?
Edit2: I get really nice looking results by doing the following:
size_t count2 = 0;
size_t max2 = kFFTSize + 2;
while( count2 < max2 )
{
const float realValue = buffer[(count2) + 0];
const float imagValue = buffer[(count2) + 1];
const float value = (log10f( sqrtf( (realValue * realValue) + (imagValue * imagValue) ) * rcpVerticalZoom ) + 1.0f) * 0.5f;
buffer[count2 >> 1] = value;
count2 += 2;
}
To my eye this even looks better than most other spectrogram implementations I have looked at.
Is there anything MAJORLY wrong with what I'm doing?
The usual thing to do to get all of an FFT visible is to take the logarithm of the magnitude.
So, the position of the output buffer tells you what frequency was detected. The magnitude (L2 norm) of the complex number tells you how strong the detected frequency was, and the phase (arctangent) gives you information that is a lot more important in image space than audio space. Because the FFT is discrete, the frequencies run from 0 to the nyquist frequency. In images, the first term (DC) is usually the largest, and so a good candidate for use in normalization if that is your aim. I don't know if that is also true for audio (I doubt it)
For each window of 512 sample, you compute the magnitude of the FFT as you did. Each value represents the magnitude of the corresponding frequency present in the signal.
mag
/\
|
| ! !
| ! ! !
+--!---!----!----!---!--> freq
0 Fs/2 Fs
Now we need to figure out the frequencies.
Since the input signal is of real values, the FFT is symmetric around the middle (Nyquist component) with the first term being the DC component. Knowing the signal sampling frequency Fs, the Nyquist frequency is Fs/2. And therefore for the index k, the corresponding frequency is k*Fs/512
So for each window of length 512, we get the magnitudes at specified frequency. The group of those over consecutive windows form the spectrogram.
Just so people know I've done a LOT of work on this whole problem. The main thing I've discovered is that the FFT requires normalisation after doing it.
To do this you average all the values of your window vector together to get a value somewhat less than 1 (or 1 if you are using a rectangular window). You then divide that number by the number of frequency bins you have post the FFT transform.
Finally you divide the actual number returned by the FFT by the normalisation number. Your amplitude values should now be in the -Inf to 1 range. Log, etc, as you please. You will still be working with a known range.
There are a few things that I think you will find helpful.
The forward FT will tend to give larger numbers in the output than in the input. You can think of it as all of the intensity at a certain frequency being displayed at one place rather than being distributed through the dataset. Does this matter? Probably not because you can always scale the data to fit your needs. I once wrote an integer based FFT/IFFT pair and each pass required rescaling to prevent integer overflow.
The real data that are your input are converted into something that is almost complex. As it turns out buffer[0] and buffer[n/2] are real and independent. There is a good discussion of it here.
The input data are sound intensity values taken over time, equally spaced. They are said to be, appropriately enough, in the time domain. The output of the FT is said to be in the frequency domain because the horizontal axis is frequency. The vertical scale remains intensity. Although it isn't obvious from the input data, there is phase information in the input as well. Although all of the sound is sinusoidal, there is nothing that fixes the phases of the sine waves. This phase information appears in the frequency domain as the phases of the individual complex numbers, but often we don't care about it (and often we do too!). It just depends upon what you are doing. The calculation
const float value = sqrt((realValue * realValue) + (imagValue * imagValue));
retrieves the intensity information but discards the phase information. Taking the logarithm essentially just dampens the big peaks.
Hope this is helpful.
If you are getting strange results then one thing to check is the documentation for the FFT library to see how the output is packed. Some routines use a packed format where real/imaginary values are interleaved, or they may begin at the N/2 element and wrap around.
For a sanity check I would suggest creating sample data with known characteristics, eg Fs/2, Fs/4 (Fs = sample frequency) and compare the output of the FFT routine with what you'd expect. Try creating both a sine and cosine at the same frequency, as these should have the same magnitude in the spectrum, but have different phases (ie the realValue/imagValue will differ, but the sum of squares should be the same.
If you're intending on using the FFT though then you really need to know how it works mathematically, otherwise you're likely to encounter other strange problems such as aliasing.