1. Exploratory Data Analysis
1.4. EDA Case Studies
1.4.2. Case Studies
1.4.2.5. Beam Deflections

## Develop a Better Model

Sinusoidal Model The lag plot and autocorrelation plot in the previous section strongly suggested a sinusoidal model might be appropriate. The basic sinusoidal model is:
$$Y_{i} = C + \alpha\sin{(2\pi\omega T_{i} + \phi)} + E_i$$
where C is constant defining a mean level, α is an amplitude for the sine function, ω is the frequency, Ti is a time variable, and $$\phi$$ is the phase. This sinusoidal model can be fit using non-linear least squares.

To obtain a good fit, sinusoidal models require good starting values for C, the amplitude, and the frequency.

Good Starting Value for C A good starting value for C can be obtained by calculating the mean of the data. If the data show a trend, i.e., the assumption of constant location is violated, we can replace C with a linear or quadratic least squares fit. That is, the model becomes
$$Y_{i} = (B_0 + B_1*T_{i}) + \alpha\sin{(2\pi\omega T_{i} + \phi)} + E_i$$
or
$$Y_{i} = (B_0 + B_1*T_{i} + B2*T^{2}_{i}) + \alpha\sin{(2\pi\omega T_{i} + \phi)} + E_i$$
Since our data did not have any meaningful change of location, we can fit the simpler model with C equal to the mean. From the summary output in the previous page, the mean is -177.44.
Good Starting Value for Frequency The starting value for the frequency can be obtained from the spectral plot, which shows the dominant frequency is about 0.3.
Complex Demodulation Phase Plot The complex demodulation phase plot can be used to refine this initial estimate for the frequency.

For the complex demodulation plot, if the lines slope from left to right, the frequency should be increased. If the lines slope from right to left, it should be decreased. A relatively flat (i.e., horizontal) slope indicates a good frequency. We could generate the demodulation phase plot for 0.3 and then use trial and error to obtain a better estimate for the frequency. To simplify this, we generate 16 of these plots on a single page starting with a frequency of 0.28, increasing in increments of 0.0025, and stopping at 0.3175.

Interpretation The plots start with lines sloping from left to right but gradually change to a right to left slope. The relatively flat slope occurs for frequency 0.3025 (third row, second column). The complex demodulation phase plot restricts the range from $$\pi$$/2 to -$$\pi$$/2. This is why the plot appears to show some breaks.
Good Starting Values for Amplitude The complex demodulation amplitude plot is used to find a good starting value for the amplitude. In addition, this plot indicates whether or not the amplitude is constant over the entire range of the data or if it varies. If the plot is essentially flat, i.e., zero slope, then it is reasonable to assume a constant amplitude in the non-linear model. However, if the slope varies over the range of the plot, we may need to adjust the model to be:
$$Y_{i} = C + (B_0 + B_1*T_{i})\sin{(2\pi\omega T_{i} + \phi)} + E_i$$
That is, we replace α with a function of time. A linear fit is specified in the model above, but this can be replaced with a more elaborate function if needed.
Complex Demodulation Amplitude Plot

The complex demodulation amplitude plot for this data shows that:

1. The amplitude is fixed at approximately 390.
2. There is a short start-up effect.
3. There is a change in amplitude at around x=160 that should be investigated for an outlier.
In terms of a non-linear model, the plot indicates that fitting a single constant for α should be adequate for this data set.
Fit Results Using starting estimates of 0.3025 for the frequency, 390 for the amplitude, and -177.44 for C, the following parameters were estimated.

Coefficient     Estimate     Stan. Error     t-Value
C            -178.786        11.02         -16.22
AMP          -361.766        26.19         -13.81
FREQ         0.302596      0.1510E-03     2005.00
PHASE         1.46536      0.4909E-01       29.85

Residual Standard Deviation = 155.8484
Residual Degrees of Freedom = 196
Model From the fit results, our proposed model is:
$$\hat{Y}_i = -178.786 - 361.766[2\pi(0.302596)T_i + 1.46536]$$
We will evaluate the adequacy of this model in the next section.