function approximation   

function approximation

关于R中的拟合,前面是自己写的多项式拟合

29 Sep 2016

Go back function approximation

第四次数值分析作业:函数逼近

一、 实验目标

1.掌握数据多项式拟合的最小二乘法。

2.会求函数的插值三角多项式。

二、实验问题

实验问题

实验问题

三、实验要求

  1. 利用最小二乘法求问题(1)所给数据的3次、4次拟合多项式,画出拟合曲线。

  2. 求函数\(f(x)=x^2*cos(x)\) 在区间\([-pi,pi]\)上的16次插值三角多项式,并画出插值多项式的图形,与\(f(x)\)的图形比较。

  3. 对函数\(f(x)=x^2*cos(x)\),在区间\([-pi,pi]\) 上的取若干点,将函数值作为数据进行适当次数的最小二乘多项式拟合,并计算误差,与上题中的16次插值三角多项式的结果进行比较。

程序及代码

1. 3次与4次拟合多项式

x<-c(0,0.9,1.9,3.0,3.9,5)
y<-c(0,10,30,50,80,110)
fitting<-function(x,y,k){#x,y是自变量和应变量k是需要拟合的多项式的次数
  #计算x所需要的数据
  xx<-matrix(data=NA,nrow=k+1,ncol = length(x))
  xx[1,]<-rep(1,length(x))
  for (i in 2:(k+1)) {
    xx[i,]<-x^(i-1)
  }
  #计算y所需要的数据
  yy<-matrix(data=NA,nrow=k+1,ncol = length(y))
  yy[1,]<-rep(1,length(y))
  for (i in 2:(k+1)) {
    yy[i,]<-y^(i-1)
  }
  #计算G所需的元素
  mat<-matrix(data=NA,nrow=k+1,ncol = k+1)
  for (i in 1:nrow(mat)) {
    for (j in 1:ncol(mat)) {
      # browser()
      mat[i,j]<-sum(xx[i,]*xx[j,])
    }
  }
  #计算d
  d<-vector(length = ncol(mat))
  for (i in 1:ncol(mat)) {
    d[i]<-sum(xx[i,]*y)
  }
  #求解
  result=solve(mat) %*% d
  return(result)
}
fitting(x=x,y=y,k=1)
##           [,1]
## [1,] -7.855048
## [2,] 22.253761
x<-c(1,1,2,3,3,3,4,5)
y<-c(4,4,4.5,6,6,6,8,8.5)
fitting(x=x,y=y,k=1)
##          [,1]
## [1,] 2.564815
## [2,] 1.203704
#两项测试通过

2.求解问题

x<-c(0,0.1,0.2,0.3,0.5,0.8,1)
y<-c(1,0.41,0.5,0.61,0.91,2.02,2.46)
cubic<-fitting(x=x,y=y,k=3)
quartic<-fitting(x=x,y=y,k=4)
#绘图
df<-data.frame(x=x,y=y,type=rep("raw data",length(x)))
x<-seq(from=0,to=1,by=0.03)
y<-cubic[1]+cubic[2]*x+cubic[3]*x^2+cubic[4]*x^3
df<-rbind(df,data.frame(x=x,y=y,type=rep("cubic polynomial",length(x))))
y<-quartic[1]+quartic[2]*x+quartic[3]*x^2+quartic[4]*x^3+quartic[5]*x^4
df<-rbind(df,data.frame(x=x,y=y,type=rep("quartic polynomial",length(x))))
library(ggplot2)
ggplot(data = df,aes(x=x,y=y,color=type))+geom_point()+geom_line()

#利用R自带的统计函数进行验证
summary(lm(y~1+I(x)+I(x^2)+I(x^3)))
## 
## Call:
## lm(formula = y ~ 1 + I(x) + I(x^2) + I(x^3))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.018814 -0.012885 -0.001366  0.012917  0.032789 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.909932   0.009639   94.40   <2e-16 ***
## I(x)        -4.533367   0.085590  -52.97   <2e-16 ***
## I(x^2)      12.673780   0.202641   62.54   <2e-16 ***
## I(x^3)      -6.621960   0.134450  -49.25   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.01563 on 30 degrees of freedom
## Multiple R-squared:  0.9995, Adjusted R-squared:  0.9995 
## F-statistic: 2.103e+04 on 3 and 30 DF,  p-value: < 2.2e-16
summary(lm(y~1+I(x)+I(x^2)+I(x^3)+I(x^4)))
## Warning in summary.lm(lm(y ~ 1 + I(x) + I(x^2) + I(x^3) + I(x^4))):
## essentially perfect fit: summary may be unreliable
## 
## Call:
## lm(formula = y ~ 1 + I(x) + I(x^2) + I(x^3) + I(x^4))
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -2.683e-15 -3.795e-16 -6.146e-17  4.104e-16  2.283e-15 
## 
## Coefficients:
##               Estimate Std. Error    t value Pr(>|t|)    
## (Intercept)  9.427e-01  6.775e-16  1.392e+15   <2e-16 ***
## I(x)        -5.299e+00  9.784e-15 -5.416e+14   <2e-16 ***
## I(x^2)       1.627e+01  4.108e-14  3.962e+14   <2e-16 ***
## I(x^3)      -1.233e+01  6.279e-14 -1.964e+14   <2e-16 ***
## I(x^4)       2.885e+00  3.145e-14  9.174e+13   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.33e-16 on 29 degrees of freedom
## Multiple R-squared:      1,  Adjusted R-squared:      1 
## F-statistic: 4.426e+30 on 4 and 29 DF,  p-value: < 2.2e-16

三次多项式的系数与四次多项式的系数

三次多项式的系数为:0.926587, -4.6590662, 12.8147251, -6.6220615

四次多项式的系数为:0.942721, -5.2987276, 16.2747397, -12.3348255, 2.8852857

从结果中可以看出:三次和四次的拟合结果相差不大。在开始的几个点中与原始数据差别比较大,之后拟合效果较好!

利用统计的方法进行验证,结果正确,但是可以看到三次拟合所有变量统计显著,而四次不是。所以在统计上三次更加合适!

3.讨论与不足

  1. 在绘图的过程中暂时还无法调整到使原始数据点不成直线,而使其它点连成直线。 因为ggplot2本身就是一个很大的一个学习项目,自己有很多的语法,这里先暂时不做深究

  2. 也可学习R本身自带的拟合函数,下面是学习的过程




学习R语言的拟合

这里只学习lm这个函数,其它的函数见参考文件

The main model-fitting commands covered on this page are:

Also covered is the use of

Fit a linear model to data

z<-lm(y~x)
summary(z)#同最小二乘估计,结果相同!
## 
## Call:
## lm(formula = y ~ x)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.31134 -0.22394 -0.01884  0.17011  0.77532 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.16740    0.08882   1.885   0.0686 .  
## x            2.11470    0.15423  13.711 6.05e-15 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2647 on 32 degrees of freedom
## Multiple R-squared:  0.8545, Adjusted R-squared:   0.85 
## F-statistic:   188 on 1 and 32 DF,  p-value: 6.048e-15

Linear models for fixed effects are implemented in the lm method in R. You can pass a data frame to the lm command, using a formula to indicate the model you want to fit.

z <- lm(response ~ explanatory, data = mydata)
z <- lm(response ~ explanatory, data = mydata, na.action = na.exclude)

The resulting object (which I’ve named z) is an lm object containing all the results. You use additional commands to pull out these results, including residuals and predicted values. The argument na.action = na.exclude is optional — it tells R to keep track of cases having missing values, in which case the residuals and predicted values will have NA’s inserted for those cases. Otherwise R just drops missing cases.

Here are some of the most useful functions to extract results from the lm object:

summary(z)     # parameter estimates and overall model fit
# estimates of slope, intercept, SE's
plot(z)        # plots of residuals, normal quantiles, leverage
coef(z)        # model coefficients (means, slopes, intercepts) 系数的点估计
confint(z, level = 0.95)     # confidence intervals for parameters 系数的区间估计
#95% conf. intervals for slope and intercept
resid(z)       # residuals
predict(z)     # predicted values
predict(z, newdata = mynewdata) # see below
fitted(z)      # predicted values
anova(z1, z2)  # compare fits of 2 models, "full" vs "reduced"
anova(z)       # ANOVA table (** terms tested sequentially **)
#To test the null hypothesis of zero slope with the ANOVA table

The command predict(z, newdata = mynewdata) will used the model to predict values for new observations contained in the data frame mynewdata. The explanatory variables (the X variables) in mynewdata must have exactly the same names as in mydata.

#例子
mydata<-data.frame(
  x=c(0,0.9,1.9,3.0,3.9,5),
  y=c(0,10,30,50,80,110))
plot(y ~ x, data = mydata)
z <- lm(y ~ x, data = mydata)
xnew <- range(mydata$x)
yhat <- predict(z, newdata = data.frame(x = xnew))#新的预测值的写入方式
lines(yhat ~ xnew)#结果如下

# In the case of simple linear regression you can add a line to a scatter plot of the data using the abline function.
# plot(response ~ explanatory, data = mydata)

Add confidence bands to your previous scatter plot:

plot(y ~ x, data = mydata)
xnew <- seq(min(mydata$x), max(mydata$x), length.out = 100)
ynew <- data.frame(predict(z,
                   interval = "confidence", level = 0.95))#interval = c("none", "confidence", "prediction")
str(ynew)
## 'data.frame':    6 obs. of  3 variables:
##  $ fit: num  -7.86 12.17 34.43 58.91 78.93 ...
##  $ lwr: num  -22.2 1.1 25.8 50.3 68.2 ...
##  $ upr: num  6.47 23.24 43.05 67.53 89.69 ...
lines(ynew$lwr ~ mydata$x, lty = 2)
lines(ynew$upr ~ mydata$x, lty = 2)

Adding prediction intervals instead is similar:

plot(y ~ x, data = mydata)
xnew <- seq(min(mydata$x), max(mydata$x), length.out = 100)
ynew <- data.frame(predict(z, newdata = data.frame(x = xnew), 
                   interval = "confidence", level = 0.95))#interval = c("none", "confidence", "prediction")
str(ynew)
## 'data.frame':    100 obs. of  3 variables:
##  $ fit: num  -7.86 -6.73 -5.61 -4.48 -3.36 ...
##  $ lwr: num  -22.2 -20.9 -19.5 -18.2 -16.9 ...
##  $ upr: num  6.47 7.4 8.32 9.25 10.18 ...
lines(ynew$lwr ~ xnew, lty = 2)
lines(ynew$upr ~ xnew, lty = 2)

The intervals might not all squeeze onto the plot unless you adjust the limits of the y-axis using the ylim = option in the plot command.

Basic types of linear models

The simplest linear model fits a constant, the mean, to a single variable. This is useful if you want to obtain an estimate of the mean with a standard error and confidence interval, or wanted to test the null hypothesis that the mean is zero.

就是常数回归

#z <- lm(y ~ 1, data = mydata)
z <- lm(y ~ 1)

The most familiar linear model is the linear regression of y on x.

# z <- lm(y ~ x, data = mydata)
z <- lm(y ~ x)#一次项回归

To fit a linear regression model without an intercept term, i.e., a regression through the origin, add “-1” to the model statement:

# z <- lm(y ~ x - 1, data = mydata)
z <- lm(y ~ x - 1)

If A is a categorical variable (factor or character) rather than a numeric variable then the model conforms to a single factor ANOVA instead.

x<-c(1,1,2,3,3,3,4,5)
y<-c(4,4,4.5,6,6,6,8,8.5)
#z <- lm(y ~ A, data = mydata)
A<-c("A","A","A","B","B","C","C","B")
z <- lm(y ~ A)

More complicated models include more variables and their interactions

z1 <- lm(y ~ x1 + x2, data = mydata)          # no interaction between x1 and x2
z2 <- lm(y ~ x1 + x2 + x1:x2, data = mydata)  # interaction term present
z2 <- lm(y ~ x1 * x2, data = mydata)          # interaction term present

Analysis of covariance models include both numeric and categorical variables. The linear model in this case is a separate linear regression for each group of the categorical variable. Interaction terms between these two types of variables, if included in the model, fit different linear regression slopes; otherwise the same slope is forced upon every group.

z <- lm(y ~ x + A + x:A, data = mydata)

参考资料

http://www.zoology.ubc.ca/~schluter/R/fit-model/

优秀参考资料

关于多项式拟合等等

http://blog.sina.com.cn/s/blog_6fbfcfb50102va2k.html