Least-Squares Approximation

$\newenvironment {prompt}{}{} \newcommand {\ungraded }[0]{} \newcommand {\npnoround }[0]{\nprounddigits {-1}} \newcommand {\npnoroundexp }[0]{\nproundexpdigits {-1}} \newcommand {\npunitcommand }[1]{\ensuremath {\mathrm {#1}}} \newcommand {\tdplotsinandcos }[3]{\pgfmathsetmacro {#1}{sin(#3)}\pgfmathsetmacro {#2}{cos(#3)}} \newcommand {\tdplotmult }[3]{\pgfmathsetmacro {#1}{#2*#3}} \newcommand {\tdplotdiv }[3]{\pgfmathsetmacro {#1}{#2/#3}} \newcommand {\tdplotcheckdiff }[5]{\par \par \pgfmathparse { abs(#2 -#1)<#3 } \par \ifthenelse {\equal {\pgfmathresult }{1}}{#4}{#5} } \newcommand {\tdplotsetmaincoords }[2]{\pgfmathsetmacro {\tdplotmaintheta }{#1} \pgfmathsetmacro {\tdplotmainphi }{#2} \tdplotcalctransformmainscreen \tikzset {tdplot_main_coords/.style={x={(\raarot cm,\rbarot cm)},y={(\rabrot cm,\rbbrot cm)},z={(\racrot cm,\rbcrot cm)}}}} \newcommand {\tdplotcalctransformmainscreen }[0]{\tdplotsinandcos {\sintheta }{\costheta }{\tdplotmaintheta }\tdplotsinandcos {\sinphi }{\cosphi }{\tdplotmainphi }\tdplotmult {\stsp }{\sintheta }{\sinphi }\tdplotmult {\stcp }{\sintheta }{\cosphi }\tdplotmult {\ctsp }{\costheta }{\sinphi }\tdplotmult {\ctcp }{\costheta }{\cosphi }\pgfmathsetmacro {\raarot }{\cosphi }\pgfmathsetmacro {\rabrot }{\sinphi }\pgfmathsetmacro {\racrot }{0}\pgfmathsetmacro {\rbarot }{-\ctsp }\pgfmathsetmacro {\rbbrot }{\ctcp }\pgfmathsetmacro {\rbcrot }{\sintheta }\pgfmathsetmacro {\rcarot }{\stsp }\pgfmathsetmacro {\rcbrot }{-\stcp }\pgfmathsetmacro {\rccrot }{\costheta }} \newcommand {\tdplotcalctransformrotmain }[0]{\tdplotsinandcos {\sinalpha }{\cosalpha }{\tdplotalpha } \tdplotsinandcos {\sinbeta }{\cosbeta }{\tdplotbeta } \tdplotsinandcos {\singamma }{\cosgamma }{\tdplotgamma } \tdplotmult {\sasb }{\sinalpha }{\sinbeta } \tdplotmult {\sbsg }{\sinbeta }{\singamma } \tdplotmult {\sasg }{\sinalpha }{\singamma } \tdplotmult {\sasbsg }{\sasb }{\singamma } \tdplotmult {\sacb }{\sinalpha }{\cosbeta } \tdplotmult {\sacg }{\sinalpha }{\cosgamma } \tdplotmult {\sbcg }{\sinbeta }{\cosgamma } \tdplotmult {\sacbsg }{\sacb }{\singamma } \tdplotmult {\sacbcg }{\sacb }{\cosgamma } \tdplotmult {\casb }{\cosalpha }{\sinbeta } \tdplotmult {\cacb }{\cosalpha }{\cosbeta } \tdplotmult {\cacg }{\cosalpha }{\cosgamma } \tdplotmult {\casg }{\cosalpha }{\singamma } \tdplotmult {\cacbsg }{\cacb }{\singamma } \tdplotmult {\cacbcg }{\cacb }{\cosgamma } \pgfmathsetmacro {\raaeul }{\cacbcg -\sasg } \pgfmathsetmacro {\rabeul }{-\cacbsg -\sacg } \pgfmathsetmacro {\raceul }{\casb } \pgfmathsetmacro {\rbaeul }{\sacbcg + \casg } \pgfmathsetmacro {\rbbeul }{-\sacbsg + \cacg } \pgfmathsetmacro {\rbceul }{\sasb } \pgfmathsetmacro {\rcaeul }{-\sbcg } \pgfmathsetmacro {\rcbeul }{\sbsg } \pgfmathsetmacro {\rcceul }{\cosbeta } } \newcommand {\tdplotcalctransformmainrot }[0]{\tdplotsinandcos {\sinalpha }{\cosalpha }{\tdplotalpha } \tdplotsinandcos {\sinbeta }{\cosbeta }{\tdplotbeta } \tdplotsinandcos {\singamma }{\cosgamma }{\tdplotgamma } \tdplotmult {\sasb }{\sinalpha }{\sinbeta } \tdplotmult {\sbsg }{\sinbeta }{\singamma } \tdplotmult {\sasg }{\sinalpha }{\singamma } \tdplotmult {\sasbsg }{\sasb }{\singamma } \tdplotmult {\sacb }{\sinalpha }{\cosbeta } \tdplotmult {\sacg }{\sinalpha }{\cosgamma } \tdplotmult {\sbcg }{\sinbeta }{\cosgamma } \tdplotmult {\sacbsg }{\sacb }{\singamma } \tdplotmult {\sacbcg }{\sacb }{\cosgamma } \tdplotmult {\casb }{\cosalpha }{\sinbeta } \tdplotmult {\cacb }{\cosalpha }{\cosbeta } \tdplotmult {\cacg }{\cosalpha }{\cosgamma } \tdplotmult {\casg }{\cosalpha }{\singamma } \tdplotmult {\cacbsg }{\cacb }{\singamma } \tdplotmult {\cacbcg }{\cacb }{\cosgamma } \pgfmathsetmacro {\raaeul }{\cacbcg -\sasg } \pgfmathsetmacro {\rabeul }{\sacbcg + \casg } \pgfmathsetmacro {\raceul }{-\sbcg } \pgfmathsetmacro {\rbaeul }{-\cacbsg -\sacg } \pgfmathsetmacro {\rbbeul }{-\sacbsg + \cacg } \pgfmathsetmacro {\rbceul }{\sbsg } \pgfmathsetmacro {\rcaeul }{\casb } \pgfmathsetmacro {\rcbeul }{\sasb } \pgfmathsetmacro {\rcceul }{\cosbeta } } \newcommand {\tdplottransformmainrot }[3]{\tdplotcalctransformmainrot \par \pgfmathsetmacro {\tdplotresx }{\raaeul * #1 + \rabeul * #2 + \raceul * #3} \pgfmathsetmacro {\tdplotresy }{\rbaeul * #1 + \rbbeul * #2 + \rbceul * #3} \pgfmathsetmacro {\tdplotresz }{\rcaeul * #1 + \rcbeul * #2 + \rcceul * #3} } \newcommand {\tdplottransformrotmain }[3]{\tdplotcalctransformrotmain \par \pgfmathsetmacro {\tdplotresx }{\raaeul * #1 + \rabeul * #2 + \raceul * #3} \pgfmathsetmacro {\tdplotresy }{\rbaeul * #1 + \rbbeul * #2 + \rbceul * #3} \pgfmathsetmacro {\tdplotresz }{\rcaeul * #1 + \rcbeul * #2 + \rcceul * #3} } \newcommand {\tdplottransformmainscreen }[3]{\tdplotcalctransformmainscreen \par \pgfmathsetmacro {\tdplotresx }{\raarot * #1 + \rabrot * #2 + \racrot * #3} \pgfmathsetmacro {\tdplotresy }{\rbarot * #1 + \rbbrot * #2 + \rbcrot * #3} } \newcommand {\tdplotsetrotatedcoords }[3]{\pgfmathsetmacro {\tdplotalpha }{#1} \pgfmathsetmacro {\tdplotbeta }{#2} \pgfmathsetmacro {\tdplotgamma }{#3} \tdplotcalctransformrotmain \par \tdplotmult {\raaeaa }{\raarot }{\raaeul } \tdplotmult {\rabeba }{\rabrot }{\rbaeul } \tdplotmult {\raceca }{\racrot }{\rcaeul } \tdplotmult {\raaeab }{\raarot }{\rabeul } \tdplotmult {\rabebb }{\rabrot }{\rbbeul } \tdplotmult {\racecb }{\racrot }{\rcbeul } \tdplotmult {\raaeac }{\raarot }{\raceul } \tdplotmult {\rabebc }{\rabrot }{\rbceul } \tdplotmult {\racecc }{\racrot }{\rcceul } \tdplotmult {\rbaeaa }{\rbarot }{\raaeul } \tdplotmult {\rbbeba }{\rbbrot }{\rbaeul } \tdplotmult {\rbceca }{\rbcrot }{\rcaeul } \tdplotmult {\rbaeab }{\rbarot }{\rabeul } \tdplotmult {\rbbebb }{\rbbrot }{\rbbeul } \tdplotmult {\rbcecb }{\rbcrot }{\rcbeul } \tdplotmult {\rbaeac }{\rbarot }{\raceul } \tdplotmult {\rbbebc }{\rbbrot }{\rbceul } \tdplotmult {\rbcecc }{\rbcrot }{\rcceul } \pgfmathsetmacro {\raarc }{\raaeaa + \rabeba + \raceca } \pgfmathsetmacro {\rabrc }{\raaeab + \rabebb + \racecb } \pgfmathsetmacro {\racrc }{\raaeac + \rabebc + \racecc } \pgfmathsetmacro {\rbarc }{\rbaeaa + \rbbeba + \rbceca } \pgfmathsetmacro {\rbbrc }{\rbaeab + \rbbebb + \rbcecb } \pgfmathsetmacro {\rbcrc }{\rbaeac + \rbbebc + \rbcecc } \tikzset {tdplot_rotated_coords/.append style={x={(\raarc cm,\rbarc cm)},y={(\rabrc cm,\rbbrc cm)},z={(\racrc cm,\rbcrc cm)}}}} \newcommand {\tdplotsetrotatedcoordsorigin }[1]{\tikzset {tdplot_rotated_coords/.append style={shift=#1}}} \newcommand {\tdplotresetrotatedcoordsorigin }[0]{\tikzset {tdplot_rotated_coords/.append style={shift={(0,0,0)}}}} \newcommand {\tdplotsetthetaplanecoords }[1]{\tdplotresetrotatedcoordsorigin \tdplotsetrotatedcoords {270 + #1}{270}{0}} \newcommand {\tdplotsetrotatedthetaplanecoords }[1]{\tdplotsetrotatedcoords {\tdplotalpha }{\tdplotbeta }{\tdplotgamma + #1}\tikzset {tdplot_rotated_coords/.append style={y={(\raarc cm,\rbarc cm)},z={(\rabrc cm,\rbbrc cm)},x={(\racrc cm,\rbcrc cm)}}}} \newcommand {\tdplotsetcoord }[4]{\tdplotsinandcos {\sinthetavec }{\costhetavec }{#3}\tdplotsinandcos {\sinphivec }{\cosphivec }{#4}\tdplotmult {\stcpv }{\sinthetavec }{\cosphivec }\tdplotmult {\stspv }{\sinthetavec }{\sinphivec }\coordinate (#1) at ($#2*(\stcpv ,\stspv ,\costhetavec )$); \coordinate (#1xy) at ($#2*(\stcpv ,\stspv ,0)$); \coordinate (#1xz) at ($#2*(\stcpv ,0,\costhetavec )$); \coordinate (#1yz) at ($#2*(0,\stspv ,\costhetavec )$); \coordinate (#1x) at ($#2*(\stcpv ,0,0)$); \coordinate (#1y) at ($#2*(0,\stspv ,0)$); \coordinate (#1z) at ($#2*(0,0,\costhetavec )$); } \newcommand {\tdplotsimplesetcoord }[4]{\tdplotsinandcos {\sinthetavec }{\costhetavec }{#3}\tdplotsinandcos {\sinphivec }{\cosphivec }{#4}\tdplotmult {\stcpv }{\sinthetavec }{\cosphivec }\tdplotmult {\stspv }{\sinthetavec }{\sinphivec }\coordinate (#1) at ($#2*(\stcpv ,\stspv ,\costhetavec )$); } \newcommand {\tdplotsetpolarplotrange }[4]{\pgfmathsetmacro {\tdplotlowerphi }{#3} \pgfmathsetmacro {\tdplotupperphi }{#4} \pgfmathsetmacro {\tdplotlowertheta }{#1} \pgfmathsetmacro {\tdplotuppertheta }{#2} } \newcommand {\tdplotresetpolarplotrange }[0]{\pgfmathsetmacro {\tdplotlowerphi }{0} \pgfmathsetmacro {\tdplotupperphi }{360} \pgfmathsetmacro {\tdplotlowertheta }{0} \pgfmathsetmacro {\tdplotuppertheta }{180} } \newcommand {\tdplotdosurfaceplot }[6]{\par \pgfmathsetmacro {\nextphi }{\curphi + \tdplotsuperfudge *\viewphistep } \par \begin {scope}[opacity=1] \par \par \tdplotcheckdiff {\nextphi }{360}{\origviewphistep }{#2}{} \tdplotcheckdiff {\nextphi }{0}{\origviewphistep }{#2}{} \par \tdplotcheckdiff {\nextphi }{90}{\origviewphistep }{#3}{} \tdplotcheckdiff {\nextphi }{450}{\origviewphistep }{#3}{} \end {scope} \par \foreach \curtheta in{\viewthetastart ,\viewthetainc ,...,\viewthetaend } { \par \pgfmathsetmacro {\curlongitude }{90 -\curphi } \pgfmathsetmacro {\curlatitude }{90 -\curtheta } \par \ifthenelse {\equal {\leftright }{-1.0}}{\pgfmathsetmacro {\curphi }{\curphi -\origviewphistep } }{} \par \pgfmathsetmacro {\tdplottheta }{mod(\curtheta ,360)} \pgfmathsetmacro {\tdplotphi }{mod(\curphi ,360)} \par \pgfmathparse {\tdplotphi <0} \ifthenelse {\equal {\pgfmathresult }{1}}{ \pgfmathsetmacro {\tdplotphi }{\tdplotphi + 360} }{}\par \pgfmathparse {\tdplottheta >\tdplotuppertheta } \pgfmathsetmacro {\logictest }{1 -\pgfmathresult } \par \pgfmathparse {\tdplottheta <\tdplotlowertheta } \pgfmathsetmacro {\logictest }{\logictest * (1 -\pgfmathresult )} \par \pgfmathsetmacro {\tdplottheta }{\tdplottheta + \viewthetastep } \pgfmathparse {\tdplottheta >\tdplotuppertheta } \pgfmathsetmacro {\logictest }{\logictest * (1 -\pgfmathresult )} \par \pgfmathparse {\tdplottheta <\tdplotlowertheta } \pgfmathsetmacro {\logictest }{\logictest * (1 -\pgfmathresult )} \par \pgfmathparse {\tdplotphi >\tdplotupperphi } \pgfmathsetmacro {\logictest }{\logictest * (1 -\pgfmathresult )} \par \pgfmathparse {\tdplotphi <\tdplotlowerphi } \pgfmathsetmacro {\logictest }{\logictest * (1 -\pgfmathresult )} \par \pgfmathsetmacro {\tdplotphi }{\tdplotphi + \viewphistep } \par \pgfmathparse {\tdplotphi <0} \ifthenelse {\equal {\pgfmathresult }{1}}{ \pgfmathsetmacro {\tdplotphi }{\tdplotphi + 360} }{}\par \pgfmathparse {\tdplotphi >\tdplotupperphi } \pgfmathsetmacro {\logictest }{\logictest * (1 -\pgfmathresult )} \par \pgfmathparse {\tdplotphi <\tdplotlowerphi } \pgfmathsetmacro {\logictest }{\logictest * (1 -\pgfmathresult )} \par \par \pgfmathsetmacro {\tdplottheta }{\curtheta } \pgfmathsetmacro {\tdplotphi }{\curphi } \par \ifthenelse {\equal {#6}{parametricfill}}{\ifthenelse {\equal {\logictest }{1.0}}{\pgfmathsetmacro {\radius }{#1} \pgfmathsetmacro {\tdplotr }{\radius *360} \par \pgfmathlessthan {\radius }{0} \pgfmathsetmacro {\phaseshift }{180 * \pgfmathresult } \par \pgfmathsetmacro {\colorarg }{#5} \pgfmathsetmacro {\colorarg }{\colorarg + \phaseshift } \pgfmathsetmacro {\colorarg }{mod(\colorarg ,360)} \par \pgfmathlessthan {\colorarg }{0} \pgfmathsetmacro {\colorarg }{\colorarg + 360*\pgfmathresult } \par \pgfmathdivide {\colorarg }{360} \definecolor {tdplotfillcolor}{hsb}{\pgfmathresult ,1,1} \color {tdplotfillcolor} }{}}{\pgfsetfillcolor {#5} } \pgfsetstrokecolor {#4} \par \ifthenelse {\equal {\leftright }{-1.0}}{\pgfmathsetmacro {\curphi }{\curphi + \origviewphistep } }{} \par \ifthenelse {\equal {\logictest }{1.0}}{\pgfmathsetmacro {\radius }{abs(#1)} \pgfpathmoveto {\pgfpointspherical {\curlongitude }{\curlatitude }{\radius }} \par \pgfmathsetmacro {\tdplotphi }{\curphi + \viewphistep } \pgfmathsetmacro {\radius }{abs(#1)} \pgfpathlineto {\pgfpointspherical {\curlongitude -\viewphistep }{\curlatitude }{\radius }} \par \pgfmathsetmacro {\tdplottheta }{\curtheta + \viewthetastep } \pgfmathsetmacro {\radius }{abs(#1)} \pgfpathlineto {\pgfpointspherical {\curlongitude -\viewphistep }{\curlatitude -\viewthetastep }{\radius }} \par \pgfmathsetmacro {\tdplotphi }{\curphi } \pgfmathsetmacro {\radius }{abs(#1)} \pgfpathlineto {\pgfpointspherical {\curlongitude }{\curlatitude -\viewthetastep }{\radius }} \pgfpathclose \par \pgfusepath {fill,stroke} }{} } } \newcommand {\tdplotshowargcolorguide }[4]{ \par \pgfmathsetmacro {\tdplotx }{#1} \pgfmathsetmacro {\tdploty }{#2} \pgfmathsetmacro {\tdplothuestep }{5} \pgfmathsetmacro {\tdplotxsize }{#3} \pgfmathsetmacro {\tdplotysize }{#4} \par \pgfmathsetmacro {\tdplotyscale }{\tdplotysize /360} \par \foreach \tdplotphi in {0,\tdplothuestep ,...,360} { \pgfmathdivide {\tdplotphi }{360} \definecolor {tdplotfillcolor}{hsb}{\pgfmathresult ,1,1} \color {tdplotfillcolor} \par \pgfmathsetmacro {\tdplotstarty }{\tdploty + \tdplotphi * \tdplotyscale } \pgfmathsetmacro {\tdplotstopy }{\tdplotstarty + \tdplothuestep * \tdplotyscale } \pgfmathsetmacro {\tdplotstartx }{\tdplotx } \pgfmathsetmacro {\tdplotstopx }{\tdplotx + \tdplotxsize } \filldraw [tdplot_screen_coords] (\tdplotstartx ,\tdplotstarty ) rectangle (\tdplotstopx ,\tdplotstopy ); } \par \pgfmathsetmacro {\tdplotstopy }{\tdploty + (360+\tdplothuestep )*\tdplotyscale } \pgfmathsetmacro {\tdplotstopx }{\tdplotx + \tdplotxsize } \par \draw [tdplot_screen_coords] (\tdplotx ,\tdploty ) rectangle (\tdplotstopx ,\tdplotstopy ); \par \node [tdplot_screen_coords,anchor=west,xshift=5pt] at (\tdplotstopx ,\tdploty ) {$0$}; \node [tdplot_screen_coords,anchor=west,xshift=5pt] at (\tdplotstopx ,\tdplotstopy ) {$2\pi $}; \par \pgfmathsetmacro {\tdplotstopy }{\tdploty + (360+\tdplothuestep )/2*\tdplotyscale } \node [tdplot_screen_coords,anchor=west,xshift=5pt] at (\tdplotstopx ,\tdplotstopy ) {$\pi $}; } \newcommand {\tdplotgetpolarcoords }[3]{\pgfmathsetmacro {\vxcalc }{#1} \pgfmathsetmacro {\vycalc }{#2} \pgfmathsetmacro {\vzcalc }{#3} \pgfmathsetmacro {\vcalc }{ sqrt((\vxcalc )^2 + (\vycalc )^2 + (\vzcalc )^2) } \par \pgfmathsetmacro {\vxycalc }{ sqrt((\vxcalc )^2 + (\vycalc )^2) } \par \pgfmathsetmacro {\tdplotrestheta }{asin(\vxycalc /\vcalc )} \pgfmathparse {\vzcalc <0} \ifthenelse {\equal {\pgfmathresult }{1}}{\pgfmathsetmacro {\tdplotrestheta }{180 -\tdplotrestheta } } {} \ifthenelse {\equal {\vxcalc }{0.0}}{\pgfmathparse {\vycalc <0} \ifthenelse {\equal {\pgfmathresult }{1}}{\pgfmathsetmacro {\tdplotresphi }{270} } {\pgfmathparse {\vycalc >0} \ifthenelse {\equal {\pgfmathresult }{1}}{\pgfmathsetmacro {\tdplotresphi }{90} } {\pgfmathsetmacro {\tdplotresphi }{0} } } } {\pgfmathsetmacro {\tdplotresphi }{atan(\vycalc /\vxcalc )} \pgfmathparse {\vxcalc <0} \ifthenelse {\equal {\pgfmathresult }{1}}{\pgfmathsetmacro {\tdplotresphi }{\tdplotresphi +180} } { } \par \pgfmathparse {\tdplotresphi <0} \ifthenelse {\equal {\pgfmathresult }{1}}{\pgfmathsetmacro {\tdplotresphi }{\tdplotresphi +360} } {} } } \newcommand {\vec }[0]{\mathbf } \newcommand {\RR }[0]{\mathbb {R}} \newcommand {\dfn }[0]{\textit } \newcommand {\dotp }[0]{\cdot } \newcommand {\id }[0]{\text {id}} \newcommand {\norm }[1]{\left \lVert #1\right \rVert } \newcommand {\mathtoolsset }[1]{\setkeys {\MT_options_name: }{#1}} \newcommand {\refeq }[1]{\textup {\ref {#1}}} \newcommand {\lparen }[0]{(} \newcommand {\rparen }[0]{)} \newcommand {\ordinarycolon }[0]{:} \newcommand {\MT_test_for_tcb_other:nnnnn }[1]{\if:w t#1\relax \expandafter \MH_use_choice_i:nnnn \else: \if:w c#1\relax \expandafter \expandafter \expandafter \MH_use_choice_ii:nnnn \else: \if:w b#1\relax \expandafter \expandafter \expandafter \expandafter \expandafter \expandafter \expandafter \MH_use_choice_iii:nnnn \else: \expandafter \expandafter \expandafter \expandafter \expandafter \expandafter \expandafter \MH_use_choice_iv:nnnn \fi: \fi: \fi: } \newcommand {\newcases }[6]{\newenvironment {#1}{\MT_start_cases:nnnn {#2}{#3}{#4}{#5}}{\MH_end_cases: \right #6}} \newcommand {\renewcases }[6]{\renewenvironment {#1}{\MT_start_cases:nnnn {#2}{#3}{#4}{#5}}{\MH_end_cases: \right #6}} \newcommand {\SwapAboveDisplaySkip }[0]{\noalign {\vskip -\abovedisplayskip \vskip \abovedisplayshortskip }} \newcommand {\vdotswithin }[1]{{\mathmakebox [\widthof {\ensuremath {{}#1{}}}][c]{{\vdots }}}} \newcommand {\MTFlushSpaceBelow }[0]{\\\noalign {\nobreak \vskip -\lineskip \vskip -\l_MT_shortvdotswithinadjustbelow_dim \vskip -\origjot \vskip \jot }} \newcommand {\mathmbox }[0]{\mathpalette \MT_mathmbox:nn } \newcommand {\prescript }[3]{\mathchoice {\MT_prescript_inner: {#1}{#2}{#3}{\scriptstyle }}{\MT_prescript_inner: {#1}{#2}{#3}{\scriptstyle }}{\MT_prescript_inner: {#1}{#2}{#3}{\scriptscriptstyle }}{\MT_prescript_inner: {#1}{#2}{#3}{\scriptscriptstyle }}} \newcommand {\spreadlines }[1]{\setlength {\jot }{#1}\ignorespaces } \newcommand {\newgathered }[4]{\newenvironment {#1}{\def \MT_gathered_pre: {#2}\def \MT_gathered_post: {#3}\def \MT_gathered_env_end: {#4}\MT_gathered_env }{\endMT_gathered_env }} \newcommand {\renewgathered }[4]{\renewenvironment {#1}{\def \MT_gathered_pre: {#2}\def \MT_gathered_post: {#3}\def \MT_gathered_env_end: {#4}\MT_gathered_env }{\endMT_gathered_env }} \newcommand {\lgathered }[0]{\def \MT_gathered_pre: {}\def \MT_gathered_post: {\hfil }\def \MT_gathered_env_end: {}\MT_gathered_env } \newcommand {\rgathered }[0]{\def \MT_gathered_pre: {\hfil }\def \MT_gathered_post: {}\def \MT_gathered_env_end: {}\MT_gathered_env } \newcommand {\gathered }[0]{\def \MT_gathered_pre: {\hfil }\def \MT_gathered_post: {\hfil }\def \MT_gathered_env_end: {}\MT_gathered_env } \newcommand {\splitfrac }[2]{\genfrac {}{}{0pt}{1}{\textstyle #1\quad \hfill }{\textstyle \hfill \quad \mathstrut #2}} \newcommand {\splitdfrac }[2]{\genfrac {}{}{0pt}{0}{#1\quad \hfill }{\hfill \quad \mathstrut #2}} \newcommand {\HyperFirstAtBeginDocument }[0]{\AtBeginDocument } \newcommand {\dblcolon }[0]{\vcentcolon \mathrel {\mkern -.9mu}\vcentcolon } \newcommand {\coloneqq }[0]{\vcentcolon \mathrel {\mkern -1.2mu}=} \newcommand {\Coloneqq }[0]{\dblcolon \mathrel {\mkern -1.2mu}=} \newcommand {\coloneq }[0]{\vcentcolon \mathrel {\mkern -1.2mu}\mathrel {-}} \newcommand {\Coloneq }[0]{\dblcolon \mathrel {\mkern -1.2mu}\mathrel {-}} \newcommand {\eqqcolon }[0]{=\mathrel {\mkern -1.2mu}\vcentcolon } \newcommand {\Eqqcolon }[0]{=\mathrel {\mkern -1.2mu}\dblcolon } \newcommand {\eqcolon }[0]{\mathrel {-}\mathrel {\mkern -1.2mu}\vcentcolon } \newcommand {\Eqcolon }[0]{\mathrel {-}\mathrel {\mkern -1.2mu}\dblcolon } \newcommand {\colonapprox }[0]{\vcentcolon \mathrel {\mkern -1.2mu}\approx } \newcommand {\Colonapprox }[0]{\dblcolon \mathrel {\mkern -1.2mu}\approx } \newcommand {\colonsim }[0]{\vcentcolon \mathrel {\mkern -1.2mu}\sim } \newcommand {\Colonsim }[0]{\dblcolon \mathrel {\mkern -1.2mu}\sim } \newcommand {\nuparrow }[0]{\MH_nuparrow: } \newcommand {\ndownarrow }[0]{\MH_ndownarrow: } \newcommand {\bigtimes }[0]{\MH_csym_bigtimes: }$

Least-Squares Approximation

Often an exact solution to a problem in applied mathematics is difficult or impossible to obtain. However, it is usually just as useful to find an approximation to a solution. In particular, finding “linear approximations” is a powerful technique in applied mathematics. One basic case is the situation where a system of linear equations has no solution, and it is desirable to find a “best approximation” to a solution to the system.

We begin by defining the “best approximation” in a natural way, and showing that computing the best approximation reduces to solving a related system of linear equations called the normal equations. Next, we demonstrate a common application where a collection of data points is approximated by a line (or a curve). We conclude this section by showing that $QR$ -factorization provides us with a more efficient way to solve the normal equations and compute the best approximation.

Best Approximate Solutions

Let $A=\begin {bmatrix}3 & 1\\1 & 2\\1 & 2\end {bmatrix}\quad \text {and}\quad \vec {b}=\begin {bmatrix}2\\1\\3\end {bmatrix}$ Consider the matrix equation $A\vec {x}=\vec {b}$ . A quick examination of the last two rows should convince you that this equation has no solutions. In other words, $\vec {b}$ is not in the span of the columns of $A$ .

If $\vec {z}$ were an exact solution to $A\vec {x}=\vec {b}$ , then $\vec {b}-A\vec {z}$ would be $\vec {0}$ . Since the equation does not have a solution, we will attempt to find the next best thing to a solution by finding $\vec {z}$ such that $\norm {\vec {b}-A\vec {z}}$ is as small as possible. The quantity $\norm {\vec {b}-A\vec {z}}$ is called the error.

The following GeoGebra interactive will help you understand the geometry behind finding $\vec {z}$ .

RIGHT-CLICK and DRAG to rotate the image for a better view.

Record your best guess for $\vec {z}$ – you will have a chance to check your answer in Example ex:leastSquares1.

What did you discover about the geometry of minimizing $\norm {\vec {b}-A\vec {z}}$ ? Select all that apply.

$\vec {z}$ is orthogonal to the plane spanned by the columns of $A$ . $\norm {\vec {b}-A\vec {z}}$ is orthogonal to $\text {col}(A)$ . $\vec {b}-A\vec {z}$ is orthogonal to $\text {col}(A)$ . $A\vec {z}$ is orthogonal to $\text {col}(A)$ . $A\vec {z}$ is an orthogonal projection of $\vec {b}$ onto $\text {col}(A)$ .

Our geometric observations will help us develop a method for finding $\vec {z}$ .

Suppose $A$ is an $m\times n$ matrix, and $\vec {b}$ is a column vector in $\RR ^m$ . Consider the matrix equation $A\vec {x}=\vec {b}$ . If this equation does not have a solution, we can attempt to find a best approximation by finding $\vec {z}$ which minimizes the error, $\norm {\vec {b}-A\vec {z}}$ . The expression $\norm {\vec {b}-A\vec {z}}$ is also sometimes called the residual. The error (or the residual) is given in terms of a vector norm. Recall that our definition of the norm involves the sum of squares of the vector components. When we minimize the norm, we minimize the sum of squares. This is why the method we are describing is often referred to as least squares. We will explore this idea further later in this section.

In the case when $\text {col}(A)$ is a subspace of $\RR ^3$ , we can see geometrically that $\vec {z}$ is the best approximation if and only if $A\vec {z}$ is an orthogonal projection of $\vec {b}$ onto $\text {col}(A)$ , and the error is the magnitude of $\vec {b}-A\vec {z}$ , as shown below.

What we observed above, holds in general. We will use this fact to find $\vec {z}$ .

Every vector in $\text {col}(A)$ can be written in the form $A\vec {x}$ for some $\vec {x}$ in $\RR ^m$ . Our goal is to find $\vec {z}$ such that $A\vec {z}$ is the orthogonal projection of $\vec {b}$ onto $\text {col}(A)$ . By Corollary cor:orthProjOntoW, every vector $A\vec {x}$ in $\text {col}(A)$ is orthogonal to $\vec {b}-A\vec {z}$ . This means $\vec {b}-A\vec {z}$ is in the orthogonal complement of $\text {col}(A)$ , which is $\text {null}(A^T)$ .

Therefore, we have

$\begin{eqnarray} A^T(\vec{b}-A\vec{z})&=&\vec{0}\nonumber\\ A^T\vec{b}-A^TA\vec{z}&=&\vec{0}\nonumber\\ A^TA\vec{z}&=&A^T\vec{b}\label{eq:normalForZ} \end{eqnarray}$

Since $\vec {b}-A\vec {z}$ is normal to the subspace $\text {col}(A)$ , we call the system of linear equations in (eq:normalForZ) the normal equations for $\vec {z}$ . If $A^TA$ is invertible, then we can write

$\begin{equation} \label{eq:leastSquaresZ} \vec{z}=(A^TA)^{-1}A^T\vec{b} \end{equation}$

We will return to the question of invertibility of $A^TA$ in Theorem th:ATAinverse. For now, let’s revisit the problem posed in Exploration exp:leastSq1.

We now return to the matrix equation $A\vec {x}=\vec {b}$ of Exploration exp:leastSq1 to find $\vec {z}$ that best approximates a solution.

Recall that $A=\begin {bmatrix}3 & 1\\1 & 2\\1 & 2\end {bmatrix}\quad \text {and}\quad \vec {b}=\begin {bmatrix}2\\1\\3\end {bmatrix}$

In this case, $(A^TA)^{-1}$ exists. Applying equation (eq:leastSquaresZ), we compute

$\begin{eqnarray*} \vec{z}&=&\left(\begin{bmatrix}3 & 1 & 1\\1 & 2 & 2\end{bmatrix}\begin{bmatrix}3 & 1\\1 & 2\\1& 2\end{bmatrix}\right)^{-1}\begin{bmatrix}3 & 1 & 1\\1 & 2 & 2\end{bmatrix}\begin{bmatrix}2\\1\\3\end{bmatrix}\\ &=&\begin{bmatrix}11 & 7\\7 & 9\end{bmatrix}^{-1}\begin{bmatrix}10\\10\end{bmatrix}=\begin{bmatrix}0.18 & -0.14\\-0.14 & 0.22\end{bmatrix}\begin{bmatrix}10\\10\end{bmatrix}\\ &=&\begin{bmatrix}0.4\\0.8\end{bmatrix} \end{eqnarray*}$

Compare this answer to your guess in Exploration exp:leastSq1. If your guess was correct, nice job! If your guess was different, try setting $\vec {z}$ to the correct answer and use the GeoGebra interactive in Exploration exp:leastSq1 to examine the geometry of the problem.

We now come back to the question of when $A^TA$ is invertible.

If columns of matrix $A$ are linearly independent, then $A^TA$ is invertible.

Proof: Let $A$ be a matrix with linearly independent columns. We will show that $\left (A^TA\right )\vec {x}=\vec {0}$ has only the trivial solution. For $\vec {x}$ , a solution of $A^TA\vec {x}=\vec {0}$ , we have
$\begin{eqnarray*} \norm{A\vec{x}}^2&=&(A\vec{x})\dotp(A\vec{x})\\ &=&\left(A\vec{x}\right)^TA\vec{x}\\ &=&\vec{x}^TA^TA\vec{x}\\ &=&\vec{x}^T\dotp\vec{0}=0 \end{eqnarray*}$
Therefore $A\vec {x}=\vec {0}$ . By linear independence of the columns of $A$ we conclude that $\vec {x}=\vec {0}$ . $\blacksquare$

We summarize our findings in the following theorem.

Let $A$ be an $m\times n$ matrix, let $\vec {b}$ be a column vector in $\RR ^m$ . Consider the matrix equation $A\vec {x}=\vec {b}$

(a): Any solution $\vec {z}$ to the normal equations $\left (A^TA\right )\vec {z}=A^T\vec {b}$ is a best approximation to a solution to $A\vec {x}=\vec {b}$ in the sense that $\norm {\vec {b}-A\vec {z}}$ is minimized.
(b): If the columns of $A$ are linearly independent, then $A^TA$ is invertible and $\vec {z}$ is given uniquely by $\vec {z}=\left (A^TA\right )^{-1}A^T\vec {b}$

The sytem of linear equations $\begin {matrix}3x & - & y&=&4\\x&+&2y&=&0\\2x&+&y&=&1\end {matrix}$ has no solution. Find the vector $\vec {z}$ that best approximates a solution.

We have $A=\begin {bmatrix}3&-1\\1&2\\2&1\end {bmatrix},\quad \begin {bmatrix}4\\0\\1\end {bmatrix}.$ The normal equations are $\left (A^TA\right )\vec {z}=A^T\vec {b}$ . We compute $\begin {bmatrix}3&1&2\\-1&2&1\end {bmatrix}\begin {bmatrix}3&-1\\1&2\\2&1\end {bmatrix}\vec {z}=\begin {bmatrix}3&1&2\\-1&2&1\end {bmatrix}\begin {bmatrix}4\\0\\1\end {bmatrix}.$ $\begin {bmatrix}14&1\\1&6\end {bmatrix}\vec {z}=\begin {bmatrix}14\\-3\end {bmatrix}$

We observe that is $A^TA$ is invertible. Multiplying on the left by $(A^TA)^{-1}$ yields $\vec {z}=\frac {1}{83}\begin {bmatrix}87\\-56\end {bmatrix}$ With this vector $\vec {z}$ , the left sides of the equations become $\begin {matrix}3(87/83) & - & (-56/83)&\approx &3.82\\(87/83)&+&2(-56/83)&\approx &-0.30\\2(87/83)&+&(-56/83)&\approx &1.42\end {matrix}$ This is as close as possible to a solution.

The average number $g$ of goals per game scored by a hockey player seems to be related linearly to two factors: the number $x_1$ of years of experience and the number $x_2$ of goals in the preceding 10 games. The data on the following page were collected on four players. Find the linear function $g=a_0+a_1x_1+a_2x_2$ that best fits the data. $\begin {array}{|c|c|c|} \hline g&x_1&x_2\\ \hline 0.8&5&3\\ 0.8&3&4\\ 0.6 &1&5\\ 0.4 &2&1\\ \hline \end {array}$

If the relationship is given by $g=a_0+a_1x_1+a_2x_2$ , then the data can be described as follows: $\begin {bmatrix}1&5&3\\1&3&4\\1&1&5\\1&2&1\end {bmatrix}\begin {bmatrix}a_0\\a_1\\a_2\end {bmatrix}=\begin {bmatrix}0.8\\0.8\\0.6\\0.4\end {bmatrix}$ Using Theorem th:bestApprox, we get $\vec {z}=\underbrace {\frac {1}{42}\begin {bmatrix} 119&-17&-19\\-17&5&1\\-19&1&5 \end {bmatrix}}_{\left (A^TA\right )^{-1}}\underbrace {\begin {bmatrix} 1&1&1&1\\5&3&1&2\\3&4&5&1 \end {bmatrix}}_{A^T}\begin {bmatrix} 0.8\\0.8\\0.6\\0.4 \end {bmatrix}=\begin {bmatrix} 0.14\\0.09\\0.08 \end {bmatrix}$ Hence the best-fitting function is $g=0.14+0.09x_1+0.08x_2$ .

Application of Least Squares to Curve Fitting

In Curve Fitting, we discussed how to fit a function to a set of data points, so that the graph of the function passes through each of the points. We also discussed why doing so is sometimes impossible (two points lie one above the other), and may not even be desirable (overfitting). In this section we will learn how to approximate a collection of data points with a line (or a curve) that fits the “trend” of the points. We will start with data that fit a linear pattern.

Consider the points $(1,1)$ , $(2, 3)$ and $(4,4)$ . These points do not lie on a straight line, but they have a general upward linear trend. (Typically there would be many more points to consider, but we will limit our exploration to what we can do by hand.) Our goal is to find a line that fits these points as closely as possible.

We are looking for a function $f$ of the form $f(x)=ax+b$ such that the following infeasible system is satisfied as closely as possible $\begin {matrix}a(1)&+&b&=&1\\a(2)&+&b&=&3\\a(4)&+&b&=&4\end {matrix}$

From the first part of this section we know how to find a best approximation. By Theorem th:bestApprox, we have

$\begin{eqnarray*} \vec{z}=\begin{bmatrix}a\\b\end{bmatrix}&=&\left(A^TA\right)^{-1}A^T\vec{b}\\ &=&\left(\begin{bmatrix}1&2&4\\1&1&1\end{bmatrix}\begin{bmatrix} 1&1\\2&1\\4&1\end{bmatrix}\right)^{-1}\begin{bmatrix}1&2&4\\1&1&1\end{bmatrix}\begin{bmatrix} 1\\3\\4 \end{bmatrix}\\ &=&\begin{bmatrix}13/14\\1/2\end{bmatrix} \end{eqnarray*}$

According to our computations, the line that best fits the data is given by $f(x)=\frac {13}{14}x+\frac {1}{2}$ Let’s take a look.

We found this fit by minimizing $\norm {\vec {b}-A\vec {z}}$ . We will now investigate the meaning of this expression in relation to the line and the data points.

$\begin{equation} \vec{b}-A\vec{z}=\begin{bmatrix}1\\3\\4\end{bmatrix}-\begin{bmatrix}(13/14)(1)+1/2\\(13/14)(2)+1/2\\{(13/14)(4)+1/2}\end{bmatrix}\approx\begin{bmatrix} -0.43\\0.64\\-0.21 \end{bmatrix} \end{equation}$ Observe that each entry of $\vec {b}-A\vec {z}$ is the signed vertical distance between a particular point and the line.

Instead of computing the error, $\norm {\vec {b}-A\vec {z}}$ , we will compute $\norm {\vec {b}-A\vec {z}}^2$ to avoid the square root.

$\begin{equation} \norm{\vec{b}-A\vec{z}}^2=(-0.43)^2+0.64^2+(-0.21)^2\approx 0.64 \end{equation}$

Minimizing $\norm {\vec {b}-A\vec {z}}$ also minimizes $\norm {\vec {b}-A\vec {z}}^2$ . Therefore, what we have minimized is the sum of squares of the vertical distances between the data points and the line. The following GeoGebra interactive will help you explore this idea.

In Exploration exp:leastSq2 we discovered that $\norm {\vec {b}-A\vec {z}}^2$ is the sum of squares of vertical distances between the given data points and the proposed line. By minimizing $\norm {\vec {b}-A\vec {z}}$ , we minimize the sum of squares of vertical distances. This observation holds in general. Given a collection of points $(x_1, y_1), (x_2, y_2),\dots ,(x_n, y_n)$ , finding a linear function of the form $f(x)=ax+b$ that best fits the points we would find a best solution to the system $\begin {bmatrix}x_1&1\\x_2&1\\\vdots &\vdots \\x_n&1\end {bmatrix}\begin {bmatrix}a\\b\end {bmatrix}=\begin {bmatrix}y_1\\y_2\\\vdots \\y_n\end {bmatrix}$ by minimizing $\norm {\begin {bmatrix}y_1-(ax_1+b)\\y_2-(ax_2+b)\\\vdots \\y_n-(ax_n+b)\end {bmatrix}}^2=(y_1-(ax_1+b))^2+(y_2-(ax_2+b))^2+\dots +(y_n-(ax_n+b))^2$ A geometric interpretation of $y_i-(ax_i+b)$ is shown below.

The line we obtain in this fashion is called a line of best fit or a trendline, and the method we used is referred to as the method of least squares.

We can apply the method of least squares to find best fitting non-linear functions.

Find the least squares approximating quadratic polynomial of the form $f(x)=ax^2+bx+c$ for the following points. $(-3, 3), (-1, 1), (0, 1), (1, 2), (3, 4)$

We are looking for an approximate solution to the system of equations $\begin {matrix}a(-3)^2&+&b(-3)&+&c&=&3\\a(-1)^2&+&b(-1)&+&c&=&1\\a(0)^2&+&b(0)&+&c&=&1\\a(1)^2&+&b(1)&+&c&=&2\\a(3)^2&+&b(3)&+&c&=&4\end {matrix}$ This corresponds to the matrix equation $\begin {bmatrix}9&-3&1\\1&-1&1\\0&0&1\\1&1&1\\9&3&1\end {bmatrix}\begin {bmatrix}a\\b\\c\end {bmatrix}=\begin {bmatrix}3\\1\\1\\2\\4\end {bmatrix}$ Multiplying on the left by $A^T$ gives us the normal equations. $A^TA\vec {z}=A^T\vec {b}$ $\begin {bmatrix}164&0&20\\0&20&0\\20&0&5\end {bmatrix}\vec {z}=\begin {bmatrix}66\\11\\4\end {bmatrix}$ It turns out that $A^TA$ is invertible, so it is easy to solve for $\vec {z}$ . You can use technology to accomplish this. We will demonstrate how to do this using Octave.

To use Octave, go to the Sage Math Cell Webpage, copy the code below into the cell, select OCTAVE as the language, and press EVALUATE.

% Define matrix A
 
A=[9 -3 1;1 -1 1;0 0 1;1 1 1; 9 3 1];
 
% Define vector b
 
b=[3;1;1;2;4];
 
% Let B be the inverse of A^TA.
 
B= inv(transpose(A)*A);
 
% Now we solve for z
 
ans=B*transpose(A)*b

We arrive at the solution $\vec {z}=\begin {bmatrix}0.26\\0.20\\1.15\end {bmatrix}$

Therefore, the quadratic function of best fit is given by $f(x)=0.26x^2+0.2x+1.15$ . You can see the graph and the points shown below.

This browser does not support embedded elements.

Before the end of this section we will return to this problem with a more computationally efficient approach.

Given the data points $(-1, 0)$ , $(0,1)$ , and $(1,4)$ , find the least squares approximating function of the form $f(x)=ax+b2^x$ .

We are looking for an approximate solution to the system of equations $\begin {matrix}a(-1)&+&b(2^{-1})&=&0\\a(0)&+&b(2^{0})&=&1\\a(1)&+&b(2^{1})&=&4\end {matrix}\\$ This corresponds to the matrix equation $\begin {bmatrix}-1&1/2\\0&1\\1&2\end {bmatrix}\begin {bmatrix}a\\b\end {bmatrix}=\begin {bmatrix}0\\1\\4\end {bmatrix}$

Using the normal equations, we obtain $A^TA\vec {z}=A^T\vec {b}$ $\frac {1}{4}\begin {bmatrix}8&6\\6&21\end {bmatrix}\vec {z}=\begin {bmatrix}4\\9\end {bmatrix}$ Solving for $\vec {z}$ yields $\vec {z}=\begin {bmatrix}10/11\\16/11\end {bmatrix}$ Therefore, the function of best fit (of the given form) is given by $f(x)=\frac {10}{11}x+\frac {16}{11}(2^x)$

This browser does not support embedded elements.

$QR$ -Factorization: A Quicker Way to do Least Squares

When solving the normal equations in (eq:normalForZ), it is advantageous to have a $QR$ -factorization of $A$ . For then we can write

$\begin{eqnarray*} A^TA\vec{z}&=&A^T\vec{b} \\ (QR)^T(QR)\vec{z}&=&(QR)^T\vec{b} \\ R^TQ^TQR\vec{z}&=&R^TQ^T\vec{b} \\ R^TR\vec{z}&=&R^TQ^T\vec{b}. \end{eqnarray*}$

Since $R$ is invertible, then $R^T$ also has an inverse, and multiplying on the left by it yields $R\vec {z} = Q^T b.$

This last equation is easily solved by back-substitution, since $R$ is upper triangular. This greatly reduces the amount of computations we need to make, as we will observe by using Octave in our final example of the section.

Let’s return to the Octave calculations we did in Example ex:leastSquaresPoly, so we can see the improvement.

To use Octave, go to the Sage Math Cell Webpage, copy the code below into the cell, select OCTAVE as the language, and press EVALUATE.

% Define matrix A
 
A=[9 -3 1;1 -1 1;0 0 1;1 1 1; 9 3 1];
 
% Define vector b
 
b=[3;1;1;2;4];
 
tic %this command begins a timer so we can compare how long it takes to perform our computations
 
% Let B be the inverse of A^TA.
 
B= inv(transpose(A)*A);
 
% Now we solve for z
 
ans=B*transpose(A)*b
 
toc % This command gives us the amount of time it took to perform the calculations
 
% Here we try the computation again using QR-factorization, and we will see it is more efficient
 
tic  %start the timer
 
% Compute the QR-factorization of A
 
[Q,R]=qr(A);
 
ans=R\(transpose(Q)*b)
 
toc  % Is the result noticeably faster?

Practice Problems

Find the best approximation to a solution to the system of equations. $\begin {matrix}3x&+&y&+&z&=&6\\2x&+&3y&-&z&=&1\\2x&-&y&+&z&=&0\\3x&-&3y&+&3z&=&8\end {matrix}$ Enter answers in fraction form below. $x=\answer {\frac {-20}{12}},\quad y=\answer {\frac {46}{12}},\quad z=\answer {\frac {95}{12}}$

Find a linear function of best fit for each of the following sets of data points. Examine how well your line fits the points by typing the equation of the line into the Desmos window.

$(2,4), (4,3), (7,2), (8,1)$ Enter your answers in fraction form. $f(x)=\answer {\frac {-6}{13}}x+\answer {\frac {64}{13}}$

This browser does not support embedded elements.

$(-2, 3), (-1,1), (0,0), (1, -2), (2, -4)$

$f(x)=\answer {-1.7}x+\answer {-0.4}$

This browser does not support embedded elements.

Modify the Octave code in Example ex:leastSquaresPolyRevisited to retry the problem in Example ex:leastSquares3. Is $QR$ quicker?

Use Octave to find the least squares approximating quadratic function for the following data points. $(-2,1),(0,0),(3,2),(4,3)$ Round your answers to three decimal places. $f(x)=\answer {0.194}x^2+\answer {-0.024}x+\answer {0.127}$

If $A$ is an $m \times n$ matrix, it can be proved that there exists a unique $n \times m$ matrix $A^{\#}$ satisfying the following four conditions: $AA^{\#}A = A$ ; $A^{\#}AA^{\#} = A^{\#}$ ; $AA^{\#}$ and $A^{\#}A$ are symmetric. The matrix $A^{\#}$ is called the Moore-Penrose inverse.

(a): If $A$ is square and invertible, show that $A^{\#} = A^{-1}$ .
(b): If $\text {rank} A = m$ , show that $A^{\#} = A^{T}(AA^{T})^{-1}$ .
(c): If $\text {rank} A = n$ , show that $A^{\#} = (A^{T}A)^{-1}A^{T}$ . (Notice the appearance of the Moore-Penrose inverse arrived when we solve the normal equations, arriving at Equation (eq:leastSquaresZ)).

Text Source

A portion of this section has been adapted from W. Keith Nicholson, Linear Algebra with Applications, Lyryx 2021-A, Open Edition, pp. 308-319.