## Does a set of points define a line? - [Solved]

General FreeBASIC programming questions.
cbruce
Posts: 136
Joined: Sep 12, 2007 19:13
Location: Dallas, Texas

### Does a set of points define a line? - [Solved]

Does anyone know how to determine if a set of points maps relatively close to a straight line?

(I'm obviously not a math guy)
:)

Thanks
Last edited by cbruce on Sep 23, 2007 21:59, edited 1 time in total.
Plasma
Posts: 191
Joined: May 27, 2005 5:22
Location: Earth
Contact:
A linear regression will do that.
cbruce
Posts: 136
Joined: Sep 12, 2007 19:13
Location: Dallas, Texas
Thanks Plasma, but after a lot of reading, it does not appear that linear regression is the answer to my question.

I may be incorrect, but it seems that linear regression will only best fit a line to a set of points. What I need to know is if my set of points "is" a straight line (or relatively close).

I could be getting a set of points that consist of either:
- a straight (or close) line of points
- an arc of points
- a widely dispersed cloud of points

I need to be able to determine if the set of points is either a "line" or "something else".
Basic Coder
Posts: 180
Joined: Aug 02, 2006 23:37
Location: Australia
cbruce wrote:Thanks Plasma, but after a lot of reading, it does not appear that linear regression is the answer to my question.

I may be incorrect, but it seems that linear regression will only best fit a line to a set of points. What I need to know is if my set of points "is" a straight line (or relatively close).

I could be getting a set of points that consist of either:
- a straight (or close) line of points
- an arc of points
- a widely dispersed cloud of points

I need to be able to determine if the set of points is either a "line" or "something else".

It would help if you could make the context of the problem clearer.

* * * * * * * * *
* * * * * * * * *
* * * * * * * * *
* * * * * * * * *
* * * * * * * * *
* * * * * * * * *

Choose a point above, draw a line through the point and
see how it can be part of many lines (which line do you
want) and notice the ratio of the change in coordinates
will remain constant.

--
Basic Coder
Richard
Posts: 3047
Joined: Jan 15, 2007 20:44
Location: Australia
Here is a simple linear regression that also computes the coefficient of determination, rr = r squared.
If rr = 1.0000 then it is a perfectly straight line. As the fit gets worse rr falls towards zero.

Code: Select all

` '=============================================================' Linear Regression. y = ax+b, rr = coefficient of determination' If rr = 1 then line is perfectly straight, 0 is not straight' if rr is in between then points are sort of in line.'=============================================================Dim As Integer i, n = 5     ' number of data pointsDim As Double x(1 to n) , y(1 to n)Dim As Double SumX=0, SumY=0, SumXY=0, SumXX=0, SumYY=0Dim As Double a, b, rr' Initialise the x,y data pointsFor i = 1 To n    x(i) = i    y(i) = 3*i + 2Next i' Calculate sigma x, y, xy, xx and yyFor i = 1 To n    SumX += x(i)    SumY += y(i)    SumXY += x(i)*y(i)    SumXX += x(i)*x(i)    SumYY += y(i)*y(i)Next i' compute coefficientsb = (SumXY-(SumX*SumY/n))/(SumXX-(SumX*SumX/n))a = (SumY/n)-b*(SumX/n)rr = (SumXY-(SumX*SumY/n))^2/((SumXX-(SumX*SumX/n))*(SumYY-(SumY*SumY/n)))Print "Y intercept,  a = "; aPrint "Slope dy/dx,  b = "; bPrint "Fit Quality, rr = "; rrSleep`

This code is untested. Good luck.
cbruce
Posts: 136
Joined: Sep 12, 2007 19:13
Location: Dallas, Texas
Hello Basic Coder,

Using your example, it's more like I can receive one of the following sets of points:

Code: Select all

`A.*    *      *          *                * B.* * *            * * * C.*   *    *    *  ** D.* * * *   * *         * * * * * * * * * * * * * * * * * *     * * *   *   * *   *   * *   *   * *   * *   * `

I would like to interpret A. and B. as straight lines. Induction would then allow me to determine that C. was an arc and D. was "something else", simply by the size of the data sets.

I hope that makes it clearer - And thank you for your time.

P.S. - It looks like Richard may have set me aright...
cbruce
Posts: 136
Joined: Sep 12, 2007 19:13
Location: Dallas, Texas
Hello Richard,

So that is what coefficient of determination was! - (I told you I wasn't a geometer)...

I'll try it out. Thank you very much...
KristopherWindsor
Posts: 2428
Joined: Jul 19, 2006 19:17
Location: Sunnyvale, CA
Contact:
One way to do it, is to define the line first, then make sure all points are near that line.
This works, but the column of selected points seems to have a variable with. The solution, although I don't know how to do it, is to use something like "< 40 * slope" instead of just "< 40."

Code: Select all

`' Draw some points, color the ones in a line' by Kristopher WindsorConst screenx = 800, screeny = 600, upoint = 5000Type apoint  As Integer x, yEnd TypeScreenres screenx, screeny, 32Dim As apoint points(1 To upoint)Dim As Integer pcolor, yshouldbeDim As Double x1, y1, x2, y2, slopeRandomize TimerFor a As Integer = 1 To upoint  With points(a)    .x = Rnd * screenx    .y = Rnd * screeny  End WithNext aDo  Getmouse points(1).x, points(1).y    'trick to make sure the line is not vertical  points(1).x += points(1).x = points(2).x    'define the line by the first two points  x1 = points(1).x  y1 = points(1).y  x2 = points(2).x  y2 = points(2).y  slope = (y1 - y2) / (x1 - x2)  Screenlock  Cls  'draw red line connecting first two points  Line (x1, y1) - (x2, y2), &HFFFF0000    'display points as small circles  'white > is on line; green > not on line  For a As Integer = 1 To upoint    With points(a)      yshouldbe = y1 + (.x-x1) * slope      If Abs(.y - yshouldbe) < 40 Then pcolor = &HFFFFFFFF Else pcolor = &HFF00FF00      Circle (.x, .y), 2, pcolor,,,, f    End With  Next a  ScreenunlockLoop Until Inkey = Chr(27)System`
notthecheatr
Posts: 1759
Joined: May 23, 2007 21:52
Location: Cut Bank, MT
Contact:
Linear Regression, if I remember correctly, does fit a line to the data - and it also returns a value of how close the line actually is to the actual data.

You want to find out if an arbitrary line is close to the data? Use linear regression to find the line nearest to the data, then figure out how close it is to the line you're testing - probably by comparing the slope and intercept of the two lines, and setting a certain threshold value to decide what "close enough" is. This isn't quite as accurate as KristopherWindsor's method, but it might be good enough and I think it would be faster.

Another method is to draw the line you want and the data points on the screen and eyeball it :)
KristopherWindsor
Posts: 2428
Joined: Jul 19, 2006 19:17
Location: Sunnyvale, CA
Contact:
Here is my attempt to find the "average line" for a set of points. It is better to use an average line, then to use the line connecting the first two points.
I don't think this code works quite right, since it is my own algorithm to find the average line, but I haven't found any big problems yet, so this should work. :-)

Code: Select all

`' Draw some points, color the ones in a line v2.0' by Kristopher WindsorConst screenx = 800, screeny = 600, upoint = 60Type apoint  As Integer x, yEnd TypeScreenres screenx, screeny, 32Dim As apoint points(1 To upoint)Dim As Integer pcolor, yshouldbe, mousepoint, mx, myDim As Double x1, y1, x2, y2, slope, lx, ly, slopesign, slopetempRandomize TimerFor a As Integer = 2 To upoint  With points(a)    .x = Rnd * screenx    .y = Rnd * screeny  End WithNext aDo  Getmouse mx, my  If Abs(points(mousepoint).x - mx) > 2 Or Abs(points(mousepoint).y - my) > 2 Then    mousepoint += 1    If mousepoint > upoint Then mousepoint = 1    With points(mousepoint)      .x = mx: .y = my    End With  End If    'define the line by the average of the points  lx = 0: ly = 0: slope = 0  For a As Integer = 1 To upoint    With points(a)      'create a line for every point to the following point      x1 = points(a).x      y1 = points(a).y      x2 = points(Iif(a = upoint, 1, a + 1)).x      y2 = points(Iif(a = upoint, 1, a + 1)).y      'accumulate the total x and y coords, and the slope      lx += .x      ly += .y      slopetemp = (y1 - y2) / (x1 - x2 + (x1 = x2)) 'trick to make sure the slope is always defined      slope += Abs(slopetemp)      slopesign += slopetemp    End With  Next a  'convert totals into averages  lx /= upoint  ly /= upoint  slope /= upoint  slope *= Sgn(slopesign)  'create the average line with the average coords and slope (only used for display)  x1 = lx - 300  y1 = ly - 300 * slope  x2 = lx + 300  y2 = ly + 300 * slope    Screenlock  Cls  Print "Slope: ", slope    'draw yellow midpoint  Circle (lx, ly), 4, &HFFFFFF00,,,, f    'draw red "average" line  Line (x1, y1) - (x2, y2), &HFFFF0000    'display points as small circles  'white > is on line; green > not on line  For a As Integer = 1 To upoint    With points(a)      yshouldbe = ly + (.x - lx) * slope      If Abs(.y - yshouldbe) < 40 Then pcolor = &HFFFFFFFF Else pcolor = &HFF00FF00      Circle (.x, .y), 2, pcolor,,,, f    End With  Next a  Screenunlock    Sleep 20Loop Until Inkey = Chr(27)System`
Richard
Posts: 3047
Joined: Jan 15, 2007 20:44
Location: Australia
Here is a major rewrite of my original. Instead of rr it gives you the RMS error measured perpendicular to the line of best fit. If RMS is zero then the fit is perfect. It is based on the original linear regression but RMS does a much better job than rr as an estimate of deviation from straightness. Have more fun!

Code: Select all

`'===============================================================' Best Line Fit. y = mx+c, RMS = Root Mean Square error. This'  algorithm will handles diagonal, vertical and horizontal lines.'===============================================================Dim As Integer i, n = 5     ' number of data points' Initialise the x,y data pointsDim As Double x(1 To n) , y(1 To n)Print " i     x(i)      y(i) "For i = 1 To n    x(i) = i    y(i) = 2 * i + 3    ' 2 * i * i + 3    Print Using "###  ###.####  ###.#### "; i; x(i); y(i)Next iPrint' Calculate sigma x, y, xy, xx and yyDim As Double SumX = 0, SumY = 0, SumXY = 0, SumXX = 0, SumYY = 0For i = 1 To n    SumX += x(i)    SumY += y(i)    SumXY += x(i)*y(i)    SumXX += x(i)*x(i)Next i' compute coefficients, double solves division by zero and infinityDim As Double c, m, dx, dydx = SumXX - SumX * SumX / ndy = SumXY - SumX * SumY / nm = dy / dxc = (SumY / n) - (m * SumX / n)' precompute conjugate of unity vector Dim as double Ux, Uy, scalescale = sqr(dx*dx+dy*dy)Ux = dx / scaleUy = -dy / scale' sum of squares = perpendicular distance from best fit lineDim As Double e, sos = 0, RMSFor i = 1 To n    e = Ux * (y(i) - c) + Uy * x(i)   ' vector rotation by slope    sos += e * eNext iRMS = sqr(sos / n) ' hence the term Root of the Mean SquarePrint Using " Y intercept, c =####.######"; cPrint Using " Slope dy/dx, m =####.######"; mPrint Using " Sum of squares =######.####"; sosPrint Using "    RMS  Error  =######.####"; RMSSleep`
duke4e
Posts: 717
Joined: Dec 04, 2005 0:16
Location: Varazdin, Croatia, Europe
Contact:
Will this be good for you? End result if pretty similar to KristopherWindsor 1st solution, but with a much different approach.

Code: Select all

`Const radius = 30Const numpoints = 5000Const xres = 800Const yres = 600Screenres 800, 600, 32Function randINT(first As Integer, last As Integer) As Integer    Function = Cint(Rnd * (last - first) + first)End FunctionType pt2d    x As Integer    y As Integer    mycolor As UintegerEnd TypeDim Shared As pt2d points(numpoints)For i As Integer = 0 To numpoints - 1    points(i).x = randINT(0, xres - 1)    points(i).y = randINT(0, yres - 1)NextSub MyLine(x1 As Integer, y1 As Integer, x2 As Integer, y2 As Integer)    Dim As Integer dx2, a, dy2, b, dx1, dy1, i1, i2, i   Dim As Integer x, y, d      DX2 = 1   A = X2 - X1   If A < 0 Then      A = -A      DX2 = -1    End If      DY2 = 1   B = Y2 - Y1   If B < 0 Then      B = -B      DY2 = -1    End If      DX1 = DX2   DY1 = 0      If A < B Then        Swap a, b      DX1 = 0      DY1 = DY2    End If      I1 = B * 2   D = I1 - A   I2 = D - A   X = X1   Y = Y1      For I = 0 To A        For j As Integer = 0 To numpoints - 1            Dim As Single dx, dy, dist            dx = x - points(j).x            dy = y - points(j).y            dist = dx * dx + dy * dy            If dist < radius * radius Then points(j).mycolor = Rgb(0, 255, 0)        Next                Pset (x, y)      If D < 0 Then         X += DX1         Y += DY1         D += I1        Else         X += DX2         Y += DY2         D += I2        End If    NextEnd SubDim As Integer x, y, x1, y1, x2, y2, buttonsDo    Getmouse x, y, , buttons    If buttons And 1 Then        x1 = x        y1 = y    End If    If buttons And 2 Then        x2 = x        y2 = y    End If        Screenlock    Cls        MyLine x1, y1, x2, y2    For i As Integer = 0 To numpoints - 1        Pset(points(i).x, points(i).y), points(i).mycolor        points(i).mycolor = Rgb(255, 0, 0)    Next        Screenunlock    Sleep 50Loop Until Inkey = Chr(27)`

Left mouse click to select head of line, right click to select tail of line.
Change consts "radius" and "numpoints" to whatever suits you.

This is just a bresenhams line algorithm with addition of checking for close points for every pixel of line.
cbruce
Posts: 136
Joined: Sep 12, 2007 19:13
Location: Dallas, Texas

### Does a set of points define a line? - [Solved]

Thank you folks so much!!!

duke4e's solution is the one that worked for every data set I've gotten so far. Really elegant (simple) once I figured out what Bresenham's algorithm was and how it worked. And the check to see if my data points simply fell in an area was way cool.

I still don't think I would have figured out what duke4e and Kristopher intended from reviewing their code. Seeing the graphics made all the difference in the world.

FreeBASIC people are great!
cbruce