Does anyone know how to determine if a set of points maps relatively close to a straight line?
(I'm obviously not a math guy)
:)
Thanks
Does a set of points define a line?  [Solved]
Does a set of points define a line?  [Solved]
Last edited by cbruce on Sep 23, 2007 21:59, edited 1 time in total.
A linear regression will do that.
Thanks Plasma, but after a lot of reading, it does not appear that linear regression is the answer to my question.
I may be incorrect, but it seems that linear regression will only best fit a line to a set of points. What I need to know is if my set of points "is" a straight line (or relatively close).
I could be getting a set of points that consist of either:
 a straight (or close) line of points
 an arc of points
 a widely dispersed cloud of points
I need to be able to determine if the set of points is either a "line" or "something else".
I may be incorrect, but it seems that linear regression will only best fit a line to a set of points. What I need to know is if my set of points "is" a straight line (or relatively close).
I could be getting a set of points that consist of either:
 a straight (or close) line of points
 an arc of points
 a widely dispersed cloud of points
I need to be able to determine if the set of points is either a "line" or "something else".

 Posts: 180
 Joined: Aug 02, 2006 23:37
 Location: Australia
cbruce wrote:Thanks Plasma, but after a lot of reading, it does not appear that linear regression is the answer to my question.
I may be incorrect, but it seems that linear regression will only best fit a line to a set of points. What I need to know is if my set of points "is" a straight line (or relatively close).
I could be getting a set of points that consist of either:
 a straight (or close) line of points
 an arc of points
 a widely dispersed cloud of points
I need to be able to determine if the set of points is either a "line" or "something else".
It would help if you could make the context of the problem clearer.
* * * * * * * * *
* * * * * * * * *
* * * * * * * * *
* * * * * * * * *
* * * * * * * * *
* * * * * * * * *
Choose a point above, draw a line through the point and
see how it can be part of many lines (which line do you
want) and notice the ratio of the change in coordinates
will remain constant.

Basic Coder
Here is a simple linear regression that also computes the coefficient of determination, rr = r squared.
If rr = 1.0000 then it is a perfectly straight line. As the fit gets worse rr falls towards zero.
This code is untested. Good luck.
If rr = 1.0000 then it is a perfectly straight line. As the fit gets worse rr falls towards zero.
Code: Select all
'=============================================================
' Linear Regression. y = ax+b, rr = coefficient of determination
' If rr = 1 then line is perfectly straight, 0 is not straight
' if rr is in between then points are sort of in line.
'=============================================================
Dim As Integer i, n = 5 ' number of data points
Dim As Double x(1 to n) , y(1 to n)
Dim As Double SumX=0, SumY=0, SumXY=0, SumXX=0, SumYY=0
Dim As Double a, b, rr
' Initialise the x,y data points
For i = 1 To n
x(i) = i
y(i) = 3*i + 2
Next i
' Calculate sigma x, y, xy, xx and yy
For i = 1 To n
SumX += x(i)
SumY += y(i)
SumXY += x(i)*y(i)
SumXX += x(i)*x(i)
SumYY += y(i)*y(i)
Next i
' compute coefficients
b = (SumXY(SumX*SumY/n))/(SumXX(SumX*SumX/n))
a = (SumY/n)b*(SumX/n)
rr = (SumXY(SumX*SumY/n))^2/((SumXX(SumX*SumX/n))*(SumYY(SumY*SumY/n)))
Print "Y intercept, a = "; a
Print "Slope dy/dx, b = "; b
Print "Fit Quality, rr = "; rr
Sleep
This code is untested. Good luck.
Hello Basic Coder,
Using your example, it's more like I can receive one of the following sets of points:
I would like to interpret A. and B. as straight lines. Induction would then allow me to determine that C. was an arc and D. was "something else", simply by the size of the data sets.
I hope that makes it clearer  And thank you for your time.
P.S.  It looks like Richard may have set me aright...
Using your example, it's more like I can receive one of the following sets of points:
Code: Select all
A.
*
*
*
*
*
B.
* * *
* * *
C.
*
*
*
*
*
*
D.
* * * * * *
* * * * * * *
* * * * * * * * *
* * * * * *
* * * * *
* * * * * *
I would like to interpret A. and B. as straight lines. Induction would then allow me to determine that C. was an arc and D. was "something else", simply by the size of the data sets.
I hope that makes it clearer  And thank you for your time.
P.S.  It looks like Richard may have set me aright...

 Posts: 2428
 Joined: Jul 19, 2006 19:17
 Location: Sunnyvale, CA
 Contact:
One way to do it, is to define the line first, then make sure all points are near that line.
This works, but the column of selected points seems to have a variable with. The solution, although I don't know how to do it, is to use something like "< 40 * slope" instead of just "< 40."
This works, but the column of selected points seems to have a variable with. The solution, although I don't know how to do it, is to use something like "< 40 * slope" instead of just "< 40."
Code: Select all
' Draw some points, color the ones in a line
' by Kristopher Windsor
Const screenx = 800, screeny = 600, upoint = 5000
Type apoint
As Integer x, y
End Type
Screenres screenx, screeny, 32
Dim As apoint points(1 To upoint)
Dim As Integer pcolor, yshouldbe
Dim As Double x1, y1, x2, y2, slope
Randomize Timer
For a As Integer = 1 To upoint
With points(a)
.x = Rnd * screenx
.y = Rnd * screeny
End With
Next a
Do
Getmouse points(1).x, points(1).y
'trick to make sure the line is not vertical
points(1).x += points(1).x = points(2).x
'define the line by the first two points
x1 = points(1).x
y1 = points(1).y
x2 = points(2).x
y2 = points(2).y
slope = (y1  y2) / (x1  x2)
Screenlock
Cls
'draw red line connecting first two points
Line (x1, y1)  (x2, y2), &HFFFF0000
'display points as small circles
'white > is on line; green > not on line
For a As Integer = 1 To upoint
With points(a)
yshouldbe = y1 + (.xx1) * slope
If Abs(.y  yshouldbe) < 40 Then pcolor = &HFFFFFFFF Else pcolor = &HFF00FF00
Circle (.x, .y), 2, pcolor,,,, f
End With
Next a
Screenunlock
Loop Until Inkey = Chr(27)
System

 Posts: 1759
 Joined: May 23, 2007 21:52
 Location: Cut Bank, MT
 Contact:
Linear Regression, if I remember correctly, does fit a line to the data  and it also returns a value of how close the line actually is to the actual data.
You want to find out if an arbitrary line is close to the data? Use linear regression to find the line nearest to the data, then figure out how close it is to the line you're testing  probably by comparing the slope and intercept of the two lines, and setting a certain threshold value to decide what "close enough" is. This isn't quite as accurate as KristopherWindsor's method, but it might be good enough and I think it would be faster.
Another method is to draw the line you want and the data points on the screen and eyeball it :)
You want to find out if an arbitrary line is close to the data? Use linear regression to find the line nearest to the data, then figure out how close it is to the line you're testing  probably by comparing the slope and intercept of the two lines, and setting a certain threshold value to decide what "close enough" is. This isn't quite as accurate as KristopherWindsor's method, but it might be good enough and I think it would be faster.
Another method is to draw the line you want and the data points on the screen and eyeball it :)

 Posts: 2428
 Joined: Jul 19, 2006 19:17
 Location: Sunnyvale, CA
 Contact:
Here is my attempt to find the "average line" for a set of points. It is better to use an average line, then to use the line connecting the first two points.
I don't think this code works quite right, since it is my own algorithm to find the average line, but I haven't found any big problems yet, so this should work. :)
I don't think this code works quite right, since it is my own algorithm to find the average line, but I haven't found any big problems yet, so this should work. :)
Code: Select all
' Draw some points, color the ones in a line v2.0
' by Kristopher Windsor
Const screenx = 800, screeny = 600, upoint = 60
Type apoint
As Integer x, y
End Type
Screenres screenx, screeny, 32
Dim As apoint points(1 To upoint)
Dim As Integer pcolor, yshouldbe, mousepoint, mx, my
Dim As Double x1, y1, x2, y2, slope, lx, ly, slopesign, slopetemp
Randomize Timer
For a As Integer = 2 To upoint
With points(a)
.x = Rnd * screenx
.y = Rnd * screeny
End With
Next a
Do
Getmouse mx, my
If Abs(points(mousepoint).x  mx) > 2 Or Abs(points(mousepoint).y  my) > 2 Then
mousepoint += 1
If mousepoint > upoint Then mousepoint = 1
With points(mousepoint)
.x = mx: .y = my
End With
End If
'define the line by the average of the points
lx = 0: ly = 0: slope = 0
For a As Integer = 1 To upoint
With points(a)
'create a line for every point to the following point
x1 = points(a).x
y1 = points(a).y
x2 = points(Iif(a = upoint, 1, a + 1)).x
y2 = points(Iif(a = upoint, 1, a + 1)).y
'accumulate the total x and y coords, and the slope
lx += .x
ly += .y
slopetemp = (y1  y2) / (x1  x2 + (x1 = x2)) 'trick to make sure the slope is always defined
slope += Abs(slopetemp)
slopesign += slopetemp
End With
Next a
'convert totals into averages
lx /= upoint
ly /= upoint
slope /= upoint
slope *= Sgn(slopesign)
'create the average line with the average coords and slope (only used for display)
x1 = lx  300
y1 = ly  300 * slope
x2 = lx + 300
y2 = ly + 300 * slope
Screenlock
Cls
Print "Slope: ", slope
'draw yellow midpoint
Circle (lx, ly), 4, &HFFFFFF00,,,, f
'draw red "average" line
Line (x1, y1)  (x2, y2), &HFFFF0000
'display points as small circles
'white > is on line; green > not on line
For a As Integer = 1 To upoint
With points(a)
yshouldbe = ly + (.x  lx) * slope
If Abs(.y  yshouldbe) < 40 Then pcolor = &HFFFFFFFF Else pcolor = &HFF00FF00
Circle (.x, .y), 2, pcolor,,,, f
End With
Next a
Screenunlock
Sleep 20
Loop Until Inkey = Chr(27)
System
Here is a major rewrite of my original. Instead of rr it gives you the RMS error measured perpendicular to the line of best fit. If RMS is zero then the fit is perfect. It is based on the original linear regression but RMS does a much better job than rr as an estimate of deviation from straightness. Have more fun!
Code: Select all
'===============================================================
' Best Line Fit. y = mx+c, RMS = Root Mean Square error. This
' algorithm will handles diagonal, vertical and horizontal lines.
'===============================================================
Dim As Integer i, n = 5 ' number of data points
' Initialise the x,y data points
Dim As Double x(1 To n) , y(1 To n)
Print " i x(i) y(i) "
For i = 1 To n
x(i) = i
y(i) = 2 * i + 3 ' 2 * i * i + 3
Print Using "### ###.#### ###.#### "; i; x(i); y(i)
Next i
Print
' Calculate sigma x, y, xy, xx and yy
Dim As Double SumX = 0, SumY = 0, SumXY = 0, SumXX = 0, SumYY = 0
For i = 1 To n
SumX += x(i)
SumY += y(i)
SumXY += x(i)*y(i)
SumXX += x(i)*x(i)
Next i
' compute coefficients, double solves division by zero and infinity
Dim As Double c, m, dx, dy
dx = SumXX  SumX * SumX / n
dy = SumXY  SumX * SumY / n
m = dy / dx
c = (SumY / n)  (m * SumX / n)
' precompute conjugate of unity vector
Dim as double Ux, Uy, scale
scale = sqr(dx*dx+dy*dy)
Ux = dx / scale
Uy = dy / scale
' sum of squares = perpendicular distance from best fit line
Dim As Double e, sos = 0, RMS
For i = 1 To n
e = Ux * (y(i)  c) + Uy * x(i) ' vector rotation by slope
sos += e * e
Next i
RMS = sqr(sos / n) ' hence the term Root of the Mean Square
Print Using " Y intercept, c =####.######"; c
Print Using " Slope dy/dx, m =####.######"; m
Print Using " Sum of squares =######.####"; sos
Print Using " RMS Error =######.####"; RMS
Sleep
Will this be good for you? End result if pretty similar to KristopherWindsor 1st solution, but with a much different approach.
Left mouse click to select head of line, right click to select tail of line.
Change consts "radius" and "numpoints" to whatever suits you.
This is just a bresenhams line algorithm with addition of checking for close points for every pixel of line.
Code: Select all
Const radius = 30
Const numpoints = 5000
Const xres = 800
Const yres = 600
Screenres 800, 600, 32
Function randINT(first As Integer, last As Integer) As Integer
Function = Cint(Rnd * (last  first) + first)
End Function
Type pt2d
x As Integer
y As Integer
mycolor As Uinteger
End Type
Dim Shared As pt2d points(numpoints)
For i As Integer = 0 To numpoints  1
points(i).x = randINT(0, xres  1)
points(i).y = randINT(0, yres  1)
Next
Sub MyLine(x1 As Integer, y1 As Integer, x2 As Integer, y2 As Integer)
Dim As Integer dx2, a, dy2, b, dx1, dy1, i1, i2, i
Dim As Integer x, y, d
DX2 = 1
A = X2  X1
If A < 0 Then
A = A
DX2 = 1
End If
DY2 = 1
B = Y2  Y1
If B < 0 Then
B = B
DY2 = 1
End If
DX1 = DX2
DY1 = 0
If A < B Then
Swap a, b
DX1 = 0
DY1 = DY2
End If
I1 = B * 2
D = I1  A
I2 = D  A
X = X1
Y = Y1
For I = 0 To A
For j As Integer = 0 To numpoints  1
Dim As Single dx, dy, dist
dx = x  points(j).x
dy = y  points(j).y
dist = dx * dx + dy * dy
If dist < radius * radius Then points(j).mycolor = Rgb(0, 255, 0)
Next
Pset (x, y)
If D < 0 Then
X += DX1
Y += DY1
D += I1
Else
X += DX2
Y += DY2
D += I2
End If
Next
End Sub
Dim As Integer x, y, x1, y1, x2, y2, buttons
Do
Getmouse x, y, , buttons
If buttons And 1 Then
x1 = x
y1 = y
End If
If buttons And 2 Then
x2 = x
y2 = y
End If
Screenlock
Cls
MyLine x1, y1, x2, y2
For i As Integer = 0 To numpoints  1
Pset(points(i).x, points(i).y), points(i).mycolor
points(i).mycolor = Rgb(255, 0, 0)
Next
Screenunlock
Sleep 50
Loop Until Inkey = Chr(27)
Left mouse click to select head of line, right click to select tail of line.
Change consts "radius" and "numpoints" to whatever suits you.
This is just a bresenhams line algorithm with addition of checking for close points for every pixel of line.
Does a set of points define a line?  [Solved]
Thank you folks so much!!!
duke4e's solution is the one that worked for every data set I've gotten so far. Really elegant (simple) once I figured out what Bresenham's algorithm was and how it worked. And the check to see if my data points simply fell in an area was way cool.
I still don't think I would have figured out what duke4e and Kristopher intended from reviewing their code. Seeing the graphics made all the difference in the world.
FreeBASIC people are great!
cbruce
duke4e's solution is the one that worked for every data set I've gotten so far. Really elegant (simple) once I figured out what Bresenham's algorithm was and how it worked. And the check to see if my data points simply fell in an area was way cool.
I still don't think I would have figured out what duke4e and Kristopher intended from reviewing their code. Seeing the graphics made all the difference in the world.
FreeBASIC people are great!
cbruce
Who is online
Users browsing this forum: SamL and 12 guests