## Does a set of points define a line? - [Solved]

General FreeBASIC programming questions.
cbruce
Posts: 136
Joined: Sep 12, 2007 19:13
Location: Dallas, Texas

### Does a set of points define a line? - [Solved]

Does anyone know how to determine if a set of points maps relatively close to a straight line?

(I'm obviously not a math guy)
:)

Thanks
Last edited by cbruce on Sep 23, 2007 21:59, edited 1 time in total.
Plasma
Posts: 191
Joined: May 27, 2005 5:22
Location: Earth
Contact:
A linear regression will do that.
cbruce
Posts: 136
Joined: Sep 12, 2007 19:13
Location: Dallas, Texas
Thanks Plasma, but after a lot of reading, it does not appear that linear regression is the answer to my question.

I may be incorrect, but it seems that linear regression will only best fit a line to a set of points. What I need to know is if my set of points "is" a straight line (or relatively close).

I could be getting a set of points that consist of either:
- a straight (or close) line of points
- an arc of points
- a widely dispersed cloud of points

I need to be able to determine if the set of points is either a "line" or "something else".
Basic Coder
Posts: 180
Joined: Aug 02, 2006 23:37
Location: Australia
cbruce wrote:Thanks Plasma, but after a lot of reading, it does not appear that linear regression is the answer to my question.

I may be incorrect, but it seems that linear regression will only best fit a line to a set of points. What I need to know is if my set of points "is" a straight line (or relatively close).

I could be getting a set of points that consist of either:
- a straight (or close) line of points
- an arc of points
- a widely dispersed cloud of points

I need to be able to determine if the set of points is either a "line" or "something else".

It would help if you could make the context of the problem clearer.

* * * * * * * * *
* * * * * * * * *
* * * * * * * * *
* * * * * * * * *
* * * * * * * * *
* * * * * * * * *

Choose a point above, draw a line through the point and
see how it can be part of many lines (which line do you
want) and notice the ratio of the change in coordinates
will remain constant.

--
Basic Coder
Richard
Posts: 3047
Joined: Jan 15, 2007 20:44
Location: Australia
Here is a simple linear regression that also computes the coefficient of determination, rr = r squared.
If rr = 1.0000 then it is a perfectly straight line. As the fit gets worse rr falls towards zero.

Code: Select all

'=============================================================
' Linear Regression. y = ax+b, rr = coefficient of determination
' If rr = 1 then line is perfectly straight, 0 is not straight
' if rr is in between then points are sort of in line.
'=============================================================
Dim As Integer i, n = 5     ' number of data points
Dim As Double x(1 to n) , y(1 to n)
Dim As Double SumX=0, SumY=0, SumXY=0, SumXX=0, SumYY=0
Dim As Double a, b, rr

' Initialise the x,y data points
For i = 1 To n
x(i) = i
y(i) = 3*i + 2
Next i

' Calculate sigma x, y, xy, xx and yy
For i = 1 To n
SumX += x(i)
SumY += y(i)
SumXY += x(i)*y(i)
SumXX += x(i)*x(i)
SumYY += y(i)*y(i)
Next i

' compute coefficients
b = (SumXY-(SumX*SumY/n))/(SumXX-(SumX*SumX/n))
a = (SumY/n)-b*(SumX/n)
rr = (SumXY-(SumX*SumY/n))^2/((SumXX-(SumX*SumX/n))*(SumYY-(SumY*SumY/n)))

Print "Y intercept,  a = "; a
Print "Slope dy/dx,  b = "; b
Print "Fit Quality, rr = "; rr

Sleep

This code is untested. Good luck.
cbruce
Posts: 136
Joined: Sep 12, 2007 19:13
Location: Dallas, Texas
Hello Basic Coder,

Using your example, it's more like I can receive one of the following sets of points:

Code: Select all

A.
*
*
*
*
*

B.
* * *
* * *

C.
*
*
*
*
*
*

D.
* * * *   * *
* * * * * * *
* * * * * * * * *
* *     * * *   *
* *   *   * *
*   * *   * *   *

I would like to interpret A. and B. as straight lines. Induction would then allow me to determine that C. was an arc and D. was "something else", simply by the size of the data sets.

I hope that makes it clearer - And thank you for your time.

P.S. - It looks like Richard may have set me aright...
cbruce
Posts: 136
Joined: Sep 12, 2007 19:13
Location: Dallas, Texas
Hello Richard,

So that is what coefficient of determination was! - (I told you I wasn't a geometer)...

I'll try it out. Thank you very much...
KristopherWindsor
Posts: 2428
Joined: Jul 19, 2006 19:17
Location: Sunnyvale, CA
Contact:
One way to do it, is to define the line first, then make sure all points are near that line.
This works, but the column of selected points seems to have a variable with. The solution, although I don't know how to do it, is to use something like "< 40 * slope" instead of just "< 40."

Code: Select all

' Draw some points, color the ones in a line
' by Kristopher Windsor

Const screenx = 800, screeny = 600, upoint = 5000

Type apoint
As Integer x, y
End Type

Screenres screenx, screeny, 32

Dim As apoint points(1 To upoint)
Dim As Integer pcolor, yshouldbe
Dim As Double x1, y1, x2, y2, slope

Randomize Timer
For a As Integer = 1 To upoint
With points(a)
.x = Rnd * screenx
.y = Rnd * screeny
End With
Next a

Do
Getmouse points(1).x, points(1).y

'trick to make sure the line is not vertical
points(1).x += points(1).x = points(2).x

'define the line by the first two points
x1 = points(1).x
y1 = points(1).y
x2 = points(2).x
y2 = points(2).y
slope = (y1 - y2) / (x1 - x2)

Screenlock
Cls
'draw red line connecting first two points
Line (x1, y1) - (x2, y2), &HFFFF0000

'display points as small circles
'white > is on line; green > not on line
For a As Integer = 1 To upoint
With points(a)
yshouldbe = y1 + (.x-x1) * slope
If Abs(.y - yshouldbe) < 40 Then pcolor = &HFFFFFFFF Else pcolor = &HFF00FF00
Circle (.x, .y), 2, pcolor,,,, f
End With
Next a
Screenunlock
Loop Until Inkey = Chr(27)

System

notthecheatr
Posts: 1759
Joined: May 23, 2007 21:52
Location: Cut Bank, MT
Contact:
Linear Regression, if I remember correctly, does fit a line to the data - and it also returns a value of how close the line actually is to the actual data.

You want to find out if an arbitrary line is close to the data? Use linear regression to find the line nearest to the data, then figure out how close it is to the line you're testing - probably by comparing the slope and intercept of the two lines, and setting a certain threshold value to decide what "close enough" is. This isn't quite as accurate as KristopherWindsor's method, but it might be good enough and I think it would be faster.

Another method is to draw the line you want and the data points on the screen and eyeball it :)
KristopherWindsor
Posts: 2428
Joined: Jul 19, 2006 19:17
Location: Sunnyvale, CA
Contact:
Here is my attempt to find the "average line" for a set of points. It is better to use an average line, then to use the line connecting the first two points.
I don't think this code works quite right, since it is my own algorithm to find the average line, but I haven't found any big problems yet, so this should work. :-)

Code: Select all

' Draw some points, color the ones in a line v2.0
' by Kristopher Windsor

Const screenx = 800, screeny = 600, upoint = 60

Type apoint
As Integer x, y
End Type

Screenres screenx, screeny, 32

Dim As apoint points(1 To upoint)
Dim As Integer pcolor, yshouldbe, mousepoint, mx, my
Dim As Double x1, y1, x2, y2, slope, lx, ly, slopesign, slopetemp

Randomize Timer
For a As Integer = 2 To upoint
With points(a)
.x = Rnd * screenx
.y = Rnd * screeny
End With
Next a

Do
Getmouse mx, my
If Abs(points(mousepoint).x - mx) > 2 Or Abs(points(mousepoint).y - my) > 2 Then
mousepoint += 1
If mousepoint > upoint Then mousepoint = 1
With points(mousepoint)
.x = mx: .y = my
End With
End If

'define the line by the average of the points
lx = 0: ly = 0: slope = 0
For a As Integer = 1 To upoint
With points(a)
'create a line for every point to the following point
x1 = points(a).x
y1 = points(a).y
x2 = points(Iif(a = upoint, 1, a + 1)).x
y2 = points(Iif(a = upoint, 1, a + 1)).y
'accumulate the total x and y coords, and the slope
lx += .x
ly += .y
slopetemp = (y1 - y2) / (x1 - x2 + (x1 = x2)) 'trick to make sure the slope is always defined
slope += Abs(slopetemp)
slopesign += slopetemp
End With
Next a
'convert totals into averages
lx /= upoint
ly /= upoint
slope /= upoint
slope *= Sgn(slopesign)
'create the average line with the average coords and slope (only used for display)
x1 = lx - 300
y1 = ly - 300 * slope
x2 = lx + 300
y2 = ly + 300 * slope

Screenlock
Cls
Print "Slope: ", slope

'draw yellow midpoint
Circle (lx, ly), 4, &HFFFFFF00,,,, f

'draw red "average" line
Line (x1, y1) - (x2, y2), &HFFFF0000

'display points as small circles
'white > is on line; green > not on line
For a As Integer = 1 To upoint
With points(a)
yshouldbe = ly + (.x - lx) * slope
If Abs(.y - yshouldbe) < 40 Then pcolor = &HFFFFFFFF Else pcolor = &HFF00FF00
Circle (.x, .y), 2, pcolor,,,, f
End With
Next a
Screenunlock

Sleep 20
Loop Until Inkey = Chr(27)

System
Richard
Posts: 3047
Joined: Jan 15, 2007 20:44
Location: Australia
Here is a major rewrite of my original. Instead of rr it gives you the RMS error measured perpendicular to the line of best fit. If RMS is zero then the fit is perfect. It is based on the original linear regression but RMS does a much better job than rr as an estimate of deviation from straightness. Have more fun!

Code: Select all

'===============================================================
' Best Line Fit. y = mx+c, RMS = Root Mean Square error. This
'  algorithm will handles diagonal, vertical and horizontal lines.
'===============================================================
Dim As Integer i, n = 5     ' number of data points

' Initialise the x,y data points
Dim As Double x(1 To n) , y(1 To n)
Print " i     x(i)      y(i) "
For i = 1 To n
x(i) = i
y(i) = 2 * i + 3    ' 2 * i * i + 3
Print Using "###  ###.####  ###.#### "; i; x(i); y(i)
Next i
Print

' Calculate sigma x, y, xy, xx and yy
Dim As Double SumX = 0, SumY = 0, SumXY = 0, SumXX = 0, SumYY = 0
For i = 1 To n
SumX += x(i)
SumY += y(i)
SumXY += x(i)*y(i)
SumXX += x(i)*x(i)
Next i

' compute coefficients, double solves division by zero and infinity
Dim As Double c, m, dx, dy
dx = SumXX - SumX * SumX / n
dy = SumXY - SumX * SumY / n
m = dy / dx
c = (SumY / n) - (m * SumX / n)

' precompute conjugate of unity vector
Dim as double Ux, Uy, scale
scale = sqr(dx*dx+dy*dy)
Ux = dx / scale
Uy = -dy / scale

' sum of squares = perpendicular distance from best fit line
Dim As Double e, sos = 0, RMS
For i = 1 To n
e = Ux * (y(i) - c) + Uy * x(i)   ' vector rotation by slope
sos += e * e
Next i
RMS = sqr(sos / n) ' hence the term Root of the Mean Square

Print Using " Y intercept, c =####.######"; c
Print Using " Slope dy/dx, m =####.######"; m
Print Using " Sum of squares =######.####"; sos
Print Using "    RMS  Error  =######.####"; RMS

Sleep
duke4e
Posts: 717
Joined: Dec 04, 2005 0:16
Location: Varazdin, Croatia, Europe
Contact:
Will this be good for you? End result if pretty similar to KristopherWindsor 1st solution, but with a much different approach.

Code: Select all

Const numpoints = 5000

Const xres = 800
Const yres = 600
Screenres 800, 600, 32

Function randINT(first As Integer, last As Integer) As Integer
Function = Cint(Rnd * (last - first) + first)
End Function

Type pt2d
x As Integer
y As Integer
mycolor As Uinteger
End Type

Dim Shared As pt2d points(numpoints)

For i As Integer = 0 To numpoints - 1
points(i).x = randINT(0, xres - 1)
points(i).y = randINT(0, yres - 1)
Next

Sub MyLine(x1 As Integer, y1 As Integer, x2 As Integer, y2 As Integer)
Dim As Integer dx2, a, dy2, b, dx1, dy1, i1, i2, i
Dim As Integer x, y, d

DX2 = 1
A = X2 - X1
If A < 0 Then
A = -A
DX2 = -1
End If

DY2 = 1
B = Y2 - Y1
If B < 0 Then
B = -B
DY2 = -1
End If

DX1 = DX2
DY1 = 0

If A < B Then
Swap a, b
DX1 = 0
DY1 = DY2
End If

I1 = B * 2
D = I1 - A
I2 = D - A
X = X1
Y = Y1

For I = 0 To A
For j As Integer = 0 To numpoints - 1
Dim As Single dx, dy, dist
dx = x - points(j).x
dy = y - points(j).y
dist = dx * dx + dy * dy
If dist < radius * radius Then points(j).mycolor = Rgb(0, 255, 0)
Next

Pset (x, y)
If D < 0 Then
X += DX1
Y += DY1
D += I1
Else
X += DX2
Y += DY2
D += I2
End If
Next
End Sub

Dim As Integer x, y, x1, y1, x2, y2, buttons
Do
Getmouse x, y, , buttons
If buttons And 1 Then
x1 = x
y1 = y
End If
If buttons And 2 Then
x2 = x
y2 = y
End If

Screenlock
Cls

MyLine x1, y1, x2, y2
For i As Integer = 0 To numpoints - 1
Pset(points(i).x, points(i).y), points(i).mycolor
points(i).mycolor = Rgb(255, 0, 0)
Next

Screenunlock
Sleep 50
Loop Until Inkey = Chr(27)

Left mouse click to select head of line, right click to select tail of line.
Change consts "radius" and "numpoints" to whatever suits you.

This is just a bresenhams line algorithm with addition of checking for close points for every pixel of line.
cbruce
Posts: 136
Joined: Sep 12, 2007 19:13
Location: Dallas, Texas

### Does a set of points define a line? - [Solved]

Thank you folks so much!!!

duke4e's solution is the one that worked for every data set I've gotten so far. Really elegant (simple) once I figured out what Bresenham's algorithm was and how it worked. And the check to see if my data points simply fell in an area was way cool.

I still don't think I would have figured out what duke4e and Kristopher intended from reviewing their code. Seeing the graphics made all the difference in the world.

FreeBASIC people are great!
cbruce