In the beginning, there was data

Crestfallen by the seeming impossibility of recovering by beloved game from high school, I turned my attention to the original motivation: developing machine learning on the Apple ][+. Grabbing my manuals, I got to work.

Apple 2+ manuals

Every solid machine learning project begins with data. My old clunker, however, is cut off from the world. I never had a modem but I recall one of my best friends using the modem on his Apple ][ to hack long distance access codes. (He since went on to pursue a successful career in telecommunications… naturally.) Even if I had a modem, what would I connect it to? Do those services still even exists? (I’m sure they do but I’m not going to figure that out today.) In the meantime, the plan is to create synthetic data locally.

I’m sure everyone is familiar with the chart comparing different classification algorithms. Since two-dimensional binary classification seems like a great place to start, I’ll need a simple way to graph the data.

Classifier comparison

Fortunately, the Apple ][+ comes with a high-resolution graphics screen (HGR) that is 280 pixels across by 160 pixels high. (There is another mode with 192 vertical pixels but I wanted leave the bottom 4 lines of the text window visible for running output.) There are 8 color options for each pixel, but I have a green screen monitor, so I set everything to HCOLOR=7 (white2).

Drawing the axes

Deciding to stick to the positive quadrant, I wrote a subroutine (Applesoft BASIC does not have methods) to simply display the axes. Interestingly, coordinate (0,0) is the upper left corner of the screen while (279,159) is the lower right, so HPLOT 0,159 TO 279,159 draws the x-axis.

    500 REM  == SET GRAPHICS AND DRAW Y AXIS WITH TICK MARKS ==
    510 HGR : HCOLOR=7
    520 HPLOT 0,159 TO 279,159
    530 HPLOT 0,159 TO 0,0
    540 J = 0
    550 FOR I = 159 TO 0 STEP -10
    560   HPLOT 0,I TO 1,I
    570   IF J / 5 = INT(J / 5) THEN HPLOT 0,I TO 2,I
    580   IF J / 10 = INT(J / 10) THEN HPLOT 0,I TO 3,I
    590   J = J + 1
    600 NEXT I
    610 J = 0
    620 FOR I = 0 TO 279 STEP 10
    630   HPLOT I,159 TO I,158
    640   IF J / 5 = INT(J / 5) THEN HPLOT I,159 TO I,157
    650   IF J / 10 = INT(J / 10) THEN HPLOT I,159 TO I,156
    660   J = J + 1
    670 NEXT I
    680 RETURN

The first FOR I loop adds tick marks along the y-axis. I decided to get fancy and add elongated tick marks at every 5th unit and even longer ticks marks at every 10 unit. I use J to count the ticks. The second FOR I loop does the same for the x-axis. The final result looks like this:

x- and y-axis of graph

Synthesizing the data

With a particular fondness for all thing Gaussian, I decided to create two sets of data points with Gaussian distributions. They look pretty and they’re fun.

    10  HOME : VTAB 21
    20  PI = 3.14159265
    30  GOSUB 500 : REM  DRAW AXIS
    40  DIM AM%(2) : REM  -- MEAN
    50  DIM BM%(2)
    60  DIM AS%(2) : REM  -- STD
    70  DIM BS%(2)
    80  REM  == HYPERPARAMETERS ==
    90  AN = 100 : REM  # OF POINTS
    100 BN = 50
    110 AM%(1) = 100 : AM%(2) = 50
    120 BM%(1) = 200 : BM%(2) = 100
    130 AS%(1) = 30 : AS%(2) = 10
    140 BS%(1) = 50 : BS%(2) = 25
    150 AC = 0.5
    160 BC = -0.5
    170 REM  == END HYPERPARAMETERS ==

HOME clears the screen and VTAB 21 moves the cursor to the 21st text line on the screen, which will be the first line under the graphics after HGR. The Apple ][+ doesn’t come with a constant for π so I set that up using all the precision available.

AM% is a two element array of integers to store, respectively, the x-mean and y-mean values for the A data set. (The % sign makes the variable an integer which save space but actually reduced performance because mathematical operations on the Apple ][+ convert integers to real numbers and then back again.) AS% does the same for the x- and y-standard deviations. AC is a real number correlation coefficient ∈ [-1, 1]. Finally, AN is the number of elements in the A dataset. The corresponding B data set hyperparameters are similar. (In Applesoft BASIC only the first two characters of a variable name are ‘considered’ so you have to be careful of collisions.)

Box-Muller to the rescue

The Apple ][+ can generate uniform random variables from 0 to 0.999999999 using RND(), however, to get standard normal random variables we’ll have to use the Box-Muller transform.

Given two independent samples, \(u_1\) and \(u_2\), chosen from a uniform distribution on the unit interval (0, 1), we can get two independent random variables, \(z_0\) and \(z_1\), with a standard normal distribution with the following:

\[z_0 = R \cos(\theta) = \sqrt{-2 \ln(u_1)} \cos(2 \pi u_2) \\ z_1 = R \sin(\theta) = \sqrt{-2 \ln(u_1)} \sin(2 \pi u_2)\]

Here is the code to do that.

900 REM  == BOX-MULLER TRANSFORM ==
910 U1 = RND(1)
920 U2 = RND(1)
930 R = SQR(-2 * LOG(U1))
940 TH = 2 * PI * U2
950 Z0 = R * COS(TH)
960 Z1 = R * SIN(TH)
970 RETURN

From there, to obtain a 2D Gaussian with mean \(\mu_x, \mu_y\) and covariance matrix \(\Sigma\), we’ll need to apply the following, where \(\sigma_x\) and \(\sigma_y\) are the standard deviations in the \(x\) and \(y\) directions, and \(\rho\) is the correlation coefficient:

\[x = \mu_x + \sigma_x z_1 \\ y = \mu_y + \rho \sigma_y z_1 + \sqrt{1 - \rho^2} \sigma_y z_2\]

Here is the code that generates all the samples using these transformations.

200 DIM AX%(AN,2)
210 DIM BX%(BN,2)
220 FOR I = 1 TO AN
230   GOSUB 900 : REM FETCH STANDARD NORMAL RANDOM VALUES
240   X% = AM%(1) + AS%(1) * Z0
250   Y% = AM%(2) + AC * AS%(2) * Z0 + SQR(1 - AC ^ 2) * AS%(2) * Z1
260   AX%(I,1) = X%
270   AX%(I,2) = Y%
280   PRINT "POINT IN SET A "; I; " AT ("; X%; ","; Y%; ")"
290   GOSUB 700 : REM  DRAW A +
300 NEXT I
310 FOR I = 1 TO BN
320   GOSUB 900 : REM FETCH STANDARD NORMAL RANDOM VALUES
330   X% = BM%(1) + BS%(1) * Z0
340   Y% = BM%(2) + BC * BS%(2) * Z0 + SQR(1 - BC ^ 2) * BS%(2) * Z1
350   BX%(I,1) = X%
360   BX%(I,2) = Y%
370   PRINT "POINT IN SET B "; I; " AT ("; X%; ","; Y%; ")"
380   GOSUB 800 : REM  DRAW A BOX
390 NEXT I
400 END

I decided to store the data samples in AX% and BX% as integers to save on memory. Since they are all randomly generated, the extra precision won’t make much of a difference anyway. I’m using 2D arrays where the first axis is the number of samples (e.g., AN) and the second axis is the feature dimension (e.g., 2 for \(x\) and \(y\)).

You can see calling GOSUB 700 and GOSUB 800 to plot the points. Here is the final code for that.

700 REM == DRAW + AT X,Y ==
710 IF X% < 1 OR X% > 270 THEN RETURN
720 IF Y% < 1 OR Y% > 150 THEN RETURN
730 HPLOT X% - 1,Y% TO X% + 1,Y%
740 HPLOT X%,Y% - 1 TO X%,Y% + 1
750 RETURN
800 REM == DRAW BOX AT X,Y ==
810 IF X% < 1 OR X% > 270 THEN RETURN
820 IF Y% < 1 OR Y% > 150 THEN RETURN
830 HPLOT X% - 1,Y% - 1 TO X% + 1,Y% - 1
840 HPLOT X% - 1,Y%: HPLOT X% + 1,Y%
850 HPLOT X% - 1,Y% + 1 TO X% + 1,Y% + 1
860 RETURN

I first verify that the data point is on the visible region of the graph (plotting off the screen will throw an error) and then draw either a “+” or a “□”.

Running the code

Putting it all together, it takes the Apple ][+ about a minute to generate and plot 150 data points. The end result looks like this.

Final plot of synthesized data

Watch this video if you’d like to see it run, in all it’s glory.

Now that this is out of the way, we’ll start by implementing one of the simplest and easiest to understand machine learning algorithms.

Share on

Twitter Facebook LinkedIn

Mark Cramer