Speedy Composer – Appendix 7

November 1999

Appendix 7

Statistics of the training set and networks performance for June 1999

(Note that these nets were trained only where the note pitch is different than the previous pitch, so there was no need for an output unit representing interval 0. There are also a few minor differences in the input and output representation).

I checked the average of the network expectations (on the training set), and compared it to the average percentage of the original training set. I found out, that the average is almost exactly the same, even if there are differences between the average of each output unit.

Actually, in many cases the network expectation values are below 0 or above 1 (but usually very close to 0 or 1). I don't know the reason for that.

All the examples below contain results of 2 different nets, both of them with 2 hidden layers. The first network with 40 and 15 units in each hidden layer, and the second network with 40 and 20 units in each hidden layer.

The note_begin and note_change training set average percentage
(for 2 training sets of 6000 randomly chosen notes each) are:
  47.9667   39.4333
  47.9667   39.4333
the network average expectations are:
  47.9972   39.1036
  48.3672   39.3660
same, but ignoring values below 0 and above 1:
  47.9983   39.4299
  48.3050   39.6656
the percent of network expectations above 0.5:
  44.6333   31.8500
  45.3667   30.4500
the percent of network expectations above 0 and below 1:
  90.0833   92.1000
  91.0500   92.9667
the percent of network expectations below 0:
   4.9833    6.0333
   4.7667    5.1167
the percent of network expectations below -0.1:
   1.9000    1.8000
   1.5333    1.8500
the percent of network expectations below -0.2:
   0.6000    0.4000
   0.3000    0.4167
the percent of network expectations below -0.3:
   0.1500    0.1000
   0.1000    0.1500
the percent of network expectations above 1:
   4.9333    1.8667
   4.1833    1.9167
the percent of network expectations above 1.1:
   1.7500    0.6167
   1.7833    0.7000
the percent of network expectations above 1.2:
   0.5500    0.2500
   0.8000    0.1833
the percent of network expectations above 1.3:
   0.1500    0.0667
   0.2000    0.0500

Total success for those expectations (below 0.5 is 0, above 0.5 is 1):
  77.8667   78.1167
  76.4333   77.0167
failure when result should be 1:
  12.7333   14.7333
  13.0833   15.9833
failure when result should be 0:
   9.4000    7.1500
  10.4833    7.0000



The interval training set average percentage (for 2 training sets
of 6000 randomly chosen notes each, with note_change unit on) are:
(-18 to -7)(-6 to -3)(-2 to -1)(+1 to +2)(+3 to +6)(+7 to +18)
   3.0000   13.6000   37.6000   27.2000   13.8000    4.6000
   2.9000   13.6000   37.6000   27.0000   13.8000    4.8000
the network average expectations are:
   3.2000   13.9000   37.2000   27.5000   13.8000    4.6000
   3.2000   13.8000   37.5000   26.8000   13.4000    4.6000
same, but ignoring values below 0:
   4.4000   14.6000   37.9000   27.9000   14.9000    6.7000
   5.4000   15.4000   38.1000   27.4000   14.6000    6.3000
the percent of units with maximum value:
   0.0000    6.5000   55.7000   25.4000    9.4000    2.8000
   0.2000    4.2000   56.3000   27.3000    8.9000    2.8000
the average percent of units above 0.5 * maximum value:
   0.1000    9.7000   46.4000   28.2000   11.8000    3.5000
   1.1000   10.5000   45.2000   28.4000   11.1000    3.3000
the average percent of units above 0.4 * maximum value:
   0.3000   10.7000   43.8000   28.7000   12.4000    3.8000
   1.6000   12.1000   42.8000   27.9000   11.9000    3.5000
the average percent of units above 0.3 * maximum value:
   0.8000   12.0000   41.2000   28.7000   12.9000    4.2000
   2.2000   13.4000   40.3000   27.3000   12.7000    3.8000
the average percent of units above 0.2 * maximum value:
   1.9000   12.9000   38.7000   27.8000   13.5000    4.8000
   3.2000   14.1000   38.1000   26.7000   13.2000    4.4000
the average percent of units above 0.1 * maximum value:
   3.4000   13.6000   36.5000   26.7000   13.9000    5.8000
   4.5000   14.3000   36.2000   25.9000   13.5000    5.2000
the percent of network expectations above 0.5:
   0.0000    1.3000   30.5000   16.2000    4.2000    1.4000
   0.3000    0.7000   29.0000   15.6000    3.6000    1.0000
the percent of network expectations above 0:
  70.8500   88.2000   91.9500   93.7167   83.6833   66.7167
  66.1000   84.4000   93.7000   90.6000   82.4000   64.7000
the percent of network expectations above -0.1:
  97.1000   97.4000   97.6000   98.7000   96.5000   93.1000
  92.6000   94.9000   98.0000   97.6000   96.6000   96.2000
the percent of network expectations above -0.2:
  99.5000   99.4000   99.1000   99.6000   99.3000   98.5000
  98.5000   98.1000   99.2000   99.6000   98.8000   99.2000

Total success for those expectations (maximum value):
  97.0000   86.7000   70.5000   79.5000   88.4000   96.2000
  97.0000   86.7000   70.8000   81.1000   87.9000   96.1000
failure when result should be 1:
   2.9000   10.2000    5.6000   11.1000    8.0000    2.8000
   2.8000   11.3000    5.2000    9.2000    8.4000    2.8000
failure when result should be 0:
   0.0000    3.0000   23.8000    9.3000    3.5000    0.9000
   0.1000    1.9000   23.8000    9.6000    3.5000    0.9000

Total success in choosing the right unit:
   maximum value: 59.2% success, above 0.5: 39.1% success.
   maximum value: 60.0% success, above 0.5: 36.8% success.



The pitch training set average percentage (for 2 training sets
of 6000 randomly chosen notes each, with note_change unit on) are:
 (C ) (C#) (D ) (D#) (E ) (F ) (F#) (G ) (G#) (A ) (A#) (B )
 16.3  0.3 15.0  0.3 19.6  9.7  0.6  8.9  1.8 15.1  0.3 11.5
 16.2  0.2 15.1  0.3 19.7  9.9  0.7  9.0  1.9 14.9  0.2 11.3
the network average expectations are:
 16.2  0.1 15.1  0.4 19.7  9.6  0.5  9.0  1.8 15.3  0.2 11.6
 16.2  0.4 14.9  0.2 19.7 10.1  0.9  9.1  1.9 14.9 -0.1 11.3
same, but ignoring values below 0:
 17.2  2.3 15.7  2.6 20.9 10.3  2.3  9.4  4.0 16.0  2.4 12.2
 17.2  2.4 15.6  2.0 20.5 11.2  2.2 10.0  3.5 16.0  2.5 12.2
the percent of units with maximum value:
 18.4  0.0 19.5  0.0 22.2  6.8  0.0  6.1  0.4 14.9  0.0 11.3
 17.5  0.0 18.7  0.0 19.2  8.0  0.0  7.3  0.2 14.3  0.0 14.4
the average percent of units above 0.5 * maximum value:
 18.5  0.1 16.6  0.3 23.9  6.6  0.0  5.9  1.0 15.7  0.2 10.8
 18.6  0.0 15.5  0.1 21.8  8.5  0.0  7.1  0.6 15.9  0.1 11.3
the average percent of units above 0.4 * maximum value:
 18.2  0.1 15.8  0.6 22.9  7.5  0.1  6.4  1.5 15.3  0.4 10.6
 18.3  0.0 15.4  0.1 21.4  8.8  0.1  7.3  0.9 15.8  0.2 11.1
the average percent of units above 0.3 * maximum value:
 17.4  0.4 15.1  0.9 21.7  8.4  0.3  7.0  2.1 14.9  0.7 10.5
 17.5  0.1 15.0  0.3 20.7  9.4  0.2  7.7  1.3 15.5  0.6 11.0
the average percent of units above 0.2 * maximum value:
 16.4  0.8 14.5  1.4 20.2  8.9  0.8  7.5  2.7 14.6  1.0 10.7
 16.4  0.6 14.5  0.7 19.7  9.9  0.4  8.4  2.0 14.9  1.2 10.9
the average percent of units above 0.1 * maximum value:
 15.3  1.5 13.9  1.8 18.8  8.9  1.5  8.0  3.2 14.2  1.6 10.6
 15.4  1.5 13.8  1.3 18.4  9.8  1.2  8.6  2.7 14.2  1.9 10.7
the percent of network expectations above 0.5:
  5.3  0.0  5.2  0.0 12.0  0.7  0.0  0.2  0.1  6.6  0.0  1.1
  5.5  0.0  4.7  0.1 11.8  2.3  0.0  1.0  0.1  6.0  0.0  1.0
the percent of network expectations above 0:
 81.9 54.3 83.6 51.6 83.0 84.5 54.2 85.9 58.9 87.4 55.8 85.4
 82.1 57.4 84.7 52.2 86.1 79.7 61.5 80.4 61.3 84.0 49.0 79.4
the percent of network expectations above -0.1:
 97.2 94.2 99.1 95.2 95.7 98.9 96.5 99.4 93.8 97.8 94.2 98.8
 97.0 95.5 98.5 97.8 97.3 97.3 98.2 98.0 97.1 96.5 92.8 98.2
the percent of network expectations above -0.2:
 99.8 99.8 99.9 99.5 99.5 99.9 99.7 99.9 99.7 99.7 99.4 99.9
 99.6 99.5 99.8 99.8 99.5 99.5 99.8 99.8 99.6 99.2 98.9 99.8

Total success for those expectations (maximum value):
 83.2 99.6 85.1 99.7 84.0 91.2 99.3 91.1 98.4 86.7 99.7 88.8
 84.6 99.7 85.4 99.6 84.7 90.5 99.2 90.0 98.3 86.7 99.7 87.9
failure when result should be 1:
  7.3  0.3  5.1  0.2  6.6  5.8  0.6  5.8  1.5  6.7  0.3  5.6
  7.0  0.2  5.4  0.3  7.9  5.6  0.7  5.8  1.7  6.9  0.2  4.5
failure when result should be 0:
  9.4  0.0  9.7  0.0  9.2  2.9  0.0  3.0  0.0  6.5  0.0  5.4
  8.3  0.0  9.0  0.0  7.3  3.8  0.0  4.1  0.0  6.3  0.0  7.5
(notice that for rare values – units never get maximum value).

Total success in choosing the right unit:
   maximum value: 53.7% success, above 0.5: 23.5% success.
   maximum value: 53.3% success, above 0.5: 24.5% success.

Conclusions:

Not surprisingly, the network average expectations are very close to the training set average, because the bias is trained to be as close to the average as possible. In the first network (2 output units), we take any value below 0.5 to be 0, and above 0.5 to be 1 (I don't see any other possibility). But, if we do this for the other nets, or take the maximum value as 1 and all the rest as 0 – we lose a lot. The only solution I can think about, is to use the expectations in a random way, that will give more probability to higher values. We can also use only values above X_max * maximum value (X_max can be any value between 0 and 1 – the lower X_max is, the more "surprizes" we have), and this will give in the result similar percentage to the training set percentage. But, I'm not sure this solution is good enough.

I still want to check, if this choice of interval classes is good. Maybe we should choose different classes, or maybe even separate units for each interval. Also, maybe we will want to join the interval network with the first network, together with the note_change and note_begin. Or we can choose a different representation for the interval units.

For the time being, I think the biggest problem is to get rid of the differences between percentages (and average value in the training set) of different units, because it is very difficult for the network to work correctly with such big differences.

I also want to check the network expectation for songs not from the training set, and determine the "quality" of the song according to the network success. Of course, this "quality" regards only songs which are similar to the songs in the training set, and in the same note (C major/A minor).

We can also decide to get rid of the rare pitches (C#, D#, F# and A# together appear for 1.13% of the notes, and G# appears for 1.84% – totally 2.97%), and simplifying the network a lot by this. If we do that, we will have only 7 or 8 pitch classes instead of 12, and we can reduce the number of input and output units. But, songs with rare pitches are usually more interesting, and we can lose this.

I also want to add another feature to the user interface – to let the user select which pitches to use, and which not to use. for example, the user can choose to use only the 7 popular pitches. I think I will not give this information to the network, but use it to select the pitches from the output of the network. So, we will have an array of 12 values between 0 and 1, and we will multiply the pitch output units by this value.

Alternatively, we can have an array X_max[0..11] (X_max should be between 0 and 1 for used pitches, or above 1 for unused pitches), and use only the pitch output units values above X_max percentage of maximum value. I think this is the best solution for the problem of differences between percentages – by giving different X_max values for different pitches, we can give more chances to the rare pitches (or less chances, if we don't want rare pitches). Maybe we can also do the same for interval too.

I found an algorithm to calculate the optimal X_max array. You can look at the attached "create_X_max.m" file for the implementation of the algorithm.

Back to main page.

Speedy