Appendix 7
Statistics of the training set and networks performance for June 1999
(Note that these nets were trained only where the note pitch is different than the previous pitch, so there was no need for an output unit representing interval 0. There are also a few minor differences in the input and output representation).
I checked the average of the network expectations (on the training set), and compared it to the average percentage of the original training set. I found out, that the average is almost exactly the same, even if there are differences between the average of each output unit.
Actually, in many cases the network expectation values are below 0 or above 1 (but usually very close to 0 or 1). I don't know the reason for that.
All the examples below contain results of 2 different nets, both of them with 2 hidden layers. The first network with 40 and 15 units in each hidden layer, and the second network with 40 and 20 units in each hidden layer.
The note_begin and note_change training set average percentage
(for 2 training sets of 6000 randomly chosen notes each) are:
47.9667 39.4333
47.9667 39.4333
the network average expectations are:
47.9972 39.1036
48.3672 39.3660
same, but ignoring values below 0 and above 1:
47.9983 39.4299
48.3050 39.6656
the percent of network expectations above 0.5:
44.6333 31.8500
45.3667 30.4500
the percent of network expectations above 0 and below 1:
90.0833 92.1000
91.0500 92.9667
the percent of network expectations below 0:
4.9833 6.0333
4.7667 5.1167
the percent of network expectations below -0.1:
1.9000 1.8000
1.5333 1.8500
the percent of network expectations below -0.2:
0.6000 0.4000
0.3000 0.4167
the percent of network expectations below -0.3:
0.1500 0.1000
0.1000 0.1500
the percent of network expectations above 1:
4.9333 1.8667
4.1833 1.9167
the percent of network expectations above 1.1:
1.7500 0.6167
1.7833 0.7000
the percent of network expectations above 1.2:
0.5500 0.2500
0.8000 0.1833
the percent of network expectations above 1.3:
0.1500 0.0667
0.2000 0.0500
Total success for those expectations (below 0.5 is 0, above 0.5 is 1):
77.8667 78.1167
76.4333 77.0167
failure when result should be 1:
12.7333 14.7333
13.0833 15.9833
failure when result should be 0:
9.4000 7.1500
10.4833 7.0000
The interval training set average percentage (for 2 training sets
of 6000 randomly chosen notes each, with note_change unit on) are:
(-18 to -7)(-6 to -3)(-2 to -1)(+1 to +2)(+3 to +6)(+7 to +18)
3.0000 13.6000 37.6000 27.2000 13.8000 4.6000
2.9000 13.6000 37.6000 27.0000 13.8000 4.8000
the network average expectations are:
3.2000 13.9000 37.2000 27.5000 13.8000 4.6000
3.2000 13.8000 37.5000 26.8000 13.4000 4.6000
same, but ignoring values below 0:
4.4000 14.6000 37.9000 27.9000 14.9000 6.7000
5.4000 15.4000 38.1000 27.4000 14.6000 6.3000
the percent of units with maximum value:
0.0000 6.5000 55.7000 25.4000 9.4000 2.8000
0.2000 4.2000 56.3000 27.3000 8.9000 2.8000
the average percent of units above 0.5 * maximum value:
0.1000 9.7000 46.4000 28.2000 11.8000 3.5000
1.1000 10.5000 45.2000 28.4000 11.1000 3.3000
the average percent of units above 0.4 * maximum value:
0.3000 10.7000 43.8000 28.7000 12.4000 3.8000
1.6000 12.1000 42.8000 27.9000 11.9000 3.5000
the average percent of units above 0.3 * maximum value:
0.8000 12.0000 41.2000 28.7000 12.9000 4.2000
2.2000 13.4000 40.3000 27.3000 12.7000 3.8000
the average percent of units above 0.2 * maximum value:
1.9000 12.9000 38.7000 27.8000 13.5000 4.8000
3.2000 14.1000 38.1000 26.7000 13.2000 4.4000
the average percent of units above 0.1 * maximum value:
3.4000 13.6000 36.5000 26.7000 13.9000 5.8000
4.5000 14.3000 36.2000 25.9000 13.5000 5.2000
the percent of network expectations above 0.5:
0.0000 1.3000 30.5000 16.2000 4.2000 1.4000
0.3000 0.7000 29.0000 15.6000 3.6000 1.0000
the percent of network expectations above 0:
70.8500 88.2000 91.9500 93.7167 83.6833 66.7167
66.1000 84.4000 93.7000 90.6000 82.4000 64.7000
the percent of network expectations above -0.1:
97.1000 97.4000 97.6000 98.7000 96.5000 93.1000
92.6000 94.9000 98.0000 97.6000 96.6000 96.2000
the percent of network expectations above -0.2:
99.5000 99.4000 99.1000 99.6000 99.3000 98.5000
98.5000 98.1000 99.2000 99.6000 98.8000 99.2000
Total success for those expectations (maximum value):
97.0000 86.7000 70.5000 79.5000 88.4000 96.2000
97.0000 86.7000 70.8000 81.1000 87.9000 96.1000
failure when result should be 1:
2.9000 10.2000 5.6000 11.1000 8.0000 2.8000
2.8000 11.3000 5.2000 9.2000 8.4000 2.8000
failure when result should be 0:
0.0000 3.0000 23.8000 9.3000 3.5000 0.9000
0.1000 1.9000 23.8000 9.6000 3.5000 0.9000
Total success in choosing the right unit:
maximum value: 59.2% success, above 0.5: 39.1% success.
maximum value: 60.0% success, above 0.5: 36.8% success.
The pitch training set average percentage (for 2 training sets
of 6000 randomly chosen notes each, with note_change unit on) are:
(C ) (C#) (D ) (D#) (E ) (F ) (F#) (G ) (G#) (A ) (A#) (B )
16.3 0.3 15.0 0.3 19.6 9.7 0.6 8.9 1.8 15.1 0.3 11.5
16.2 0.2 15.1 0.3 19.7 9.9 0.7 9.0 1.9 14.9 0.2 11.3
the network average expectations are:
16.2 0.1 15.1 0.4 19.7 9.6 0.5 9.0 1.8 15.3 0.2 11.6
16.2 0.4 14.9 0.2 19.7 10.1 0.9 9.1 1.9 14.9 -0.1 11.3
same, but ignoring values below 0:
17.2 2.3 15.7 2.6 20.9 10.3 2.3 9.4 4.0 16.0 2.4 12.2
17.2 2.4 15.6 2.0 20.5 11.2 2.2 10.0 3.5 16.0 2.5 12.2
the percent of units with maximum value:
18.4 0.0 19.5 0.0 22.2 6.8 0.0 6.1 0.4 14.9 0.0 11.3
17.5 0.0 18.7 0.0 19.2 8.0 0.0 7.3 0.2 14.3 0.0 14.4
the average percent of units above 0.5 * maximum value:
18.5 0.1 16.6 0.3 23.9 6.6 0.0 5.9 1.0 15.7 0.2 10.8
18.6 0.0 15.5 0.1 21.8 8.5 0.0 7.1 0.6 15.9 0.1 11.3
the average percent of units above 0.4 * maximum value:
18.2 0.1 15.8 0.6 22.9 7.5 0.1 6.4 1.5 15.3 0.4 10.6
18.3 0.0 15.4 0.1 21.4 8.8 0.1 7.3 0.9 15.8 0.2 11.1
the average percent of units above 0.3 * maximum value:
17.4 0.4 15.1 0.9 21.7 8.4 0.3 7.0 2.1 14.9 0.7 10.5
17.5 0.1 15.0 0.3 20.7 9.4 0.2 7.7 1.3 15.5 0.6 11.0
the average percent of units above 0.2 * maximum value:
16.4 0.8 14.5 1.4 20.2 8.9 0.8 7.5 2.7 14.6 1.0 10.7
16.4 0.6 14.5 0.7 19.7 9.9 0.4 8.4 2.0 14.9 1.2 10.9
the average percent of units above 0.1 * maximum value:
15.3 1.5 13.9 1.8 18.8 8.9 1.5 8.0 3.2 14.2 1.6 10.6
15.4 1.5 13.8 1.3 18.4 9.8 1.2 8.6 2.7 14.2 1.9 10.7
the percent of network expectations above 0.5:
5.3 0.0 5.2 0.0 12.0 0.7 0.0 0.2 0.1 6.6 0.0 1.1
5.5 0.0 4.7 0.1 11.8 2.3 0.0 1.0 0.1 6.0 0.0 1.0
the percent of network expectations above 0:
81.9 54.3 83.6 51.6 83.0 84.5 54.2 85.9 58.9 87.4 55.8 85.4
82.1 57.4 84.7 52.2 86.1 79.7 61.5 80.4 61.3 84.0 49.0 79.4
the percent of network expectations above -0.1:
97.2 94.2 99.1 95.2 95.7 98.9 96.5 99.4 93.8 97.8 94.2 98.8
97.0 95.5 98.5 97.8 97.3 97.3 98.2 98.0 97.1 96.5 92.8 98.2
the percent of network expectations above -0.2:
99.8 99.8 99.9 99.5 99.5 99.9 99.7 99.9 99.7 99.7 99.4 99.9
99.6 99.5 99.8 99.8 99.5 99.5 99.8 99.8 99.6 99.2 98.9 99.8
Total success for those expectations (maximum value):
83.2 99.6 85.1 99.7 84.0 91.2 99.3 91.1 98.4 86.7 99.7 88.8
84.6 99.7 85.4 99.6 84.7 90.5 99.2 90.0 98.3 86.7 99.7 87.9
failure when result should be 1:
7.3 0.3 5.1 0.2 6.6 5.8 0.6 5.8 1.5 6.7 0.3 5.6
7.0 0.2 5.4 0.3 7.9 5.6 0.7 5.8 1.7 6.9 0.2 4.5
failure when result should be 0:
9.4 0.0 9.7 0.0 9.2 2.9 0.0 3.0 0.0 6.5 0.0 5.4
8.3 0.0 9.0 0.0 7.3 3.8 0.0 4.1 0.0 6.3 0.0 7.5
(notice that for rare values – units never get maximum value).
Total success in choosing the right unit:
maximum value: 53.7% success, above 0.5: 23.5% success.
maximum value: 53.3% success, above 0.5: 24.5% success.
Conclusions:
Not surprisingly, the network average expectations are very close to the training set average, because the bias is trained to be as close to the average as possible. In the first network (2 output units), we take any value below 0.5 to be 0, and above 0.5 to be 1 (I don't see any other possibility). But, if we do this for the other nets, or take the maximum value as 1 and all the rest as 0 – we lose a lot. The only solution I can think about, is to use the expectations in a random way, that will give more probability to higher values. We can also use only values above X_max * maximum value (X_max can be any value between 0 and 1 – the lower X_max is, the more "surprizes" we have), and this will give in the result similar percentage to the training set percentage. But, I'm not sure this solution is good enough.
I still want to check, if this choice of interval classes is good. Maybe we should choose different classes, or maybe even separate units for each interval. Also, maybe we will want to join the interval network with the first network, together with the note_change and note_begin. Or we can choose a different representation for the interval units.
For the time being, I think the biggest problem is to get rid of the differences between percentages (and average value in the training set) of different units, because it is very difficult for the network to work correctly with such big differences.
I also want to check the network expectation for songs not from the training set, and determine the "quality" of the song according to the network success. Of course, this "quality" regards only songs which are similar to the songs in the training set, and in the same note (C major/A minor).
We can also decide to get rid of the rare pitches (C#, D#, F# and A# together appear for 1.13% of the notes, and G# appears for 1.84% – totally 2.97%), and simplifying the network a lot by this. If we do that, we will have only 7 or 8 pitch classes instead of 12, and we can reduce the number of input and output units. But, songs with rare pitches are usually more interesting, and we can lose this.
I also want to add another feature to the user interface – to let the user select which pitches to use, and which not to use. for example, the user can choose to use only the 7 popular pitches. I think I will not give this information to the network, but use it to select the pitches from the output of the network. So, we will have an array of 12 values between 0 and 1, and we will multiply the pitch output units by this value.
Alternatively, we can have an array X_max[0..11] (X_max should be between 0 and 1 for used pitches, or above 1 for unused pitches), and use only the pitch output units values above X_max percentage of maximum value. I think this is the best solution for the problem of differences between percentages – by giving different X_max values for different pitches, we can give more chances to the rare pitches (or less chances, if we don't want rare pitches). Maybe we can also do the same for interval too.
I found an algorithm to calculate the optimal X_max array. You can look at the attached "create_X_max.m" file for the implementation of the algorithm.