TOKYO TRIATHLON OLYMPIC RACE ANALYSIS –WOMEN INDIVIDUAL
DID THE STARTING POSITION REALLY SHAPED THE RACE?
On social media after the women’s individual triathlon in Tokyo we read several rumors claiming that the competition was heavily conditioned by the starting position from the pontoon. A thesis that tried to use an analysis to support its own, basically a heat map built by comparing the time of each athlete in the Olympic test with the average gap from the first leg in the two previous World Triathlon Series races. This analysis, strongly biased, is methodologically weak for several reasons, and by itself does not provide any proof of the validity of the aforementioned thesis. Let’s see why.
Starting order of the women's Olympic competition. The choice of the position was made in order of rank from 1st to last qualified.
ICONTEXT AND ENVIROMENT
First of all, it should be remembered that triathlon is a sport that requires a high degree of adaptation to contingent situations, not only in terms of environmental variables, which athletes must be able to manage and interpret with a good degree of adaptation, but even in
management. of the energies regarding the distance, which from race to race – with the same format – often vary by several hundred meters. In particular, in the case of the swimming fraction it is wrong to think that it is a mere transposition in open water of the pool dynamics, due to the never replicable competition field and the peculiarities of group swimming, with tacks at the buoys and more and more often the inclusion of Australian outings, not necessarily in the middle of the fraction (so much so that even in Tokyo the two splits in swimming were respectively 950 and 550 meters) or even usage of wet suit or not.
It would therefore be naïve to claim that the conditions of the competition field are the same for everyone at all times and it would be all the more naïve to compare the performance of the athletes (ie what they expressed in the race from a physiological and technical point of
view) with the results (ie with “the the result “of one’s own performance and that of others, in the contingent environmental context).
Nonetheless, through a simple but rational analysis, we have the opportunity to conclude whether in the Olympic race in question the start had a significant impact on the race, or rather, to understand if it is probable or unlikely that this has happened, even with all the limits. due to the scarcity of available data.
The start of the WTS of Leeds, swimming fraction with wetsuit unlike what happened in Tokyo
WHY USING PREVIOUS RACES IS NOT A GOOD IDEA
One of the golden rules of statistics is to compare the same things with the same things (the famous “apples with apples and pears with pears”), if things are not the same you can try to normalize them but often this operation cannot be carried out, especially how little data is available. The analysis used to support the thesis that the position at the pontoon conditioned the race is partial, it is not statistically solid and does not allow conclusions to be drawn as it compares 3 different events for the conditions of the race field, starting list and – reasonably – approach to the race by the athletes. It also uses data processing (the average of the postings in the two WTS) which further reduces the information available.
APPLES WITH APPLES
Instead of arbitrarily drawing data we have chosen to use only what we know of the Olympic test, thus comparing data that are certainly homogeneous, using Olympic access rank (robust athlete quality index, an input in our model), starting position from the pontoon ( the variable we want to control, another input of the model) and position and detachments of the first two splits and of the total fraction (which also contain the effect of the variable we want to control). Sorting by starting position and building the very useful heat map (Figure 3) we already visually notice three things:
1) The athletes with the best rank chose the right side of the pontoon (lower numbers)
2) The last 5 athletes (left end of the pontoon) obtained a much better performance than the athletes in the center, similar to those positioned in the “offending” part
3) It seems to be very difficult to separate the effect of the Olympic ranking from the effect of the pontoon
4) “At the bottom” of the pontoon Jeffcoat, Kingma and Barthelemy manage to enter the top 18
5) Lopes and Perriault in the middle of the group are 2nd and 12th
Heat map start, split 1, split 2 and final positions swimming leg in Tokyo 
For further simplification we have created a matrix that counts the number of athletes in the three “input” blocks at the start (18 right – 18 center – 18 left) with the three “output” blocks in the two splits and with respect to the total fraction. For each group we then calculated the median rank.
In the first split (which should be most affected by the position effect at start) it appears that 13 out of 18 athletes have retained the starting “group”, 5 out of 18 in the middle group while 3 of the “left” group entered the top 18 already at the end of the first split.
Finally we reported the median position with respect to the starting group in the first split.
Observing this summary table (Figure 4) the difference in the athlete quality index along the pontoon is evident but also that those who started from the center and left managed to get ahead and at least on average there are no striking differences between the first split (different point starting point) and second split (same starting point)..
Summary of the values in the field and of the splits in the three macro groups right, center, left pontoon
A DIFFERENT PERSPECTIVE
Ordering the heat map by passage at the end of the first split swim, the theoretical outliers emerge even more clearly with respect to the starting position with Jeffcoat, Ackermann, Kingma, Thorpe and Barthelemy but also the opposite outliers (start right and exit the top 18).
Comparison of starting position and ranking in the two swimming splits and overall order of the fraction in Tokyo
A COMPLEX ANSWER
A qualitative analysis respectful of the context and data therefore guided us towards at least two conclusions:
1) Those who started in the group of 18 athletes to the right of the pontoon did not enjoy any special advantages
2) It is difficult to separate the effect of the Olympic rank from the effect of the starting position
However, moving from qualitative analysis to quantitative analysis through Principal Component Analysis we are able to clarify the contribution of each variable to the phenomenon examined, so on the Main Component 1 we have the following contributions in descending order of importance: detachment of the second split and the second split (directly related to the total gap) substantially equal (weight 0.49), followed by Olympic ranking (weight 0.40) and last starting position (0.31), the latter two obviously inversely correlated to the total gap.
Therefore we cannot exclude some influence of the starting position on the result of the swimming fraction, but we can certainly say that this influence was residual.
Loading on Component 1
*I calcoli e i plot sono stati eseguiti con il software CAT, Chemometric Agile Tool (R-based) 
 Tempi e classifiche gara https://olympics.com/tokyo-2020/olympic-games/en/results/triathlon/results-women-s-individual-fnl-000100-.htm
  Kim H. Esbensen, Dominique Guyot, Frank Westad, Lars P. Houmoller, Multivariate Data Analysis: In Practice : an Introduction to Multivariate Data Analysis and Experimental Design, Multivariate Data Analysis, 2002. ISBN8299333032, 9788299333030
 R. Leardi, C. Melzi, G. Polotti, CAT (Chemometric Agile Tool), gratuitamente scaricabile al seguente link http://gruppochemiometria.it/index.php/software