Suppose I’m running a race. I have a fixed amount of energy to “spend” on running fast. But the energy cost of running fast varies during the race (e.g., it’s high on hills and low on flats). How should I pace myself to minimize my race time?

This post discusses my optimal pacing problem. I describe it mathematically, derive its solution in simple and general settings, and analyze these solutions’ properties. I assume energy costs are deterministic, whereas my next post allows them to be random.

The optimal pacing problem

My race consists of N “laps” n{1,2,,N} with equal lengths. I start with k1=k>0 units of energy and finish the race with none. Running lap n at speed sn costs cnsn units of energy, where cn>0 varies with n.

My goal is to find the speed sequence (sn)n=1N that minimizes my total time1 Tn=1N1sn subject to the dynamic energy constraint kn+1=kncnsn, boundary conditions k1=k and kN+1=0, and non-negativity constraint sn0.

Solving the two-lap case

We can build intuition by solving the case with N=2. Then the dynamic constraint and boundary conditions imply T=c1kk2+c2k2, where k2 is the energy I choose to leave for the second lap. It satisfies the first-order condition T/k2=0, which we can write as c1(kk2)2=c2k22. The left-hand side is the marginal cost (in units of total time) of using less energy in the first lap. The right-hand side is the marginal benefit of using more energy in the second lap. The first-order condition balances this marginal cost and benefit. It determines how I should smooth my energy consumption across laps.

Rearranging the first-order condition for k2 gives k2=c2c1+c2k. So I should spend my energy proportionally to the square roots of the costs I face. For example, if c1=4c2 then I should spend a third of my energy on the first lap and two thirds on the second. This leads me to run twice as fast on the second lap and makes my total time equal 9c2/k. In contrast, if I spent energy proportionally to costs then I would spend a fifth on the first lap and four fifths on the second. I would run at a constant speed and my total time would equal 10c2/k. That strategy would be optimal if the costs were constant at their mean 5c2/2. But they aren’t constant: they vary by a factor of four. Square-root scaling takes advantage of this variation. It makes me run slow when it’s expensive to run fast.

Solving the general case

The results and intuitions from the case with N=2 generalize to cases with N>2. But those cases require more powerful solution methods. I explain two: using the Hamiltonian and using the Bellman equation. The first is faster, but the second is more intuitive and extends naturally to a setting with random costs.

Using the Hamiltonian

The Hamiltonian for my optimal pacing problem is H1snλn+1cnsn, where λn+1 is a costate that satisfies λn+1λn=Hkn for each n. But H/kn=0 and so λn+1=λ is constant. Substituting it into the first-order condition H/sn=0 gives sn=1λcn. Now the dynamic constraint and boundary conditions imply n=1Ncnsn=k, from which it follows that λ=1kn=1Ncn and therefore sn=kcnm=1Ncm for each n. Then my total time equals T=1k(n=1Ncn)2. For example, letting N=2 and c1=4c2 yields the optimal time T=9c2/k described above.

Using the Bellman equation

The dynamic constraint implies sn=knkn+1cn for each n. Consequently, the cost sequence (cn)n=1N and “remaining energy” sequence (kn+1)n=0N uniquely determine the speed sequence (sn)n=1N. So if Vnm=nN1sn denotes the time spent running laps n through N when I pace myself optimally, then Vn must satisfy the Bellman equation Vn=minkn+1{cnknkn+1+Vn+1}. This equation echoes my objective in the two-lap case. Intuitively, my optimal speeds in the N-lap case solve a sequence of two-lap problems, where the second “lap” is the remainder of my race.

We can solve the Bellman equation using the method of undetermined coefficients. Suppose Vn+1=an+1/kn+1 for some an+10. Then, under optimal pacing, we have 0=kn+1(cnknkn+1+Vn+1)=cn(knkn+1)2an+1kn+1 and therefore kn+1=an+1an+1+cnkn. Substituting this recurrence into the Bellman equation gives Vn=an/kn, where an(an+1+cn)2 and aN+1=0. Solving recursively gives an=m=nNcm for each n, from which it follows that kn+1=m=n+1Ncmm=1Ncmk and sn=kcnm=1Ncm. So we get the same optimal speed sequence and total time as obtained using the Hamiltonian. We also see the square-root scaling from the two-lap case generalize to the N-lap case. For example, if the costs I face in the first half of the race are four times the costs I face in the second, then I should run half as fast in the first half than I run in the second.

Solution properties

As explained above, each speed term sn scales with the inverse square-root of the corresponding cost term cn. This scaling takes advantage of the variation in costs faced during my race. But scaling all of the cost terms has a linear effect: doubling each cn halves each sn and so doubles my total time T. Likewise, doubling my initial energy k doubles each sn and so halves T. These linearities come from the linearity of the dynamic constraint kn+1=kncnsn.

Rearranging the cost sequence (cn)n=1N leads to the same rearrangement of the optimal speed sequence (sn)n=1N. This is because the sequences satisfy cnsn=km=1Ncm, the right-hand side of which doesn’t change if I rearrange the cn. Nor does my minimized time T change. Intuitively, swapping the laps on which I run slow and fast doesn’t change my average pace.

Whereas variation in costs improves my average pace. To see how, let E[cn]1Nn=1Ncn be the empirical mean cost of energy during my race and let T¯E[cn]Nk be my optimal time when cn=E[cn] for each n. Then T¯T=Nk(E[cn]1N2(n=1Ncn)2)=Nk(E[cn2]E[cn]2) and therefore T=T¯NkVar(cn), where Var(cn)=E[cn2]E[cn]2 is the empirical variance of the cn. So applying a mean-preserving spread to the distribution of cn values lowers my optimal time T. But this is not the same as increasing the variance in cn. For example, consider the cost sequences (cn)n=1100 and (cn)n=1100 defined by cn={145if n5055otherwise and cn={200if n2075otherwise. Then E[cn]=E[cn]=100, while Var(cn)=2025 and Var(cn)=2500. So the cn have lower variance than the cn. But Var(cn)5.3 is larger than Var(cn)4.8, which means my optimal time is smaller under (cn)n=1100 than under (cn)n=1100. Intuitively, I prefer cost sequences with a mix of highs and lows to sequences with a few sharp highs and lots of mild lows.


  1. Replacing T with T/N, and letting x=n/N and N, yields a special (linear) case of the problem discussed in my post on negative splits. ↩︎